9 datasets found
  1. h

    Nekochu_Llama-3.1-8B-German-ORPO-details

    • huggingface.co
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open LLM Leaderboard (2025). Nekochu_Llama-3.1-8B-German-ORPO-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/Nekochu_Llama-3.1-8B-German-ORPO-details
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Open LLM Leaderboard
    Description

    Dataset Card for Evaluation run of Nekochu/Llama-3.1-8B-German-ORPO

    Dataset automatically created during the evaluation run of model Nekochu/Llama-3.1-8B-German-ORPO The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/Nekochu_Llama-3.1-8B-German-ORPO-details.

  2. wsdm - open models - nbroad

    • kaggle.com
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Broad (2025). wsdm - open models - nbroad [Dataset]. https://www.kaggle.com/datasets/nbroad/wsdm-open-models-nbroad
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nicholas Broad
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    8.5k+13k+5k rows (v1, v2, v3) of multilingual prompts and responses. Prompts taken from lmsys 1m dataset. Same format as host-provided dataset.

    ****No winner column****

    v1 Model response counts:

    ModelCount
    mistralai/Mistral-Nemo-Instruct-24071867
    meta-llama/Meta-Llama-3-8B-Instruct1702
    mistralai/Mixtral-8x7B-Instruct-v0.11506
    mistralai/Mistral-7B-Instruct-v0.31424
    NousResearch/Hermes-3-Llama-3.1-8B1408
    meta-llama/Llama-3.3-70B-Instruct1344
    Qwen/Qwen2.5-72B-Instruct1322
    01-ai/Yi-1.5-34B-Chat1322
    HuggingFaceH4/starchat2-15b-v0.11302
    microsoft/Phi-3.5-mini-instruct1294
    google/gemma-2-27b-it1230
    Qwen/QwQ-32B-Preview1117

    v1 Language Counts

    LanguageCount
    Portuguese1079
    Russian966
    Chinese909
    English883
    Spanish779
    German615
    French585
    Italian493
    unknown383
    Japanese319
    Korean201
    Polish132
    Indonesian104
    Arabic75
    Vietnamese57
    Turkish57
    Dutch50
    Latin40
    Hungarian37
    Ukrainian36
    Persian34
    Danish33
    Greek33
    Czech29
    Swedish25
    Romanian24
    Galician22
    Hebrew19
    Serbian18
    Scots17
    Norwegian17
    Bulgarian15
    Finnish14
    Catalan14
    Hawaiian13
    Corsican13
    Malay12
    Slovak11
    Thai10
    Occitan9
    Norwegian Nynorsk8
    Afrikaans8
    Haitian Creole8
    Quechua8
    Samoan7
    Breton7
    Uzbek7
    Bangla7
    Hausa6
    Luxembourgish6
    Tsonga6
    Esperanto6
    Interlingua5
    Somali5
    Basque5
    Aymara5
    Tatar5
    Nauru4
    Tagalog4
    Tswana4
    Wolof4
    Guarani4
    Faroese4
    Croatian4
    Malagasy4
    Estonian4
    Lithuanian3
    Khasi3
    Tongan3
    Akan3
    Manx3
    Javanese3
    Swahili3
    Seselwa Creole French3
    Oromo3
    Latvian3
    Lingala2
    Interlingue2
    Bosnian2
    Yoruba2
    Kazakh2
    zzp2
    Macedonian2
    Tajik2
    Southern Sotho2
    Welsh2
    Scottish Gaelic2
    Northern Sotho2
    Kinyarwanda2
    Irish2
    Fijian2
    Amharic2
    Bislama2
    Hmong2
    Hindi2
    Waray2
    Volapük2
    Marathi1
    Sundanese1
    Kalaallisut1
    Ganda1
    Afar1
    Rundi1
    Sanskrit1
    Bashkir1
    Cebuano1
    Zulu1
    Sinhala1
    Romansh1
    Nepali1
    Xhosa1
    Tamil1
    Māori1
    Albanian1
    Icelandic1
    Slovenian1
    xx1

    v2 Model Counts

    Model NameCount
    google/gemma-2-9b-it1242
    01-ai/Yi-1.5-34B-Chat1229
    microsoft/phi-41195
    microsoft/Phi-3.5-mini-instruct1187
    NousResearch/Hermes-3-Llama-3.1-8B1179
    meta-llama/Llama-2-7b-chat-hf1179
    mistralai/Mixtral-8x7B-Instruct-v0.11177
    mistralai/Mistral-Nemo-Instruct-24071163
    meta-llama/Meta-Llama-3-8B-Instruct1158
    meta-llama/Llama-3.1-70B-Instruct1146
    meta-llama/Llama-3.3-70B-Instruct1142
    microsoft/Phi-3-mini-4k-instruct1141
    Qwen/Qwen2.5-0.5B-Instruct1138
    google/gemma-2-2b-it1133
    google/gemma-1.1-7b-it1130
    meta-llama/Llama-3.2-1B-Instruct1115
    mistralai/Mistral-7B-Instruct-v0.31115
    HuggingFaceH4/starchat2-15b-v0.11112
    meta-llama/Llama-3.2-3B-Instruct1097
    HuggingFaceTB/SmolLM2-1.7B-Instruct1092
    Qwen/Qwen2.5-72B-Instruct1088
    tiiuae/falcon-7b-instruct1064
    Qwen/QwQ-32B-Preview964

    v2 Language Counts

    LanguageCount
    English2724
    Portuguese1482
    Russian1410
    Chinese1121
    Spanish1088
    French859
    German814
    Italian725
    unknown502
    Japanese378
    Korean270
    Polish151
    Indonesian132
    Arabic114
    Vietnamese98
    Latin ...
  3. h

    dolly-15k_de

    • huggingface.co
    Updated Aug 31, 2000
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayflower GmbH (2000). dolly-15k_de [Dataset]. https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2000
    Dataset authored and provided by
    Mayflower GmbH
    Description

    A reformatted version of the DRXD1000/Dolly-15k-German dataset. Available for finetuning in hiyouga/LLaMA-Factory.

  4. Aleph-Alpha-GermanWeb

    • huggingface.co
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleph Alpha (2025). Aleph-Alpha-GermanWeb [Dataset]. https://huggingface.co/datasets/Aleph-Alpha/Aleph-Alpha-GermanWeb
    Explore at:
    Dataset updated
    Apr 25, 2025
    Dataset authored and provided by
    Aleph Alphahttps://aleph-alpha.com/
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    AlephAlphaGermanWeb

    Aleph-Alpha-GermanWeb is a new German-language dataset that combines heuristic and model-based filtering techniques with synthetic data generation to achieve SOTA performance in German-language benchmarks. The dataset draws from three sources: (1) Common Crawl web data, (2) FineWeb2, and (3) synthetically-generated data conditioned on actual, organic web data. In our accompanying paper, we evaluated our dataset by training both a 1B Llama-style model and an 8B… See the full description on the dataset page: https://huggingface.co/datasets/Aleph-Alpha/Aleph-Alpha-GermanWeb.

  5. h

    wiki_qa_de

    • huggingface.co
    Updated Jan 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayflower GmbH (2025). wiki_qa_de [Dataset]. https://huggingface.co/datasets/mayflowergmbh/wiki_qa_de
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2025
    Dataset authored and provided by
    Mayflower GmbH
    Description

    A german translation for the wiki_qa dataset. Extracted from seedboxventures/multitask_german_examples_32k. Translation created by seedbox ai for KafkaLM ❤️. Available for finetuning in hiyouga/LLaMA-Factory.

  6. f

    350M Model

    • figshare.com
    json
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavel Chizhov (2025). 350M Model [Dataset]. http://doi.org/10.6084/m9.figshare.29135096.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    May 23, 2025
    Dataset provided by
    figshare
    Authors
    Pavel Chizhov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    350M Model**RAG-350M** is a 350 million parameters Small Reasoning Model, trained for retrieval-augmented general (RAG), search and source summarization. Along with RAG-1B it belongs to our family of specialized reasoning models.RAG-350M outperforms most SLMs (4 billion parameters and below) on standardized benchmarks for retrieval-augmented general (HotPotQA, 2wiki) and is a highly cost-effective alternative with popular larger models, including Qwen-2.5-7B, Llama-3.1-8B and Gemma-3-4B. It is the only SLM to date to maintain consistent RAG performance across leading European languages and to ensure systematic reference grounding for statements. Due to its size, ease of deployment on constrained infrastructure (including mobile phone) and built-in support for factual and accurate information, RAG-350m unlocks a range of new use cases for generative AI.## FeaturesRAG-350M is a specialized language model using a series of special tokens to process a structured input (query and sources) and generate a structured output (reasoning sequence and answer with sources). For easier implementation, we encourage to use the associated API library.### Citation supportRAG-350M natively generated grounded answers on the basis of excerpts and citations extracted from the provided sources, using a custom syntax inspired by Wikipedia. It is one a handful open weights model to date to have been developed with this feature and the first one designed for actual deployment. In contrast with Anthropic approach (Citation mode), citation are integrally generated by the model and are not the product of external chunking. As a result we can provide another desirable feature to simplify source checking: citation shortening for longer excerpts (using "(…)").### RAG reasoningRAG-350M generates a specific reasoning sequences incorporating several proto-agentic abilities for RAG applications. The model is able to make a series of decisions directly:* Assessing whether the query is understandable.* Assessing whether the query is trivial enough to not require a lengthy pre-analysis (adjustable reasoning)* Assessing whether the sources do contain enough input to generate a grounded answer.The structured reasoning trace include the following steps:* Language detection of the query. The model will always strive to answer in the language of the original query.* Query analysis and associated query report. The analysis can either lead to a standard answer, a shortening reasoning trace/answer for trivial question, a reformulated query or a refusal (that could in the context of the application be transformed into user input querying).* Source analysis and associated source report. This step evaluates the coverage and depth of the provided sources in regards to the query.* Draft of the final answer.### MultilingualityRAG-350M is able to read and write in the main European languages: French, German, Italian, Spanish and, to a lesser extent, Polish, Latin and Portuguese.To date, it is the only small language model with negligible loss of performance in leading European languages for RAG-related tasks. On a translated set of HotPotQA we observed a significant drop of performance in most SLMs from 10\% to 30-35\% for sub-1B models. We do expect the results of any standard English evaluation on our RAG models should be largely transferable to the main European languages limiting the costs of evaluation and deployment in multilingual settings.## TrainingRAG-350M is trained on large synthetic dataset emulating retrieval of wide variety of multilingual open sources from Common Corpus. They provide native support for citation and grounding with literal quotes. Following on the latest trends of agentification, the models reintegrate multiple features associated with RAG workflows such as query routing, query reformulation, source reranking.## EvaluationRAG-350M was evaluated on three standard RAG benchmarks, 2wiki, HotpotQA and MuSique.All the benchmarks only assess the "trivial" mode on questions requiring some form of multi-hop reasoning over sources (answer disseminated into different sources) as well as discrimination of distractor sources.RAG-350M is not simply a cost-effective version of larger models. We found it has been able to answer correctly to several hundred questions from HotPotQA that neither Llama-3-8b nor Qwen-2.5-7b could solve. Consequently we encourage its use as part of multi-model RAG systems.

  7. h

    deutsche_bahn_faq_128

    • huggingface.co
    Updated Aug 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    adesso SE (2024). deutsche_bahn_faq_128 [Dataset]. https://huggingface.co/datasets/islam-hajosman/deutsche_bahn_faq_128
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2024
    Authors
    adesso SE
    Description

    Dataset Name: Deutsche Bahn FAQ in Llama 3 Format Dataset Description: This dataset contains 1000 question-answer pairs extracted from the official Deutsche Bahn (German Railways) FAQ section. The data has been specifically formatted to be compatible with the Llama 3 instruct models for supervised fine-tuning (SFT). Dataset Purpose: The primary purpose of this dataset is to facilitate the fine-tuning of Llama 3 instruct models for tasks related to customer service and information retrieval in… See the full description on the dataset page: https://huggingface.co/datasets/islam-hajosman/deutsche_bahn_faq_128.

  8. u

    Data from: LLM-Supported Workflow for Processing Faulty OCR

    • pub.uni-bielefeld.de
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Wachter; Patrick Jentsch (2025). LLM-Supported Workflow for Processing Faulty OCR [Dataset]. https://pub.uni-bielefeld.de/record/3003406
    Explore at:
    Dataset updated
    Jul 11, 2025
    Authors
    Christian Wachter; Patrick Jentsch
    Description

    Notebook based on Sarah Oberbichler's (oberbichler@ieg-mainz.de) Notebook 'Researching German Historical Newspapers with Llama AI Model' (https://github.com/soberbichler/Notebooks4Historical_Newspapers/blob/main/Llama3_OCR.ipynb) Edited by Christian Wachter (christian.wachter@uni-bielefeld.de) and Patrick Jentsch (p.jentsch@uni-bielefeld.de) This notebook shows how LLMs can be used to support research with historical newspapers. In this example, the Llama 3.1 model is used to correct OCR of previously OCR'd historical newspaper pages. OCR quality has been a long-standing issue in digitization efforts. Historical newspapers are particularly affected due their complexity, historical fonts, or degradation. Additionally, OCR technology faced limitations when dealing with historical scripts.


    License: GNU GPLv3

  9. h

    HARD-REASONING-DE

    • huggingface.co
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Embraceable Technology GmbH (2025). HARD-REASONING-DE [Dataset]. https://huggingface.co/datasets/embraceableAI/HARD-REASONING-DE
    Explore at:
    Dataset updated
    Jul 16, 2025
    Dataset authored and provided by
    Embraceable Technology GmbH
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    HARD-REASONING-DE

    The original dataset was obtained from German-RAG LLM-HARD BENCHMARK and was further cleaned, filtered and re-evaluated.

      Methodology: Reasoning-DE
    

    Providing Persona Descriptions and rewriting in a similar style with a different focus area and name in german/english language Generating Simple Logical Problems out of Persona-specific Views & Language. Generating Approaches, Thinking-Steps & Solutions separately verified by Llama-3.1-405B-Instruct Quality… See the full description on the dataset page: https://huggingface.co/datasets/embraceableAI/HARD-REASONING-DE.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Open LLM Leaderboard (2025). Nekochu_Llama-3.1-8B-German-ORPO-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/Nekochu_Llama-3.1-8B-German-ORPO-details

Nekochu_Llama-3.1-8B-German-ORPO-details

Evaluation run of Nekochu/Llama-3.1-8B-German-ORPO

open-llm-leaderboard/Nekochu_Llama-3.1-8B-German-ORPO-details

Explore at:
Dataset updated
Jul 30, 2025
Dataset authored and provided by
Open LLM Leaderboard
Description

Dataset Card for Evaluation run of Nekochu/Llama-3.1-8B-German-ORPO

Dataset automatically created during the evaluation run of model Nekochu/Llama-3.1-8B-German-ORPO The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/Nekochu_Llama-3.1-8B-German-ORPO-details.

Search
Clear search
Close search
Google apps
Main menu