100+ datasets found
  1. h

    german

    • huggingface.co
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mattia (2023). german [Dataset]. https://huggingface.co/datasets/mstz/german
    Explore at:
    Dataset updated
    Apr 13, 2023
    Authors
    Mattia
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    German

    The German dataset from the UCI ML repository. Dataset on loan grants to customers.

      Configurations and tasks
    

    Configuration Task Description

    encoding

    Encoding dictionary showing original values of encoded features.

    loan Binary classification Has the loan request been accepted?

      Usage
    

    from datasets import load_dataset

    dataset = load_dataset("mstz/german", "loan")["train"]

      Features
    

    Feature Type… See the full description on the dataset page: https://huggingface.co/datasets/mstz/german.

  2. NLP for German News Articles

    • kaggle.com
    zip
    Updated Oct 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Chauhan (2022). NLP for German News Articles [Dataset]. https://www.kaggle.com/datasets/whenamancodes/nlp-for-10k-german-news-articles
    Explore at:
    zip(128989980 bytes)Available download formats
    Dataset updated
    Oct 1, 2022
    Authors
    Aman Chauhan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    :::: Ten Thousand German News Articles Dataset ::::

    A dataset for topic extraction from 10k German News Articles and NLP for German language. English text classification datasets are common. Examples are the big AG News, the class-rich 20 Newsgroups and the large-scale DBpedia ontology datasets for topic classification and for example the commonly used IMDb and Yelp datasets for sentiment analysis. Non-english datasets, especially German datasets, are less common. There is a collection of sentiment analysis datasets assembled by the Interest Group on German Sentiment Analysis. To my knowledge the MLDoc contains German documents for classification. Due to grammatical differences between the English and the German language, a classifier might be effective on a English dataset, but not as effective on a German dataset. The German language has a higher inflection and long compound words are quite common compared to the English language. One would need to evaluate a classifier on multiple German datasets to get a sense of it’s effectiveness.

    :::: What It Cointains ::::

    The 10kGNAD dataset is intended to solve part of this problem as the first german topic classification dataset. It consists of 10273 german language news articles from an austrian online newspaper categorized into nine topics. These articles are a till now unused part of the One Million Posts Corpus. In the One Million Posts Corpus each article has a topic path. For example Newsroom/Wirtschaft/Wirtschaftpolitik/Finanzmaerkte/Griechenlandkrise. The 10kGNAD uses the second part of the topic path, here Wirtschaft, as class label. The article titles and texts are concatenated into one text and the authors are removed to avoid a keyword like classification on autors frequent in a class. I created and used this dataset in my thesis to train and evaluate four text classifiers on the German language. By publishing the dataset I hope to support the advancement of tools and models for the German language. Additionally this dataset can be used as a benchmark dataset for german topic classification.

    Citations:

    @InProceedings{Schabus2017, Author = {Dietmar Schabus and Marcin Skowron and Martin Trapp}, Title = {One Million Posts: A Data Set of German Online Discussions}, Booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)}, Pages = {1241--1244}, Year = {2017}, Address = {Tokyo, Japan}, Doi = {10.1145/3077136.3080711}, Month = aug } @InProceedings{Schabus2018, author = {Dietmar Schabus and Marcin Skowron}, title = {Academic-Industrial Perspective on the Development and Deployment of a Moderation System for a Newspaper Website}, booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC)}, year = {2018}, address = {Miyazaki, Japan}, month = may, pages = {1602-1605}, abstract = {This paper describes an approach and our experiences from the development, deployment and usability testing of a Natural Language Processing (NLP) and Information Retrieval system that supports the moderation of user comments on a large newspaper website. We highlight some of the differences between industry-oriented and academic research settings and their influence on the decisions made in the data collection and annotation processes, selection of document representation and machine learning methods. We report on classification results, where the problems to solve and the data to work with come from a commercial enterprise. In this context typical for NLP research, we discuss relevant industrial aspects. We believe that the challenges faced as well as the solutions proposed for addressing them can provide insights to others working in a similar setting.}, url = {http://www.lrec-conf.org/proceedings/lrec2018/summaries/8885.html}, }

    More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Hehe

  3. h

    german-ler

    • huggingface.co
    • opendatalab.com
    Updated Nov 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Leitner (2024). german-ler [Dataset]. http://doi.org/10.57967/hf/0046
    Explore at:
    Dataset updated
    Nov 2, 2024
    Authors
    Elena Leitner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for "German LER"

      Dataset Summary
    

    A dataset of Legal Documents from German federal court decisions for Named Entity Recognition. The dataset is human-annotated with 19 fine-grained entity classes. The dataset consists of approx. 67,000 sentences and contains 54,000 annotated entities. NER tags use the BIO tagging scheme. The dataset includes two different versions of annotations, one with a set of 19 fine-grained semantic classes (ner_tags) and another one… See the full description on the dataset page: https://huggingface.co/datasets/elenanereiss/german-ler.

  4. german-credit

    • kaggle.com
    zip
    Updated Oct 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kikoy (2023). german-credit [Dataset]. https://www.kaggle.com/datasets/rizkia14/german-credit
    Explore at:
    zip(17802 bytes)Available download formats
    Dataset updated
    Oct 15, 2023
    Authors
    Kikoy
    Description

    Dataset

    This dataset was created by Kikoy

    Released under Other (specified in description)

    Contents

  5. s

    German Dataset

    • hmn.shaip.com
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). German Dataset [Dataset]. https://hmn.shaip.com/offerings/speech-data-catalog/german-dataset/
    Explore at:
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Tsev German DatasetDeutscher DatensatzHigh-Quality German Hu-Center, thiab IVR Dataset rau AI & Speech Models Hu rau peb OverviewTitle (Language) German Language DatasetDataset TypesCall Center, General Conversation, Music, Scripted MonologueCountryGermanyDescriptionTitle (Language) German Language DatasetDataset TypesCall Center, General Conversation, Music, Scripted MonologueCountryGermanyDescriptionTitle Unscripted, kev sib tham...

  6. German Credit Scoring Data

    • kaggle.com
    zip
    Updated Jan 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elshan Kazim (2024). German Credit Scoring Data [Dataset]. https://www.kaggle.com/datasets/elsnkazm/german-credit-scoring-data
    Explore at:
    zip(17947 bytes)Available download formats
    Dataset updated
    Jan 17, 2024
    Authors
    Elshan Kazim
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context

    This dataset classifies people described by a set of attributes as good or bad credit risks. Link to the original dataset: German Credit Data

    Dataset Characteristics# Instances# Features
    Multivariate100020

    Since it is impossible to understand the original dataset due to its categorical features with coded, we have mapped those codes into appropriate ones.

    Content

    Features and explanations

    1. checking_acc_status (categorical) - Status of existing checking account
      • below_0: ... < 0 DM
      • below_200: 0 <= ... < 200 DM
      • above_200: ... >= 200 DM / salary assignments for at least 1 year
      • no_checking_acc: no checking account
    2. duration (numeric) - Agreed Loan Duration in months
    3. cred_hist (categorical) - Credit history status
      • no_loan_or_paid_duly_other: no credits taken/ all credits paid back duly
      • paid_duly_this_bank: all credits at this bank paid back duly
      • curr_loans_paid_duly: existing credits paid back duly till now
      • delay_in_past: delay in paying off in the past
      • risky_acc_or_curr_loan_other: critical account/ other credits existing (not at this bank)
    4. purpose (categorical) - Loan Request Purpose
      • car_new: car (new)
      • car_used: car (used)
      • furniture_equipment: furniture/equipment
      • radio_tv: radio/television
      • domestic_appliance: domestic appliances
      • repairs: repairs
      • education: education
      • retraining: retraining
      • business: business
      • others: others
    5. loan_amt (numerical) - Credit amount
    6. saving_acc_bonds (categorical) - Savings account/bonds
      • below_100: ... < 100 DM
      • below_500: 100 <= ... < 500 DM
      • below_1000: 500 <= ... < 1000 DM
      • above_1000: .. >= 1000 DM
      • unknown_no_saving_acc: unknown/ no savings account
    7. present_employment_since (categorical) - Present employment since
      • unemployed: unemployed
      • below_1y: ... < 1 year
      • below_4y: 1 <= ... < 4 years
      • below_7y: 4 <= ... < 7 years
      • above_7y: .. >= 7 years
    8. installment_rate (numerical) - Installment rate in percentage of disposable income
    9. personal_stat_gender (categorical) - Personal status and sex
      • male_divorced_separated
      • female_divorced_separated_married
      • male_single
      • male_married_widowed
      • female_single
    10. other_debtors_guarantors (categorical: co-applicant, guarantor, none)
    11. present_residence_since (numerical)
    12. property (categorical)
      • real_estate
      • life_insurance_or_agreements: if not real_estate: building society savings agreement/ life insurance
      • car_or_other: if not others: car or other, not in attribute 6
      • unknown_or_no_property: unknown / no property
    13. age (numerical)
    14. other_installment_plans (categorical: bank, stores, none)
    15. housing (categorical: rent, own, for_free)
    16. num_curr_loans - Number of existing credits at this bank
    17. job (categorical)
      • unemployed_non_resident: unemployed/ unskilled - non-resident
      • unskilled_resident: unskilled - resident
      • skilled_official: skilled employee / official
      • management_or_self_emp: management/ self-employed/highly qualified employee/ officer
    18. num_people_provide_maint (numerical) - Number of people being liable to provide maintenance for
    19. telephone (categorical)
    20. is_foreign_worker (categorical) - Indicates whether the individual is a foreign worker
  7. s

    German Dataset

    • ny.shaip.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip, German Dataset [Dataset]. https://ny.shaip.com/offerings/speech-data-catalog/german-dataset/
    Explore at:
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Home Dataset GermanDeutscher DatensatzHigh-Quality German Call-Center, ndi IVR Dataset for AI & Speech Models Lumikizanani Nafe OverviewTitle (Language) German Language DatasetDataset TypesCall Center, General Conversation, Music, Scripted MonologueCountryGermanyDescriptionUnscripted, synthetic telephonic conversations...

  8. R

    German Traffic Sign Recognition Dataset

    • universe.roboflow.com
    zip
    Updated Aug 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TrafficSignRecognition (2023). German Traffic Sign Recognition Dataset [Dataset]. https://universe.roboflow.com/trafficsignrecognition/german-traffic-sign-recognition
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    TrafficSignRecognition
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Traffic Signs Bounding Boxes
    Description

    German Traffic Sign Recognition

    ## Overview
    
    German Traffic Sign Recognition is a dataset for object detection tasks - it contains Traffic Signs annotations for 1,102 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  9. h

    german-speech-recognition-dataset

    • huggingface.co
    Updated Mar 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). german-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/german-speech-recognition-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    German Speech Dataset for recognition task

    Dataset comprises 431 hours of telephone dialogues in German, collected from 590+ native speakers across various topics and domains, achieving an impressive 95% sentence accuracy rate. It is designed for research in automatic speech recognition (ASR) systems. By utilizing this dataset, researchers and developers can advance their understanding and capabilities in transcribing audio, and natural language processing (NLP). - Get the data… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/german-speech-recognition-dataset.

  10. r

    WMT 2014 English-to-German Dataset

    • resodate.org
    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karim Ahmed; Nitish Shirish Keskar; Richard Socher (2024). WMT 2014 English-to-German Dataset [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvd210LTIwMTQtZW5nbGlzaC10by1nZXJtYW4tZGF0YXNldA==
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Karim Ahmed; Nitish Shirish Keskar; Richard Socher
    Description

    The WMT 2014 English-to-German dataset consists of 4.5 million sentence pairs used for neural machine translation.

  11. Traffic German Dataset

    • universe.roboflow.com
    zip
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Object detection (2024). Traffic German Dataset [Dataset]. https://universe.roboflow.com/object-detection-7sfqy/traffic-german
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 29, 2024
    Dataset authored and provided by
    Object detection
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Football Player Detection Bounding Boxes
    Description

    Traffic German

    ## Overview
    
    Traffic German is a dataset for object detection tasks - it contains Football Player Detection annotations for 6,523 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  12. s

    UK German Dataset

    • shaip.com
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2025). UK German Dataset [Dataset]. https://www.shaip.com/offerings/speech-data-catalog/uk-german-dataset/
    Explore at:
    Dataset updated
    Sep 26, 2025
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    Home UK German DatasetHigh-Quality UK German Call Center, and Utterance Dataset for AI & Speech Models Contact Us OverviewTitle (Language)UK German Language DatasetDataset TypesCall Center, UtteranceCountryUnited KingdomDescriptionThis dataset includes unscripted…

  13. R

    German Railway Signaling Dataset

    • universe.roboflow.com
    zip
    Updated Jan 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruno Guiomar (2024). German Railway Signaling Dataset [Dataset]. https://universe.roboflow.com/bruno-guiomar/german-railway-signaling-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 12, 2024
    Dataset authored and provided by
    Bruno Guiomar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Germany
    Variables measured
    Signals Bounding Boxes
    Description

    German Railway Signaling Dataset

    ## Overview
    
    German Railway Signaling Dataset is a dataset for object detection tasks - it contains Signals annotations for 1,605 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  14. H

    Replication Data for: German Credit

    • dataverse.harvard.edu
    Updated Apr 6, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Bartley (2016). Replication Data for: German Credit [Dataset]. http://doi.org/10.7910/DVN/Q8MAW8
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Christopher Bartley
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Original data from: https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data), using the file "german.data-numeric" version produced by Strathclyde University. Changes made: - changed the ordering of Attribute 3 (Credit History) to try to extract monotone relationship: ORIGINAL ORDERING: A30 : no credits taken/all credits paid back duly, A31 : all credits at this bank paid back duly, A32 : existing credits paid back duly till now, A33 : delay in paying off in the past, A34 : critical account/other credits existing (not at this bank)) NEW ORDERING: 0=all credits paid back (A31) 1=all credits paid back duly til now (A32) 2= no credits taken (A30) 3= delay in past (A33) 4=critical acct (A34). ATTRIBUTES: 0 CLASS Credit Rating: +1 is bad / -1 is good 1 BalanceCheque 2 Loan NurnMonth 3 CreditHistory 4 CreditAmt 5 SavingsBalance 6 Mths in PresentEmployment 7 PersonStatusSex 8 PresentResidenceSince 9 Property 10 AgeInYears 11 OtherInstallmentPlans (highest val is NO other installment plans) 12 NumExistingCreditsThisBank 13 NumPplLiablMaint 14 Telephone 15 ForeignWorker 16 Purpose-CarNew 17 Purpose-CarOld 18 otherdebtor-none (compared to guarantor) 19 otherdebt-coappl (compared to guarantor) 20 house-rent (compared to 'for free') 21 house-owns (compared to 'for free') 22 job-unemployed (vs mgt) 23 jobs-unskilled (vs mgt) 24 job-skilled (vs mgt)

  15. R

    German Dataset

    • universe.roboflow.com
    zip
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    faculty of engineering minia university (2023). German Dataset [Dataset]. https://universe.roboflow.com/faculty-of-engineering-minia-university/german-7b6mo/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset authored and provided by
    faculty of engineering minia university
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Sign Bounding Boxes
    Description

    German

    ## Overview
    
    German is a dataset for object detection tasks - it contains Sign annotations for 898 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  16. g

    Ten Thousand German News Articles Dataset

    • tblock.github.io
    • kaggle.com
    csv
    Updated Mar 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T. Block (2019). Ten Thousand German News Articles Dataset [Dataset]. https://tblock.github.io/10kGNAD/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 5, 2019
    Authors
    T. Block
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    10kGNAD - A german topic classification dataset. Visit the dataset page for more information: https://tblock.github.io/10kGNAD/

  17. F

    German Open Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). German Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/german-open-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    The German Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the German language, advancing the field of artificial intelligence.

    Dataset Content:

    This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in German. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native German people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity:

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.

    Answer Formats:

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled German Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.

    Quality and Accuracy:

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    Both the question and answers in German are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy German Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.

  18. s

    Wake Word German Dataset

    • hmn.shaip.com
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). Wake Word German Dataset [Dataset]. https://hmn.shaip.com/offerings/speech-data-catalog/wake-word-german-dataset/
    Explore at:
    Dataset updated
    Aug 9, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Tsev German Wake Word DatasetHigh-Quality German Wake Word Dataset rau AI & Cov Qauv Hais Lus Hu rau Peb Txheej TxheemTitle (Language) German Language DatasetDataset TypesWake WordCountryGermanyDescriptionWake Words / Voice Command / Trigger Word /…

  19. Ratio of national debt to GDP in Germany 1991-2030

    • statista.com
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron O'Neill (2025). Ratio of national debt to GDP in Germany 1991-2030 [Dataset]. https://www.statista.com/topics/13131/german-election-2025/
    Explore at:
    Dataset updated
    Feb 24, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Aaron O'Neill
    Area covered
    Germany
    Description

    The ratio of national debt to gross domestic product (GDP) of Germany was about 63.89 percent in 2024. Between 1991 and 2024, the ratio rose by approximately 24.40 percentage points, though the increase followed an uneven trajectory rather than a consistent upward trend. The ratio will steadily rise by around 10.96 percentage points over the period from 2024 to 2030, reflecting a clear upward trend.The general government gross debt consists of all liabilities that require payment or payments of interest and/or principal by the debtor to the creditor at a date or dates in the future. Here it is depicted in relation to the country's GDP, which refers to the total value of goods and services produced during a year.

  20. h

    german-commons

    • huggingface.co
    Updated Aug 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CORAL NLP Research (2018). german-commons [Dataset]. https://huggingface.co/datasets/coral-nlp/german-commons
    Explore at:
    Dataset updated
    Aug 23, 2018
    Dataset authored and provided by
    CORAL NLP Research
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

    A comprehensive collection of German-language text data under open licenses for training German language models.

    Datasheet: DATASHEET.md. Paper: arxiv.org/abs/2510.13996 Code: github.com/coral-nlp/llmdata Bloom Filter (DOLMA-compatible): bloom_filter.bin

      Dataset Description
    

    This dataset is aggregated from 41 diverse sources and contains 154.56 billion tokensof German text data with… See the full description on the dataset page: https://huggingface.co/datasets/coral-nlp/german-commons.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mattia (2023). german [Dataset]. https://huggingface.co/datasets/mstz/german

german

German

mstz/german

Explore at:
Dataset updated
Apr 13, 2023
Authors
Mattia
License

https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

Description

German

The German dataset from the UCI ML repository. Dataset on loan grants to customers.

  Configurations and tasks

Configuration Task Description

encoding

Encoding dictionary showing original values of encoded features.

loan Binary classification Has the loan request been accepted?

  Usage

from datasets import load_dataset

dataset = load_dataset("mstz/german", "loan")["train"]

  Features

Feature Type… See the full description on the dataset page: https://huggingface.co/datasets/mstz/german.

Search
Clear search
Close search
Google apps
Main menu