100+ datasets found
  1. h

    german

    • huggingface.co
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mattia (2023). german [Dataset]. https://huggingface.co/datasets/mstz/german
    Explore at:
    Dataset updated
    Apr 13, 2023
    Authors
    Mattia
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    German

    The German dataset from the UCI ML repository. Dataset on loan grants to customers.

      Configurations and tasks
    

    Configuration Task Description

    encoding

    Encoding dictionary showing original values of encoded features.

    loan Binary classification Has the loan request been accepted?

      Usage
    

    from datasets import load_dataset

    dataset = load_dataset("mstz/german", "loan")["train"]

      Features
    

    Feature Type… See the full description on the dataset page: https://huggingface.co/datasets/mstz/german.

  2. Processed German Credit Data

    • zenodo.org
    zip
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Sanchez Martin; Pablo Sanchez Martin (2024). Processed German Credit Data [Dataset]. http://doi.org/10.5281/zenodo.10785677
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pablo Sanchez Martin; Pablo Sanchez Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The zip file contains two folders related to the German Credit Data (https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data). The `german_credit` folder contains the CSV file of the dataset. The `german_data` contains 5 different folds of the dataset.

  3. h

    german-ler

    • huggingface.co
    • opendatalab.com
    Updated Nov 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Leitner (2024). german-ler [Dataset]. http://doi.org/10.57967/hf/0046
    Explore at:
    Dataset updated
    Nov 2, 2024
    Authors
    Elena Leitner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for "German LER"

      Dataset Summary
    

    A dataset of Legal Documents from German federal court decisions for Named Entity Recognition. The dataset is human-annotated with 19 fine-grained entity classes. The dataset consists of approx. 67,000 sentences and contains 54,000 annotated entities. NER tags use the BIO tagging scheme. The dataset includes two different versions of annotations, one with a set of 19 fine-grained semantic classes (ner_tags) and another one… See the full description on the dataset page: https://huggingface.co/datasets/elenanereiss/german-ler.

  4. R

    German Dataset

    • universe.roboflow.com
    zip
    Updated Dec 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bbbbbbbj (2024). German Dataset [Dataset]. https://universe.roboflow.com/bbbbbbbj/german-blymn
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 25, 2024
    Dataset authored and provided by
    bbbbbbbj
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Traffic Bounding Boxes
    Description

    German

    ## Overview
    
    German is a dataset for object detection tasks - it contains Traffic annotations for 2,593 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  5. s

    German Dataset

    • hmn.shaip.com
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). German Dataset [Dataset]. https://hmn.shaip.com/offerings/speech-data-catalog/german-dataset/
    Explore at:
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Tsev German DatasetDeutscher DatensatzHigh-Quality German Hu-Center, thiab IVR Dataset rau AI & Speech Models Hu rau peb OverviewTitle (Language) German Language DatasetDataset TypesCall Center, General Conversation, Music, Scripted MonologueCountryGermanyDescriptionTitle (Language) German Language DatasetDataset TypesCall Center, General Conversation, Music, Scripted MonologueCountryGermanyDescriptionTitle Unscripted, kev sib tham...

  6. s

    Wake Word German Dataset

    • shaip.com
    Updated Oct 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2023). Wake Word German Dataset [Dataset]. https://www.shaip.com/offerings/speech-data-catalog/wake-word-german-dataset/
    Explore at:
    Dataset updated
    Oct 17, 2023
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Home German Wake Word DatasetHigh-Quality German Wake Word Dataset for AI & Speech Models Contact Us OverviewTitle (Language)German Language DatasetDataset TypesWake WordCountryGermanyDescriptionWake Words / Voice Command / Trigger Word /…

  7. Ten Thousand German News Articles Dataset

    • kaggle.com
    • tblock.github.io
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Block (2022). Ten Thousand German News Articles Dataset [Dataset]. https://www.kaggle.com/tblock/10kgnad
    Explore at:
    zip(21144764 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Timo Block
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    (see https://tblock.github.io/10kGNAD/ for the original dataset page)

    This page introduces the 10k German News Articles Dataset (10kGNAD) german topic classification dataset. The 10kGNAD is based on the One Million Posts Corpus and avalaible under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can download the dataset here.

    Why a German dataset?

    English text classification datasets are common. Examples are the big AG News, the class-rich 20 Newsgroups and the large-scale DBpedia ontology datasets for topic classification and for example the commonly used IMDb and Yelp datasets for sentiment analysis. Non-english datasets, especially German datasets, are less common. There is a collection of sentiment analysis datasets assembled by the Interest Group on German Sentiment Analysis. However, to my knowlege, no german topic classification dataset is avaliable to the public.

    Due to grammatical differences between the English and the German language, a classifyer might be effective on a English dataset, but not as effectiv on a German dataset. The German language has a higher inflection and long compound words are quite common compared to the English language. One would need to evaluate a classifyer on multiple German datasets to get a sense of it's effectivness.

    The dataset

    The 10kGNAD dataset is intended to solve part of this problem as the first german topic classification dataset. It consists of 10273 german language news articles from an austrian online newspaper categorized into nine topics. These articles are a till now unused part of the One Million Posts Corpus.

    In the One Million Posts Corpus each article has a topic path. For example Newsroom/Wirtschaft/Wirtschaftpolitik/Finanzmaerkte/Griechenlandkrise. The 10kGNAD uses the second part of the topic path, here Wirtschaft, as class label. In result the dataset can be used for multi-class classification.

    I created and used this dataset in my thesis to train and evaluate four text classifyers on the German language. By publishing the dataset I hope to support the advancement of tools and models for the German language. Additionally this dataset can be used as a benchmark dataset for german topic classification.

    Numbers and statistics

    As in most real-world datasets the class distribution of the 10kGNAD is not balanced. The biggest class Web consists of 1678, while the smalles class Kultur contains only 539 articles. However articles from the Web class have on average the fewest words, while artilces from the culture class have the second most words.

    Splitting into train and test

    I propose a stratifyed split of 10% for testing and the remaining articles for training. To use the dataset as a benchmark dataset, please used the train.csv and test.csv files located in the project root.

    Code

    Python scripts to extract the articles and split them into a train- and a testset avaliable in the code directory of this project. Make sure to install the requirements. The original corpus.sqlite3 is required to extract the articles (download here (compressed) or here (uncompressed)).

    License

    Creative Commons License

    This dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please consider citing the authors of the One Million Post Corpus if you use the dataset.

  8. F

    German Open Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). German Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/german-open-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    The German Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the German language, advancing the field of artificial intelligence.

    Dataset Content:

    This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in German. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native German people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity:

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.

    Answer Formats:

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled German Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.

    Quality and Accuracy:

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    Both the question and answers in German are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy German Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.

  9. Median age of the population in Germany 1950-2100

    • statista.com
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron O'Neill (2025). Median age of the population in Germany 1950-2100 [Dataset]. https://www.statista.com/topics/1903/germany/
    Explore at:
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Aaron O'Neill
    Area covered
    Germany
    Description

    The median age of Germans in 2025 was 45.5 years, meaning that half the German population was younger, half older. Following some fluctuation during the post-WWII baby boom waves, Germany's average age has been on an upwards trajectory since the 1970s, with a sharp rise in the 1990s and 2000s, although it has slowed in recent years. It is projected to peak at over 48 years in the 2040s, before plateauing around the 47 year mark for the remainder of the century. Aging in Germany This shift in the age makeup of Germany is driven by having fewer young people and more old people. While it has increased slightly in the last decade, the German fertility rate remains low. Fewer young people lead to a higher median age, as does rising life expectancy. These trends have significant economic and societal impacts, where workforces shrink and the elderly population places greater demand on healthcare systems and public finances, while families must increasingly care for elderly relatives. Regional and global trends The entire European Union, due to higher levels of development, shows an upward shift in its age distribution. While this shift is occurring globally, the level of Germany’s median age is particularly high. In many other parts of the world, particularly Subsaharan Africa, the proportion of young and old inhabitants is skewed sharply toward the young, pulling the median age lower.

  10. F

    German Human-Human Chat Dataset for Conversational AI & NLP

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). German Human-Human Chat Dataset for Conversational AI & NLP [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/german-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The German General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world German usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level German conversations covering a broad spectrum of everyday topics.

    Conversational Text Data

    This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native German speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.

    •
    Words per Chat: 300–700
    •
    Turns per Chat: Up to 50 dialogue turns
    •
    Contributors: 200 native German speakers from the FutureBeeAI Crowd Community
    •
    Format: TXT, DOCS, JSON or CSV (customizable)
    •
    Structure: Each record contains the full chat, topic tag, and metadata block

    Diversity and Domain Coverage

    Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:

    •Music, books, and movies
    •Health and wellness
    •Children and parenting
    •Family life and relationships
    •Food and cooking
    •Education and studying
    •Festivals and traditions
    •Environment and daily life
    •Internet and tech usage
    •Childhood memories and casual chatting

    This diversity ensures the dataset is useful across multiple NLP and language understanding applications.

    Linguistic Authenticity

    Chats reflect informal, native-level German usage with:

    •Colloquial expressions and local dialect influence
    •Domain-relevant terminology
    •Language-specific grammar, phrasing, and sentence flow
    •Inclusion of realistic details such as names, phone numbers, email addresses, locations, dates, times, local currencies, and culturally grounded references
    •Representation of different writing styles and input quirks to ensure training data realism

    Metadata

    Every chat instance is accompanied by structured metadata, which includes:

    •Participant Age
    •Gender
    •Country/Region
    •Chat Domain
    •Chat Topic
    •Dialect

    This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.

    Data Quality Assurance

    All chat records pass through a rigorous QA process to maintain consistency and accuracy:

    •Manual review for content completeness
    •Format checks for chat turns and metadata
    •Linguistic verification by native speakers
    •Removal of inappropriate or unusable samples

    This ensures a clean, reliable dataset ready for high-performance AI model training.

    Applications

    This dataset is ideal for training and evaluating a wide range of text-based AI systems:

    •Conversational AI / Chatbots
    •Smart assistants and voicebots
    <div

  11. germanquad

    • huggingface.co
    • opendatalab.com
    Updated Jun 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    deepset (2021). germanquad [Dataset]. https://huggingface.co/datasets/deepset/germanquad
    Explore at:
    Dataset updated
    Jun 16, 2021
    Dataset authored and provided by
    deepsethttps://www.deepset.ai/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In order to raise the bar for non-English QA, we are releasing a high-quality, human-labeled German QA dataset consisting of 13 722 questions, incl. a three-way annotated test set. The creation of GermanQuAD is inspired by insights from existing datasets as well as our labeling experience from several industry projects. We combine the strengths of SQuAD, such as high out-of-domain performance, with self-sufficient questions that contain all relevant information for open-domain QA as in the NaturalQuestions dataset. Our training and test datasets do not overlap like other popular datasets and include complex questions that cannot be answered with a single entity or only a few words.

  12. s

    German Dataset

    • ny.shaip.com
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2025). German Dataset [Dataset]. https://ny.shaip.com/offerings/speech-data-catalog/german-dataset/
    Explore at:
    Dataset updated
    Sep 20, 2025
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Home Dataset GermanDeutscher DatensatzHigh-Quality German Call-Center, ndi IVR Dataset for AI & Speech Models Lumikizanani Nafe OverviewTitle (Language) German Language DatasetDataset TypesCall Center, General Conversation, Music, Scripted MonologueCountryGermanyDescriptionUnscripted, synthetic telephonic conversations...

  13. u

    GLips - German Lipreading Dataset

    • fdr.uni-hamburg.de
    zip
    Updated Mar 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schwiebert, Gerald; Weber, Cornelius; Qu, Leyuan; Siqueira, Henrique; Wermter, Stefan; Schwiebert, Gerald; Weber, Cornelius; Qu, Leyuan; Siqueira, Henrique; Wermter, Stefan (2022). GLips - German Lipreading Dataset [Dataset]. http://doi.org/10.25592/uhhfdm.10048
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 1, 2022
    Dataset provided by
    University of Hamburg
    Authors
    Schwiebert, Gerald; Weber, Cornelius; Qu, Leyuan; Siqueira, Henrique; Wermter, Stefan; Schwiebert, Gerald; Weber, Cornelius; Qu, Leyuan; Siqueira, Henrique; Wermter, Stefan
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The German Lipreading dataset consists of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament, which was processed for word-level lip reading using an automatic pipeline. The format is similar to that of the English language Lip Reading in the Wild (LRW) dataset, with each H264-compressed MPEG-4 video encoding one word of interest in a context of 1.16 seconds duration, which yields compatibility for studying transfer learning between both datasets. Choosing video material based on naturally spoken language in a natural environment ensures more robust results for real-world applications than artificially generated datasets with as little noise as possible. The 500 different spoken words ranging between 4-18 characters in length each have 500 instances and separate MPEG-4 audio- and text metadata-files, originating from 1018 parliamentary sessions. Additionally, the complete TextGrid files containing the segmentation information of those sessions are also included. The size of the uncompressed dataset is 16GB.

  14. German Weimar Republic Data, 1919-1933

    • icpsr.umich.edu
    ascii, sas, spss
    Updated Dec 22, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inter-university Consortium for Political and Social Research (2005). German Weimar Republic Data, 1919-1933 [Dataset]. http://doi.org/10.3886/ICPSR00042.v1
    Explore at:
    spss, ascii, sasAvailable download formats
    Dataset updated
    Dec 22, 2005
    Dataset authored and provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/42/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/42/terms

    Time period covered
    1919 - 1933
    Area covered
    Germany
    Description

    This data collection contains electoral and demographic data at several levels of aggregation (kreis, land/regierungsberzirk, and wahlkreis) for Germany in the Weimar Republic period of 1919-1933. Two datasets are available. Part 1, 1919 Data, presents raw and percentagized election returns at the wahlkreis level for the 1919 election to the Nationalversammlung. Information is provided on the number and percentage of eligible voters and the total votes cast for parties such as the German National People's Party, German People's Party, Christian People's Party, German Democratic Party, Social Democratic Party, and Independent Social Democratic Party. Part 2, 1920-1933 Data, consists of returns for elections to the Reichstag, 1920-1933, and for the Reichsprasident elections of 1925 and 1932 (including runoff elections in each year), returns for two national referenda, held in 1926 and 1929, and data pertaining to urban population, religion, and occupations, taken from the German Census of 1925. This second dataset contains data at several levels of aggregation and is a merged file. Crosstemporal discrepancies, such as changes in the names of the geographical units and the disappearance of units, have been adjusted for whenever possible. Variables in this file provide information for the total number and percentage of eligible voters and votes cast for parties, including the German Nationalist People's Party, German People's Party, German Center Party, German Democratic Party, German Social Democratic Party, German Communist Party, Bavarian People's Party, Nationalist-Socialist German Workers' Party (Hitler's movement), German Middle Class Party, German Business and Labor Party, Conservative People's Party, and other parties. Data are also provided for the total number and percentage of votes cast in the Reichsprasident elections of 1925 and 1932 for candidates Jarres, Held, Ludendorff, Braun, Marx, Hellpach, Thalman, Hitler, Duesterburg, Von Hindenburg, Winter, and others. Additional variables provide information on occupations in the country, including the number of wage earners employed in agriculture, industry and manufacturing, trade and transportation, civil service, army and navy, clergy, public health, welfare, domestic and personal services, and unknown occupations. Other census data cover the total number of wage earners in the labor force and the number of female wage earners employed in all occupations. Also provided is the percentage of the total population living in towns with 5,000 inhabitants or more, and the number and percentage of the population who were Protestants, Catholics, and Jews.

  15. Share of economic sectors in gross domestic product in Germany 2024

    • statista.com
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron O'Neill (2025). Share of economic sectors in gross domestic product in Germany 2024 [Dataset]. https://www.statista.com/topics/1903/germany/
    Explore at:
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Aaron O'Neill
    Area covered
    Germany
    Description

    In 2024, the services sector's share in Germany's gross domestic product amounted edged over 70 percent, while the secondary and primary sectors generated less than a third of GDP together. At your service The tertiary, or services, sector encompasses all kinds of intangible goods, like consulting and advice, transport, or attention. If a country generates its GDP mostly via services, this is often through industries like housing, tourism (including accommodation and hospitality), financial services, or telecommunications. Germany is a popular tourist destination and an important financial hub. Germany is not a ā€œservice desertā€ The services sector in Germany not only generates most of the country’s GDP, it also employs the vast majority of the workforce with over 70 percent. Lately, business confidence in the German services sector has increased significantly, which suggests a stable economy and ideally an increase in production and output in the future. This projection is supported by rising GDP and a stable inflation rate at around two percent.

  16. R

    German Traffic Sign Recognition Dataset

    • universe.roboflow.com
    zip
    Updated Aug 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TrafficSignRecognition (2023). German Traffic Sign Recognition Dataset [Dataset]. https://universe.roboflow.com/trafficsignrecognition/german-traffic-sign-recognition
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    TrafficSignRecognition
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Traffic Signs Bounding Boxes
    Description

    German Traffic Sign Recognition

    ## Overview
    
    German Traffic Sign Recognition is a dataset for object detection tasks - it contains Traffic Signs annotations for 1,102 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  17. F

    German Product Image OCR Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). German Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/german-product-image-ocr-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introducing the German Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the German language.

    Dataset Contain & Diversity

    Containing a total of 2000 images, this German OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible German text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.

    All these images were captured by native German people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata

    Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of German text recognition models.

    Update & Custom Collection

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native German crowd community.

    If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.

    License

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the German language. Your journey to enhanced language understanding and processing starts here.

  18. s

    UK German Dataset

    • shaip.com
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2025). UK German Dataset [Dataset]. https://www.shaip.com/offerings/speech-data-catalog/uk-german-dataset/
    Explore at:
    Dataset updated
    Sep 26, 2025
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    Home UK German DatasetHigh-Quality UK German Call Center, and Utterance Dataset for AI & Speech Models Contact Us OverviewTitle (Language)UK German Language DatasetDataset TypesCall Center, UtteranceCountryUnited KingdomDescriptionThis dataset includes unscripted…

  19. h

    german-speech-recognition-dataset

    • huggingface.co
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). german-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/german-speech-recognition-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    German Speech Dataset for recognition task

    Dataset comprises 431 hours of telephone dialogues in German, collected from 590+ native speakers across various topics and domains, achieving an impressive 95% sentence accuracy rate. It is designed for research in automatic speech recognition (ASR) systems. By utilizing this dataset, researchers and developers can advance their understanding and capabilities in transcribing audio, and natural language processing (NLP). - Get the data… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/german-speech-recognition-dataset.

  20. German two-year treasury note yield 2014-2024, by month

    • statista.com
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). German two-year treasury note yield 2014-2024, by month [Dataset]. https://www.statista.com/statistics/1203409/two-year-treasury-note-yield-germany/
    Explore at:
    Dataset updated
    Jan 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2014 - Jun 2024
    Area covered
    Germany
    Description

    The yield on German two-year treasure notes was equal to 2.09 percent as of the end of December 2024. For short term debt traded on the capital market, the German federal government issues a two-year treasury note called a 'Schatz' in German. This is then followed by five-year treasure notes called 'Bobl', then federal bonds with a maturity of between 10 and 30 years ('Bund' in German).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mattia (2023). german [Dataset]. https://huggingface.co/datasets/mstz/german

german

German

mstz/german

Explore at:
Dataset updated
Apr 13, 2023
Authors
Mattia
License

https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

Description

German

The German dataset from the UCI ML repository. Dataset on loan grants to customers.

  Configurations and tasks

Configuration Task Description

encoding

Encoding dictionary showing original values of encoded features.

loan Binary classification Has the loan request been accepted?

  Usage

from datasets import load_dataset

dataset = load_dataset("mstz/german", "loan")["train"]

  Features

Feature Type… See the full description on the dataset page: https://huggingface.co/datasets/mstz/german.

Search
Clear search
Close search
Google apps
Main menu