100+ datasets found
  1. germanquad

    • huggingface.co
    • opendatalab.com
    Updated Jun 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    deepset (2021). germanquad [Dataset]. https://huggingface.co/datasets/deepset/germanquad
    Explore at:
    Dataset updated
    Jun 16, 2021
    Dataset authored and provided by
    deepsethttps://www.deepset.ai/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In order to raise the bar for non-English QA, we are releasing a high-quality, human-labeled German QA dataset consisting of 13 722 questions, incl. a three-way annotated test set. The creation of GermanQuAD is inspired by insights from existing datasets as well as our labeling experience from several industry projects. We combine the strengths of SQuAD, such as high out-of-domain performance, with self-sufficient questions that contain all relevant information for open-domain QA as in the NaturalQuestions dataset. Our training and test datasets do not overlap like other popular datasets and include complex questions that cannot be answered with a single entity or only a few words.

  2. Ten Thousand German News Articles Dataset

    • kaggle.com
    • tblock.github.io
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Block (2022). Ten Thousand German News Articles Dataset [Dataset]. https://www.kaggle.com/tblock/10kgnad
    Explore at:
    zip(21144764 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Timo Block
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    (see https://tblock.github.io/10kGNAD/ for the original dataset page)

    This page introduces the 10k German News Articles Dataset (10kGNAD) german topic classification dataset. The 10kGNAD is based on the One Million Posts Corpus and avalaible under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can download the dataset here.

    Why a German dataset?

    English text classification datasets are common. Examples are the big AG News, the class-rich 20 Newsgroups and the large-scale DBpedia ontology datasets for topic classification and for example the commonly used IMDb and Yelp datasets for sentiment analysis. Non-english datasets, especially German datasets, are less common. There is a collection of sentiment analysis datasets assembled by the Interest Group on German Sentiment Analysis. However, to my knowlege, no german topic classification dataset is avaliable to the public.

    Due to grammatical differences between the English and the German language, a classifyer might be effective on a English dataset, but not as effectiv on a German dataset. The German language has a higher inflection and long compound words are quite common compared to the English language. One would need to evaluate a classifyer on multiple German datasets to get a sense of it's effectivness.

    The dataset

    The 10kGNAD dataset is intended to solve part of this problem as the first german topic classification dataset. It consists of 10273 german language news articles from an austrian online newspaper categorized into nine topics. These articles are a till now unused part of the One Million Posts Corpus.

    In the One Million Posts Corpus each article has a topic path. For example Newsroom/Wirtschaft/Wirtschaftpolitik/Finanzmaerkte/Griechenlandkrise. The 10kGNAD uses the second part of the topic path, here Wirtschaft, as class label. In result the dataset can be used for multi-class classification.

    I created and used this dataset in my thesis to train and evaluate four text classifyers on the German language. By publishing the dataset I hope to support the advancement of tools and models for the German language. Additionally this dataset can be used as a benchmark dataset for german topic classification.

    Numbers and statistics

    As in most real-world datasets the class distribution of the 10kGNAD is not balanced. The biggest class Web consists of 1678, while the smalles class Kultur contains only 539 articles. However articles from the Web class have on average the fewest words, while artilces from the culture class have the second most words.

    Splitting into train and test

    I propose a stratifyed split of 10% for testing and the remaining articles for training. To use the dataset as a benchmark dataset, please used the train.csv and test.csv files located in the project root.

    Code

    Python scripts to extract the articles and split them into a train- and a testset avaliable in the code directory of this project. Make sure to install the requirements. The original corpus.sqlite3 is required to extract the articles (download here (compressed) or here (uncompressed)).

    License

    Creative Commons License

    This dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please consider citing the authors of the One Million Post Corpus if you use the dataset.

  3. g

    GERDA -- German Election Database

    • german-elections.com
    Updated Jan 2, 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanno Hilbig (2006). GERDA -- German Election Database [Dataset]. http://www.german-elections.com/
    Explore at:
    Dataset updated
    Jan 2, 2006
    Authors
    Hanno Hilbig
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Germany
    Description

    Comprehensive dataset of local, state, and federal election results in Germany, facilitating research on electoral behavior, representation, and political responsiveness. Umfassende Datenbank von: Bundestagswahlergebnissen, Landeswahlergebnissen und Kommunalwahlergebnissen in Deutschland, die die Forschung zu Wahlverhalten, politischer Repräsentation und politischer Reaktionsfähigkeit ermöglicht.

  4. s

    German Dataset

    • hmn.shaip.com
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). German Dataset [Dataset]. https://hmn.shaip.com/offerings/speech-data-catalog/german-dataset/
    Explore at:
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Tsev German DatasetDeutscher DatensatzHigh-Quality German Hu-Center, thiab IVR Dataset rau AI & Speech Models Hu rau peb Hu-Center Data IVR Cov Ntaub Ntawv Hu-Center Cov Ntaub Ntawv .elementor-58669 .elementor-element.elementor-element-91938a9{20px:0px 50px;}.elementor-0 .elementor-element.elementor-element-58669f99d{padding:171px 0px 0px…

  5. R

    German Traffic Sign Recognition Dataset

    • universe.roboflow.com
    zip
    Updated Aug 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TrafficSignRecognition (2023). German Traffic Sign Recognition Dataset [Dataset]. https://universe.roboflow.com/trafficsignrecognition/german-traffic-sign-recognition
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    TrafficSignRecognition
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Traffic Signs Bounding Boxes
    Description

    German Traffic Sign Recognition

    ## Overview
    
    German Traffic Sign Recognition is a dataset for object detection tasks - it contains Traffic Signs annotations for 1,102 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  6. u

    GLips - German Lipreading Dataset

    • fdr.uni-hamburg.de
    zip
    Updated Mar 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schwiebert, Gerald; Weber, Cornelius; Qu, Leyuan; Siqueira, Henrique; Wermter, Stefan; Schwiebert, Gerald; Weber, Cornelius; Qu, Leyuan; Siqueira, Henrique; Wermter, Stefan (2022). GLips - German Lipreading Dataset [Dataset]. http://doi.org/10.25592/uhhfdm.10048
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 1, 2022
    Dataset provided by
    University of Hamburg
    Authors
    Schwiebert, Gerald; Weber, Cornelius; Qu, Leyuan; Siqueira, Henrique; Wermter, Stefan; Schwiebert, Gerald; Weber, Cornelius; Qu, Leyuan; Siqueira, Henrique; Wermter, Stefan
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The German Lipreading dataset consists of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament, which was processed for word-level lip reading using an automatic pipeline. The format is similar to that of the English language Lip Reading in the Wild (LRW) dataset, with each H264-compressed MPEG-4 video encoding one word of interest in a context of 1.16 seconds duration, which yields compatibility for studying transfer learning between both datasets. Choosing video material based on naturally spoken language in a natural environment ensures more robust results for real-world applications than artificially generated datasets with as little noise as possible. The 500 different spoken words ranging between 4-18 characters in length each have 500 instances and separate MPEG-4 audio- and text metadata-files, originating from 1018 parliamentary sessions. Additionally, the complete TextGrid files containing the segmentation information of those sessions are also included. The size of the uncompressed dataset is 16GB.

  7. h

    German-PD-Newspapers

    • huggingface.co
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Majstorovic (2024). German-PD-Newspapers [Dataset]. https://huggingface.co/datasets/storytracer/German-PD-Newspapers
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 5, 2024
    Authors
    Sebastian Majstorovic
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for Public Domain Newspapers (German)

    This dataset contains 13 billion words of OCR text extracted from German historical newspapers.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Curated by: Sebastian Majstorovic Language(s) (NLP): German License: Dataset: CC0, Texts: Public Domain

      Dataset Sources [optional]
    

    Repository: https://www.deutsche-digitale-bibliothek.de/newspaper

      Copyright & License
    

    The newspapers texts have been… See the full description on the dataset page: https://huggingface.co/datasets/storytracer/German-PD-Newspapers.

  8. s

    German Dataset

    • ig.shaip.com
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2023). German Dataset [Dataset]. https://ig.shaip.com/offerings/speech-data-catalog/german-dataset/
    Explore at:
    Dataset updated
    Jun 11, 2023
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Ụlọ German DatasetDeutscher DatensatzHigh-Quality German Call-Center, na IVR Dataset maka AI & Ụdị Okwu Kpọtụrụ Anyị Oku-Center Data IVR Data Call-Center Data .elementor-58669 .elementor-element.elementor-element-91938a9{padding:20px 0px 50px;}.elementor-0 .elementor-element.elementor-element-58669f99d{padding:171px 0px 0px…

  9. F

    German Human-Human Chat Dataset for Conversational AI & NLP

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). German Human-Human Chat Dataset for Conversational AI & NLP [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/german-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The German General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world German usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level German conversations covering a broad spectrum of everyday topics.

    Conversational Text Data

    This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native German speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.

    Words per Chat: 300–700
    Turns per Chat: Up to 50 dialogue turns
    Contributors: 200 native German speakers from the FutureBeeAI Crowd Community
    Format: TXT, DOCS, JSON or CSV (customizable)
    Structure: Each record contains the full chat, topic tag, and metadata block

    Diversity and Domain Coverage

    Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:

    Music, books, and movies
    Health and wellness
    Children and parenting
    Family life and relationships
    Food and cooking
    Education and studying
    Festivals and traditions
    Environment and daily life
    Internet and tech usage
    Childhood memories and casual chatting

    This diversity ensures the dataset is useful across multiple NLP and language understanding applications.

    Linguistic Authenticity

    Chats reflect informal, native-level German usage with:

    Colloquial expressions and local dialect influence
    Domain-relevant terminology
    Language-specific grammar, phrasing, and sentence flow
    Inclusion of realistic details such as names, phone numbers, email addresses, locations, dates, times, local currencies, and culturally grounded references
    Representation of different writing styles and input quirks to ensure training data realism

    Metadata

    Every chat instance is accompanied by structured metadata, which includes:

    Participant Age
    Gender
    Country/Region
    Chat Domain
    Chat Topic
    Dialect

    This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.

    Data Quality Assurance

    All chat records pass through a rigorous QA process to maintain consistency and accuracy:

    Manual review for content completeness
    Format checks for chat turns and metadata
    Linguistic verification by native speakers
    Removal of inappropriate or unusable samples

    This ensures a clean, reliable dataset ready for high-performance AI model training.

    Applications

    This dataset is ideal for training and evaluating a wide range of text-based AI systems:

    Conversational AI / Chatbots
    Smart assistants and voicebots
    <div

  10. s

    Wake Word German Dataset

    • hmn.shaip.com
    Updated Aug 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). Wake Word German Dataset [Dataset]. https://hmn.shaip.com/offerings/speech-data-catalog/wake-word-german-dataset/
    Explore at:
    Dataset updated
    Aug 9, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Tsev German DatasetHigh-Quality German Wake Word Dataset rau AI & Speech Models Hu rau Peb Txheej TxheemTitleGerman Language DatasetDataset HomWake WordDescriptionWake Words / Voice Command / Trigger Word / Keyphrase sau ntawm…

  11. s

    German Dataset

    • sn.shaip.com
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). German Dataset [Dataset]. https://sn.shaip.com/offerings/speech-data-catalog/german-dataset/
    Explore at:
    Dataset updated
    Sep 11, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Kumba German DatasetDeutscher DatensatzHigh-Quality German Call-Center, uye IVR Dataset yeAI & Speech Models Bata Isu Call-Center Data IVR Data Call-Center Data .elementor-element.elementor-element-58669a91938{padding:9px 20px.0p50px 0px 58669px 99px 171px 0 px. .elementor-element.elementor-element-0f20d{padding:XNUMXpx XNUMXpx XNUMXpx…

  12. n

    1,796 Hours – German Speech Dataset by Mobile Phone (Scripted Monologue)

    • nexdata.ai
    Updated Oct 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 1,796 Hours – German Speech Dataset by Mobile Phone (Scripted Monologue) [Dataset]. https://www.nexdata.ai/datasets/speechrecog/949
    Explore at:
    Dataset updated
    Oct 8, 2023
    Dataset provided by
    nexdata technology inc
    Nexdata
    Authors
    Nexdata
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
    Description

    1,796 Hours – German Speech Dataset by Mobile Phone, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(3,442 German native speakers in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  13. German Time Series Dataset, 1834-2012

    • figshare.com
    xls
    Updated May 26, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Rahlf; Paul Erker; Georg Fertig; Franz Rothenbacher; Jochen Oltmer; Volker Müller-Benedict; Reinhard Spree; Marcel Boldorf; Mark Spoerer; Marc Debus; Dietrich Oberwittler; Toni Pierenkemper; Heike Wolter; Bernd Wedemeyer-Kolwe; Thomas Großbölting; Markus Goldbeck; Rainer Metz; Richard Tilly; Christopher Kopper; Michael Kopsidis; Alfred Reckendrees; Günther Schulz; Markus Lampe; Nikolaus Wolf; Herman de Jong; Joerg Baten (2016). German Time Series Dataset, 1834-2012 [Dataset]. http://doi.org/10.6084/m9.figshare.1450809.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 26, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Thomas Rahlf; Paul Erker; Georg Fertig; Franz Rothenbacher; Jochen Oltmer; Volker Müller-Benedict; Reinhard Spree; Marcel Boldorf; Mark Spoerer; Marc Debus; Dietrich Oberwittler; Toni Pierenkemper; Heike Wolter; Bernd Wedemeyer-Kolwe; Thomas Großbölting; Markus Goldbeck; Rainer Metz; Richard Tilly; Christopher Kopper; Michael Kopsidis; Alfred Reckendrees; Günther Schulz; Markus Lampe; Nikolaus Wolf; Herman de Jong; Joerg Baten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Germany
    Description

    The aim of the project was to identify and compile the best available historical time series for Germany, and to complement or update them at reasonable expense. Time series were only to be included, if data for the entire period from 1834 to 2012 was at least theoretically available. An integral aspect of the concept of our project is the combination of data with critical commentaries of the time series by established expert scientists. The following themes are covered (authors in parentheses): 1. Environment, Climate, and Nature (Paul Erker) 2. Population, Households, Families (Georg Fertig/Franz Rothenbacher) 3. Migration (Jochen Oltmer) 4. Education and Science (Volker Müller-Benedict) 5. Health Service (Reinhard Spree) 6. Social Policy (Marcel Boldorf) 7. Public Finance and Taxation (Mark Spoerer) 8. Political Participation (Marc Debus) 9. Crime and Justice (Dietrich Oberwittler) 10. Work, Income, and Standard of Living (Toni Pierenkemper) 11. Culture, Tourism, and Sports (Heike Wolter/Bernd Wedemeyer-Kolwe) 12. Religion (Thomas Großbölting/Markus Goldbeck) 13. National Accounts (Rainer Metz) 14. Prices (Rainer Metz) 15. Money and Credit (Richard Tilly) 16. Transport and Communication (Christopher Kopper) 17. Agriculture (Michael Kopsidis) 18. Business, Industry, and Craft (Alfred Reckendrees) 19. Building and Housing (Günther Schulz) 20. Trade (Markus Lampe/ Nikolaus Wolf) 21. Balance of Payments (Nikolaus Wolf) 22. International Comparisons (Herman de Jong/Joerg Baten) Basically, the structure of a dataset is guided by the tables in the print publication by the Federal Agency. The print publication allows for four to eight tables for each of the 22 chapters, which means the data record is correspondingly made up of 120 tables in total. The inner structure of the dataset is a consequence of a German idiosyncrasy: the numerous territorial changes. To account for this idiosyncrasy, we decided on a four-fold data structure. Four territorial units with their respective data, are therefore differentiated in each table in separate columns: A German Confederation/Custom Union/German Reich (1834-1945).B German Federal Republic (1949-1989).C German Democratic Republic (1949-1989).D Germany since the reunification (since 1990). Years in parentheses should be considered a guideline only. It is possible that series for the territory of the old Federal Republic or the new federal states are continued after 1990, or that all-German data from before 1990 were available or were reconstructed.All time series are identified by a distinct ID consisting of an “x” and a four-digit number (for numbers under 1000 with leading zeros). The time series that exclusively contain GDR data were identified with a “c” prefix instead of the “x”.For the four territorial units, the time series are arranged in four blocks side by side within the XLSX files. That means: first all time series for the territory and the period of the Custom Union and German Reich, the next columns contain side by side all time series for the territory of the German Federal Republic / the old federal states, then – if available – those for the territory of the German Democratic Republic / the new federal states, and finally for the reunified Germany. There is at most one row for each year. Dates can be missing if no data for the respective year are available in either of the table’s time series, but no date will appear twice. The four territorial units and the resultant time periods cause a “stepwise” appearance of the data tables.

    If you find anything missing, unclear, incomprehensible, improvable, etc., please contact me (kontakt@deutschland-in-daten.de). Further reading:Rahlf, Thomas, The German Time Series Dataset 1834-2012, in: Journal of Economics and Statistics 236/1 (2016), pp. 129-143. [DOI: 10.1515/jbnst-2015-1005] Open Access: Rahlf, Thomas, Voraussetzungen für eine Historische Statistik von Deutschland (19./20. Jh.), in: Vierteljahrschrift für Sozial- und Wirtschaftsgeschichte 101/3 (2014), S. 322-352. [PDF] Rahlf, Thomas (Hrsg.), Dokumentation zum Zeitreihendatensatz für Deutschland, 1834-2012, Version 01 (= Historical Social Research Transition 26v01), Köln 2015. http://dx.doi.org/10.12759/hsr.trans.26.v01.2015Rahlf, Thomas (Hrsg.), Deutschland in Daten. Zeitreihen zur Historischen Statistik, Bonn: Bundeszentrale für Politische Bildung, 2015. [EconStor]

  14. T

    Germany Current Account

    • tradingeconomics.com
    • it.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Aug 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Germany Current Account [Dataset]. https://tradingeconomics.com/germany/current-account
    Explore at:
    json, csv, xml, excelAvailable download formats
    Dataset updated
    Aug 12, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 1956 - Jul 31, 2025
    Area covered
    Germany
    Description

    Germany recorded a Current Account surplus of 14774.93 EUR Million in July of 2025. This dataset provides - Germany Current Account - actual values, historical data, forecast, chart, statistics, economic calendar and news.

  15. German Speech Recognition Dataset

    • kaggle.com
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). German Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/german-speech-recognition-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    German Speech Dataset for recognition task

    Dataset comprises 431 hours of telephone dialogues in German, collected from 590+ native speakers across various topics and domains, achieving an impressive 95% sentence accuracy rate. It is designed for research in automatic speech recognition (ASR) systems.

    By utilizing this dataset, researchers and developers can advance their understanding and capabilities in transcribing audio, and natural language processing (NLP). - Get the data

    The dataset contains diverse audio files that represent different accents and dialects, making it a comprehensive resource for training and evaluating recognition models.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2F602775557600742814487a26ed7d34bb%2FFrame%202%20(1).png?generation=1741267375174939&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

    This dataset is essential for anyone looking to improve speech recognition technology and develop more effective automatic speech systems.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  16. Traffic German Dataset

    • universe.roboflow.com
    zip
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Object detection (2024). Traffic German Dataset [Dataset]. https://universe.roboflow.com/object-detection-7sfqy/traffic-german/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 29, 2024
    Dataset authored and provided by
    Object detection
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Football Player Detection Bounding Boxes
    Description

    Traffic German

    ## Overview
    
    Traffic German is a dataset for object detection tasks - it contains Football Player Detection annotations for 6,523 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  17. German Reichstag Election Data, 1871-1912

    • icpsr.umich.edu
    ascii, sas, spss
    Updated Jan 12, 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inter-university Consortium for Political and Social Research (2006). German Reichstag Election Data, 1871-1912 [Dataset]. http://doi.org/10.3886/ICPSR00043.v1
    Explore at:
    sas, spss, asciiAvailable download formats
    Dataset updated
    Jan 12, 2006
    Dataset authored and provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/43/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/43/terms

    Time period covered
    1871 - 1912
    Area covered
    Germany, Global
    Description

    This data collection contains electoral data at the wahlkreis and staat levels for the Reichstag elections of 1871, 1874, 1877, 1878, 1881, 1884, 1890, 1893, 1898, 1903, 1907, and 1912. The variables for each election provide information on the votes cast for parties, including the Conservative Party, the German Empire Party, the National-Liberals, the Liberal Empire Party, the People's Party, the Social Democrats, the Progress Party, the Catholic Center, the Particularists, the Poles Party, the Protest Party, the Antisemites, the Free-thinking People's Party, the German Reform Party, the Farmers' Union, the Peasants' Union, and splinter parties. Data are also provided on the total population in 1871 and every fifth year between 1875 and 1910, and the proportions of Protestants and of Catholics in the total population for 1871, 1875, 1880, 1885, 1890, 1905, and 1910. Additional variables provide information on the number of eligible voters, valid and invalid votes cast, and voter turnout.

  18. WMT 2014 English-German

    • kaggle.com
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Lotfy (2024). WMT 2014 English-German [Dataset]. https://www.kaggle.com/datasets/mohamedlotfy50/wmt-2014-english-german/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohamed Lotfy
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The WMT 2014 English-German dataset is a cornerstone resource for researchers developing and evaluating machine translation (MT) systems. It's widely used in the annual WMT shared task, serving as a standard benchmark to compare different approaches and track progress in the field.

    Key Features

    • Size: 4.5 million parallel sentence pairs, providing ample data for training and testing MT models
    • Origin: Comprises high-quality news articles from Europarl, News Commentary, and TED Talks, offering realistic and diverse text domains.
    • Preprocessing: Cleaned and normalized for consistency, ensuring model compatibility and training efficiency.
    • Task Diversity: Originally used for the WMT 2014 News Translation Task, but applicable to various MT research areas
  19. s

    German Dataset

    • shaip.com
    Updated Jun 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2023). German Dataset [Dataset]. https://www.shaip.com/offerings/speech-data-catalog/german-dataset/
    Explore at:
    Dataset updated
    Jun 11, 2023
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Home German DatasetDeutscher DatensatzHigh-Quality German Call-Center, and IVR Dataset for AI & Speech Models Contact Us Call-Center Data IVR Data Call-Center Data .elementor-58669 .elementor-element.elementor-element-91938a9{padding:20px 0px 50px 0px;}.elementor-58669 .elementor-element.elementor-element-99f171d{padding:0px 0px 20px…

  20. T

    Germany Terms Of Trade

    • tradingeconomics.com
    • es.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Germany Terms Of Trade [Dataset]. https://tradingeconomics.com/germany/terms-of-trade
    Explore at:
    xml, excel, csv, jsonAvailable download formats
    Dataset updated
    Jul 16, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 1962 - Jul 31, 2025
    Area covered
    Germany
    Description

    Terms of Trade in Germany increased to 103.90 points in July from 103.60 points in June of 2025. This dataset provides - Germany Terms Of Trade- actual values, historical data, forecast, chart, statistics, economic calendar and news.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
deepset (2021). germanquad [Dataset]. https://huggingface.co/datasets/deepset/germanquad
Organization logo

germanquad

deepset/germanquad

Explore at:
84 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 16, 2021
Dataset authored and provided by
deepsethttps://www.deepset.ai/
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In order to raise the bar for non-English QA, we are releasing a high-quality, human-labeled German QA dataset consisting of 13 722 questions, incl. a three-way annotated test set. The creation of GermanQuAD is inspired by insights from existing datasets as well as our labeling experience from several industry projects. We combine the strengths of SQuAD, such as high out-of-domain performance, with self-sufficient questions that contain all relevant information for open-domain QA as in the NaturalQuestions dataset. Our training and test datasets do not overlap like other popular datasets and include complex questions that cannot be answered with a single entity or only a few words.

Search
Clear search
Close search
Google apps
Main menu