Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset contains all buildings in Germany with their footprint polygon and height. It is a partial dump of the ETHOS.BUILDA database (version v7_20240429). ETHOS.BUILDA is a database containing building-level data for the German building stock. It is based on various data sources that are combined and enriched with machine learning approaches to generate one consistent and complete building dataset.
ETHOS.BUILDA is made available under the Open Database License (ODbL). The licenses of the contents of the database depend on the data source. The sources of the building attributes and information on the type of processing that was done to assign the information from the raw data to the building in ETHOS.BUILDA are provided for each individual data point.
Building data is provided per federal state, the files are named according to the NUTS-1 region names. The building data has the following fields:
field name | description |
ID | unique identifier of the building |
source | the source of the building footprint |
footprint | footprint polygon in WKT-format, EPSG:3035 |
height_m |
value: height of the building in [m], source: source of the height data, lineage: height assignment method |
A mapping of the abbreviations of "source" and "lineage" of individual data points to the descriptions is provided in sources.csv and lineages.csv. There is no source entry for the source "v7_model.json" in the sources.csv file, as this refers to the internally trained machine learning model and not to an external dataset.
This work was supported by the Helmholtz Association under the program "Energy System Design".
Furthermore, the authors would like to express their gratitude to the Federal Ministry for Economic Affairs and Climate Action (BMWK.IIB4) for providing the necessary resources to conduct this study. Our research was supported by the WAAGE Grant Program (Grant No. 03EI1044/03EE 5031D), and we appreciate their financial assistance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the New Germany median household income by race. The dataset can be utilized to understand the racial distribution of New Germany income.
The dataset will have the following datasets when applicable
Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Explore our comprehensive data analysis and visual representations for a deeper understanding of New Germany median household income by race. You can refer the same here
In this article, we introduce a unique dataset containing all written communication published by the German Bundestag between 1949 and 2017. Increasing numbers of scholars make use of protocols of parliamentary speeches, parliamentary questions, or the texts of legislative drafts in various fields of comparative politics including representation, responsiveness, professionalization and political careers, or parliamentary agenda studies. Since preparing parliamentary documents is rather resource intense, these studies remain limited to single points in time, types of documents and/or policy areas. The long time horizon and various types of documents covered by our new comprehensive dataset will enable scholars interested in parliaments, parties and representatives to answer various innovative research questions related to legislative studies.
505 Economics is on a mission to make academic economics accessible. We've developed the first monthly sub-national GDP data for EU and UK regions from January 2015 onwards.
Our GDP dataset uses luminosity as a proxy for GDP. The brighter a place, the more economic activity that place tends to have.
We produce the data using high-resolution night time satellite imagery and Artificial Intelligence.
This builds on our academic research at the London School of Economics, and we're producing the dataset in collaboration with the European Space Agency BIC UK.
We have published peer-reviewed academic articles on the usage of luminosity as an accurate proxy for GDP.
Key features:
The dataset can be used by:
We have created this dataset for all UK sub-national regions, 28 EU Countries and Switzerland.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In order to raise the bar for non-English QA, we are releasing a high-quality, human-labeled German QA dataset consisting of 13 722 questions, incl. a three-way annotated test set. The creation of GermanQuAD is inspired by insights from existing datasets as well as our labeling experience from several industry projects. We combine the strengths of SQuAD, such as high out-of-domain performance, with self-sufficient questions that contain all relevant information for open-domain QA as in the NaturalQuestions dataset. Our training and test datasets do not overlap like other popular datasets and include complex questions that cannot be answered with a single entity or only a few words.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the German Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.
This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.
Participant Diversity:
- Speakers: 50+ native German speakers from the FutureBeeAI Community.
- Regions: Ensures a balanced representation of Germany1 accents, dialects, and demographics.
- Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
Recording Nature: Scripted wake word and command type of audio recordings.
- Duration: Average duration of 5 to 20 seconds per audio recording.
- Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.
Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.
Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.
Different Cars: Data collection was carried out in different types and models of cars.
Different Types of Voice Commands:
- Navigational Voice Commands
- Mobile Control Voice Commands
- Car Control Voice Commands
- Multimedia & Entertainment Commands
- General, Question Answer, Search Commands
Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.
- Morning
- Afternoon
- Evening
Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:
- Noise Level: Silent, Low Noise, Moderate Noise, High Noise
- Parking Location: Indoor, Outdoor
- Car Windows: Open, Closed
- Car AC: On, Off
- Car Engine: On, Off
- Car Movement: Stationary, Moving
The dataset provides comprehensive metadata for each audio recording and participant:
Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.
Other Metadata: Recording transcript, recording environment, device details, sample rate, bit depth, file format, recording time.
This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of German voice assistant speech recognition models.
This In-car Speech Dataset is a valuable resource for various applications in the field of in-car voice recognition and AI-driven voice technology. This dataset can be leveraged to enhance the performance and functionality of voice-activated systems across different domains.
Speech Recognition Model Training: Provides high-quality audio data for training models to accurately recognize and respond to in-car voice commands.
Safety and Emergency Response: Supports the development of systems that recognize and respond to emergency commands and safety alerts.
Driver Assistance: Facilitates the creation of advanced driver-assistance systems (ADAS) that leverage voice commands for hands-free operation.
Our proprietary data collection platform, “Yugo,” was used throughout the process of this dataset creation.
Throughout the data collection process, the data remained within our secure platform and did not leave our environment, ensuring data security and confidentiality.
The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
It does not include any personally identifiable information about any participant, which makes the dataset safe to use.
Understanding the importance of diverse environments for robust voice assistant models, our in-car voice dataset is regularly updated with new audio data captured in various real-world conditions.
Customization & Custom Collection Options:
- Environmental Conditions: Custom collection in specific environmental conditions upon request.
- Sample Rates: Customizable from 8kHz to 48kHz.
- Diverse Pace: Custom collection can be done at a diverse pace upon request.
- Device Specific: Recording can be done with the specific mobile brand or operating system.
This German In-car audio dataset is created by FutureBeeAI and is available for commercial use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generation of multiple true-false questions
This project provides a Natural Language Pipeline for processing German Textbook sections as an input generating Multiple True-False Questions using GPT2.
Assessments are an important part of the learning cycle and enable the development and promotion of competencies. However, the manual creation of assessments is very time-consuming. Therefore, the number of tasks in learning systems is often limited. In this repository, we provide an algorithm that can automatically generate an arbitrary number of German True False statements from a textbook using the GPT-2 model. The algorithm was evaluated with a selection of textbook chapters from four academic disciplines (see `data` folder) and rated by individual domain experts. One-third of the generated MTF Questions are suitable for learning. The algorithm provides instructors with an easier way to create assessments on chapters of textbooks to test factual knowledge.
As a type of Multiple-Choice question, Multiple True False (MTF) Questions are, among other question types, a simple and efficient way to objectively test factual knowledge. The learner is challenged to distinguish between true and false statements. MTF questions can be presented differently, e.g. by locating a true statement from a series of false statements, identifying false statements among a list of true statements, or separately evaluating each statement as either true or false. Learners must evaluate each statement individually because a question stem can contain both incorrect and correct statements. Thus, MTF Questions as a machine-gradable format have the potential to identify learners’ misconceptions and knowledge gaps.
Example MTF question:
Check the correct statements:
[ ] All trees have green leafs.
[ ] Trees grow towards the sky.
[ ] Leafes can fall from a tree.
Features
- generation of false statements
- automatic selection of true statements
- selection of an arbitrary similarity for true and false statements as well as the number of false statements
- generating false statements by adding or deleting negations as well as using a german gpt2
Setup
Installation
1. Create a new environment: `conda create -n mtfenv python=3.9`
2. Activate the environment: `conda activate mtfenv`
3. Install dependencies using anaconda:
```
conda install -y -c conda-forge pdfplumber
conda install -y -c conda-forge nltk
conda install -y -c conda-forge pypdf2
conda install -y -c conda-forge pylatexenc
conda install -y -c conda-forge packaging
conda install -y -c conda-forge transformers
conda install -y -c conda-forge essential_generators
conda install -y -c conda-forge xlsxwriter
```
3. Download spacy: `python3.9 -m spacy download de_core_news_lg`
Getting started
After installation, you can execute the bash script `bash run.sh` in the terminal to compile MTF questions for the provided textbook chapters.
To create MTF questions for your own texts use the following command:
`python3 main.py --answers 1 --similarity 0.66 --input ./
The parameter `answers` indicates how many false answers should be generated.
By configuring the parameter `similarity` you can determine what portion of a sentence should remain the same. The remaining portion will be extracted and used to generate a false part of the sentence.
## History and roadmap
* Outlook third iteration: Automatic augmentation of text chapters with generated questions
* Second iteration: Generation of multiple true-false questions with improved text summarizer and German GPT2 sentence generator
* First iteration: Generation of multiple true false questions in the Bachelor thesis of Mirjam Wiemeler
Publications, citations, license
Publications
Citation of the Dataset
The source code and data are maintained at GitHub: https://github.com/D2L2/multiple-true-false-question-generation
Contact
License Distributed under the MIT License. See [LICENSE.txt](https://gitlab.pi6.fernuni-hagen.de/la-diva/adaptive-assessment/generationofmultipletruefalsequestions/-/blob/master/LICENSE.txt) for more information.
Acknowledgments This research was supported by CATALPA - Center of Advanced Technology for Assisted Learning and Predictive Analytics of the FernUniversität in Hagen, Germany.
This project was carried out as part of research in the CATALPA project [LA DIVA](https://www.fernuni-hagen.de/forschung/schwerpunkte/catalpa/forschung/projekte/la-diva.shtml)
Abstract
The Urban Green Raster Germany is a land cover classification for Germany that addresses in particular the urban vegetation areas. The raster dataset covers the terrestrial national territory of Germany and has a spatial resolution of 10 meters. The dataset is based on a fully automated classification of Sentinel-2 satellite data from a full 2018 vegetation period using reference data from the European LUCAS land use and land cover point dataset. The dataset identifies eight land cover classes. These include Built-up, Built-up with significant green share, Coniferous wood, Deciduous wood, Herbaceous vegetation (low perennial vegetation), Water, Open soil, Arable land (low seasonal vegetation). The land cover dataset provided here is offered as an integer raster in GeoTiff format. The assignment of the number coding to the corresponding land cover class is explained in the legend file.
Data acquisition
The data acquisition comprises two main processing steps: (1) Collection, processing, and automated classification of the multispectral Sentinel 2 satellite data with the “Land Cover DE method”, resulting in the raw land cover classification dataset, NDVI layer, and RF assignment frequency vector raster. (2) GIS-based postprocessing including discrimination of (densely) built-up and loosely built-up pixels according NDVI threshold, and creating water-body and arable-land masks from geo-topographical base-data (ATKIS Basic DLM) and reclassification of water and arable land pixels based on the assignment frequency.
Data collection
Satellite data were searched and downloaded from the Copernicus Open Access Hub (https://scihub.copernicus.eu/).
The LUCAS reference and validation points were loaded from the Eurostat platform (https://ec.europa.eu/eurostat/web/lucas/data/database).
The processing of the satellite data was performed at the DLR data center in Oberpfaffenhofen.
GIS-based post-processing of the automatic classification result was performed at IOER in Dresden.
Value of the data
The dataset can be used to quantify the amount of green areas within cities on a homogeneous data base [5].
Thus it is possible to compare cities of different sizes regarding their greenery and with respect to their ratio of green and built-up areas [6].
Built-up areas within cities can be discriminated regarding their built-up density (dense built-up vs. built-up with higher green share).
Data description
A Raster dataset in GeoTIFF format: The dataset is stored as an 8 bit integer raster with values ranging from 1 to 8 for the eight different land cover classes. The nomenclature of the coded values is as follows: 1 = Built-up, 2=open soil; 3=Coniferous wood, 4= Deciduous wood, 5=Arable land (low seasonal vegetation), 6=Herbaceous vegetation (low perennial vegetation), 7=Water, 8=Built-up with significant green share. Name of the file ugr2018_germany.tif. The dataset is zipped alongside with accompanying files: *.twf (geo-referencing world-file), *.ovr (Overlay file for quick data preview in GIS), *.clr (Color map file).
A text file with the integer value assignment of the land cover classes. Name of the file: Legend_LC-classes.txt.
Experimental design, materials and methods
The first essential step to create the dataset is the automatic classification of a satellite image mosaic of all available Sentinel-2 images from May to September 2018 with a maximum cloud cover of 60 percent. Points from the 2018 LUCAS (Land use and land cover survey) dataset from Eurostat [1] were used as reference and validation data. Using Random Forest (RF) classifier [2], seven land use classes (Deciduous wood, Coniferous wood, Herbaceous vegetation (low perennial vegetation), Built-up, Open soil, Water, Arable land (low seasonal vegetation)) were first derived, which is methodologically in line with the procedure used to create the dataset "Land Cover DE - Sentinel-2 - Germany, 2015" [3]. The overall accuracy of the data is 93 % [4].
Two downstream post-processing steps served to further qualify the product. The first step included the selective verification of pixels of the classes arable land and water. These are often misidentified by the classifier due to radiometric similarities with other land covers; in particular, radiometric signatures of water surfaces often resemble shadows or asphalt surfaces. Due to the heterogeneous inner-city structures, pixels are also frequently misclassified as cropland.
To mitigate these errors, all pixels classified as water and arable land were matched with another data source. This consisted of binary land cover masks for these two land cover classes originating from the Monitor of Settlement and Open Space Development (IOER Monitor). For all water and cropland pixels that were outside of their respective masks, the frequencies of class assignments from the RF classifier were checked. If the assignment frequency to water or arable land was at least twice that to the subsequent class, the classification was preserved. Otherwise, the classification strength was considered too weak and the pixel was recoded to the land cover with the second largest assignment frequency.
Furthermore, an additional land cover class "Built-up with significant vegetation share" was introduced. For this purpose, all pixels of the Built-up class were intersected with the NDVI of the satellite image mosaic and assigned to the new category if an NDVI threshold was exceeded in the pixel. The associated NDVI threshold was previously determined using highest resolution reference data of urban green structures in the cities of Dresden, Leipzig and Potsdam, which were first used to determine the true green fractions within the 10m Sentinel pixels, and based on this to determine an NDVI value that could be used as an indicator of a significant green fraction within the built-up pixel. However, due to the wide dispersion of green fraction values within the built-up areas, it is not possible to establish a universally valid green percentage value for the land cover class of Built-up with significant vegetation share. Thus, the class essentially serves to the visual differentiability of densely and loosely (i.e., vegetation-dominated) built-up areas.
Acknowledgments
This work was supported by the Federal Institute for Research on Building, Urban Affairs and Spatial Development (BBSR) [10.06.03.18.101].The provided data has been developed and created in the framework of the research project “Wie grün sind bundesdeutsche Städte?- Fernerkundliche Erfassung und stadträumlich-funktionale Differenzierung der Grünausstattung von Städten in Deutschland (Erfassung der urbanen Grünausstattung)“ (How green are German cities?- Remote sensing and urban-functional differentiation of the green infrastructure of cities in Germany (Urban Green Infrastructure Inventory)). Further persons involved in the project were: Fabian Dosch (funding administrator at BBSR), Stefan Fina (research partner, group leader at ILS Dortmund), Annett Frick, Kathrin Wagner (research partners at LUP Potsdam).
References
[1] Eurostat (2021): Land cover / land use statistics database LUCAS. URL: https://ec.europa.eu/eurostat/web/lucas/data/database
[2] L. Breiman (2001). Random forests, Mach. Learn., 45, pp. 5-32
[3] M. Weigand, M. Wurm (2020). Land Cover DE - Sentinel-2—Germany, 2015 [Data set]. German Aerospace Center (DLR). doi: 10.15489/1CCMLAP3MN39
[4] M. Weigand, J. Staab, M. Wurm, H. Taubenböck, (2020). Spatial and semantic effects of LUCAS samples on fully automated land use/land cover classification in high-resolution Sentinel-2 data. Int J Appl Earth Obs, 88, 102065. doi: https://doi.org/10.1016/j.jag.2020.102065
[5] L. Eichler., T. Krüger, G. Meinel, G. (2020). Wie grün sind deutsche Städte? Indikatorgestützte fernerkundliche Erfassung des Stadtgrüns. AGIT Symposium 2020, 6, 306–315. doi: 10.14627/537698030
[6] H. Taubenböck, M. Reiter, F. Dosch, T. Leichtle, M. Weigand, M. Wurm (2021). Which city is the greenest? A multi-dimensional deconstruction of city rankings. Comput Environ Urban Syst, 89, 101687. doi: 10.1016/j.compenvurbsys.2021.101687
This is NOT a raw population dataset. We use our proprietary stack to combine detailed 'WorldPop' UN-adjusted, sex and age structured population data with a spatiotemporal OD matrix.
The result is a dataset where each record indicates how many people can be reached in a fixed timeframe (4 Hours in this case) from that record's location.
The dataset is broken down into sex and age bands at 5 year intervals, e.g - male 25-29 (m_25) and also contains a set of features detailing the representative percentage of the total that the count represents.
The dataset provides 76174 records, one for each sampled location. These are labelled with a h3 index at resolution 7 - this allows easy plotting and filtering in Kepler.gl / Deck.gl / Mapbox, or easy conversion to a centroid (lat/lng) or the representative geometry of the hexagonal cell for integration with your geospatial applications and analyses.
A h3 resolution of 7, is a hexagonal cell area equivalent to: - ~1.9928 sq miles - ~5.1613 sq km
Higher resolutions or alternate geographies are available on request.
More information on the h3 system is available here: https://eng.uber.com/h3/
WorldPop data provides for a population count using a grid of 1 arc second intervals and is available for every geography.
More information on the WorldPop data is available here: https://www.worldpop.org/
One of the main use cases historically has been in prospecting for site selection, comparative analysis and network validation by asset investors and logistics companies. The data structure makes it very simple to filter out areas which do not meet requirements such as: - being able to access 70% of the German population within 4 hours by Truck and show only the areas which do exhibit this characteristic.
Clients often combine different datasets either for different timeframes of interest, or to understand different populations, such as that of the unemployed, or those with particular qualifications within areas reachable as a commute.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the German Scripted Monologue Speech Dataset for the Healthcare Domain, a voice dataset built to accelerate the development and deployment of German language automatic speech recognition (ASR) systems, with a sharp focus on real-world healthcare interactions.
This dataset includes over 6,000 high-quality scripted audio prompts recorded in German, representing typical voice interactions found in the healthcare industry. The data is tailored for use in voice technology systems that power virtual assistants, patient-facing AI tools, and intelligent customer service platforms.
The prompts span a broad range of healthcare-specific interactions, such as:
To maximize authenticity, the prompts integrate linguistic elements and healthcare-specific terms such as:
These elements make the dataset exceptionally suited for training AI systems to understand and respond to natural healthcare-related speech patterns.
Every audio recording is accompanied by a verbatim, manually verified transcription.
The number of small and medium-sized enterprises in Germany was forecast to continuously increase between 2024 and 2029 by in total 0.8 thousand enterprises (+0.38 percent). According to this forecast, in 2029, the number will have increased for the sixth consecutive year to 212.45 thousand enterprises. According to the OECD an enterprise is defined as the smallest combination of legal units, which is an organisational unit producing services or goods, that benefits from a degree of autonomy with regards to the allocation of resources and decision making. Shown here are small and medium-sized enterprises, which are defined as companies with 1-249 employees.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in more than 150 countries and regions worldwide. All input data are sourced from international institutions, national statistical offices, and trade associations. All data has been are processed to generate comparable datasets (see supplementary notes under details for more information).Find more key insights for the number of small and medium-sized enterprises in countries like Austria and Switzerland.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the distribution of median household income among distinct age brackets of householders in Germany township. Based on the latest 2019-2023 5-Year Estimates from the American Community Survey, it displays how income varies among householders of different ages in Germany township. It showcases how household incomes typically rise as the head of the household gets older. The dataset can be utilized to gain insights into age-based household income trends and explore the variations in incomes across households.
Key observations: Insights from 2023
In terms of income distribution across age cohorts, in Germany township, the median household income stands at $139,318 for householders within the 45 to 64 years age group, followed by $111,071 for the 25 to 44 years age group. Notably, householders within the 65 years and over age group, had the lowest median household income at $66,250.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.
Age groups classifications include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Germany township median household income by age. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of 1,759,830 multi-spectral image patches from the Sentinel-2 mission, annotated with image- and pixel-level land cover and land usage labels from the German land cover model LBM-DE2018 with land cover classes based on the CORINE Land Cover database (CLC) 2018. It includes pixel synchronous examples from each of the four seasons, plus an additional snowy set, spanning the time from April 2018 to February 2019. The patches were taken from 519,547 unique locations, covering the whole surface area of Germany, with each patch covering an area of 1.2km x 1.2km. The set is split into two overlapping grids, consisting of roughly 880,000 samples each, which are shifted by half the patch size in both dimensions. The images in each of the both grids themselves do not overlap.
Contents
Each sample includes:
3 10m resolution bands (RGB), 120px x 120px
1 10m resolution band (infrared), 120px x 120px
6 20m resolution bands, 60px x 60px
2 60m resolution bands, 20xp x 20px
1 pixel-level label map
2 binary masks for cloud and snow coverage
2 binary masks for easy and medium segmentation difficulties, marks areas <300px and <100px respectively
1 JSON-file containing additional meta-information
The meta.csv contains the following information about each sample:
Which season it belongs to
Which of the two grids it belongs to
Coordinates of the patch center
Whether it was acquired from Sentinel-2 Satellite A or B
Date and time of image acquisition
Snow and cloud coverage percentages
Image-level multi-class labels
Three additional image-level urbanization labels, based on the center pixel (details below)
The path to the sample
Classes
ID
Class
1
Continuous urban fabric
2
Discontinuous urban fabric
3
Industrial or commercial units
4
Road and rail networks and associated land
5
Port areas
6
Airports
7
Mineral extraction sites
8
Dump sites
9
Construction sites
10
Green urban areas
11
Sport and leisure facilities
12
Non-irrigated arable land
13
Vineyards
14
Fruit trees and berry plantations
15
Pastures
16
Broad-leaved forest
17
Coniferous forest
18
Mixed forest
19
Natural grasslands
20
Moors and heathland
21
Transitional woodland/shrub
22
Beaches, dunes, sands
23
Bare rock
24
Sparsely vegetated areas
25
Inland marshes
26
Peat bogs
27
Salt marshes
28
Intertidal flats
29
Water courses
30
Water bodies
31
Coastal lagoons
32
Estuaries
33
Sea and ocean
Urbanization classes
SLRAUM
0: None
1: Ländlicher Raum (~ rural area)
2: Städtischer Raum (~ urban area)
RTYP3
0: None
1: Ländliche Regionen (~ rural areas)
2: Regionen mit Verstädterungsansätzen (~ urbanizing areas)
3: Städtische Regionen (~ urban areas)
KTYP4
0: None
1: Dünn besiedelte ländliche Kreise
2: Kreisfreie Großstädte
3: Ländliche Kreise mit Verdichtungsansätzen
4: Städtische Kreise
Further information on the urbanization classes can be found here:
SLRAUM
RTYP3
KTYP4
License of landcover model
Bundesamt für Kartographie und Geodäsie
dl-de/by-2-0 from https://www.govdata.de/dl-de/by-2-0
© GeoBasis-DE / BKG 2022
Source of landcover model
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Government Revenues in Germany increased to 590.21 EUR Billion in the fourth quarter of 2024 from 485.91 EUR Billion in the third quarter of 2024. This dataset provides - Germany Government Revenues- actual values, historical data, forecast, chart, statistics, economic calendar and news.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the German General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of German speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world German communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade German speech models that understand and respond to authentic German accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of German. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple German speech and language AI applications:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Germany Demo is a dataset for object detection tasks - it contains Person annotations for 586 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Exports in Germany decreased to 130.20 EUR Billion in July from 130.90 EUR Billion in June of 2025. This dataset provides - Germany Exports - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
License Plate Recognition - 177 827 Image
This dataset provides 177 827 vehicle images captured in Germany, serving as a robust foundation for license plate recognition, license plate detection, and OCR tasks, supporting autonomous vehicles, traffic management, and smart city applications. - Get the data
Dataset characteristics:
Characteristic Data
Description License plate images with labeling for OCR tasks
Data types Image
Tasks Detection… See the full description on the dataset page: https://huggingface.co/datasets/ud-smart-city/germany-license-plate-dataset.
In this study, the development of the prices of grain, the staple food throughout Germany since the 17th century, is represented, starting from the end of the 18th century. This survey was carried out within the scope of a general historical examination on wholesale prices in Germany by the Reich Statistical Office (“Statistisches Reichsamt”) in cooperation with the German Institute for Economic Research (“Institut für Konjunkturforschung”). Since the records of prices for rye, wheat, barley, and oats available in the primary sources are incomplete as regards the whole length of the above-mentioned period, several values have been converted in order to make a comparison possible (conversion into German mark/Reichsmark per 1,000 kg). Furthermore, index numbers for the German grain prices have been calculated so that a continuous development becomes visible (base year: 1913 = 100). Apart from grain harvests and consumption in Germany since 1878/79, the study gives an overwiev of the foreign trade of rye, wheat, barley, and oats as well. Topics: List of Data tables within the HISTAT research and download system: A. Grain harvest, Foreign trade, and consumption in Germany: Rye, wheat, barley, and oats (1836–1934). B. Index numbers of grain prices in Germany, 1913=100 (1792–1934). C. Prices of different types of grain: Germany, other countries, and world market (rye, wheat, barley, and oats, 1000 kg in German mark and Reichsmark (1836–1934). In der vorliegenden Arbeit wird die Entwicklung der Preise für das Hauptnahrungsmittel seit dem 17. Jahrhundert, das Getreide, in Deutschland seit dem Ausgang des 18. Jahrhunderts dargestellt. Die Arbeit ist im Zusammenhang mit einer allgemeinen historischen Untersuchung der deutschen Großhandelspreise entstanden, die gemeinsam vom Statistischen Reichsamt und dem Institut für Konjunkturforschung durchgeführt worden ist. Da die Aufzeichnungen der Preise für die Getreidesorten Roggen, Weizen, Gerste und Hafer aus den Primärquellen nicht für den gesamten Zeitraum in vergleichbarer Form vorliegen, sind die zu einem Vergleich erforderlichen Umrechnungen vorgenommen worden (in Mark bzw. Reichsmark je 1000kg). Ferner wurden Indexziffern der Getreidepreise in Deutschland berechnet, die den kontinuierlichen Verlauf der Entwicklung zeigen (Basisjahr: 1913 = 100). Neben der Getreideernte und den Getreideverbrauch in Deutschland seit 1878/79 berücksichtigt die Arbeit auch den Außenhandel für die Getreidesorten Roggen, Weizen, Gerste und Hafer. Themen: Verzeichnis der Daten-Tabellen in dem Recherche- und Downloadsystem HISTAT: A. Getreideernte, Außenhandel und Verbrauch in Deutschland: Roggen, Weizen, Gerste und Hafer (1836 – 1934). B. Indexziffern der Getreidepreise in Deutschland, 1913=100 (1792 – 1934). C. Preise für Getreidesorten: Deutschland, Ausland bzw. Weltmarkt (Roggen, Weizen, Gerste und Hafer, 1000 kg in Mark u. Reichmark (1836 – 1934). Sources: One part of these documents was taken from earlier publications by the Statistisches Reichsamt and the former Prussian Statistical Authorities (Preußisches Statistisches Landesamt), as well as from other authorities and non-authorities. A second part has been retrieved from official files. Grain prices: cf Jacobs, A./ Richter, H., 1935: Wholesale prices in Germany between 1792-1934 (“Wholesale Prices in Germany 1792–1934”). Journals (special edition) published by the German Institute for Economic Research, 37. Berlin: Hanseat. Verl.-Anst. Hamburg, p. 52–55. Quellen: Die Unterlagen entstammen zum Teil früheren Veröffentlichungen des Statistischen Reichsamts und des ehemaligen Preußischen Statistischen Landesamtes sowie Veröffentlichungen anderer amtlicher und nichtamtlicher Stellen, zum Teil sind sie amtlichen Akten entnommen. Zu den Getreidepreisen siehe auch: Jacobs, A./ Richter, H., 1935: Die Großhandelspreise in Deutschland von 1792 bis 1934. Sonderhefte des Instituts für Konjunkturforschung, 37. Berlin: Hanseat. Verl.-Anst. Hamburg, S. 52 – 55.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 7 rows and is filtered where the books is The conquest of nature : water, landscape and the making of modern Germany. It features 4 columns: authors, books, and publication dates.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset contains all buildings in Germany with their footprint polygon and height. It is a partial dump of the ETHOS.BUILDA database (version v7_20240429). ETHOS.BUILDA is a database containing building-level data for the German building stock. It is based on various data sources that are combined and enriched with machine learning approaches to generate one consistent and complete building dataset.
ETHOS.BUILDA is made available under the Open Database License (ODbL). The licenses of the contents of the database depend on the data source. The sources of the building attributes and information on the type of processing that was done to assign the information from the raw data to the building in ETHOS.BUILDA are provided for each individual data point.
Building data is provided per federal state, the files are named according to the NUTS-1 region names. The building data has the following fields:
field name | description |
ID | unique identifier of the building |
source | the source of the building footprint |
footprint | footprint polygon in WKT-format, EPSG:3035 |
height_m |
value: height of the building in [m], source: source of the height data, lineage: height assignment method |
A mapping of the abbreviations of "source" and "lineage" of individual data points to the descriptions is provided in sources.csv and lineages.csv. There is no source entry for the source "v7_model.json" in the sources.csv file, as this refers to the internally trained machine learning model and not to an external dataset.
This work was supported by the Helmholtz Association under the program "Energy System Design".
Furthermore, the authors would like to express their gratitude to the Federal Ministry for Economic Affairs and Climate Action (BMWK.IIB4) for providing the necessary resources to conduct this study. Our research was supported by the WAAGE Grant Program (Grant No. 03EI1044/03EE 5031D), and we appreciate their financial assistance.