100+ datasets found

P
German Credit Dataset Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
German Credit Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/german-credit-dataset
Explore at:
Description
Two datasets are provided. the original dataset, in the form provided by Prof. Hofmann, contains categorical/symbolic attributes and is in the file "german.data".

For algorithms that need numerical attributes, Strathclyde University produced the file "german.data-numeric". This file has been edited and several indicator variables added to make it suitable for algorithms which cannot cope with categorical variables. Several attributes that are ordered categorical (such as attribute 17) have been coded as integer. This was the form used by StatLog.

This dataset requires use of a cost matrix:

Good Bad
Good 0 1
Bad 5 0

The rows represent the actual classification and the columns the predicted classification.

It is worse to class a customer as good when they are bad (5), than it is to class a customer as bad when they are good (1).
German Credit Scoring Data
kaggle.com
Updated Jan 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elshan Kazim (2024). German Credit Scoring Data [Dataset]. https://www.kaggle.com/datasets/elsnkazm/german-credit-scoring-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 17, 2024
Dataset provided by
Kaggle
Authors
Elshan Kazim
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context

This dataset classifies people described by a set of attributes as good or bad credit risks. Link to the original dataset: German Credit Data

Dataset Characteristics # Instances # Features
Multivariate 1000 20

Since it is impossible to understand the original dataset due to its categorical features with coded, we have mapped those codes into appropriate ones.

Content

Features and explanations

checking_acc_status (categorical) - Status of existing checking account

below_0: ... < 0 DM

below_200: 0 <= ... < 200 DM

above_200: ... >= 200 DM / salary assignments for at least 1 year

no_checking_acc: no checking account

duration (numeric) - Agreed Loan Duration in months

cred_hist (categorical) - Credit history status

no_loan_or_paid_duly_other: no credits taken/ all credits paid back duly

paid_duly_this_bank: all credits at this bank paid back duly

curr_loans_paid_duly: existing credits paid back duly till now

delay_in_past: delay in paying off in the past

risky_acc_or_curr_loan_other: critical account/ other credits existing (not at this bank)

purpose (categorical) - Loan Request Purpose

car_new: car (new)

car_used: car (used)

furniture_equipment: furniture/equipment

radio_tv: radio/television

domestic_appliance: domestic appliances

repairs: repairs

education: education

retraining: retraining

business: business

others: others

loan_amt (numerical) - Credit amount

saving_acc_bonds (categorical) - Savings account/bonds

below_100: ... < 100 DM

below_500: 100 <= ... < 500 DM

below_1000: 500 <= ... < 1000 DM

above_1000: .. >= 1000 DM

unknown_no_saving_acc: unknown/ no savings account

present_employment_since (categorical) - Present employment since

unemployed: unemployed

below_1y: ... < 1 year

below_4y: 1 <= ... < 4 years

below_7y: 4 <= ... < 7 years

above_7y: .. >= 7 years

installment_rate (numerical) - Installment rate in percentage of disposable income

personal_stat_gender (categorical) - Personal status and sex

male_divorced_separated

female_divorced_separated_married

male_single

male_married_widowed

female_single

other_debtors_guarantors (categorical: co-applicant, guarantor, none)

present_residence_since (numerical)

property (categorical)

real_estate

life_insurance_or_agreements: if not real_estate: building society savings agreement/ life insurance

car_or_other: if not others: car or other, not in attribute 6

unknown_or_no_property: unknown / no property

age (numerical)

other_installment_plans (categorical: bank, stores, none)

housing (categorical: rent, own, for_free)

num_curr_loans - Number of existing credits at this bank

job (categorical)

unemployed_non_resident: unemployed/ unskilled - non-resident

unskilled_resident: unskilled - resident

skilled_official: skilled employee / official

management_or_self_emp: management/ self-employed/highly qualified employee/ officer

num_people_provide_maint (numerical) - Number of people being liable to provide maintenance for

telephone (categorical)

is_foreign_worker (categorical) - Indicates whether the individual is a foreign worker
German Credit Risk
kaggle.com
Updated Dec 14, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning (2016). German Credit Risk [Dataset]. https://www.kaggle.com/uciml/german-credit/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2016
Dataset provided by
Kaggle
Authors
UCI Machine Learning
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

The original dataset contains 1000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes. The link to the original dataset can be found below.

Content

It is almost impossible to understand the original dataset due to its complicated system of categories and symbols. Thus, I wrote a small Python script to convert it into a readable CSV file. Several columns are simply ignored, because in my opinion either they are not important or their descriptions are obscure. The selected attributes are:

Age (numeric)

Sex (text: male, female)

Job (numeric: 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled)

Housing (text: own, rent, or free)

Saving accounts (text - little, moderate, quite rich, rich)

Checking account (numeric, in DM - Deutsch Mark)

Credit amount (numeric, in DM)

Duration (numeric, in month)

Purpose (text: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others)

Acknowledgements

Source: UCI
s
German Dataset
ig.shaip.com
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2023). German Dataset [Dataset]. https://ig.shaip.com/offerings/speech-data-catalog/german-dataset/
Explore at:
Dataset updated
Jun 11, 2023
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Ụlọ German DatasetDeutscher DatensatzHigh-Quality German Call-Center, na IVR Dataset maka AI & Ụdị Okwu Kpọtụrụ Anyị Oku-Center Data IVR Data Call-Center Data .elementor-58669 .elementor-element.elementor-element-91938a9{padding:20px 0px 50px;}.elementor-0 .elementor-element.elementor-element-58669f99d{padding:171px 0px 0px…
R
German Dataset
universe.roboflow.com
zip
Updated Jun 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
faculty of engineering minia university (2023). German Dataset [Dataset]. https://universe.roboflow.com/faculty-of-engineering-minia-university/german-7b6mo
Explore at:
zipAvailable download formats
Dataset updated
Jun 15, 2023
Dataset authored and provided by
faculty of engineering minia university
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Sign Bounding Boxes
Description
German

## Overview German is a dataset for object detection tasks - it contains Sign annotations for 898 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
h
German-PD
huggingface.co
Updated Nov 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PleIAs (2024). German-PD [Dataset]. https://huggingface.co/datasets/PleIAs/German-PD
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 6, 2024
Dataset authored and provided by
PleIAs
Description
🇩🇪 German Public Domain 🇩🇪

German-Public Domain or German-PD is a large collection aiming to aggregate all German monographies and periodicals in the public domain. As of March 2024, it is the biggest German open corpus.

Dataset summary

The collection contains 260,638 individual texts making up 37,650,706,611 words recovered from multiple sources, including Internet Archive and various European national libraries and cultural heritage institutions. Each parquet file… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/German-PD.
Ten Thousand German News Articles Dataset
kaggle.com
tblock.github.io
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timo Block (2022). Ten Thousand German News Articles Dataset [Dataset]. https://www.kaggle.com/tblock/10kgnad
Explore at:
zip(21144764 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
Timo Block
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
(see https://tblock.github.io/10kGNAD/ for the original dataset page)

This page introduces the 10k German News Articles Dataset (10kGNAD) german topic classification dataset. The 10kGNAD is based on the One Million Posts Corpus and avalaible under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can download the dataset here.

Why a German dataset?

English text classification datasets are common. Examples are the big AG News, the class-rich 20 Newsgroups and the large-scale DBpedia ontology datasets for topic classification and for example the commonly used IMDb and Yelp datasets for sentiment analysis. Non-english datasets, especially German datasets, are less common. There is a collection of sentiment analysis datasets assembled by the Interest Group on German Sentiment Analysis. However, to my knowlege, no german topic classification dataset is avaliable to the public.

Due to grammatical differences between the English and the German language, a classifyer might be effective on a English dataset, but not as effectiv on a German dataset. The German language has a higher inflection and long compound words are quite common compared to the English language. One would need to evaluate a classifyer on multiple German datasets to get a sense of it's effectivness.

The dataset

The 10kGNAD dataset is intended to solve part of this problem as the first german topic classification dataset. It consists of 10273 german language news articles from an austrian online newspaper categorized into nine topics. These articles are a till now unused part of the One Million Posts Corpus.

In the One Million Posts Corpus each article has a topic path. For example Newsroom/Wirtschaft/Wirtschaftpolitik/Finanzmaerkte/Griechenlandkrise. The 10kGNAD uses the second part of the topic path, here Wirtschaft, as class label. In result the dataset can be used for multi-class classification.

I created and used this dataset in my thesis to train and evaluate four text classifyers on the German language. By publishing the dataset I hope to support the advancement of tools and models for the German language. Additionally this dataset can be used as a benchmark dataset for german topic classification.

Numbers and statistics

As in most real-world datasets the class distribution of the 10kGNAD is not balanced. The biggest class Web consists of 1678, while the smalles class Kultur contains only 539 articles. However articles from the Web class have on average the fewest words, while artilces from the culture class have the second most words.

Splitting into train and test

I propose a stratifyed split of 10% for testing and the remaining articles for training. To use the dataset as a benchmark dataset, please used the train.csv and test.csv files located in the project root.

Code

Python scripts to extract the articles and split them into a train- and a testset avaliable in the code directory of this project. Make sure to install the requirements. The original corpus.sqlite3 is required to extract the articles (download here (compressed) or here (uncompressed)).

License

This dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please consider citing the authors of the One Million Post Corpus if you use the dataset.
s
German Dataset
la.shaip.com
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2024). German Dataset [Dataset]. https://la.shaip.com/offerings/speech-data-catalog/german-dataset/
Explore at:
Dataset updated
Dec 8, 2024
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Domus Germanica DatasetDataset Germanica Dataset Altae Qualitatis pro Centris Vocationum et IVR pro Exemplis Intelligentiae Artificialis et Orationis Contactus Nobiscum Data Centrorum Vocationum Data IVR Data Centrorum Vocationum .elementor-58669 .elementor-element.elementor-element-91938a9{padding:20px 0px 50px 0px;}.elementor-58669 .elementor-element.elementor-element-99f171d{padding:0px 0px 20px…
E
German Fake News Dataset "GermanFakeNC"
live.european-language-grid.eu
explore.openaire.eu
json
Updated Apr 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). German Fake News Dataset "GermanFakeNC" [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7564
Explore at:
jsonAvailable download formats
Dataset updated
Apr 15, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Germany
Description
"GermanFakeNC" is a German Fake News Corpus including 490 texts which were retrieved from German alternative online media sources. Every fake statement in the text was verified claim-by-claim by authoritative sources (e.g. from local police authorities, scientific studies, the police press office, etc.). The time interval for most of the news is established from December 2015 to March 2018.
Steps to reproduce the data are described in the README file.
Please cite:
@inproceedings{TPDL_Vogel19, author = {Inna Vogel and Peter Jiang}, title = {Fake News Detection with the New German Dataset "GermanFakeNC"}, booktitle = {Digital Libraries for Open Knowledge - 23rd International Conference on Theory and Practice of Digital Libraries, {TPDL} 2019, Oslo, Norway, September 9-12, 2019, Proceedings}, pages = {288--295}, year = {2019}, url = {https://doi.org/10.1007/978-3-030-30760-8\_25}, doi = {10.1007/978-3-030-30760-8\_25},}
Traffic German Dataset
universe.roboflow.com
zip
Updated Apr 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Object detection (2024). Traffic German Dataset [Dataset]. https://universe.roboflow.com/object-detection-7sfqy/traffic-german
Explore at:
zipAvailable download formats
Dataset updated
Apr 29, 2024
Dataset authored and provided by
Object detection
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Football Player Detection Bounding Boxes
Description
Traffic German

## Overview Traffic German is a dataset for object detection tasks - it contains Football Player Detection annotations for 6,523 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
h
hatecheck-german
huggingface.co
Updated Jan 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Röttger (2025). hatecheck-german [Dataset]. https://huggingface.co/datasets/Paul/hatecheck-german
Explore at:
Dataset updated
Jan 23, 2025
Authors
Paul Röttger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Multilingual HateCheck

Dataset Description

Multilingual HateCheck (MHC) is a suite of functional tests for hate speech detection models in 10 different languages: Arabic, Dutch, French, German, Hindi, Italian, Mandarin, Polish, Portuguese and Spanish. For each language, there are 25+ functional tests that correspond to distinct types of hate and challenging non-hate. This allows for targeted diagnostic insights into model performance. For more details… See the full description on the dataset page: https://huggingface.co/datasets/Paul/hatecheck-german.
g
GERDA -- German Election Database
german-elections.com
Updated Jan 2, 2006
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hanno Hilbig (2006). GERDA -- German Election Database [Dataset]. http://www.german-elections.com/
Explore at:
Dataset updated
Jan 2, 2006
Authors
Hanno Hilbig
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Germany
Description
Comprehensive dataset of local, state, and federal election results in Germany, facilitating research on electoral behavior, representation, and political responsiveness. Umfassende Datenbank von: Bundestagswahlergebnissen, Landeswahlergebnissen und Kommunalwahlergebnissen in Deutschland, die die Forschung zu Wahlverhalten, politischer Repräsentation und politischer Reaktionsfähigkeit ermöglicht.
h
German-PD-Newspapers
huggingface.co
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Majstorovic (2024). German-PD-Newspapers [Dataset]. https://huggingface.co/datasets/storytracer/German-PD-Newspapers
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 5, 2024
Authors
Sebastian Majstorovic
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Public Domain Newspapers (German)

This dataset contains 13 billion words of OCR text extracted from German historical newspapers.

Dataset Details Dataset Description

Curated by: Sebastian Majstorovic Language(s) (NLP): German License: Dataset: CC0, Texts: Public Domain

Dataset Sources [optional]

Repository: https://www.deutsche-digitale-bibliothek.de/newspaper

Copyright & License

The newspapers texts have been… See the full description on the dataset page: https://huggingface.co/datasets/storytracer/German-PD-Newspapers.
h
german-hate-speech-superset
huggingface.co
Updated Nov 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Tonneau (2024). german-hate-speech-superset [Dataset]. https://huggingface.co/datasets/manueltonneau/german-hate-speech-superset
Explore at:
Dataset updated
Nov 6, 2024
Authors
Manuel Tonneau
Description
German Hate Speech Superset

This dataset is a superset (N=50,545) of posts annotated as hateful or not. It results from the preprocessing and merge of all available German hate speech datasets in April 2024. These datasets were identified through a systematic survey of hate speech datasets conducted in early 2024. We only kept datasets that:

are documented are publicly available focus on hate speech, defined broadly as "any kind of communication in speech, writing or behavior, that… See the full description on the dataset page: https://huggingface.co/datasets/manueltonneau/german-hate-speech-superset.
P
Voxforge German Dataset
paperswithcode.com
Updated Nov 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Voxforge German Dataset [Dataset]. https://paperswithcode.com/dataset/voxforge-german
Explore at:
Dataset updated
Nov 14, 2022
Description
VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).

We will make available all submitted audio files under the GPL license, and then 'compile' them into acoustic models for use with Open Source speech recognition engines such as CMU Sphinx, ISIP, Julius (github) and HTK (note: HTK has distribution restrictions).
German Reichstag Election Data, 1871-1912
icpsr.umich.edu
ascii, sas, spss
Updated Jan 12, 2006
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inter-university Consortium for Political and Social Research (2006). German Reichstag Election Data, 1871-1912 [Dataset]. http://doi.org/10.3886/ICPSR00043.v1
Explore at:
sas, spss, asciiAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR00043.v1
Dataset updated
Jan 12, 2006
Dataset authored and provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
License
https://www.icpsr.umich.edu/web/ICPSR/studies/43/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/43/terms
Time period covered
1871 - 1912
Area covered
Global, Germany
Description
This data collection contains electoral data at the wahlkreis and staat levels for the Reichstag elections of 1871, 1874, 1877, 1878, 1881, 1884, 1890, 1893, 1898, 1903, 1907, and 1912. The variables for each election provide information on the votes cast for parties, including the Conservative Party, the German Empire Party, the National-Liberals, the Liberal Empire Party, the People's Party, the Social Democrats, the Progress Party, the Catholic Center, the Particularists, the Poles Party, the Protest Party, the Antisemites, the Free-thinking People's Party, the German Reform Party, the Farmers' Union, the Peasants' Union, and splinter parties. Data are also provided on the total population in 1871 and every fifth year between 1875 and 1910, and the proportions of Protestants and of Catholics in the total population for 1871, 1875, 1880, 1885, 1890, 1905, and 1910. Additional variables provide information on the number of eligible voters, valid and invalid votes cast, and voter turnout.
Germany: civilian workforce by gender and foreign workers 1939-1944
statista.com
Updated Dec 18, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2012). Germany: civilian workforce by gender and foreign workers 1939-1944 [Dataset]. https://www.statista.com/statistics/1290338/german-workforce-wwii-background/
Explore at:
Dataset updated
Dec 18, 2012
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 1941 - Sep 1944
Area covered
Germany
Description
In late May 1939, just three months before the Second World War began in Europe, Germany's workforce was made up of almost 25 million men, 15 million women, and a very small number of foreign workers. The share of German men in the workforce decreased each year thereafter, as more were conscripted into the armed forces, and there were approximately 11 million fewer German male citizens in the workforce by September 1944. The number of German women fluctuated, but remained between 14 and 15 million throughout the given period, and it exceeded the number of German men in 1944. Despite the number of German men in the workforce dropping by 45 percent, the total number of workers in German was consistently around 36 million between 1940 and 1944, as this difference was offset by foreign and forced laborers. These workers were mostly drafted from annexed territories in Eastern Europe, and prisoners were transferred from concentration and POW camps to meet the labor demands in various areas of Germany.
J
The German Cliometrics Database (replication data)
journaldata.zbw.eu
csv, json
Updated Sep 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tobias A. Jopp; Mark Spoerer; Tobias A. Jopp; Mark Spoerer (2024). The German Cliometrics Database (replication data) [Dataset]. http://doi.org/10.15456/vswg.2024078.1048204229
Explore at:
csv(7674), csv(1184041), json(1643212)Available download formats
Unique identifier
https://doi.org/10.15456/vswg.2024078.1048204229
Dataset updated
Sep 17, 2024
Dataset provided by
ZBW - Leibniz Informationszentrum Wirtschaft
Authors
Tobias A. Jopp; Mark Spoerer; Tobias A. Jopp; Mark Spoerer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Germany
Description
This short article introduces the German Cliometrics Database as the fundament of Jopp and Spoerer (2024) who trace cliometric research on German history. This newly constructed database of every publication which (1) contributes to the historiography of Germany and (2) employs, as a baseline, inferential statistics enables researchers to specifically find cliometric studies related to their own work much quicker. Even though no full texts are provided along with the data file, the collected abstracts or, respectively, summaries for every publication in the database allow for some baseline text mining approaches. Along with the remaining information provided, they may also form the basis for broader bibliometric or historiographical studies.
Population numbers in Germany 1990-2023
statista.com
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Population numbers in Germany 1990-2023 [Dataset]. https://www.statista.com/topics/13131/german-election-2025/
Explore at:
Dataset updated
Feb 24, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
Germany
Description
This statistic shows the development of population numbers in Germany from 1990 to 2023. In 2023, the population in Germany, as of December 31 of that year, amounted to 84.67 million people. An increase compared to the previous year.
d
Data from: German Socio-Economic Panel
dknet.org
neuinfo.org
+2more
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). German Socio-Economic Panel [Dataset]. http://identifiers.org/RRID:SCR_013140
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013140
Dataset updated
Jul 31, 2024
Description
A wide-ranging representative longitudinal study of private households that permits researchers to track yearly changes in the health and economic well-being of older people relative to younger people in Germany from 1984 to the present. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, Foreigners, and recent Immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. In addition to standard demographic information, the GSOEP questionnaire also contains objective measuresuse of time, use of earnings, income, benefit payments, health, etc. and subjective measures - level of satisfaction with various aspects of life, hopes and fears, political involvement, etc. of the German population. The first wave, collected in 1984 in the western states of Germany, contains 5,921 households in two randomly sampled sub-groups: 1) German Sub-Sample: people in private households where the head of household was not of Turkish, Greek, Yugoslavian, Spanish, or Italian nationality; 2) Foreign Sub-Sample: people in private households where the head of household was of Turkish, Greek, Yugoslavian, Spanish, or Italian nationality. In each year since 1984, the GSOEP has attempted to re-interview original sample members unless they leave the country. A major expansion of the GSOEP was necessitated by German reunification. In June 1990, the GSOEP fielded a first wave of the eastern states of Germany. This sub-sample includes individuals in private households where the head of household was a citizen of the German Democratic Republic. The first wave contains 2,179 households. In 1994 and 1995, the GSOEP added a sample of immigrants to the western states of Germany from 522 households who arrived after 1984, which in 2006 included 360 households and 684 respondents. In 1998 a new refreshment sample of 1,067 households was selected from the population of private households. In 2000 a sample was drawn using essentially similar selection rules as the original German sub-sample and the 1998 refreshment sample with some modifications. The 2000 sample includes 6,052 households covering 10,890 individuals. Finally, in 2002, an overrepresentation of high-income households was added with 2,671 respondents from 1,224 households, of which 1,801 individuals (689 households) were still included in the year 2006. Data Availability: The data are available to researchers in Germany and abroad in SPSS, SAS, TDA, STATA, and ASCII format for immediate use. Extensive documentation in English and German is available online. The SOEP data are available in German and English, alone or in combination with data from other international panel surveys (e.g., the Cross-National Equivalent Files which contain panel data from Canada, Germany, and the United States). The public use file of the SOEP with anonymous microdata is provided free of charge (plus shipping costs) to universities and research centers. The individual SOEP datasets cannot be downloaded from the DIW Web site due to data protection regulations. Use of the data is subject to special regulations, and data privacy laws necessitate the signing of a data transfer contract with the DIW. The English Language Public Use Version of the GSOEP is distributed and administered by the Department of Policy Analysis and Management, Cornell University. The data are available on CD-ROM from Cornell for a fee. Full instructions for accessing GSOEP data may be accessed on the project website, http://www.human.cornell.edu/che/PAM/Research/Centers-Programs/German-Panel/cnef.cfm * Dates of Study: 1984-present * Study Features: Longitudinal, International * Sample Size: ** 1984: 12,290 (GSOEP West) ** 1990: 4,453 (GSOEP East) ** 2000: 20,000+ Links: * Cornell Project Website: http://www.human.cornell.edu/che/PAM/Research/Centers-Programs/German-Panel/cnef.cfm * GSOEP ICPSR: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/00131

Dataset Characteristics	# Instances	# Features
Multivariate	1000	20

Facebook

Twitter

Click to copy link

Link copied

Cite

German Credit Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/german-credit-dataset

German Credit Dataset Dataset

Explore at:

Description

Two datasets are provided. the original dataset, in the form provided by Prof. Hofmann, contains categorical/symbolic attributes and is in the file "german.data".

For algorithms that need numerical attributes, Strathclyde University produced the file "german.data-numeric". This file has been edited and several indicator variables added to make it suitable for algorithms which cannot cope with categorical variables. Several attributes that are ordered categorical (such as attribute 17) have been coded as integer. This was the form used by StatLog.

This dataset requires use of a cost matrix:

	Good	Bad
Good	0	1
Bad	5	0

The rows represent the actual classification and the columns the predicted classification.

It is worse to class a customer as good when they are bad (5), than it is to class a customer as bad when they are good (1).

Clear search

Close search

Google apps

Main menu

German Credit Dataset Dataset

German Credit Scoring Data

Context

Content

German Credit Risk

Context

Content

Acknowledgements

German Dataset

German Dataset

German

German-PD

Ten Thousand German News Articles Dataset

Why a German dataset?

The dataset

Numbers and statistics

Splitting into train and test

Code

License

German Dataset

German Fake News Dataset "GermanFakeNC"

Traffic German Dataset

Traffic German

hatecheck-german

GERDA -- German Election Database

German-PD-Newspapers

german-hate-speech-superset

Voxforge German Dataset

German Reichstag Election Data, 1871-1912

Germany: civilian workforce by gender and foreign workers 1939-1944

The German Cliometrics Database (replication data)

Population numbers in Germany 1990-2023

Data from: German Socio-Economic Panel

German Credit Dataset Dataset