7 datasets found
  1. Z

    Leibniz-ZAS corpus of MAIN

    • data.niaid.nih.gov
    Updated May 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gagarina, Natalia (2021). Leibniz-ZAS corpus of MAIN [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4724969
    Explore at:
    Dataset updated
    May 6, 2021
    Dataset provided by
    Topaj, Nathalie
    Gagarina, Natalia
    Rizaeva, Zarina
    Sternharz, Alyona
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The presented dataset is part of the narrative corpus collected at Leibniz-Centre General Linguistics (Leibniz-ZAS). It contains transcriptions of oral narratives elicited with the Multilingual Assessment Instrument for Narratives (MAIN; read more here), developed as part of the LITMUS battery of tests in the framework of COST Action IS0804 Language Impairment in a Multilingual Society: Linguistic Patterns and the Road to Assessment. Narratives were elicited in the Russian, Turkish and German languages in the telling elicitation mode using two MAIN picture stories, Baby Birds and Baby Goats. The data were collected during two large-scale longitudinal studies conducted at ZAS in the framework of the Berlin Interdisciplinary Network for Multilingualism (BIVEM) and Interdisciplinary Research Alliance (IFV) projects (more information about the studies). The participants of the studies were Russian-German and Turkish-German bilingual children from different areas of Berlin. Their language development was closely documented every year from early kindergarten up to the end of the third grade of primary school (age 2;9 to 10;4 years). It is the longest and largest study of language development in bilingual children in Germany allowing for cross-sectional and longitudinal analyses from a cross-linguistic perspective.

    The narratives were audio recorded and transcribed in the standardized CHAT format (MacWhinney, 2000) using the CLAN program according to the CHILDES transcription rules for later analysis. The transcriptions can be used to analyze the narrative abilities of bilingual children on macro- and microstructural levels (more information can be found here).

    In total, the dataset contains 210 transcriptions of narratives from 29 participants (10 Russian-German bilingual children and 19 Turkish-German bilingual children), who were tested 5 times after the initial testing (pretest). The 5 testing points are therefore referred to as posttests: post1, post2, post3, post4, post5, post6 (this dataset does not contain data from post5, as oral narratives were not elicited at the end of the second grade). The corresponding age ranges at all testing points are given below for each part of the dataset. The dataset is divided into two parts, Russian-German and Turkish-German narrative corpus respectively.

    The narrative corpus of Russian-German bilingual children includes two folders with narratives elicited in Russian and German, at 5 testing points.

    Total number of transcriptions=100

    Number of children=10

    Total age range=2;9-10;4

    Age range of children for narratives in Russian at each testing point:

    post 1: 2;9-4;3 (kindergarten)

    post 2: 3;9-5;2 (kindergarten)

    post 3: 4;9-6;1 (kindergarten)

    post 4: 6;9-7;6 (end of first grade)

    post 6: 8;7-9;10 (end of third grade)

    Age range of children for narratives in German at each testing point:

    post 1: 2;10-4;3 (kindergarten)

    post 2: 3;9-5;3 (kindergarten)

    post 3: 4;9-6;2 (kindergarten)

    post 4: 6;9-7;6 (end of first grade)

    post 6: 8;8-10;4 (end of third grade)

    The narrative corpus of Turkish-German bilingual children includes two folders.

    One folder contains narratives elicited in German at the earlier 3 testing points, which allows the analysis of early narrative development in one language.

    Total number of transcriptions=30

    Number of children=10

    Total age range=3;5-6;4

    Age range of children for narratives in German at each testing point:

    post 1: 3;5-4;3 (kindergarten)

    post 2: 4;4-5;4 (kindergarten)

    post 3: 5;3-6;4 (kindergarten)

    Another folder contains narratives elicited in both languages, Turkish and German, at 4 testing points starting from post2 and allowing for the analysis of narrative development up to the third grade in both languages.

    Total number of transcriptions=80

    Number of children=10

    Total age range=3;10-9;9

    Age range of children for narratives in Turkish at each testing point:

    post 2: 3;10-5;1 (kindergarten)

    post 3: 4;9-6;1 (kindergarten)

    post 4: 6;5-7;8 (end of first grade)

    post 6: 8;6-9;9 (end of third grade)

    Age range of children for narratives in German at each testing point:

    post 2: 4;1-5;4 (kindergarten)

    post 3: 5;1-6;4 (kindergarten)

    post 4: 6;6-7;8 (end of first grade)

    post 6: 8;5-9;8 (end of third grade)

    The files are named according to the following pattern: child’s code (letters refer to child’s first languages: r-Russian, t-Turkish), test (MAIN), story (bb=Baby Birds, bg=Baby Goats), language of elicitation (de/ru/tr), testing point (1=post1, 2=post2 etc.), and child’s age (year/month). Here is an example: r009_MAIN_bb_de_4_610.

  2. d

    Data from: German Socio-Economic Panel

    • dknet.org
    • neuinfo.org
    • +2more
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). German Socio-Economic Panel [Dataset]. http://identifiers.org/RRID:SCR_013140
    Explore at:
    Dataset updated
    Jul 31, 2024
    Description

    A wide-ranging representative longitudinal study of private households that permits researchers to track yearly changes in the health and economic well-being of older people relative to younger people in Germany from 1984 to the present. Every year, there were nearly 11,000 households, and more than 20,000 persons sampled by the fieldwork organization TNS Infratest Sozialforschung. The data provide information on all household members, consisting of Germans living in the Old and New German States, Foreigners, and recent Immigrants to Germany. The Panel was started in 1984. Some of the many topics include household composition, occupational biographies, employment, earnings, health and satisfaction indicators. In addition to standard demographic information, the GSOEP questionnaire also contains objective measuresuse of time, use of earnings, income, benefit payments, health, etc. and subjective measures - level of satisfaction with various aspects of life, hopes and fears, political involvement, etc. of the German population. The first wave, collected in 1984 in the western states of Germany, contains 5,921 households in two randomly sampled sub-groups: 1) German Sub-Sample: people in private households where the head of household was not of Turkish, Greek, Yugoslavian, Spanish, or Italian nationality; 2) Foreign Sub-Sample: people in private households where the head of household was of Turkish, Greek, Yugoslavian, Spanish, or Italian nationality. In each year since 1984, the GSOEP has attempted to re-interview original sample members unless they leave the country. A major expansion of the GSOEP was necessitated by German reunification. In June 1990, the GSOEP fielded a first wave of the eastern states of Germany. This sub-sample includes individuals in private households where the head of household was a citizen of the German Democratic Republic. The first wave contains 2,179 households. In 1994 and 1995, the GSOEP added a sample of immigrants to the western states of Germany from 522 households who arrived after 1984, which in 2006 included 360 households and 684 respondents. In 1998 a new refreshment sample of 1,067 households was selected from the population of private households. In 2000 a sample was drawn using essentially similar selection rules as the original German sub-sample and the 1998 refreshment sample with some modifications. The 2000 sample includes 6,052 households covering 10,890 individuals. Finally, in 2002, an overrepresentation of high-income households was added with 2,671 respondents from 1,224 households, of which 1,801 individuals (689 households) were still included in the year 2006. Data Availability: The data are available to researchers in Germany and abroad in SPSS, SAS, TDA, STATA, and ASCII format for immediate use. Extensive documentation in English and German is available online. The SOEP data are available in German and English, alone or in combination with data from other international panel surveys (e.g., the Cross-National Equivalent Files which contain panel data from Canada, Germany, and the United States). The public use file of the SOEP with anonymous microdata is provided free of charge (plus shipping costs) to universities and research centers. The individual SOEP datasets cannot be downloaded from the DIW Web site due to data protection regulations. Use of the data is subject to special regulations, and data privacy laws necessitate the signing of a data transfer contract with the DIW. The English Language Public Use Version of the GSOEP is distributed and administered by the Department of Policy Analysis and Management, Cornell University. The data are available on CD-ROM from Cornell for a fee. Full instructions for accessing GSOEP data may be accessed on the project website, http://www.human.cornell.edu/che/PAM/Research/Centers-Programs/German-Panel/cnef.cfm * Dates of Study: 1984-present * Study Features: Longitudinal, International * Sample Size: ** 1984: 12,290 (GSOEP West) ** 1990: 4,453 (GSOEP East) ** 2000: 20,000+ Links: * Cornell Project Website: http://www.human.cornell.edu/che/PAM/Research/Centers-Programs/German-Panel/cnef.cfm * GSOEP ICPSR: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/00131

  3. A

    ‘Population aged 25 to under 65 in NRW by immigration status, gender and...

    • analyst-2.ai
    Updated Jan 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Population aged 25 to under 65 in NRW by immigration status, gender and highest vocational education’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-europa-eu-population-aged-25-to-under-65-in-nrw-by-immigration-status-gender-and-highest-vocational-education-03d9/ca061c06/?iid=021-282&v=presentation
    Explore at:
    Dataset updated
    Jan 18, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    North Rhine-Westphalia
    Description

    Analysis of ‘Population aged 25 to under 65 in NRW by immigration status, gender and highest vocational education’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/ab109bdf-d992-5cee-a61a-2c8eaffcda1e on 18 January 2022.

    --- Dataset description provided by original source is as follows ---

    The statistics show population aged 25 to under 65 in NRW by immigration status, gender and highest vocational education degree. People who are still in school or are in training — i.e. trainees, pupils and students — are not included in the analyses. The highest level of vocational education is distinguished in: without a degree, completed vocational training, tertiary education. Other key data are: German (citizens, emigrants), not German (including Turkish nationality), with migration background, without migration background, overall.

    --- Original source retains full ownership of the source dataset ---

  4. o

    Dataset and Replication Materials for Surveying Citizens with a Migration...

    • osf.io
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanne van Oosten (2025). Dataset and Replication Materials for Surveying Citizens with a Migration Background - A Quantitative Study of Identification versus Categorization [Dataset]. http://doi.org/10.17605/OSF.IO/BS6YN
    Explore at:
    Dataset updated
    Mar 15, 2025
    Dataset provided by
    Center For Open Science
    Authors
    Sanne van Oosten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Survey research on minoritized citizens in Germany and the Netherlands tends to categorize respondents from the top down using the category “migration background,” instead of allowing for identification from the bottom up. Through surveying 1864 respondents in Germany and the Netherlands, including 401 respondents with a background in Türkiye, this paper shows that those who identify as Turkish hold different attitudes than those who have a background in Türkiye but who identify as German or Dutch. In fact, those who do not identify as Turkish, hold very similar attitudes to those without a migration background, and different attitudes than when a researcher would categorize them with those “having a migration background in Türkiye.” I provide proof of this with attitudes towards topics often associated with citizens with a migration background. Beyond these empirical advantages to identification over categorization, this paper outlines additional theoretical, methodological and conceptual advantages to following an identification approach in designing surveys.

  5. P

    MLSUM Dataset

    • paperswithcode.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Scialom; Paul-Alexis Dray; Sylvain Lamprier; Benjamin Piwowarski; Jacopo Staiano, MLSUM Dataset [Dataset]. https://paperswithcode.com/dataset/mlsum
    Explore at:
    Authors
    Thomas Scialom; Paul-Alexis Dray; Sylvain Lamprier; Benjamin Piwowarski; Jacopo Staiano
    Description

    A large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish. Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community.

  6. Z

    Results of the paper "Composition Identification in Ottoman-Turkish Makam...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Şentürk, Sertan (2020). Results of the paper "Composition Identification in Ottoman-Turkish Makam Music Using Transposition-Invariant Partial Audio-Score Alignment" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_56652
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Şentürk, Sertan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Ottoman Empire
    Description

    This repository contains the composition identification and tonic identification results along with the statistical significance values presented in the paper:

    Şentürk, S., & Serra X. (2016). Composition Identification in Ottoman-Turkish Makam Music Using Transposition-Invariant Partial Audio-Score Alignment. In Proceedings of 13th Sound and Music Computing Conference (SMC 2016), (pp. XX–XX)., Hamburg, Germany.

    Please cite the publication above in any work using this dataset.

    For the details of the results, please refer to the paper. For any further information please contact the authors.

    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

  7. o

    Harmonized Cultural Access & Participation Dataset for Music

    • explore.openaire.eu
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Antal (2022). Harmonized Cultural Access & Participation Dataset for Music [Dataset]. http://doi.org/10.5281/zenodo.5917741
    Explore at:
    Dataset updated
    Jan 29, 2022
    Authors
    Daniel Antal
    Description

    Changes since the last version: in the .csv export there was a naming problem. - visit_concert: This is a standard CAP variables about visiting frequencies, in numeric form. - fct_visit_concert: This is a standard CAP variables about visiting frequencies, in categorical form. - is_visit_concert: binary variable, 0 if the person had not visited concerts in the previous 12 months. - artistic_activity_played_music: A variable of the frequency of playing music as an amateur or professional practice, in some surveys we have only a binary variable (played in the last 12 months or not) in other we have frequencies. We will convert this into a binary variable. - fct_artistic_activity_played_music: The artistic_activity_played_music in categorical representation. - artistic_activity_sung: A variable of the frequency of singing as an amateur or professional practice, like played_muisc. Because of the liturgical use of singing, and the differences of religious practices among countries and gender, this is a significantly different variable from played_music. - fct_artistic_activity_sung: The artistic_activity_sung variable in categorical representation. - age_exact: The respondent’s age as an integer number. - country_code: an ISO country code - geo: an ISO code that separates Germany to the former East and West Germany, and the United Kingdom to Great Britain and Northern Ireland, and Cyprus to Cyprus and the Turiksh Cypriot community.[we may leave Turkish Cyprus out for practical reasons.] - age_education: This is a harmonized education proxy. Because we work with the data of more than 30 countries, education levels are difficult to harmonize, and we use the Eurobarometer standard proxy, age of leaving education. It is a specially coded variable, and we will re-code them into two variables, age_education and is_student. - is_student: is a dummy variable for the special coding in age_education for “still studying”, i.e. the person does not have yet a school leaving age. It would be tempting to impute age in this case to age_education, but we will show why this is not a good strategy. - w, w1: Post-stratification weights for the 15+ years old population of each country. Use w1 for averages of geo entities treating Northern Ireland, Great Britain, the United Kingdom, the former GDR, the former West Germany, and Germany as geographical areas. Use w when treating the United Kingdom and Germany as one territory. - wex: Projected weight variable. For weighted average values, use w, w1, for projections on the population size, i.e., use with sums, use wex. - id: The identifier of the original survey. - rowid`: A new unique identifier that is unique in all harmonized surveys, i.e., remains unique in the harmonized dataset.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gagarina, Natalia (2021). Leibniz-ZAS corpus of MAIN [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4724969

Leibniz-ZAS corpus of MAIN

Explore at:
Dataset updated
May 6, 2021
Dataset provided by
Topaj, Nathalie
Gagarina, Natalia
Rizaeva, Zarina
Sternharz, Alyona
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

The presented dataset is part of the narrative corpus collected at Leibniz-Centre General Linguistics (Leibniz-ZAS). It contains transcriptions of oral narratives elicited with the Multilingual Assessment Instrument for Narratives (MAIN; read more here), developed as part of the LITMUS battery of tests in the framework of COST Action IS0804 Language Impairment in a Multilingual Society: Linguistic Patterns and the Road to Assessment. Narratives were elicited in the Russian, Turkish and German languages in the telling elicitation mode using two MAIN picture stories, Baby Birds and Baby Goats. The data were collected during two large-scale longitudinal studies conducted at ZAS in the framework of the Berlin Interdisciplinary Network for Multilingualism (BIVEM) and Interdisciplinary Research Alliance (IFV) projects (more information about the studies). The participants of the studies were Russian-German and Turkish-German bilingual children from different areas of Berlin. Their language development was closely documented every year from early kindergarten up to the end of the third grade of primary school (age 2;9 to 10;4 years). It is the longest and largest study of language development in bilingual children in Germany allowing for cross-sectional and longitudinal analyses from a cross-linguistic perspective.

The narratives were audio recorded and transcribed in the standardized CHAT format (MacWhinney, 2000) using the CLAN program according to the CHILDES transcription rules for later analysis. The transcriptions can be used to analyze the narrative abilities of bilingual children on macro- and microstructural levels (more information can be found here).

In total, the dataset contains 210 transcriptions of narratives from 29 participants (10 Russian-German bilingual children and 19 Turkish-German bilingual children), who were tested 5 times after the initial testing (pretest). The 5 testing points are therefore referred to as posttests: post1, post2, post3, post4, post5, post6 (this dataset does not contain data from post5, as oral narratives were not elicited at the end of the second grade). The corresponding age ranges at all testing points are given below for each part of the dataset. The dataset is divided into two parts, Russian-German and Turkish-German narrative corpus respectively.

The narrative corpus of Russian-German bilingual children includes two folders with narratives elicited in Russian and German, at 5 testing points.

Total number of transcriptions=100

Number of children=10

Total age range=2;9-10;4

Age range of children for narratives in Russian at each testing point:

post 1: 2;9-4;3 (kindergarten)

post 2: 3;9-5;2 (kindergarten)

post 3: 4;9-6;1 (kindergarten)

post 4: 6;9-7;6 (end of first grade)

post 6: 8;7-9;10 (end of third grade)

Age range of children for narratives in German at each testing point:

post 1: 2;10-4;3 (kindergarten)

post 2: 3;9-5;3 (kindergarten)

post 3: 4;9-6;2 (kindergarten)

post 4: 6;9-7;6 (end of first grade)

post 6: 8;8-10;4 (end of third grade)

The narrative corpus of Turkish-German bilingual children includes two folders.

One folder contains narratives elicited in German at the earlier 3 testing points, which allows the analysis of early narrative development in one language.

Total number of transcriptions=30

Number of children=10

Total age range=3;5-6;4

Age range of children for narratives in German at each testing point:

post 1: 3;5-4;3 (kindergarten)

post 2: 4;4-5;4 (kindergarten)

post 3: 5;3-6;4 (kindergarten)

Another folder contains narratives elicited in both languages, Turkish and German, at 4 testing points starting from post2 and allowing for the analysis of narrative development up to the third grade in both languages.

Total number of transcriptions=80

Number of children=10

Total age range=3;10-9;9

Age range of children for narratives in Turkish at each testing point:

post 2: 3;10-5;1 (kindergarten)

post 3: 4;9-6;1 (kindergarten)

post 4: 6;5-7;8 (end of first grade)

post 6: 8;6-9;9 (end of third grade)

Age range of children for narratives in German at each testing point:

post 2: 4;1-5;4 (kindergarten)

post 3: 5;1-6;4 (kindergarten)

post 4: 6;6-7;8 (end of first grade)

post 6: 8;5-9;8 (end of third grade)

The files are named according to the following pattern: child’s code (letters refer to child’s first languages: r-Russian, t-Turkish), test (MAIN), story (bb=Baby Birds, bg=Baby Goats), language of elicitation (de/ru/tr), testing point (1=post1, 2=post2 etc.), and child’s age (year/month). Here is an example: r009_MAIN_bb_de_4_610.

Search
Clear search
Close search
Google apps
Main menu