1 dataset found
  1. Duolingo Spaced Repetition Data

    • kaggle.com
    Updated Feb 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinicius Araujo (2024). Duolingo Spaced Repetition Data [Dataset]. https://www.kaggle.com/datasets/aravinii/duolingo-spaced-repetition-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vinicius Araujo
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    PLEASE UPVOTE IF YOU LIKE THIS CONTENT! 😍

    Duolingo is an American educational technology company that produces learning apps and provides language certification. There main app is considered the most popular language learning app in the world.

    To progress in their learning journey, each user of the application needs to complete a set of lessons in which they are presented with the words of the language they want to learn. In an infinite set of lessons, each word is applied in a different context and, on top of that, Duolingo uses a spaced repetition approach, where the user sees an already known word again to reinforce their learning.

    Each line in this file refers to a Duolingo lesson that had a target word to practice.

    The columns are as follows:

    • p_recall - proportion of exercises from this lesson/practice where the word/lexeme was correctly recalled
    • timestamp - UNIX timestamp of the current lesson/practice
    • delta - time (in seconds) since the last lesson/practice that included this word/lexeme
    • user_id - student user ID who did the lesson/practice (anonymized)
    • learning_language - language being learned
    • ui_language - user interface language (presumably native to the student)
    • lexeme_id - system ID for the lexeme tag (i.e., word)
    • lexeme_string - lexeme tag (see below)
    • history_seen - total times user has seen the word/lexeme prior to this lesson/practice
    • history_correct - total times user has been correct for the word/lexeme prior to this lesson/practice
    • session_seen - times the user saw the word/lexeme during this lesson/practice
    • session_correct - times the user got the word/lexeme correct during this lesson/practice

    The lexeme_string column contains a string representation of the "lexeme tag" used by Duolingo for each lesson/practice (data instance) in our experiments. The lexeme_string field uses the following format:

    `surface-form/lemma

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Vinicius Araujo (2024). Duolingo Spaced Repetition Data [Dataset]. https://www.kaggle.com/datasets/aravinii/duolingo-spaced-repetition-data
Organization logo

Duolingo Spaced Repetition Data

A dataset containing 13 million Duolingo student learning traces

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vinicius Araujo
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

PLEASE UPVOTE IF YOU LIKE THIS CONTENT! 😍

Duolingo is an American educational technology company that produces learning apps and provides language certification. There main app is considered the most popular language learning app in the world.

To progress in their learning journey, each user of the application needs to complete a set of lessons in which they are presented with the words of the language they want to learn. In an infinite set of lessons, each word is applied in a different context and, on top of that, Duolingo uses a spaced repetition approach, where the user sees an already known word again to reinforce their learning.

Each line in this file refers to a Duolingo lesson that had a target word to practice.

The columns are as follows:

  • p_recall - proportion of exercises from this lesson/practice where the word/lexeme was correctly recalled
  • timestamp - UNIX timestamp of the current lesson/practice
  • delta - time (in seconds) since the last lesson/practice that included this word/lexeme
  • user_id - student user ID who did the lesson/practice (anonymized)
  • learning_language - language being learned
  • ui_language - user interface language (presumably native to the student)
  • lexeme_id - system ID for the lexeme tag (i.e., word)
  • lexeme_string - lexeme tag (see below)
  • history_seen - total times user has seen the word/lexeme prior to this lesson/practice
  • history_correct - total times user has been correct for the word/lexeme prior to this lesson/practice
  • session_seen - times the user saw the word/lexeme during this lesson/practice
  • session_correct - times the user got the word/lexeme correct during this lesson/practice

The lexeme_string column contains a string representation of the "lexeme tag" used by Duolingo for each lesson/practice (data instance) in our experiments. The lexeme_string field uses the following format:

`surface-form/lemma

Search
Clear search
Close search
Google apps
Main menu