1 dataset found

Duolingo Spaced Repetition Data
kaggle.com
Updated Feb 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vinicius Araujo (2024). Duolingo Spaced Repetition Data [Dataset]. https://www.kaggle.com/datasets/aravinii/duolingo-spaced-repetition-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vinicius Araujo
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
PLEASE UPVOTE IF YOU LIKE THIS CONTENT! 😍

Duolingo is an American educational technology company that produces learning apps and provides language certification. There main app is considered the most popular language learning app in the world.

To progress in their learning journey, each user of the application needs to complete a set of lessons in which they are presented with the words of the language they want to learn. In an infinite set of lessons, each word is applied in a different context and, on top of that, Duolingo uses a spaced repetition approach, where the user sees an already known word again to reinforce their learning.

Each line in this file refers to a Duolingo lesson that had a target word to practice.

The columns are as follows:

p_recall - proportion of exercises from this lesson/practice where the word/lexeme was correctly recalled

timestamp - UNIX timestamp of the current lesson/practice

delta - time (in seconds) since the last lesson/practice that included this word/lexeme

user_id - student user ID who did the lesson/practice (anonymized)

learning_language - language being learned

ui_language - user interface language (presumably native to the student)

lexeme_id - system ID for the lexeme tag (i.e., word)

lexeme_string - lexeme tag (see below)

history_seen - total times user has seen the word/lexeme prior to this lesson/practice

history_correct - total times user has been correct for the word/lexeme prior to this lesson/practice

session_seen - times the user saw the word/lexeme during this lesson/practice

session_correct - times the user got the word/lexeme correct during this lesson/practice

The lexeme_string column contains a string representation of the "lexeme tag" used by Duolingo for each lesson/practice (data instance) in our experiments. The lexeme_string field uses the following format:

`surface-form/lemma
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Vinicius Araujo (2024). Duolingo Spaced Repetition Data [Dataset]. https://www.kaggle.com/datasets/aravinii/duolingo-spaced-repetition-data

Duolingo Spaced Repetition Data

A dataset containing 13 million Duolingo student learning traces

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 11, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Vinicius Araujo

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

PLEASE UPVOTE IF YOU LIKE THIS CONTENT! 😍

Duolingo is an American educational technology company that produces learning apps and provides language certification. There main app is considered the most popular language learning app in the world.

To progress in their learning journey, each user of the application needs to complete a set of lessons in which they are presented with the words of the language they want to learn. In an infinite set of lessons, each word is applied in a different context and, on top of that, Duolingo uses a spaced repetition approach, where the user sees an already known word again to reinforce their learning.

Each line in this file refers to a Duolingo lesson that had a target word to practice.

The columns are as follows:

p_recall - proportion of exercises from this lesson/practice where the word/lexeme was correctly recalled
timestamp - UNIX timestamp of the current lesson/practice
delta - time (in seconds) since the last lesson/practice that included this word/lexeme
user_id - student user ID who did the lesson/practice (anonymized)
learning_language - language being learned
ui_language - user interface language (presumably native to the student)
lexeme_id - system ID for the lexeme tag (i.e., word)
lexeme_string - lexeme tag (see below)
history_seen - total times user has seen the word/lexeme prior to this lesson/practice
history_correct - total times user has been correct for the word/lexeme prior to this lesson/practice
session_seen - times the user saw the word/lexeme during this lesson/practice
session_correct - times the user got the word/lexeme correct during this lesson/practice

The lexeme_string column contains a string representation of the "lexeme tag" used by Duolingo for each lesson/practice (data instance) in our experiments. The lexeme_string field uses the following format:

`surface-form/lemma

Clear search

Close search

Google apps

Main menu