1 dataset found
  1. Cleaned Duolingo Learning Data

    • kaggle.com
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charity Githogora (2025). Cleaned Duolingo Learning Data [Dataset]. https://www.kaggle.com/datasets/charitygithogora/cleaned-duolingo-learning-data/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Charity Githogora
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains pre-processed learning traces from Duolingo’s spaced repetition system. It includes timestamps, user interactions, and correctness data, structured to analyze learning patterns over time. The dataset was cleaned and refined in Google Colab before being used to generate visual insights, including a heatmap showing learning activity trends.

    Checkout the heatmap visualization: https://github.com/Charity-Githogora/duolingo-heatmap-insights

    Source: The original dataset was obtained from (https://www.kaggle.com/datasets/aravinii/duolingo-spaced-repetition-data) , and it has been processed to improve usability for data analysis and visualization.

    Columns:

    timestamp – The time of user interaction (converted to datetime format). hour – The hour of the day the interaction occurred. day_of_week – The day of the week the interaction occurred. correct – Whether the response was correct (1) or incorrect (0). Other relevant features extracted for analysis.

    Usage: It can be used for various analyses, such as identifying peak learning hours, tracking performance trends over time, and understanding how engagement impacts accuracy. Researchers and data enthusiasts can explore predictive modeling, time-series analysis, and interactive visualizations to uncover deeper insights. Additionally, the dataset can be used to generate heatmaps and other visual representations of learning activity.

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Charity Githogora (2025). Cleaned Duolingo Learning Data [Dataset]. https://www.kaggle.com/datasets/charitygithogora/cleaned-duolingo-learning-data/data
Organization logo

Cleaned Duolingo Learning Data

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 25, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Charity Githogora
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This dataset contains pre-processed learning traces from Duolingo’s spaced repetition system. It includes timestamps, user interactions, and correctness data, structured to analyze learning patterns over time. The dataset was cleaned and refined in Google Colab before being used to generate visual insights, including a heatmap showing learning activity trends.

Checkout the heatmap visualization: https://github.com/Charity-Githogora/duolingo-heatmap-insights

Source: The original dataset was obtained from (https://www.kaggle.com/datasets/aravinii/duolingo-spaced-repetition-data) , and it has been processed to improve usability for data analysis and visualization.

Columns:

timestamp – The time of user interaction (converted to datetime format). hour – The hour of the day the interaction occurred. day_of_week – The day of the week the interaction occurred. correct – Whether the response was correct (1) or incorrect (0). Other relevant features extracted for analysis.

Usage: It can be used for various analyses, such as identifying peak learning hours, tracking performance trends over time, and understanding how engagement impacts accuracy. Researchers and data enthusiasts can explore predictive modeling, time-series analysis, and interactive visualizations to uncover deeper insights. Additionally, the dataset can be used to generate heatmaps and other visual representations of learning activity.

Search
Clear search
Close search
Google apps
Main menu