The dataset provides detailed information on the communications taking place between learners in two offerings of the Massively Open Online Course for Educators (MOOC-Eds) titled The Digital Learning Transition in K-12 Schools. The courses were offered to educators from the USA and abroad during the spring and fall of 2013. Though based on the same course, minor controlled variations were made to both MOOCs in terms of the course length, discussion prompts, and group size. The primary use of this dataset is to enable social network analyses (SNAs) of these communications. In particular, it allows modeling network mechanisms to better understand factors that facilitate or impede the exchange of information among educators, and includes relevant characteristics of the participants, such as their professional roles and their experience in education.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
2022
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this course, you will explore a variety of open-source technologies for working with geosptial data, performing spatial analysis, and undertaking general data science. The first component of the class focuses on the use of QGIS and associated technologies (GDAL, PROJ, GRASS, SAGA, and Orfeo Toolbox). The second component of the class introduces Python and associated open-source libraries and modules (NumPy, Pandas, Matplotlib, Seaborn, GeoPandas, Rasterio, WhiteboxTools, and Scikit-Learn) used by geospatial scientists and data scientists. We also provide an introduction to Structured Query Language (SQL) for performing table and spatial queries. This course is designed for individuals that have a background in GIS, such as working in the ArcGIS environment, but no prior experience using open-source software and/or coding. You will be asked to work through a series of lecture modules and videos broken into several topic areas, as outlined below. Fourteen assignments and the required data have been provided as hands-on opportunites to work with data and the discussed technologies and methods. If you have any questions or suggestions, feel free to contact us. We hope to continue to update and improve this course. This course was produced by West Virginia View (http://www.wvview.org/) with support from AmericaView (https://americaview.org/). This material is based upon work supported by the U.S. Geological Survey under Grant/Cooperative Agreement No. G18AP00077. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the opinions or policies of the U.S. Geological Survey. Mention of trade names or commercial products does not constitute their endorsement by the U.S. Geological Survey. After completing this course you will be able to: apply QGIS to visualize, query, and analyze vector and raster spatial data. use available resources to further expand your knowledge of open-source technologies. describe and use a variety of open data formats. code in Python at an intermediate-level. read, summarize, visualize, and analyze data using open Python libraries. create spatial predictive models using Python and associated libraries. use SQL to perform table and spatial queries at an intermediate-level.
https://brightdata.com/licensehttps://brightdata.com/license
We'll tailor a Udemy dataset to meet your unique needs, encompassing course titles, user engagement metrics, completion rates, demographic data of learners, enrollment numbers, review scores, and other pertinent metrics.
Leverage our Udemy datasets for diverse applications to bolster strategic planning and market analysis. Scrutinizing these datasets enables organizations to grasp learner preferences and online education trends, facilitating nuanced educational program development and learning initiatives. Customize your access to the entire dataset or specific subsets as per your business requisites.
Popular use cases involve optimizing educational content based on engagement insights, enhancing learning strategies through targeted learner segmentation, and identifying and forecasting trends to stay ahead in the online education landscape.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite the following paper when using this dataset:
N. Thakur, “A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,” Preprints, 2022, DOI: 10.20944/preprints202206.0146.v1
Abstract
The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset files contain the raw version that comprises 52,868 Tweet IDs (that correspond to the same number of Tweets) and the cleaned and preprocessed version that contains 46,208 unique Tweet IDs. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.
Data Description
The dataset comprises 7 .txt files. The raw version of this dataset comprises 6 .txt files (TweetIDs_Corona Virus.txt, TweetIDs_Corona.txt, TweetIDs_Coronavirus.txt, TweetIDs_Covid.txt, TweetIDs_Omicron.txt, and TweetIDs_SARS CoV2.txt) that contain Tweet IDs grouped together based on certain synonyms or terms that were used to refer to online learning and the Omicron variant of COVID-19 in the respective tweets. Table 1 shows the list of all the synonyms or terms that were used for the dataset development. The cleaned and preprocessed version of this dataset is provided in the .txt file - TweetIDs_Duplicates_Removed.txt. A description of these dataset files is provided in Table 2.
The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.
Table 1. List of commonly used synonyms, terms, and phrases for online learning and COVID-19 that were used for the dataset development
Terminology |
List of synonyms and terms |
COVID-19 |
Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus |
online learning |
online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures |
Table 2: Description of the dataset files along with the information about the number of Tweet IDs in each of them
Filename |
No. of Tweet IDs |
Description |
TweetIDs_Corona Virus.txt |
321 |
Tweet IDs correspond to tweets that comprise the keywords – "corona virus" and one or more keywords/terms that refer to online learning |
TweetIDs_Corona.txt |
1819 |
Tweet IDs correspond to tweets that comprise the keyword – "corona" or "coronaoutbreak" and one or more keywords/terms that refer to online learning |
TweetIDs_Coronavirus.txt |
1429 |
Tweet IDs correspond to tweets that comprise the keywords – "coronavirus" or "coronaviruspandemic" and one or more keywords/terms that refer to online learning |
TweetIDs_Covid.txt |
41088 |
Tweet IDs correspond to tweets that comprise the keywords – "COVID" or "COVID19" or "COVID-19" and one or more keywords/terms that refer to online learning |
TweetIDs_Omicron.txt |
8198 |
Tweet IDs correspond to tweets that comprise the keywords – "omicron" or "omicron variant" and one or more keywords/terms that refer to online learning |
TweetIDs_SARS CoV2.txt |
13 |
Tweet IDs correspond to tweets that comprise the keyword – "SARS-CoV-2" and one or more keywords/terms that refer to online learning |
TweetIDs_Duplicates_Removed.txt |
46208 |
A collection of unique Tweet IDs from all the 6 .txt files mentioned above after data preprocessing, data clearing, and removal of duplicate tweets |
Online Courses Dataset
This repository provides a comprehensive dataset of online courses, including details about course categories, duration, platforms, enrollment numbers, completion rates, and ratings. The dataset can be used for trend analysis, platform comparisons, and market insights.
Key Features
Course Categories: Analyze trends across AI, Business, Data Science, Design, Finance, and more. Enrollment Metrics: Understand popularity with student enrollment… See the full description on the dataset page: https://huggingface.co/datasets/Mitul1999/online-courses-usage-and-history-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Online education has become more prevalent in the 21st century, especially after the COVID-19 pandemic. One of the major trends is the learning via Massive Open Online Courses (MOOCs), which is increasingly present at many universities around the world these days. In these courses, learners interact with the pre-designed materials and study everything mostly by themselves. Therefore, gaining insights into their satisfaction of such courses is vitally important to improve their learning experiences and performances. However, previous studies primarily focused on factors that affected learners’ satisfaction, not on how and what the satisfaction was. Moreover, past research mainly employed the narrative reviews posted on MOOC platforms; very few utilized survey and interview data obtained directly from MOOC users. The present study aims to fill in such gaps by employing a mixed-methods approach including a survey design and semi-structured interviews with the participation of 120 students, who were taking academic writing courses on Coursera (one of the world-leading MOOC platforms), at a private university in Vietnam. Results from both quantitative and qualitative data showed that the overall satisfaction of courses on Coursera was relatively low. Furthermore, most learners were not satisfied with their learning experience on the platform, primarily due to inappropriate assessment, lack of support, and interaction with teachers as well as improper plagiarism check. In addition, there were moderate correlations between students’ satisfaction and their perceived usefulness of Coursera courses. Pedagogically, teachers’ feedback and grading, faster support from course designers as well as easier-to-use plagiarism checking tools are needed to secure learners’ satisfaction of MOOCs.
This dataset release is comprised of de-identified data from March 2014 - September 2015 of Canvas Network open courses, along with related documentation. In balancing data utility with thorough de-identification, this dataset favors utility; therefore, access and usage of this dataset is restricted as described in the Canvas Network Data Usage Agreement. These data use a star schema to organize various course, activity, and person records using dimensions and facts. The structure of this dataset is based on the Canvas Data star schema as described in https://portal.inshosteddata.com/docs. The first release of this dataset is the Canvas Network Courses, Activities, and Users (4/2014 - 9/2015) Dataset, version 1.0, created on March 3, 2016. The data set is split into multiple files for convenience: CNCAU_1403-1509_R_v1_03-03-2016.tgz contains the facts and dimensions representing the breadth of the dataset CNCAU_1403-1509_R_v1_03-03-2016_requests-01.gz - ...08.gz contain user page view requests The resulting files are plain text, with tab-separated values.
Online Data Science Training Programs Market Size 2025-2029
The online data science training programs market size is forecast to increase by USD 8.67 billion, at a CAGR of 35.8% between 2024 and 2029.
The market is experiencing significant growth due to the increasing demand for data science professionals in various industries. The job market offers lucrative opportunities for individuals with data science skills, making online training programs an attractive option for those seeking to upskill or reskill. Another key driver in the market is the adoption of microlearning and gamification techniques in data science training. These approaches make learning more engaging and accessible, allowing individuals to acquire new skills at their own pace. Furthermore, the availability of open-source learning materials has democratized access to data science education, enabling a larger pool of learners to enter the field. However, the market also faces challenges, including the need for continuous updates to keep up with the rapidly evolving data science landscape and the lack of standardization in online training programs, which can make it difficult for employers to assess the quality of graduates. Companies seeking to capitalize on market opportunities should focus on offering up-to-date, high-quality training programs that incorporate microlearning and gamification techniques, while also addressing the challenges of continuous updates and standardization. By doing so, they can differentiate themselves in a competitive market and meet the evolving needs of learners and employers alike.
What will be the Size of the Online Data Science Training Programs Market during the forecast period?
Request Free SampleThe online data science training market continues to evolve, driven by the increasing demand for data-driven insights and innovations across various sectors. Data science applications, from computer vision and deep learning to natural language processing and predictive analytics, are revolutionizing industries and transforming business operations. Industry case studies showcase the impact of data science in action, with big data and machine learning driving advancements in healthcare, finance, and retail. Virtual labs enable learners to gain hands-on experience, while data scientist salaries remain competitive and attractive. Cloud computing and data science platforms facilitate interactive learning and collaborative research, fostering a vibrant data science community. Data privacy and security concerns are addressed through advanced data governance and ethical frameworks. Data science libraries, such as TensorFlow and Scikit-Learn, streamline the development process, while data storytelling tools help communicate complex insights effectively. Data mining and predictive analytics enable organizations to uncover hidden trends and patterns, driving innovation and growth. The future of data science is bright, with ongoing research and development in areas like data ethics, data governance, and artificial intelligence. Data science conferences and education programs provide opportunities for professionals to expand their knowledge and expertise, ensuring they remain at the forefront of this dynamic field.
How is this Online Data Science Training Programs Industry segmented?
The online data science training programs industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. TypeProfessional degree coursesCertification coursesApplicationStudentsWorking professionalsLanguageR programmingPythonBig MLSASOthersMethodLive streamingRecordedProgram TypeBootcampsCertificatesDegree ProgramsGeographyNorth AmericaUSMexicoEuropeFranceGermanyItalyUKMiddle East and AfricaUAEAPACAustraliaChinaIndiaJapanSouth KoreaSouth AmericaBrazilRest of World (ROW)
By Type Insights
The professional degree courses segment is estimated to witness significant growth during the forecast period.The market encompasses various segments catering to diverse learning needs. The professional degree course segment holds a significant position, offering comprehensive and in-depth training in data science. This segment's curriculum covers essential aspects such as statistical analysis, machine learning, data visualization, and data engineering. Delivered by industry professionals and academic experts, these courses ensure a high-quality education experience. Interactive learning environments, including live lectures, webinars, and group discussions, foster a collaborative and engaging experience. Data science applications, including deep learning, computer vision, and natural language processing, are integral to the market's growth. Data analysis, a crucial application, is gaining traction due to the increasing demand
Although the commercial name for the The USAID University - Learning Management System is CSOD InCompass, the agencies that use the system have renamed (or rebranded) their specific agency portals to meet their own needs. lnCompass is a comprehensive talent management system that incorporates the following functional modules: 1) Learning -- The Learning module supports the management and tracking of training events and individual training records. Training events may be instructor Jed or online. Courses may be managed within the system to provide descriptions, availability, and registration. Online content is stored on the system. Training information stored for individuals includes courses completed, scores, and courses registered for, 2) Connect -- The Connect module supports employee collaboration efforts. Features include communities of practice, expertise location, blogs, and knowledge sharing support. Profile information that may be stored by the system includes job position, subject matter expertise, and previous accomplishments, 3) Performance -- The Performance module supports management of organizational goals and alignment of those goals to individual performance. The module supports managing skills and competencies for the organization. The module also supports employee performance reviews. The types of information gathered about employees include their skills, competencies, and performance evaluation, 4) Succession -- The Succession module supports workforce management and planning. The type of information gathered for this module includes prior work experience, skills, and competencies, and 5) Extended Enterprise -- The Extended Enterprise module supports delivery of training outside of the organization. Training provided may be for a fee. The type of information collected for this module includes individual data for identifying the person for training records management and related information for commercial transactions.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Online learning (e-learning) course enrolment totals by course and year for public and Catholic schools. School boards report this data using the Ontario School Information System (OnSIS). Includes: * course code * course name * online learning course enrolment totals by year Enrolment totals include withdrawn or dropped courses. A student enrolled in more than one course is counted for each course. Data excludes private schools and Education and Community Partnership Program (ECPP) facilities. Not all courses offered by school boards are available to students via online learning. Cells are suppressed in categories with less than 10 students. Enrolment totals are rounded to the nearest five. Final as of October 4, 2024
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset accompanies the research paper titled "Enhancing Personalized Learning in Online Education through Integrated Cross-Course Learning Path Planning." The dataset consists of MATLAB data files (.mat format).The dataset includes data on seven types of learner attributes, named from LearnerA.mat to LearnerG.mat. Each learner dataset contains two variables: L and LP. L is a 10x16 matrix that stores learner attributes, where each row represents a learner. The first column indicates the learner's ability level, the second column indicates the expected learning time, columns 3 to 6 represent normalized learning styles, and columns 7 to 16 represent learning objectives. LP is a structure that stores statistical information about this matrix.The dataset also includes data on seven types of learning resource attributes, named DatasetA.mat, DatasetB.mat, DatasetC.mat, DatasetAB.mat, DatasetAC.mat, DatasetBC.mat, and DatasetABC.mat. Each resource dataset contains two variables: M and MP. M is a matrix that stores the attributes of learning materials, where each row represents a material. The first column indicates the material's difficulty level, the second column represents the learning time required for the material, columns 3 to 6 describe the type of material, columns 7 to 16 cover the knowledge points addressed by the material, and columns 17 to 26 list the prerequisite knowledge points required for the material. MP is a structure that stores statistical information about this matrix.The dataset encompasses results from learning path planning involving seven types of learners across seven datasets, totaling 49 datasets, named in the format PathCost4_LSHADE_cnEpSin_D_X_L_Y.mat. Here, X represents the type of learning resource dataset (A, B, C, AB, AC, BC, ABC) and Y represents the type of learner (A to G). Each data file contains three variables: Gbest, Gtime, and S. Gbest is a 30x10 matrix, where each column stores the best cost function obtained from 30 runs of path planning for a learner on the corresponding dataset. Gtime is a 30x10 matrix, where each column stores the time spent on each run for a learner on the corresponding dataset. S is a 30x10 cell array storing the status information from each run.Finally, the dataset includes a compilation of the best cost functions for all runs for all learners across all learning material datasets, named learnerBest.mat. The file contains a variable, learnerBest, which is a 7x7x10x30 four-dimensional array. The first dimension represents the type of learner, the second dimension represents the type of learning material, the third dimension represents the learner index, and the fourth dimension represents the run index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This seminar is an applied study of deep learning methods for extracting information from geospatial data, such as aerial imagery, multispectral imagery, digital terrain data, and other digital cartographic representations. We first provide an introduction and conceptualization of artificial neural networks (ANNs). Next, we explore appropriate loss and assessment metrics for different use cases followed by the tensor data model, which is central to applying deep learning methods. Convolutional neural networks (CNNs) are then conceptualized with scene classification use cases. Lastly, we explore semantic segmentation, object detection, and instance segmentation. The primary focus of this course is semantic segmenation for pixel-level classification. The associated GitHub repo provides a series of applied examples. We hope to continue to add examples as methods and technologies further develop. These examples make use of a vareity of datasets (e.g., SAT-6, topoDL, Inria, LandCover.ai, vfillDL, and wvlcDL). Please see the repo for links to the data and associated papers. All examples have associated videos that walk through the process, which are also linked to the repo. A variety of deep learning architectures are explored including UNet, UNet++, DeepLabv3+, and Mask R-CNN. Currenlty, two examples use ArcGIS Pro and require no coding. The remaining five examples require coding and make use of PyTorch, Python, and R within the RStudio IDE. It is assumed that you have prior knowledge of coding in the Python and R enviroinments. If you do not have experience coding, please take a look at our Open-Source GIScience and Open-Source Spatial Analytics (R) courses, which explore coding in Python and R, respectively. After completing this seminar you will be able to: explain how ANNs work including weights, bias, activation, and optimization. describe and explain different loss and assessment metrics and determine appropriate use cases. use the tensor data model to represent data as input for deep learning. explain how CNNs work including convolutional operations/layers, kernel size, stride, padding, max pooling, activation, and batch normalization. use PyTorch, Python, and R to prepare data, produce and assess scene classification models, and infer to new data. explain common semantic segmentation architectures and how these methods allow for pixel-level classification and how they are different from traditional CNNs. use PyTorch, Python, and R (or ArcGIS Pro) to prepare data, produce and assess semantic segmentation models, and infer to new data.
There is no description for this dataset.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/3UKVORhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/3UKVOR
[NOTE: Data are currently only accessible to qualified reviewers. For reviewers, detailed dataset descriptions are provided as text files associated with each dataset.] This dataset includes statistics about student actions in MITx and HarvardX courses, used in an analysis of Copying Answers using Multiple Existences Online (CAMEO) behavior. The data are partially anonymized, but insufficiently so for open release.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Datasets are critical for emotion analysis in the machine learning field. This study aims to explore emotion analysis datasets and related benchmarks in online learning, since, currently, there are very few studies that explore the same. We have scientifically labeled the topic and nine-category emotion of 4715 comment texts in online learning platforms using the “three-person voting label method” based on the “sentence-level” and multi-category labeling dimensions with our self-developed system. After testing the consistency of the labeling results using the Fleiss Kappa method, we found that the consistency of the dataset was about 0.51, representing a moderate strength of agreement. Based on the dataset, the prediction accuracy of the Long-Short Term Memory (LSTM) method is about 0.68. This dataset provides a benchmark for the multi- category emotion dataset in the Chinese online learning field. It can provide a basis for the subsequent solution of emotion analysis, monitoring, and intervention in the education field. It can also provide a reference for constructing subsequent datasets in the education field. We need to remind you that this is a Chinese dataset. If you want to use this dataset, please contact the author and you should request for the dataset below.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/11.2/customlicense?persistentId=doi:10.7910/DVN/26147https://dataverse.harvard.edu/api/datasets/:persistentId/versions/11.2/customlicense?persistentId=doi:10.7910/DVN/26147
This release is comprised of de-identified data from the first year (Academic Year 2013: Fall 2012, Spring 2013, and Summer 2013) of HarvardX courses on the edX platform along with related documentation. These data are aggregate records, and each record represents one individual's activity in one edX course. For more information about the existing analyses of these data and the first year of HarvardX courses, please see the HarvardX and MITx working paper "HarvardX and MITx: The first year of open online courses" by Andrew Ho, Justin Reich, Sergiy Nesterko, Daniel Seaton, Tommy Mullaney, Jim Waldo, and Isaac Chuang (http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2381263). The first release of this dataset is the HarvardX Person-Course Academic Year 2013 De-Identified dataset, version 3.0, created on November 12, 2019. File name: HXPC13_DI_v3_11-13-2019.csv The md5sum for this release (HXPC13_DI_v3_11-13-2019.csv) is: 53419b486c3b19c14d2f06612980f630
Learning Management System online courses for USAID staff to access.
LectureBank Dataset is a manually collected dataset of lecture slides. It contains 1,352 online lecture files from 60 courses covering 5 different domains, including Natural Language Processing (nlp), Machine Learning (ml), Artificial Intelligence (ai), Deep Learning (dl) and Information Retrieval (ir). In addition, it also contains the corresponding annotations for each slide.
Stanford Online Products (SOP) dataset has 22,634 classes with 120,053 product images. The first 11,318 classes (59,551 images) are split for training and the other 11,316 (60,502 images) classes are used for testing
The dataset provides detailed information on the communications taking place between learners in two offerings of the Massively Open Online Course for Educators (MOOC-Eds) titled The Digital Learning Transition in K-12 Schools. The courses were offered to educators from the USA and abroad during the spring and fall of 2013. Though based on the same course, minor controlled variations were made to both MOOCs in terms of the course length, discussion prompts, and group size. The primary use of this dataset is to enable social network analyses (SNAs) of these communications. In particular, it allows modeling network mechanisms to better understand factors that facilitate or impede the exchange of information among educators, and includes relevant characteristics of the participants, such as their professional roles and their experience in education.