INTRODUCTION
The growing advance and use of communication technologies have allowed the development of new educational trends based on ubiquity and networking. New forms of education have gradually become very popular, such as the Massive Open Online Course (MOOC) [1-2]. The MOOC are open and participatory courses that are offered free of cost to hundreds and thousands of students and cover topics ranging from technology to poetry. They enable a great expansion in online education. They have experienced rapid development and have received great attention from many institutions and universities. The MOOC have been presented as a new disruptive technology in the educational field [2-4].
The MOOCs have been presented as an opportunity to expand coverage of higher education and also expand access to more students. They are presented as the new path for the expansion of knowledge, university innovation, employability and the sustainable development of massive learning scenarios. This is why many universities are committed to incorporating MOOC into higher education [5-7]. This incorporation and the use of learning environments destined to the offering of MOOC has been increased thanks to the different advantages previously presented and generated from the characteristic of massiveness [8].
The popularity of MOOCs and the massive numbers of participants have made it possible to generate a large volume of data and use it for analytical purposes. Some MOOC data are the same as those obtained in a presential course such as teaching materials, demographics, student background, enrollment information, assessment results and grades [9]. Virtuality allows leaving a trace of the interaction of the students with the learning platform. This data through data analysis techniques can help to understand the behavior of students and how they learn through a virtual environment [10-12].
However, access to data from current MOOC platforms is limited and often difficult to collect and process. Aggregating learning platforms such as Coursera, Edx, MiriadaX, etc., do not easily deliver student interaction data to their associates due to different data processing policies. On the other hand, the own or private platforms, although always keeping the interaction records, do not contain tools that allow the extraction and processing of such data for analysis.
Since the first period of 2016, an instance of the learning platform “Open edX” has been implemented at the University of Cauca. An open-source platform named Selene. The first courses offered were in the modality of Small Private Online Courses (SPOC) [13-14] and Massive Private Online Course (MPOC) [15-16], variants of MOOCs that are characterized by being limited in access (private courses) and therefore also in size but with a wider scope of participation than any conventional online course [13-17].
From the first experiences of the courses offered on the platform, the need to follow up on the students and knowing how they interact with the content arose. This is because the courses have an academic recognition and therefore the instructors in charge of the courses wanted to have a greater control than in conventional MOOCs. The follow-up and analysis of student behavior are strategies that are not incorporated into the Open edX platform. With this the following research question was posed: How to capture and process the data of the interaction of the users of an instance of Open edX?
To answer this question, the objective of this work was to design the necessary mechanisms to capture Selene’s event records and process them to build a Data Set that will allow through Learning Analytics to understand the behavior of students in Selene Unicauca. The creation of several Scripts and technological development that are presented in this document is proposed. In section 2, some of the related works that were taken into account for the study are described. In section 3, the proposed prototype is presented. In section 4, the results obtained are shown. In section 5, the study presents conclusions and future work.
1. RELATED WORKS
In [18] and [19], the author presents a system of learning analytics with video data for a MOOC. The system captures the interactions of students with the video player (pause, repetition, forward, stop, etc.) using a Youtube API and at the same time collects information about the student’s performance in terms of summative assessments. In both works, each time you press a button an abbreviation of the name of the action and the time it occurred are stored in the Google database. Interaction data and student assessment results can be displayed statistically to help tutors better understand student behavior. However, the analysis of the data is still performed by the tutors, a complicated task when the number of students is massive. Also, other types of interactions are not taken into account, such as signings on the platform and interaction with evaluations and forums.
In [11] a set of data from a course at Coursera is analyzed. Several data mining processes or techniques were used and provided some indicators in terms of usefulness, ideas, and guidance for teacher intervention in the courses to improve the quality and delivery of MOOC. In the work, they manage to obtain a classification by groups of the students. The first criterion for grouping is the type of certificate to which students are enrolled. The second criterion is the level of achievement or final grade. It is concluded that the most successful students review the contents and carry out their assessment activities in a more structured and linear way than the least successful students. It is mentioned that data mining can help to understand student behaviors and contribute to improving course designs and the quality of education offered. However, the data is provided directly by the Coursera team. They do not perform the data extraction and processing procedure.
In [10] an exploratory study on an instance of the Open edX platform and a course offered by the University of Cuenca and the Ecuadorian Consortium for Advanced Internet (Cedia) are presented. The study analyzes the browsing behavior of students and relates it to the level of self-regulation they have and their learning style. All the results obtained from this work allow us to understand how students interact with different levels of self-regulation and different learning styles. The results suggest that MOOC should be designed to address the heterogeneity of students and for this purpose an adaptive course structure should be suggested that proposes learning activities and presents content based on the particularities of each student. However, the course was attended by 78 students of whom 24 dropped out or did not complete the course. The number of students present in the course is not sufficient to achieve a good characterization of the behavior of the students in a MOOC and does not describe the process for extracting the data from the platform.
In the previous works, it is observed how the interaction of the students with the contents can be monitored to know the behavior of the student, to improve contents, to improve the instructional design, and even to know its progress. However, none of the papers describe the process of obtaining data from the platform.
2. PROPOSED MECHANISM
The design of the mechanism for monitoring the learning activities of students in Massive Open Online Courses was done taking into account the structure and operation of the Open edX platform. To describe software architectures, it was decided to use the 4+1 views model. This model allows us to represent in a standard form the architecture through UML diagrams. Figure 1 shows the general architecture of the built mechanism [20].
A description of each of the components of the architecture and their relationship to the other elements within the architecture is given below.
Tracking.log: the Selene platform is deployed on a server computer with a version of Ubuntu server 14.04 and is an instance of Open edX. A log file called tracking. log is generated on the platform. In this file are registered all the interactions of the students with the Selene learning environment, such as login, login to the courses, browsing in contents, presentation of forums, and information of the interaction with the evaluative activities. The location of the tracking.log file within the instance is: /edx/ var/log/tracking/. A new log file is periodically created here depending on the amount of interaction data generated. The file is compressed and stored leaving a register of the interactions in the platform since it is released.
Script and extraction: scripts are a set of instructions generally stored in a text file that must be interpreted line by line in real-time for execution. The extraction script contains the commands used to make a copy of the complete folder containing the tracking.log registers of the platform and a synchronization line of this folder in an external device. In this way, all logs can be processed in real-time. Figure 2 shows the script executed every 5 minutes.
Script and processing: once all the registers are available in the external device, the processing script captures the tracking.log file (a file that has the information of the students’ interactions with the platform), makes a copy of it, places it in a specific address where Recolertor.jar can process it and repeats this operation periodically every 5 minutes through the Cron program present in Ubuntu. Processed interaction data can be used to follow-up on students at all times. To obtain the data of past events, the script was modified to decompress one by one the registry files and to order the execution of Recolector.jar for each registry present in the synchronized folder. It was possible to create a data set with student interaction data from the moment the platform was created. Figure 3 shows the processing script which is executed every 5 minutes.
Recolector.jar: is a java-based executable. When executed, it searches for the tracking.log file in a specific location, reads it taking event b and event present in the log file, saves in a buffer the events related to the activities you want to capture, obtain the information in an orderly format and saves them in a database (MySQL). The tracking.log file is written in JSON so the collector must interpret this information properly. Figure 4 shows an example of the events registered in the tracking.log file.
In the figure: within the registration, the line is saved all the necessary information for monitoring activities. For example, the couple “name: play_video” identifies that a student reproduced a video within the learning platform, including student ID information, course, date, section and subsection in the course content, etc. Thus, it is possible to capture student interaction events in terms of access, content, resources, forums, and evaluations.
3. RESULTS
From the execution of the mechanism with the registry files of Selene, the construction of a data set was achieved with more than 1,557,683 events generated from the first period of 2016 until the first period of 2018. Here is the interaction data of four courses that have been offered at the University of Cauca and have been recognized academically. Table 1 shows the number of enrollees in the Selene platform since the first period of 2016 until the first period of 2018.
Courses | 2016-I | 2016-II | 2017-I | 2017-II | 2018-I | Enrolled |
---|---|---|---|---|---|---|
Introducción al emprendimiento con Lean Startup | x | x | x | 265 | 317 | 582 |
Comprensión de textos Argumentativos | 105 | 110 | 109 | 103 | 97 | 524 |
Drones-Curso introductorio virtual FISH | x | x | 133 | 101 | 106 | 340 |
Introducción a la Edición de textos científicos y literarios con LaTeX | x | 102 | 99 | 104 | 100 | 405 |
Astronomía cotidiana | 433 | 428 | 517 | x | X | 1378 |
Total Enrolled | 538 | 640 | 858 | 573 | 620 | 3229 |
Source: Ownelaboration.
One of the courses that have achieved greater participation is the course “Astronomía Cotidiana” that for the first semester of 2017 managed to get 517 students enrolled, more than 10 times the number of students enrolled in a classroom course at the University of Cauca.
This course is catalogued as an MPOC for the characteristic of being private. To contribute to the understanding of student behaviors in MOOC, the student interactions of the course for the first semester of 2016 were obtained from the database. For data analysis, a CSV file was generated by querying the MySQL database. Figure 5 shows a screenshot of the analyzed CSV.
A small statistic alanalysis was performed with the CSV. This served as a follow-up instrument for the teacher who taught the course. Figure 6 shows the course behavior data related to the number of interactions made by students throughout the course with contents, videos, forums, and exams. Students in an MPOC with academic recognition behave around evaluative activities [21]. The students have big peaks of interaction on the dates the tests are scheduled.
4. CONCLUSIONS AND FUTURE WORK
In recent years, the MOOC have positioned themselves as a new educational techno- logy that is gradually making its way into higher education. However, there are still challenges to overcome such as offering adequate courses so that these strengthen the skills and competencies of students. An alternative that is presented is the analysis of the huge amounts of data that are obtained in the platforms that will allow us to understand how learning processes are developed in massive virtual environments.
This work managed to show the possibility and methodology used to extract the data from an instance of Open edX and to begin an analysis towards the understanding of learning where it was possible to see the behavior of the “Astronomía Cotidiana” course in terms of student interactions. This is a good example of how students behave according to evaluative activities and leaves a question: How to make students feel motivated to perform learning activities without focusing on evaluative tasks?
It is proposed to continue working on the analysis of data collected from other cour- ses offered through Selene that are gradually increasing and to achieve an understanding of various areas of the learning process based on variables such as student profile. Our focus in this line of research is to identify the dishonest behaviours of students in MOOC courses that have academic recognition using techniques of learning analytics.