1. INTRODUCTION
Virtual Education (VE) offers multiple benefits, not only because of its convenience and flexibility for students and teachers, but also because it can improve educational coverage, especially in remote areas with limited access to resources. Nevertheless, the quality of virtual education is controversial although, according to the U.S. Department of Education [1], online students achieve a better performance than those who take face-to-face classes. Furthermore, online students tend to be self-motivated, self-disciplined, and self-directed, which makes VE a very popular modality nowadays.
The freedom students experience in VE also produces security and reliability issues, especially when giving tests and exams According to Bretag [2], fraud in VE is higher and more worrying than in traditional education. For instance, 95 % of the students in Israel and 69 % in Korea admitted to committing fraud in virtual exams or tests and the trend is similar in the rest of the world [2]. For this reason, virtual tests are not used in evaluations such as admission exams or final tests by universities.
In general terms, biometric systems can be classified into two approaches: verification and identification [3]. In identification, the biometric features of a user are compared to multiple users in a database in order to find the identity of the user among all the individuals. In verification, a previously registered user logs-in to the system and the biometric features of the user are compared with the biometric features of the register.
Depending the similarity of the features, the system may decide whether the user is valid or not.
Keystroke Dynamics (KD) analysis is a very good option to capture biometric information to control who has access to certain information or platforms. One of the main advantages of KD is that it does not require the use of additional hardware, i.e., the identity of a user can be verified with a regular keyboard computer.
KD analysis started in the 20th century when telegraph operators had to transmit dozens of words in a short period of time, developing a distinctive rhythm that was captured by the operators on the other side of the line to identify who was transmitting [4]. Later, in 1990, Joyce and Grupta [5] extracted specific digital signatures to identify users based on their KD.
The authors asked users to type their username and password 8 times to compute a curve with the average time they took to enter the data. At a later login, the system compared the average curve with the new curve generated in the new login. Then, the system detected whether the user was valid or an impostor based on a measure of similarity between the two curves. The system was evaluated with 30 valid users and 27 impostors.
As a result, there was a total of 30 valid access attempts and 810 intruder access attempts. The authors reported a False Positive Rate (FPR) of 0.25 % and a False Negative Rate (FNR) of 16.0 %. The system had several usability issues since the user was requested to type the data correctly.
The system was biased by cases where a user deleted wrong characters. A similar strategy was proposed in [6] to identify 173 users based on their KD. The users attended a programming course at the Helsinki University, and the data were extracted based on their programming exercises. The authors created a student profile based on the average hold time when pressing any key, the average hold time when pressing a particular key, the average time when pressing two particular keys, and a combination of the three previous times. The similarity between the evaluation sample and the database was measured with the Euclidean distance.
The authors reported accuracies of up to 97 %. In [7], the authors proposed a model to verify user identity with features extracted when users typed a password on a smartphone. The authors asked 94 different users to type the password “.tieRoanl” in order to extract features such as pressure when touching the screen, coordinates of the pressing point, and times when the finger presses or releases the screen. The authors computed several statistical functionals from the keystrokes and obtained a set of 155 features. The most important features were selected based on a minimum Redundancy Maximum Relevance (mRMR) algorithm. The selected features included pressure and coordinates. The authors reported an accuracy of 97.4 % using a Support Vector Machine (SVM) classifier. In recent years, identity verification based on KD has captured the attention of the research community. For instance, a keystroke dynamic application was presented in [8]. In the study, the authors created a keyprint (typing fingerprints) to authenticate users in online courses. The aim of a keyprint is to capture few data with specific characteristics of a user’s KD; therefore, only data with unusual values of typing dynamics are considered. The authors claim that this system is suitable for verification but not for identification.
They also showed that two samples from the same user are very unlikely to be exactly the same; therefore, to determine the similarity between the samples, a t-test (𝛼=0.05) is enough. The decision is made based on the equal error rate (EER), i.e., where FPR and FNR are the same. The authors reported an accuracy of 80 %, but the main drawback of the approach was that users needed to type least 964 characters to be correctly identified.
A strategy to authenticate a user identity based on KD is proposed in [9]. Where the identity of the users is verified by comparing enrollment and log-in information. 63 users where asked to type 5 items of personal information: name, last name, email address, nationality, and national ID.
The database comprised 12 genuine accesses and 12 impostor access per user to enroll, for a total of 7560 samples. Six genuine samples were used to register the user; and the rest, to log in. The authors tested different features and, but the best result was obtained using time between key press and release and the difference between the time of pressing a key and releasing the following one. With these features and a classifier based on the Modified Scaled Manhattan distance, they obtained an EER of 2.4 %. In [9], this result was achieved because the identity of users was verified using KD when they typed data such as name, email address, and other information. As users are similar with these data, the KD will probably not vary from one sample to the next, which allows systems to verify users’ identify in a more accurate way.
1.1 Contribution of this study
This paper proposes a methodology to verify the identity of students based on their KD. The proposed approach is tested on two different modalities: intrusive and non-intrusive. The first mode considers the case when the subject is aware of being tested, and the second mode considers the case where the subject is not aware of the verification process, and then a different writing task is required to verify the identity of the subject.
The features extracted from the writing tasks are used to create Gaussian Mixture Models (GMM). Those models are compared using probabilistic distances to make the decision whether a user is valid or not. The main difference of the proposed method with respect to others reported in the literature is that our approach is based on probabilistic models instead of the direct comparison of feature sets. The results indicate that it is possible to detect intruders with accuracies of up to 89 %, measured in the EER.
2. MATERIALS AND METHODS
2.1 Participants
A total of 170 subjects (116 male) participated in this study. The average age was 24 years old. The subjects were asked to perform 5 different tasks which were designed to capture the KD over different regions of the keyboard. Most users were undergraduate students from the University of Antioquia. Users with higher education attainment were also considered.
In addition, 20 of the 170 users performed different tasks in two different sessions. (Table 1) details participants’ information.
2.2 Data collection
Each user of the database performed 5 different tasks the first 4 tasks were designed to capture specific movements on the keyboard. For instance, task 1 captures long horizontal displacements. In this task, the user typed the sentence “El sapo de mi casa come queso, zapallo y xoubas”. Here, the characters of each word follow other characters on the opposite side of the keyboard; thus, it is possible to define the user’s dynamics while moving from one side to the other. (Fig. 1a) shows the keyboard regions involved in this task.
The arrow indicates displacements between the two regions. Similarly, task 2, “En un pueblo un niño juega afuera y tu vejez es notable”, aims to capture short displacements along the horizontal axis. These displacements are shown in (Fig. 1b). Task 3, “La leña esta partida, la tijera se ha roto, yo quiero jugar y reír, dale a la gata sus gatitos y las fresas y las patatas del huerto”, connects characters in the middle row of the keyboard with some in the top row, defining top vertical displacements. This task is shown in (Fig. 1c).
Finally, task 4, “La vaca flaca, las lañas malvas, las jacas blancas, a la sal acabas la salsa, zancada flaca”, requires the user to connect characters in the middle row with characters in the bottom row, defining the lower vertical displacements, as shown in (Fig. 1d). To define users’ KD in normal conditions, we considered the task 5, which has a total of 500 characters. This task was extracted from the novel Frankenstein or the modern Prometheus by Mary Shelly [10]. (Table 2) details the size of each task.
2.3 Methods
A user registers in the platform by typing the first 4 tasks previously described in the (Fig 1). When the user types, the system returns data from the KD. The user-model is created with the KD data. When the user logs into the platform, s/he should type one of the 5 tasks following a procedure similar to the one completed during registration stage. A login model is created per user and compared to the model created during registration stage. Finally, if the distance between these two models is short, the user is classified as valid; otherwise, the user is classified as an intruder. The general methodology is summarized in (Fig. 2). The next subsections detail the methods applied at each stage of the methodology.
2.4 Raw information extracted from the computer to model KD
Computers can provide the ASCII code of the characters that are typed when a text is written. They can also store the time the keys were pressed (P) and released (R). (Table 3) shows an example with the raw information that can be extracted when the word “Hola” is typed.
2.5 User Characterization
The objective of this stage is to find a feature matrix 𝑋 ∈ ℝ 𝑛,𝑘 associated to each user. 𝑛 refers to the number of segments, and 𝑘 is the number of extracted features. (Fig. 3). Describes the feature matrix 𝑋. Note that each task might have a different number of segments (rows), but the number of features is fixed. The characterization process is divided into two parts: segmentation and feature extraction.
2.5.1. Segmentation
Each row in 𝑋 refers to a specific segment of the text that the user has typed. These segments were based on a tri-graph model, which consists of small packets with the information of three consecutive characters. A similar strategy was considered in another study for identity verification based on speech signals [11].
For our analysis a sliding window of 5 tri-graphs, with an overlap of 3 tri-graphs, was used, as shown in (Fig. 4).
2.5.2 Feature extraction
A total of six-time series are created when the user types each character: three when the key is pressed and three when the key is released. These times are shown in Table 3. With this information, it is possible to extract 2 main features: Hold time, which is the time between press and release of a key; and Flight time, which is the time between pressing a key and pressing the next one, as described in (Fig 5). The thirteen features that are extracted per segment are described below:
Total Hold Time (T HT ): the sum of the hold times of the characters.
Average Hold Time (A HT ): the sum of the hold times of the characters divided by the number of characters.
Standard Deviation of the Hold Time (σ HT ): the deviation of the Hold times with respect to A HT .
Strong Key (S K ): the code of the key, with shorter hold time.
Time Strong Key (T SK ): the minimum hold time.
Weak Key (W K ): the code of the key, with longer hold time.
Time Weak Key (T WK ): the maximum hold time.
Total Flight Time (T FT ): the sum of flight times of the characters.
Average Flight Time (A FT ): the sum of the Flight times of the characters divided by the number of characters.
Standard Deviation of the Flight Time (σ FT ): the deviation of the Flight times with respect to A FT .
Strong Key in Flight (S KF ): the code of the key, with shorter flight time.
Time Strong Key in Flight (T SKF ): the minimum flight time
Weak Key in Flight (W KF ): the code of the key, with longer flight time.
Time Weak Key in Flight (T WKF ): the maximum flight time.
Once the feature matrix has been created per user, it is necessary to find a representation to model the distribution of the features. The models created in the registration stage are compared with those created in the log-in stage. We considered Gaussian Mixture Models (GMM) to create those models and the Bhattacharyya distance to compare them, as explained below.
2.6 Gaussian Mixture Model
A GMM is a probabilistic model created to represent a population from a linear combination of Gaussian distributions.
Each Gaussian of the GMM models a specific group of samples in a population [12], [13]. Equation (1) shows the mathematical expression for a GMM of a multivariate random variable x , which corresponds to the sum of M Gaussian distributions, weighted by a parameter C m .
A compact way to represent GMM models is indicated in (2).
Three parameters should be estimated in the GMM modeling approach: weight Cm, mean vector μ m , and covariance matrix Σ m . m is the index for the Gaussians. These parameters are estimated using the Expectation-Maximization (EM) algorithm. The total number of Gaussians M must be defined before starting the estimation procedure, and it can be done according to the Bayesian Information Criterion (BIC) [14], which measures the quantity of information lost when the model is used.
However, in case of problems where there is no prior knowledge of the data, the number of Gaussian distributions is found experimentally [11].
2.7 Classification
Each user is represented by a GMM.
Thus, to calculate the similarity between two models (registration: f i (x) and log-in: g i (x)), we can use the Bhattacharyya distance (D_bha), where μ m and the Σ m of each GMM are taken into account [15].
D Bha can be expressed as in (3):
where the first term considers the mean vectors of the GMMs, and the second term is the covariance matrix. As indicated in Equation (3), the similarity measurement between the two models, f i (x) and g i (x), considers the mean vectors and the covariance matrix separately. Mean vectors are compared in (4), while the covariance matrix is considered in (5).
Finally, depending on the similarity of both models, it is possible to classify the user’s identity. If the user is valid, the distance between the two models is expected to be less than the distance resulting from an impostor. However, it is necessary to define a threshold 𝑈 to decide whether a user is valid or an impostor.
This distance measurement has been considered in previous studies where GMM models resulting from speech recordings are compared [16].
(Fig. 6) and (Fig.7) show the flowchart of the registration and login stages, respectively. The number of components 𝑀 and the decision threshold 𝑈 are found in the training and development stage explained below.
3. EXPERIMENTS AND RESULTS
The test stage aims to evaluate the performance and usability of the system in two different modes: intrusive and non-intrusive verification. In the intrusive mode, the user is aware that his/her identity is being verified through the keyboard. On the other hand, in the non-intrusive mode the user does not know that is being verified.
3.1 Experiment 1: intrusive mode
In the intrusive mode, two sessions are required because the registration and log-in writing tasks are the same, then we use the first session to register and the second to log-in the user. In this case only 20 of the 170 users have two sessions; therefore, this experiment was conducted with 20 users.
For this experiment a cross validation strategy was carried out with 5 folds (Subject independent in each fold).
Therefore 4 subjects were considered for the test and 16 were considered for the training. The (Fig. 8) shows the test and train sets for each fold.
3.1.1 Training of the GMM-based model
The training stage is considered to find the optimal hyper-parameters of the classifier that makes the decision. The number of Gaussian components (M) were optimized following a grid search strategy between 1 and 50 in steps of 3 (with selection criterion in the minimum EER).
The threshold U was optimized between 0 and 1 up to steps of 10(-3) (selection criterion also in the EER).
These parameters are found for each fold. In each fold we consider the best M where the final EER was optimal. For this modality of intrusive verification, the optimal point is in 𝑀 = 34 ± 3.356 which is the median of the best M in each fold. The U value was also varied from 0 to 1 for each M and the average of the best thresholds along the folds is 𝑈 = 0.148 ± 0.006.
3.1.2 Test of experiment 1
The results of this experiment are shown in Table 4. The performance is measured in terms of FPR and FNR. The usability of the method is measured in terms of the Cost to a User to Enroll (CUE) and the Cost to a User to Authenticate (CUA) [17], [18], [19]. These costs refer to the number of keys required to be pressed to do the registration or authentication procedure. The registration model is generated with the first 4 tasks; therefore, the CUE is 314 keystrokes.
The log-in model is generated with Task 3 and Task 4 as it is indicated in Table 4. The minimum EER is obtained with task 3, however this is the task with the highest CUA. Tasks 1 and Task 2 do not have the minimum keystrokes required to perform their modeling with a GMM with M=34. A minimum of 2M+1 keystrokes is needed in order to estimate the GMM’s covariances, therefore these tasks were not included in the analysis.
3.2 Experiment 2: Non-intrusive mode
In this experiment, tasks 1, 2, 3 and 4 were used to generate the user registration model, in the same way as in the previous experiment. The difference with the previous experiment is that the login model is generated with the task 5. The task 5 is divided into 5 equal length chunks. For each chunk, the distance between the registration and log-in models is computed. To decide whether a user is valid or not, the average distance is estimated for the 5 chunks and compared to the decision threshold U.
For this experiment there are 170 different users. A cross validation strategy similar to that developed in the previous experiment was used. The only difference with respect to the previous experiment is that here we addressed a 10-fold cross validation strategy, then 153 subjects are used for the training stage and 17 subjects are used in the test stage (subject independent in each fold).
3.2.1 Training of the GMM-based model
The strategy for the training of the GMM-based model in this experiment is the same as the previous experiment. For this experiment the optimal hyper-parameters are: M=36 ±6.074 and U=0.013 ±0.021.
3.2.2 Test of the experiment 2
Table 5 shows the performance and usability of the system by varying the number of chunks used to decide whether the user is valid or not. In this case the CUE is the same as the previous experiment, because the same tasks are used to create the registration model.
3.3 Experiment 3: comparison with another methodology in non-intrusive mode
In the literature there are several works of biometric verification based on keystroke dynamic, but few verify identity of the user in a non-intrusive way. The methodology proposed in [6] is a work of identity verification in non-intrusive mode.
We implemented this methodology with the 170 users of our database. In [6], the authors propose to create a student profile based on the average Hold time when pressing different keys. The similarity between registered and log-in samples is calculated by the Euclidean distance and a training set was taken to optimize the decision threshold.
This methodology was adapted to the problem of non-intrusive verification, using tasks 1 to 4 as register tasks and the task 5 for log-in.
(Fig. 9) shows the EER when varying the decision threshold from 0 to 7000 with steps of 10. Fig. 9 shows an EER of 36 % when the threshold is 390. This is the minimum EER obtained using the methodology proposed in [6] for a non-intrusive verification approach. As it can be observed, the approach proposed here, based on GMM models, is more accurate and reliable than other approaches reported in the literature.
4.CONCLUSIONS
This study proposed a method for identity verification based on the statistical modeling of KD using GMMs. The main application of the proposed approach can be in virtual education platforms to verify the identity of a student when he/she is performing a test. The system was evaluated in two modes: (1) intrusive mode, which is text dependent, and (2) non-intrusive mode, which is text independent. (i.e., the user is not aware that his/her identity is being verified).
In the intrusive mode, the user logs-in the system with one of the tasks used in the registration stage. Since the log-in is performed with a fixed task, the user knows that the identity is being verified.
This mode showed an EER of 15.7 %. The usability of this mode was evaluated and showed a CUA of 133 keystrokes. This mode can be modified by changing the log-in tasks. For instance, the registration and log-in stages can be performed by typing the username and password. In this case the access to the system is the same than in the traditional manner. However, our proposed system provides an additional security layer because the user has to provide the username and the password with a valid KD to enter the system. The main drawback of the proposed approach is that the verification is only performed when the user logs-in the platform. If the valid user logs-in the platform but the exam is performed by an intruder, the system will not be able to detect the fraud.
In the non-intrusive mode, the log-in task is independent on the tasks used in the registration stage. In this case the user is not aware that is being verified. This mode achieved an EER of 11.7 %. This mode can be used during evaluation activities because the identity of the user can be constantly verified without interrupting the activity. Although this mode presents a higher CUA compared to the other mode, this is not a problem because the verification can be performed based on any text typed by the user including those texts written during the examination.