I. INTRODUCTION
Currently, universities and education centers are targets of cyber-attacks that focus on the alteration, extortion, and theft of sensitive information [1]. Due to such hazards, some questions arise: Do universities guarantee the confidentiality, integrity, and availability of information towards cyber threats? Does their technical staff maintain adequate security procedures to minimize vulnerabilities? Does the University have the ability to detect and respond to any cyber-attack?
For an adequate control of security incidents, organizational structures known as Computer Security Incident Response Team (CSIRT) or Computer Emergency Readiness Team (CERT) have been steadily implemented [2]. A CSIRT offers services such as analysis, coordination, support, and response to computer security incidents, based on an adequate vulnerability analysis [3]. However, when it comes to an academic CSIRT (A-CSIRT), the volume of the collected information might cause partial or total non-compliance of such services.
Based on the mentioned scenario, this study aimed at generating a novel technique that optimizes the extraction of malicious traffic, collected by intrusion detection and prevention systems in university networks in Ecuador. In order to comply with this purpose, we applied the Research-Action methodology [4]: (1) we compared Snort and Passive Vulnerability Scanner (PVS), two passive analysis tools that have been used in the CSIRT, to establish their benefits and check if they are exclusive or complementary; [2] we collected the data through extraction, transport and loading techniques (ETL); [3] we stored this information in a MySQL database; and [4] we designed an application based on Business Intelligence techniques to detect malicious events that may appear in the network, and thus act immediately.
Among the main contributions of this study is the generation of unpublished algorithms for ETL processes that allow transporting and filtering information, generating data of interest. In addition, we implemented an application using Pentaho BI [5] to generate a secure coupling.
II. RESEARCH DESIGN
This research is based on the conceptual framework illustrated in Fig. 1, and on the Ralph Kimball Methodology (6). In an orderly way, first, we defined the requirements (upper part) and established the project planning; these two processes created the basis to determine the fulfillment of the proposed objectives. Subsequently, we developed the application, obtaining a Web product. These processes are further explained below.
A. Requirements definition
The Academic CSIRT in Ecuador is responsible for reporting computer incidents that the universities network record and detect, and for counting with a vulnerability review service. Within this service, several points have been established such as frequent reviews of the institutions public networks, revisions of configuration files of routers and firewalls, configuration review, updates and security implemented in Linux and Windows servers, and policy compliance in database systems, servers, and routers, among others [7]. However, an effective automatic solution to detect and classify malicious events is still missing, due to the high volume of such events. For the CSIRT, it has been fundamental that its members have a tool that allows analyzing effectively the traffic that their network tracks, accomplishing quick decisions towards malicious events.
B. Evaluation of the traffic analysis tools
In this section, we will evaluate the Snort and PVS tools. Snort is an open source intruder detection and prevention system capable of real-time traffic analysis [8]. PVS is a product of the Tenable Network Security Company, which stands out for analyzing all the traffic transmitted within a network. PVS generates two files: the first intended to monitor real-time vulnerabilities, and the second focused on monitoring the Web and FTP activities [9]. We determined that PVS does not act as an IPS, meaning that it simply verifies the traffic that flows through the network, with no activity on the circulating packages.
To conduct an unbiased comparison, within similar conditions, we used data sets from the DARPA Intrusion, detection, and evaluation group at MIT Lincoln laboratory [10]. We analyzed the data in the alert file and the logs generated by Snort, as well as the real-time file and reports generated by the PVS. Based on this analysis we compared the results shown in Table 1.
Table 1 documents the specified consumption that the tools registered on a Centos7 distribution with a 64-bit architecture. The data demonstrated that Snort consumes less resources than PVS, mainly because Snort lacks the graphical user interface Web deployed by default in the system. In addition, the analysis in Snort was structured in "rules", in which the parameters were established subject to the packet to be captured. Furthermore, PVS analyzes the "plugin's", which provided a specification to be displayed when they do not capture the packets to be analyzed. Within the assessment, we established that the PVS lists more categories to classify the risk of the raised events in the network than Snort.
With the capture of FTP and HTTP traffic, we identified a clear difference in the volume of analyzed data between the two tools. This occurred because Snort acts with default rules. Therefore, the optimal performance of Snort was just the configuration of the rules with "Ruleset" that appeared from a variety of sources. This evaluation allowed inferring that the tools have not been exclusive, as they complement each other, providing a more forceful protection to the network.
C. Design and implementation of algorithms and the application of business intelligence
Here, we used the phases of the methodology proposed by Ralph Kimball [11], diving it into three sections: [1] specific software to analyze the benefits; [2] database design and structure of ETL; [3] specifications of the BI application and development of the product. Below, we further describe the developed process.
The ETL processes for data capture and filtering are fundamental for this study, because they represent the technique that allows organizations to move, reformat, and clean data from multiple sources, using SQL algorithms. ETL allow extracting, analyzing, and interpreting upcoming information. Nonetheless, the data formats may vary among organizations due to their source of origin. Therefore, in an unprecedented way, to homogenize the data from their sources, ETL algorithms have been designed, implemented, and subsequently loaded into a MySQL, DataMart, or Data Warehouse database to submit them to a business process [11]. In this study, the information from the Snort and PVS registers was relatively condensed and had no standardized format. However, the created ETL algorithms solved this issue, preventing to overload the database with potential irrelevant information.
We developed three solutions related to the PVS processes, based on vulnerability analysis, real-time activities, and risk filtering. Each solution processes its information from files, since the vulnerabilities have been encountered in an enriched ".XML" (.nessus) [12], and both the real-time files and the filtering work in a ".txt" file. We generated a transformation flow for the vulnerabilities, which originated from the PVS system (Fig. 2). Also, we generated flows that support the optimization of the processed packages, avoiding generating redundant or delayed data. Figure 3 illustrates the flow that which handles a "Work" generated by the Pentaho BI data integration system; in this flow, the transformation was executed establishing a parameter of repetition, and highlighting the error notification via e-mail, allowing for a subsequent call to an execution process, which manages to filter the risks according to the established parameters.
Within the vulnerability report, the events transformation flow occurs in real time (Fig. 4). In such programming, a control of the events was established, specifying a classification that was stored in MySQL database. Like the vulnerabilities, the control of the flow transformation was highlighted by a warning via e-mail (Fig. 5).
Snort is a tool with the complementary system Barnyard2, which allows migrating the alerts detected by the IDS to a MySQL database (Fig. 6 and 7). Hence, we adapted the fields to generate a relational model. For filtering, the transformation was based on programming in Pentaho BI (Spoon Data Integration) [13a, which is based on the content registered in a file, allowed to catalog the obtained packages with Snort as relevant; this prevented a potential database overload with records that lack interest.
In agreement with Stephen Few [14], who defines a Dashboard as a scorecard in which a dense array of information must be demonstrated in a reduced manner, we used the Pentaho BI with the Community Dashboard Framework (CDF) guidelines [15]. In this way, to visualize the data collected from the PVS, we developed two control panels: a first panel destined to the events in real time, and a second panel used for the vulnerability analysis. For Snort, we developed a panel that allowed visualizing a matrix with the latest detected events. Such information complements the real-time events registered in the PVS.
Regarding the implementation of the application, we applied Scrum as a methodology to develop agile projects because it is independent of technologies and allows coupling to different programming models, focuses on people, as users and developers have defined roles, provides quick results, considers time management, is iterative, responds to changes, and the client is active [16].
We used Node.js, a platform created on Chrome's JavaScript Runtime [17] as programming language, since it allows implementing fast and scalable network applications. Node.js uses an event-driven, non-blocking I/O model that performs lightly and efficiently, being suitable for data-intensive applications in real-time.
III. RESULTS
A. Proof of concept
The environment and the flow where the concept was proven (Fig. 8) are represented by 1) the generation of a request for Dashboard or Access to the Pentaho BI system; 2) the system generating the queries to the MySQL database; 3) the system generating the control panel that will be displayed in case of a Dashboard request, while an access has been requested, depending on the user's profile and taking into account that the Dashboard may be edited; 4) the requested Dashboard being displayed in the BI System; and 5) the BI System allowing impression, as well as submission of the data to Excel, generating a proper record file.
B. Results assessment
The application was verified in a real-time trial period, in which the users were able to browse the entire network. Thus, for example, in the records of the last week of June, we could determine that the day with most events was the 27th (Fig. 9). Fig. 10 illustrates a variety of events recorded in the trial period. The most common event was the "ET SCAN Potential SSH Scan", which generated multiple connections to the server where the product was previously installed.
The maximum vulnerability tracking point in the protected network was obtained on day 14 (Fig. 11). Afterwards, the flow became normal, with few days with fewer and others with more vulnerabilities; consequently, we can infer that control actions were taken on the observed vulnerabilities.
During this analysis, the false positives decreased, which implies an improvement in the fulfilment of the Academic CSIRT services. Therefore, the system focused on supporting the Academic CSIRT to improve its services for detecting vulnerabilities and tracking malicious activity on the network; this aims at uprising awareness among members to lead to a more efficient deal with incidents occurring in their network, thus achieving adequate and timely attention to their weaknesses and threats.
IV. CONCLUSIONS AND FUTURE WORK
In the current study, we designed a solution implemented through Business Intelligence, which acts as a strategic factor in the vulnerability analysis of an Academic CSIRT. This was possible by applying the Action-Research methodology and the phases of Ralph Kimball. The evaluation of Passive Scanner and Snort offered security management based on network traffic and customization of its configurations, to reduce false positives and thus enhance the response to security incidents. We developed several algorithms to apply the ETL process ofthe non-standardized logs that the graphical interface processed. Finally, we built a software application using Scrum, which allowed linking the obtained logs in Pentaho BI to generate early alerts of vulnerabilities and malicious codes. The results demonstrate that such application has managed to help those responsible for the CSIRT to establish immediate priorities and to allocate resources to key areas, which may be potential victims of digital attacks. Big Data techniques provide scalability in high data volume scenarios in the CSIRT, therefore, we suggest applying them in future studies.