Introduction
The current growth of user subscriptions and video streaming services has generated an exponential increase in traffic in mobile cellphone networks. By late 2020, and according to [1], 8,1 billion users of mobile wireless networks were registered worldwide, with an expected growth of up to 8,9 billion by 2025. Out of this number of users, 50 % will use the LTE wireless access technology, 25 % will use access technologies proposed by 5G, and 25 % will use other technologies (WCDMA/HSPA, GSM/EDGE, etc.). The generated mobile data traffic is estimated at 164 Exabytes by 2025, of which 76 % corresponds to video services [1].
Currently, deployment has begun globally of the 5G network infrastructure. This migration has been defined by the 3GPP from 3GPP TR 21.915 V15.0.0 (2019-09) [2], which defines two possible configurations: independent and non-independent. The independent configuration uses only one radio access technology, whereas the non-independent configuration combines multiple ones. Due to the above, in the Latin American landscape, it is believed that the option to be adopted by the operators, given its economic benefits, capacity expansion, use of frequency bands, and introduction of the technology, is the non-independent configuration (option 3) [2], which uses core LTE (EPC), an evolved node B (eNB) that acts as master and a 5G base station (en-gNB) or NR, which acts as secondary node. The user station would have dual connectivity with the master and secondary nodes, as illustrated in Fig. 1.
Within this context, user traffic under this migration configuration continues to completely or partially circulate through the LTE access and always through the LTE core, which is why it is of vital importance to deploy the 5G network in developing countries and analyze the traffic behavior prevalent in the network, such as live video streaming (LVS) to keep the LTE core from becoming the bottleneck in the migration process, thus allowing to determine, together with user behaviors, the implementation times of new phases in the 5G start-up process.
The streaming technique used to support video services, specifically video on demand (VoD) and LVS, use the Hypertext Transfer (HTTP) protocol [3]. HTTP streaming is improved by using adaptive streaming techniques. The most popular of these include the HTTP Smooth Streaming (HSS) protocol [4], HTTP2 Live Streaming - HLS [5], HTTP Dynamic Streaming - HDS [6], and the Dynamic Adaptive Streaming protocol over HTTP (DASH), the latter being the international standard ISO/IEC 23009-1:2012 adopted for LTE networks [7].
Regarding the aforementioned, and due to the high level of video traffic in wireless networks, it is necessary for telecommunication operators to make decisions that guarantee minimum conditions in the quality of the services offered to users. Hence, in network and resource management and administration activities, it is of paramount importance to have tools such as traffic models that characterize the behavior of the networks. Although there are mathematical models and simulation environments that allow characterizing data traffic in wireless networks, these lack truthful information of the real conditions of the environment [8]. In response to this situation, there is currently great interest by the academic and scientific communities in the use and development of test platforms of communication systems that consider real operating conditions. Within this context, traffic models obtained from real operating environments have a high degree of acceptance, and they allow characterizing traffic behavior in the networks without affecting the operation of current systems. [9] presents different approaches to studying traffic modeling, which consider test benches, simulation, emulation environments, and mathematical models. Among these, emulation environments are a hybrid between simulation approaches and test bench that seeks to improve the disadvantages of one with the advantages of the other. Currently, this type of approach has a high degree of acceptance.
Within the context of the emulation scenario approach to conducting traffic characterization studies, in [10], the authors present the traffic characterization of a VoD service in an IPTV network and use a lexical analyzer designed for this activity to facilitate the study. In [11], the authors perform a traffic characterization for the VoD service on HFC networks using the RTMP protocol. In [12], the researchers study and characterize the VoD traffic for IPTV networks, propose an optimization problem, and present its solution through a sub-optimal model. The authors in [13] study, analyze, and model the traffic generated by the interactive services of a Virtual Academic Community (VAC), with high-quality audio and video content typical of the IPTV environment, where the main service is that of VoD supported on IPTV technology. In [14], the authors explore some aspects of the IPTV streaming modeling and present general studies involving generators of synthetic video traces. In addition, they conclude that it is important to study the IPTV parameters before implementing the service in order to evaluate architecture alternatives to configure the network that allows obtaining the best performance. [15] presents the techniques to model and predict video traffic statistically, which is highly useful in the development of this work, but it focuses only on modeling the data traffic with ARIMA time series, specifically with the SARIMA model. Furthermore, the authors argue that it is the most precise form to describe IPTV traffic. The previous works focus on characterizing traffic over IPTV networks. [16] presents an approach to an analytic traffic model of the HTTP adaptive video streaming service, which corresponds to the first model reported in the literature for this type of service. The analytic model proposed is comprised of three components: a video server model, a model for the IP network between the client and the server, and a client model for video playback. To obtain the analytic model, the simulation approach is used, where a model of nested queues is assumed in order to simulate the whole system. Moreover, it is assumed that the traffic of packages presents a binomial distribution. As an additional contribution, this work sought to find the traffic model from an emulation environment, which employs real elements and the consumption of the service in real time, where many of the assumptions given in the simulations are diminished.
According to the above, the characterization process of data traffic includes generation processes of the conceptual model of the service, traffic capture, audio identification, video frames that constitute the traffic under study, and the identification of the PDF that describes its behavior [13]. In this context, the video codecs use the image structure (frames) called Group of Pictures (GoP), which consists of a reference slice coded independently (slice I), followed by a sequence of slices P and B, in which there are only changes of movement with respect to the reference previously coded slice. In addition to these slices, there is the audio information [17]. Due to the degree of difficulty involved in capture, identification, filtering, and exportation activities of each of the slices that make up the video streaming services, which also allow characterizing traffic, it is necessary to use tools such as lexical or syntactic analyzers [10], which permit speeding up these types of processes.
This work sought to characterize LVS traffic in an emulated LTE network that uses the DASH adaptive streaming technique. The principal contributions of this work are the following: (i) designing a syntactic analyzer that facilitates the traffic characterization tasks of the service under study; (ii) providing network operators and the scientific community with a tool that describes the traffic of the LVS service in an LTE network and uses the DASH adaptive streaming technique, which can be used in activities involving the management and administration of networks and resources; and (iii) studying the behavior of the proposed traffic based on traces obtained from a real LTE network, without depending on assumptions or studies from other particular research scenarios.
This article is organized as follows: section 2 presents the methodology used in this research; section 3 shows the results and their discussion; and section 4 presents the conclusions.
Methodology
To conduct the study that allows characterizing the traffic from LVS services in an emulated LTE network, the DASH adaptive streaming technique is used, as well as the emulation scenario presented in Fig. 2, adapted from [18] and validated by the authors in [19], [20]. The emulation scenario is made up of a real video server, an LTE network constructed through the NS3 tool, and a client (User Equipment, UE). An overview of the equipment and software used in the test bench is shown in Table I. This scenario uses five test environments. In the first environment, a static UE is located at 30 m from the eNB; in the second, third, and fourth environments, the UE moves away from the evolved Node B (eNB) with rates of 1, 2, and 3 mps in a direction that follows a straight line y = x; and, in the fifth environment, the UE moves around the eNB with random direction and rate. For all the test environments, four video categories were selected for transmission: interview (category A), cartoons (categories B and D), football match (category C), and movie (category E), which are adjusted to the spatial-temporal characteristics defined in Annex I, from the ITUT 910 recommendation [21]. For each type of video, 10 tests are carried out, each lasting 180 s, which is consumed by a client. Upon reproducing the video, the traffic traces are extracted in the receptor (client). These traces are then captured with the Wireshark traffic analysis software [22]. Thereafter, to obtain the traffic mathematical model, it is necessary to extract the audio and video frames from the traffic traces, specifically the arrival times, image sizes, and types of slices. Next, through statistical analysis, the PDF that describes the behavior of each of the components of the service, which corresponds to the mathematical model, is identified.
Fig. 3 shows the functional scheme used in the traffic characterization process. To speed up the process of extraction of parameters from the traffic traces, the proposed syntactic analyzer was used.
Emulation scenario
Fig. 2 presents a diagram of the implemented emulated scenario. PC1 is used to simulate the LTE network with NS3.26 on Linux Ubuntu. The other two PCs act as Client-Servers and connect to PC1 through the Ethernet. PC2 uses the Wowza Streaming Engine software [23] as a video server. The Wowza server is compatible with adaptive streaming technologies. For which it employs files from the Synchronized Multimedia Integration Language (SMIL), which allows using flows of various rates of bits in groups for transmission of bitrate-adaptive HTTP. PC3 implements a web-based video reproduction application developed with Apache HTTP [24]. In both PC2 and PC3, the openaccess traffic analysis software and Wireshark protocols were installed [22]. PC1 hosts the LTE network simulated with the LENA tool over NS3 [25]. The LTE network is composed of various nodes: a remote host node, a Serving Gateway/Packet Data Network Gateway (SGW/PGW), an eNB node, and a user equipment (UE) node that acquires the role of mobile device. Thus, real video traffic arrives at the UE node injected into the system through the real LVS server from the simulated LTE network. To allow communication between PC2-Host-Remote and UE-PC3, a Hardware in the Loop (HIL) tool is used, which allows using PC1 as a black box with virtual components. This black box receives and delivers data to the extreme real systems, PC2 and PC3 [18]. Tables II and III show the initial configuration parameters and adjustments for the emulation tool constructed in the test environment.
Live video service
To carry out this study, the Wowza Streaming Engine video server was used, which uses the H.264/MPEG-4 AVC as encoder; the LVS service in LTE networks uses the Baseline Profile (BP) [26]. In the process of encoding a video frame under the H.264 standard, it must be considered that it is divided into two main layers: a first layer called video coding layer (VCL), where the video coding process takes place (Fig. 4); and a second layer called network abstraction layer (NAL), which is in charge of adapting the information that the VCL delivers towards a diverse number of technologies for storage and transport of compressed contents. After the VCL, the video is organized into a ‘NAL Unit’, comprised of a heading, where the type of data from the NAL is indicated, and another bigger part of the unprocessed byte stream payload (RBSP). NAL units may be classified according to their information: possibly VCL NAL units, if these refer to data from the video slice; and non-VCL units, when these are only for control, which, in turn, could be a set that can be applied to various VCL NAL.
Syntactic analyzer
Due to the high volumes of traffic present in a live video service, among the activities requiring the greatest time and work for characterization is the identification of audio and video slices, specifically of the type of video slice (I, P, or B), arrival times, and size. These slices are delivered to the network in an interlaced manner, which makes the data identification and extraction process a complex task. The proposed syntactic analyzer speeds up these activities, automating the slice type identification process through a Matlab® script, automatically delivering the information required for later analysis. For an idea of the high volume of traffic manipulated by the LVS service, Fig. 5 shows, by means of a curve, the packages captured in 60 s for a particular test lasting 180 s. Audio (red) and video (blue) traffic is observed in the form of bars obtained with the Wireshark tool.
The information presented in this format from Fig. 5 allows visualizing the substantial differences between the video and audio throughputs. Additionally, at package level, it could be intuitively stated that the size of the audio shows a uniform behavior, which must be confirmed by obtaining the PDF for this type of slice. However, when the video is made up of the GOP (I, P, B), it needs a more granular analysis, that is, it is necessary to identify the video components and the PDF that represents its behavior. At this point, the proposed syntactic analyzer contrib- utes to automatically identifying, separating, and exporting information on the video and audio slices from the real traffic traces obtained from the experimental scenario.
The tokens are the inputs of a syntactic analyzer, which, in this case, are the traffic traces acquired with Wireshark in plain text format. In the first place, the analyzer identifies, through the Word variable, if it is an audio or video slice; if it corresponds to an audio slice, the arrival time and size are identified. If, on the contrary, it is a video slice, the GOP type must also be identified. Fig. 6 shows the flow diagram of the proposed syntactic analyzer.
While identifying the type of GOP for the video slices, the GOP initiation key word is searched. In this case, and considering the international standard ISO/IEC 13818-1 [27] and what was presented by [28] and [29], this key is defined as ‘00 00 00 01’. When this identification sequence of the GOP is found, the two bytes located after this identifier are analyzed; of these two bytes, the last one, which is hexadecimally encoded, contains information on the type of slice. To extract information from this byte, the least significant bit must be selected from the first hexadecimal character, as well as the most significant bit from the second hexadecimal character. According to [17], the type of slice is coded in the following manner: 00 = P, 01= B, and 10 = I. Fig. 7 presents an example of this, where the Word variable defines that it is dealing with a video slice. Additionally, the slice arrival time and size are observed, in addition to identifying the reassembled TCP segments. After the word identifying the GOP, the following two bytes appear: 00 01. To identify the type of slice, the third and fourth hexadecimally coded characters coded are used: 01H (0000 0001). The bits that encode are highlighted in bold. In binary, the type of slice, for this case 00, corresponds to a type P slice.
Finally, the output of the syntactic generator delivers the corresponding identification information of the slices (audio and video), type of slice (I, P, or B), arrival times, and size. With this information, and with help from the Matlab dfittool tools [30] and the R statistical analysis software [31], a subsequent statistical analysis is conducted by typing the probability density functions (PDFs) that describe the behavior under study.
From the results delivered by the syntactic analyzer, the R statistical analysis software is used for prior data analysis. For this task, the data provided by the syntactic analyzer are loaded and the box-whisker plot tool is used, which is a visual representation that describes the dispersion and symmetry of the data to be analyzed; the information is observed in a box-whisker plot (Fig. 8). Thus, i) the median is represented by the box line; ii) the interquartile range box distance between the first and third quartiles, Q3-Q1, represents 50 % intermediate of the data; and iii) the whiskers, which extend from any side of the box, represent the ranges of 25 % of the data values from the lower part and 25 % from the upper part, excluding the outlier values. The asymmetry of the diagram indicates that the data may not be distributed normally. Atypical data (extreme and mild) are eliminated, which, according to the Tukey test, are the extreme outliers, while the mild outlier values are kept for this analysis [32].
Null hypothesis (Ho)
The null hypothesis (H0) is selected with dfittool, which is an application by Matlab capable of interactively adjusting the probability density distributions of data imported from Workspace and has 22 PDF available for analysis. With dfittool, it is possible to visualize the set of stored data through a histogram, which is overlaid by a PDF trying to fit it. The curve that best fits the histogram is the null hypothesis. This null hypothesis is validated through the Kolmogorov-Smirnov (K-S) goodness-of-fit test [33], for which a script was developed in Matlab, which allowed measuring the degree of correlation existing between the distribution of the set of practical data and the theoretical distribution (null hypothesis). With the dfittool tool, the evaluate function is used. This function allows creating vectors with the cumulative distribution function (CDF), size or arrival times of each set of plots, from which the goodness-of-fit test was performed, which takes this CDF and the set of samples from the traffic as input parameters. To calculate the contrast variable (Dα), with a 95 % confidence interval, Equation (1) is used, as well as for the K-S test with a 0,05 significance level and is applied for values above 50 samples.
where N is the number of samples.
Thereafter, the cumulative probability observed (CPO) of the samples of each slice is calculated, which corresponds with the CDF of the practical data. The cumulative probability expected (CPE) of the null hypothesis corresponds to the CDF generated with the evaluate function of difittool from theoretical data. The value of the estimator or contrast variable of the test (D) corresponds to the maximum value of the absolute difference from CPO and CPE. D is compared with the value of the contrast variable (Dα), and it is determined, by means of Equation (2), whether the null hypothesis is accepted or rejected.
The value of the D estimator from the K-S test, besides determining whether to accept or reject the hypothesis, is also used as a PDF selection criterion. The PDF parameters of the data analyzed are calculated by using maximum likelihood estimators (MLE) [34].
Results and discussion
On carrying out the traffic analysis, for a particular scenario, the response of the proposed analyzer provides organized information on the difference between arrival times, slice size for video, and audio types I and P. Subsequently, the histogram is generated from the dfittool, and the null hypothesis (Ho) is checked. Fig. 9 shows the histogram of the slices of the time difference of the P frames and the PDFs of the possible null hypotheses (H0) to be validated with the K-S goodnessof-fit test. Fig. 10 shows the CPO of the time difference slices and the CPE of the different null hypotheses to be validated.
Table IV shows the calculated values of the test statistic (D) and their comparison with the contrast value (D 0,05) for the possible null hypotheses shown in Fig. 9 and 10. By analyzing this information, it is obtained that only the null hypothesis, which states that the analyzed data fit an ‘Inverse Gaussian’ PDF is valid. The parameters for the PDF found are shown in Fig. 11, where the type of distribution, the mean, and the variance calculated with the maximum likelihood estimator are presented. Additionally, the parameters that define the distribution are shown along with their statistical error
Tables V and VI present the results obtained from the PDF that describe the behavior of each type of frame, video, and audio for the size and relative time between slices of the four video categories (interview: category A; cartoons: categories B and D; football match: category C; and movie: category E) used in all the test environments already described in the methodology. Moreover, the D estimator is shown for the K-S test, and it is lower than the contrast variable (D0,05) for all cases, where D 0,05 shows values between 0,2104 and 0,163, which correspond to the amount of samples used. All scenarios used between 60 and 100 samples. Due to the profile of the encoder used, the Baseline Profile (BP) of the H.264/MPEG-4 AVC only detects type I and P video slices. Table VII shows the PDF used and the parameters that define it.
The following explains a particular case, which is highlighted in Tables IV and V, where a UE reproduces a C category live video (football match), and the user moves away from the eNB at an average rate of 3 mps (7,2 km/h).
In accordance with the results shown for the sizes and arrival times of slices in Tables IV and V, the type of traffic characterized by this scenario can be modelled by the following PDF:
The sizes of type I slices are described through a PDF Extreme Value with µ = 156931 and σ = 13736. Through the K-S test, it is possible to obtain a D estimator = 0,1405, which is lower than the contrast variable (D0,05 = 0,1948), which corresponds to N = 70 samples (see Equation (1)).
The sizes of type P slices are described through a Weibull-type PDF α = 102749 and b = 6,555. Through the K-S test, it is possible to obtain a D estimator = 0,0781, which is lower than the contrast variable with a 0,05 significance level (D0,05 = 0,2104) for N = 60 samples.
The sizes of the audio slices are described through a tLocationScale-type PDF with α = 1549,17, σ =34,56, and v = 1,716. Through the K-S test, it is possible to obtain a D estimator = 0,0243, which is lower than the contrast variable with a 0,05 significance level 0,05 (D0,05 = 0,1934) for N = 71 samples.
The arrival times within type I slices are described through a Rician PDF with s = 2,884 and σ = 0,60487. Through the K-S test, it is possible to obtain a D estimator = 0,0718, which is lower than the contrast variable (D0,05 = 0,1948), which corresponds to N = 70 samples.
The arrival times within type P slices are described through an Extreme Value PDF with µ = 0,45935 and σ = 0,31337. Through the K-S test, it is possible to obtain a D estimator = 0,1340, which is lower than the contrast variable (D0,05 = 0,2021), which corresponds to N = 65 samples.
The arrival times within the audio slices are described through an Extreme Value PDF with µ = 0,45935 and σ = 0,31337. Through the K-S test, it is possible to obtain a D estimator = 0,1340, which is lower than the contrast variable (D0,05 = 0,1948), which corresponds to N = 85 samples.
It can be evidenced, for all scenarios presented in Tables V and VI, that the value of the D estimator from the K-S test is lower than the contrast variable (D0,05), which validates all the proposed hypotheses.
Conclusions
According to the results and for the conditions pre-established in the different experimental scenarios, in order to implement LVS services in an LTE network, the following observations can be made:
Statistical traffic modeling is presented for the audio and video components that constitute the LVS service of an emulated LTE network. The model is developed from real traffic traces.
It is observed that the traffic modeling of the LVS service shows a very particular description for each of the test scenarios defined. It can be stated that the model found depends on the conditions of each scenario and that there is no single model to describe the general traffic behavior of LVS services in emulated LTE networks. This represents a significant contribution since, for the design and planning of the networks by the operators, it is necessary to characterize the different operating environments in which the end users are normally found. The traffic models that must be taken into account in order to correctly dimension the network depend on this characterization.
From the traffic models found for each emulated scenario, it is possible to use the defined PDFs to generate traffic in simulation systems that lead to validating other parameters for this type of networks, such as the number of users, interference study, the performance of resource planners, power management, among others, since it is based on a validated model with a behavior that is statistically equivalent to a real system.
The studies presented in this work are a tool that will allow network designers and planners to have more inputs for a performance analysis focused on QoS parameters from which decisions can be made on resizing the resources assigned to current LTE networks. This ensures that end users of the service obtain a higher degree of satisfaction.