Fast estimation of earthquake arrival azimuth using a single seismological station and machine learning techniques

Ochoa Gutiérrez, Luis Hernán; Vargas Jiménez, Carlos Alberto; Niño Vásquez, Luis Fernando; Ochoa Gutiérrez, Luis Hernán; Vargas Jiménez, Carlos Alberto; Niño Vásquez, Luis Fernando

doi:10.15446/esrj.v23n2.70581

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Earth Sciences Research Journal

Print version ISSN 1794-6190

Earth Sci. Res. J. vol.23 no.2 Bogotá Apr./June 2019

https://doi.org/10.15446/esrj.v23n2.70581

Original Articles

Fast estimation of earthquake arrival azimuth using a single seismological station and machine learning techniques

Estimación rápida del azimut de llegada de un terremoto utilizando registros de una sola estación sismológica y técnicas de aprendizaje de máquinas

Luis Hernán Ochoa Gutiérrez¹^*

Carlos Alberto Vargas Jiménez¹

Luis Fernando Niño Vásquez¹

^¹ Universidad Nacional de Colombia

ABSTRACT

The objective of this research is to develop a new approach to estimate earthquake arrival azimuth using seismological records of the "El Rosal" station, near to the city of Bogota - Colombia, by applying support vector machines (SVMs). The algorithm was trained with time series descriptors of 863 events recorded from January 1998 to October 2008, considering only events with magnitude ≥ 2 M_L. The earthquake signals were filtered in order to remove diverse kind of low and high-frequency noise not related to typical seismic activity in the area. During training stages of SVMs, several combinations of kernel exponent and complexity factor were applied to time series of 5, 10 and 15 seconds along with earthquake magnitudes of 2.0, 2.5, 3.0 and 3.5 M_L. The best classification of SVMs was obtained using time windows of 5 seconds and earthquake magnitudes greater than 3.0 M_L with kernel exponent of 10 and complexity factor of 2, showing an accuracy of 45.4 degrees. This research is an improvement of previous works related to earthquake arrival azimuth determination from single station data employing machine learning techniques.

Keywords: Earthquake early warning; rapid response; earthquake arrival azimuth; seismic event; Bogota-Colombia; support vector machine (SVM); seismology; earthquakes

RESUMEN

El propósito de esta investigación es desarrollar un nuevo enfoque para estimar el azimut de llegada de terremotos utilizando registros sismológicos de la estación El Rosal, cercana a la ciudad de Bogotá - Colombia, mediante la aplicación de máquinas de vectores de soporte (MVS). El algoritmo fue entrenado con descriptores de series de tiempo de 863 eventos adquiridos desde Enero 1998 hasta Octubre de 2008, considerando solamente eventos con magnitudes ≥ 2 M_L. Las señales de los terremotos fueron filtradas para remover diversos tipos de ruidos de alta y baja frecuencia no relacionados con la actividad sísmica típica en el área. Durante las etapas de entrenamiento de la MVS fueron aplicadas varias combinaciones del exponente kernel y factor de complejidad, a series de tiempo de 5, 10 y 15 segundos junto con terremotos de magnitudes mayores a 2.0, 2.5, 3.0 y 3.5 M_L. La mejor clasificación de la MVS fue obtenida utilizando ventanas de tiempo de 5 segundos y terremotos de magnitud mayor a 3.0 M_L con exponente kernel de 10 y factor de complejidad de 2, mostrando una precisión de 45.4 grados. Esta investigación es una mejora a trabajos previos relacionados con determinación del azimut de llegada de un terremoto a partir de datos de una única estación sismológica empleando técnicas de aprendizaje de máquinas.

Palabras clave: Alerta temprana de terremoto; respuesta rápida; Azimuth de llegada; evento sísmico; Bogotá-Colombia; máquina de soporte vectorial (MSV); sismología; terremotos

Introduction

This study is part of a research line which proposes calculation of earthquake hypocentral parameters applying artificial intelligence methods in order to develop an early warning system for the city of Bogota. In case of a destructive seismic event in this area, the entire country would face many harmful social and economic effects; this is why a seismic early warning system around Bogota is important and the earthquake arrival azimuth estimation is one of the main parameters in this system. Nearly a third of Colombia's population lives in Bogota's Savannah and surrounding which is the country's main economic center with around 40% of the gross domestic product (^{Ojeda et al., 2002}). The density of seismological stations around the city is not high enough, this makes the required time for seismic events localization to be longer than the travel time to areas where the early warning is required. An alternative solution to this problem is employing seismological data from previous events recorded at one single station to estimate the earthquake hypocentral parameters (^{Ochoa et al., 2014}). Automatic computation algorithms in a single broadband three-component station have been mainly developed for P and S waves onsets detection, allowing estimation of source location using the back-azimuth and the apparent surface speed measurements (^{Magotra et al., 1987}; ^{Roberts et al., 1989}; ^{Saita & Nakamura, 2003}), or seismic moment estimation (^{Talandier et al., 1987}; ^{Reymond et al., 1991}; ^{Odaka et al., 2003}; ^{Horiuchi et al., 2005}; ^{Wu et al., 1998}; ^{Espinosa, 1995}). Supervised machine learning techniques based on kernel methods have become a very powerful tool for mathematicians, scientist, and engineers, providing solution in areas like signal processing and pattern recognition; its implementation is quite simple and can be performed by applying mathematical functions that combine input variables as a combination of themselves, obtaining an enhanced new space with more dimensions where separation of classes can be achieved.

There are several methods to detect seismic wave and its arrival azimuth in a single three-component station (^{Magotra et al., 1987}; ^{Anant & Dowla, 1997}); these authors employed algorithms that measure the level of linear polarization in the P wave's arrival. The methodology proposed in this research consists of applying SVMs along with a kernel function in order to estimate the arrival azimuth with minimal processing of data acquired at the station, similar to methodology applied to a fast determination of earthquake magnitude and epicenter distance using a single seismological station (^{Ochoa et al., 2017}; Ochoa et al., 2018).

Data Sets And Methods

The dataset used in this research belongs to the "El Rosal" seismological station, located toward north-west Bogota as shows Figure 1. This station is part of the Colombian Seismic Network operated by The "Servicio Geológico Colombiano - SGC" (Colombian Geological Service).

Figure 1 Location of the “El Rosal” seismological station and earthquake distribution around Bogota.

The "El Rosal" station employs a Guralp CMG - T3E007 sensor in three components and a Nanometrics RD3-HRD24 digitizer, which provides simultaneous sampling of three channels with 24-bit of sampling rate (^{Bermudez & Rengifo, 2002}). The data correspond to the three component raw waveforms recorded directly in this station and a seismic catalog with 2164 characterized events, selected between January 1^st 1998 and October 27^th 2008; all of them located less than 120 kilometers from the station.

The Colombian seismic network consists of 42 stations, with an average distance of 162 kilometers between them, which record and transmit seismic data in real time for the entire country as shown in Figure 2.

Figure 2 Colombian Seismic Network.

Before starting the processing related to SVMs, waveform files from "El Rosal" station were converted to the American standard code for information interchange (ASCII) format, using a Seisan package tool; earthquakes with magnitudes lower than 2.0 M_L were ignored; therefore the followed processes were applied on the remaining 1011 events. Since the selected seismic records present variable levels of noise, it was necessary to filter them out with both high and low-frequency filters. Low frequencies correspond to instrumental noise that can be easily eliminated through implementation of a high-pass filter with cut-off frequency of 0.075 Hz (^{Wu & Zhao, 2006}), while high frequencies were removed with a low-pass filter with cut-off frequency of 150 Hz.

The statistical distribution of azimuth values is presented in Figure 3, where the main distribution of the whole dataset is observed. This histogram shows a high frequency in 240 degrees (azimuth), indicating an active seismogenic zone in that direction. Although this is not a homogeneous distribution, it represents the regular seismic behavior of the area.

Figure 3 Statistical distribution of earthquake azimuth recorded at "El Rosal" station.

Descriptors - Input data set of the SVM

In the first stage, parameters that have been previously used by other authors to earthquake magnitude estimation were calculated and employed as input variables or descriptors for the SVM of this research. In this sense, the relationship between maximum P wave amplitudes and local earthquake magnitudes was considered (^{Wu & Kanamori, 2005}), where a linear regression was performed for each one of the three components of the station. Three parameters were taken from this linear regression which corresponds to the slope (M), an independent term (B) and correlation coefficient (R). The maximum amplitude values (Mx) obtained for each component's time window were used as descriptors as well. Therefore, each event had 12 descriptors associated with this concept.

In the second place, 9 descriptors used for epicenter distance estimation were added adjusting a linear regression of an exponential function in time (t) by applying the expression "B_t exp (-A_t)"; this expression belongs to the envelope of the seismic record in logarithmic scale (^{Odaka et al., 2003}) determined also by a linear regression and its respective correlation coefficient (R), for each of component in the seismic station. The correlation coefficient (R) along with the parameters (A) and (B) were calculated for each component; where (B) represents the slope of initial part of P waves and (A) is a parameter related to amplitude variations in time.

Maximum eigenvalues of two-dimensional covariance matrix were employed as input, calculated as described in ^{Magotra et al., 1987}, and Magotra et al., 1989. A windowing scheme with one second time windows was performed to obtain consecutive values for which a linear regression was calculated, also determining the slope (M), the independent term (B), the regression correlation factor (R), and this time with addition of the arithmetic mean of the eigenvalues (P). This last processing works with all components of the station at the same time, thus 4 descriptors were added as input related to this process.

In summary, the SVM of this study employs 25-time signal descriptors as input (Table 1); 12 of them related to works on magnitude calculation, 9 were associated with epicenter distance estimations and the last 4 were used in the back-azimuth determination. These descriptors were calculated for 5, 10 and 15 seconds signal of the 863 selected events.

Table 1 Summary of Descriptors Employ as Input data

The SVM Model

A SVM is a supervised classification technique that has its roots in statistical learning theory and has shown promising empirical results in many practical applications, from handwritten digit recognition to text categorization. SVM also works very well with high-dimensional data and avoids the curse of dimensionality problem (Tan et al., 2006). In geosciences e.g., it have been applied in earthquake characterization (^{Ochoa et al., 2017}), automatic recognition of natural fractures (^{Leal et al., 2016}), automatic indicator of lithologies in open hole logs (Leal et al., 2018), among other applications related to pattern recognition. The SVM model of this research was trained with the refined data set for each time window using the Waikato Environment for Knowledge Analysis WEKA 3.6 (^{Frank et al., 2016}) and the 25 descriptors explained before. This algorithm has strong statistical support and can be easily implemented on the station by electronic processing cards.

After performing several processing tests, a standard normalized polynomial kernel was selected. In order to choose the kernel exponent and the complexity factor, correlation factors and minimum absolute error obtained by a 10 fold cross-validation process were compared. These processes were carried out testing multiple combinations of exponents and complexity factor for selected earthquake magnitudes and time signals. The correlation coefficient calculated for each partition corresponds to the Pearson's coefficient, which measure the linear relationship between two variables independently of their scales. This coefficient takes values between 1 and - 1; a value of zero means that a linear relationship between two variables could not be found. A positive value of this relation means that two variables change in the same way, i.e. high values of one variable correspond to high values of the other and vice versa. The closer this value is to one, the greater certainty that two variables have a linear relation. The Pearson's coefficient in this research is as high as 0.6, showing the relevance of SVM in the estimation of earthquake arrival azimuth; the lower value of this coefficient was 0.096 as shown in Table 4.

Results and discussion

Using the 25 descriptors and real magnitudes for each seismic event, a group of 12 datasets was evaluated (Table 2). Each dataset corresponds to combinations of 4 minimum magnitude filters (2.0, 2.5, 3.0 and 3.5 M_L) and 3 signal length filters (5, 10 and 15 seconds), evaluating combinations of 7 values for kernel exponent (E = 1.5, 2, 4, 5, 10, 20 and 50) and 6 values for complexity factor (C = 1, 3, 5, 10, 20 and 50); testing 504 models of SVMs in order to find the combination of parameters with the best correlation factor in arrival azimuth determination. Table 2 shows values of correlation coefficients in each combination of cut-off magnitude and time signals where kernel exponents and complexity factors were calculated. According to Table 2, the best correlation coefficient is 0.6 for a time signal of 15 seconds (Magnitude > 2 M_L).

Table 2 Cut-off Magnitude and Time Length combination

Table 3 shows statistical summary for the best model of earthquake arrival azimuth in each combination of time signal (15, 10 and 5 seconds) and magnitude (2, 2.5, 3 and 3.5 M_L). According to skewness values, all distributions present similar shape except samples from 5 seconds and 3.5M_L (Skewness = -4.86), theses samples show a negative skew related to low number of samples for this set (count = 33). The mean arithmetic can be affected by extremely low and high values; therefore, any interpretation from this parameter might produce a wrong perception of the data. High kurtosis values are related to over-fitting in some combinations and lower kurtosis are related to time signals of 10 and 15 seconds with cut-off of 3.5 M_L, which can be interpreted as the arrival azimuth estimation may be reliable for earthquake greater than 3.5 M_L and time signal of 15 seconds; however, the objective of this research is to develop the best model with the lowest base magnitude; in consequence and according to Table 3 the best time signal should be 5 seconds and cut-off magnitude of 3.0 M_L.

Table 3 Summary for the best model of earthquake arrival azimuth in each combination

The choice of SVM final parameters is shown in Table 4, where Pearson's correlation coefficient and mean absolute error are presented for each combination of kernel exponent "E" and complexity factor "C", all of them for the time signal and the cut-off magnitude previously selected. These parameters were calculated using SVM algorithms in WEKA 3.6 (^{Frank, et al. 2016}) with a standard normalized polynomial kernel and 10 fold cross-validations. According Table to 4, the earthquake arrival azimuth can be performed at "El Rosal" station using the SVM based model with a normalized polykernel of exponent 2 and complexity factor of 10 for a time signal of 5 seconds and 3.0 M_L of earthquake cut-off magnitude, showing a standard deviation of 45.4 degrees.

Table 4 Pearson's Coefficient and Mean Absolute Error used to Determent "E" and "C"

Figure 4 shows the cross-plot with a relationship between the real arrival azimuth (X-axis) and the azimuth calculated by the model (Y-axis), where a normal statistical pattern can be observed in the distribution of residual values (histogram). The dashed blue line represents the linear behavior of predicted data, corresponding to the locus where prediction is equal to real values. This model tends to place the prediction further south to the real location, because of seismogenic zones toward east and west of the station producing more data in those directions and lower among of information from north and south; this is a normal operational condition of "El Rosal" station and therefore, it is a behavior implicit into the model.

Figure 4 Correlation between real and calculated arrival azimuth with SVM.

Conclusions & Recommendations

This model is proposed and evaluated for fast earthquake arrival azimuth determination, based on support vector machine regression through pattern recognition and characterization of earthquake signals recorded on a three components seismic station in only 5 seconds, anticipating the arrival of earthquakes in the city of Bogota. Additionally, this model can be implemented directly in the electronic devices of a seismological station, where the main mathematical process corresponds to a simple matrix product, a kernel function of exponent 2 and complexity factor of 10 with an earthquake of 3.0 M_L.

The accuracy reported in this research is lower compared with results reported by ^{Lockman & Allen, 2005} and ^{Eisermann et al., 2015}; however it must to be consider that these authors works with data from several seismological station and not with a single station as this research does. Nevertheless, the result of 45.4 degrees in earthquake arrival azimuth estimation showed in this research is an improvement on that of ^{Noda et al., 2012}, who had standard deviation between 49.0 and 67.9 degrees working also with a single station.

The implementation of additional input variables such as predominant period, Fourier and wavelet frequency spectra should be considered in order to obtain higher correlation factors. Furthermore, the use of an updated dataset is recommended, adding information from October 27^th 2008 to present; this new data along with additional input variables might improve the model performance and reaching better estimation of earthquake arrival azimuth.

It is important to find ways to improve the prediction accuracy based on further research, supported by computational intelligence and geophysics research groups as well as the seismological network in Bogota's Savannah and its surroundings managed by the Universidad Nacional de Colombia.

Acknowledgments

The authors are grateful to the Servicio Geológico Colombiano (SGC) for providing the dataset used in this study and to Universidad Nacional de Colombia for supporting our efforts to achieve a fast and reliable early warning system for Bogota D.C. - Colombia.

References

Anant, K. S. & Dowla, F. U. (1997). Wavelet transform methods for phase identification in three-component seismograms. Bulletin of the Seismological Society of America, 87(6), 1598-1612. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.871.8399&rep=rep1 &type=pdf [ Links ]

Bermúdez, M. L. & Rengifo, F. (2002). EL ROSAL: La Estación Sismológica del CTBTO en Colombia. Bogota, Primer Simposio Colombiano de Sismología, p. 8. [ Links ]

Eisermann, A. S., Ziv, A. & Wust-Bloch, G. H. (2015). Real-Time BackAzimuth for Earthquake Early Warning. Bulletin of the Seismological Society of America, 105(4), 2274-2285. https://doi.org/10.1785/0120140298 [ Links ]

Espinosa, J. M. (1995). Mexico City seismic alert system. Seismological Research Letters, 66(6), 42-53. https://doi.org/10.1785/gssrl.66.6.42 [ Links ]

Frank, E., Hall, M. & Witten, I. (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques". Morgan Kaufmann, Fourth Edition, pp. 218-223. [ Links ]

Horiuchi, S. (2005). An automatic processing system for broadcasting earthquake alarms. Bulletin of the Seismological Society of America, 95(2), 708-718. https://doi.org/10.1785/0120030133 [ Links ]

Leal, J., Ochoa, L. & García, J. (2016). Identification of natural fractures using resistive image logs, fractal dimension and support vector machines. Ingeniería e Investigación, 36(3), 125-132. https://doi.org/10.15446/ing.investig.v36n3.56198 [ Links ]

Leal, J., Ochoa, L. & Contreras, C. (2018). Automatic Identification of Calcareous Lithologies Using Support Vector Machines, Borehole Logs and Fractal Dimension of Borehole Electrical Imaging. Earth Science Research Journal, 22(1), 7-12. https://doi.org/10.15446/esrj.v22n2.68320 [ Links ]

Lockman, A. B. & Allen, R. M. (2005). Single-Station Earthquake Characterization for Early Warning. Bulletin of the Seismological Society of America, 95(6), 2029-2039. https://doi.org/10.1785/0120040241 [ Links ]

Magotra, N., Ahmed, N. & Chael, E. (1987). Seismic event detection and source location using single station (three components) data. Bulletin of the Seismological Society of America, 77(3), 958-971. [ Links ]

Magotra, N., Ahmed, N. & Chael, E. (1989). Single-station seismic event detection and location. IEEE Transactions on Geoscience and Remote Sensing, 27(1), 15-23. DOI: 10.1109/36.20270 [ Links ]

Noda, S. (2012). Improvement of back-azimuth estimation in real-time by using a single station record. Earth, Planets and Space, 64(3), 305-308. https://link.springer.com/article/10.5047/eps.2011.10.005 [ Links ]

Ochoa, L. H., Niño, L. F. & Vargas, C. A. (2014). Severity Classification of a Seismic Event based on the Magnitude-Distance Ratio Using Only One Seismological Station. Earth Sciences Research Journal, 18(2), 115-122. https://doi.org/10.15446/esrj.v18n2.41083 [ Links ]

Ochoa, L. H., Niño, L. F. & Vargas, C. A. (2017). Fast magnitude determination using a single seismological station record implementing machine learning techniques. Sciences Direct, Geodesy and Geodynamic, 1-8. https://doi.org/10.1016/j.geog.2017.03.010 [ Links ]

Ochoa, L. H., Niño, L. F. & Vargas, C. A. (2018). Fast estimation of earthquake epicenter distance using a single seismological station with machine learning techniques. DYNA, 85(204), 161-168. https://doi.org/10.15446/dyna.v85n204.68408 [ Links ]

Odaka, T. (2003). A new method for quickly estimating epicentral distance and magnitude from a single seismic record. Bulletin of the Seismological Society of America, 93(1), 526-532. DOI: 10.1785/0120020008 [ Links ]

Ojeda, A., Martinez, S., Bermudez, M. & Atakan, K. (2002). The new accelerograph network for Santa Fe De Bogota, Colombia. Soil Dynamics and Earthquake engineering, 22(9-12), 791-797. https://doi.org/10.1016/S0267-7261(02)00100-8 [ Links ]

Saita, J. & Nakamura, Y. (2003). The early warning systems for mitigation of disasters caused by earthquakes and tsunamis. In: J. Zschau & A. Kuppers (eds). Early Warning Systems for Natural Disaster Reduction. Berlin: Springer-Verlag, pp. 453-460. https://doi.org/10.1007/978-3-642-55903-758 [ Links ]

Reymond, D., Hyvernaud, O. & Talandier, J. (1991). Automatic detection, location and quantification of earthquakes. Pure and Applied Geophysics, 135(3), pp. 361-382. https://doi.org/10.1007/BF00879470 [ Links ]

Roberts, R. G., Christoffersson, A. & Cassidy, F. (1989). Real-time detection, phase identification and source location estimation using single station three component seismic data. Geophysical Journal, 97(3), 471-480. https://doi.org/10.1111/j.1365-246X.1989.tb00517.x [ Links ]

Talandier, J., Reymond, D. & Oka, E. A. (1987). Use of variable mantle magnitude for the rapid one-station estimation of teleseismic moments. Geophysical Research Letters, 14(8), 840-843. https://doi.org/10.1029/GL014i008p00840 [ Links ]

Wu, Y. M. & Zhao, L. (2006). Magnitude estimation using the first three seconds P-wave amplitude in earthquake early warning. Geophysics Research Letter, 33(16), L16312. https://doi.org/10.1029/2006GL026871 [ Links ]

Wu, Y. M. & Kanamori, H. (2005). Rapid assessment of damage potential of earthquakes in Taiwan from beginning of P waves. Bulletin of the Seismological Society of America, 93(1), 526-532. https://doi.org/10.1785/0120040193 [ Links ]

Wu, Y. M., Shin, T. C. & Tsai, Y. B. (1998). Quick and reliable determination of magnitude for seismic early warning. Bulletin of the Seismological Society of America, 88(5), 1254-1259. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462.985&rep=rep1&type=pdf [ Links ]

How to cite item Ochoa-Gutierrez, L. H., Vargas-Jimenez, C. A., & Niño-Vásquez, L. F. (2019). Fast estimation of earthquake arrival azimuth using a single seismological station and machine learning techniques. Earth Sciences Research Journal, 23(2), 103-109.

Received: February 22, 2018; Accepted: March 15, 2019

^* Corresponding author: lhochoag@unal.edu.co

Licencia Creative Commons

This is an open-access article distributed under the terms of the Creative Commons Attribution License