INTRODUCTION
Estimation occupancy of indoor spaces is relevant nowadays, where strategies have been proposed to mitigate the spread of the SARS Cov-2 virus that caused the COVID-19 pandemic, such as avoiding crowds of people in both open and closed spaces, and in controlled and uncontrolled environments [1]. Computer vision refers to the set of techniques that enable the acquisition, processing, analysis and understanding of images, so that information can be extracted and processed by a computer [2]in recent learning models, such as convolutional and recurrent neural networks, two-and three-dimensional feature tensors can also be inputted to the model. During training, the machine adjusts its internal parameters to project each feature tensor close to its target. After training, the machine can be used to predict the target for previously unseen feature tensors. What this study focuses on is the requirement that feature tensors must be of the same size. In other words, the same number of features must be present for each sample. This creates a barrier in processing images and texts, as they usually have different sizes, and thus different numbers of features. In classifying an image using a convolutional neural network (CNN. The processes are based on fields such as geometry and statistics, where through stages of analysis of size and shape of objects, as well as computational learning, objects present in an image frame are detected, tracked and classified [3].
Currently, processes based on artificial intelligence, such as machine learning [4], and deep learning stand out [5]. In machine learning processes, patterns are recognized based on data in order to generate a prediction in the output, while in deep learning processes, machines reason and analyze with certain autonomy [6]. Although these processes guarantee high reliability in the detection and classification processes, they have a high computational consumption, which implies an increase in the cost of implementation [7]. Processes based on image treatment and processing are presented as an alternative, since they considerably reduce computational resources and present hit rates of over 90 % [8].
Background subtraction is a technique based on image processing that allows the separation and detection of moving objects by means of background segmentation [9]. The processing starts with preprocessing steps such as grayscale conversion and smoothing filtering, with subsequent image segmentation. In addition, the image is complemented with filters based on set operations and by means of a threshold with defined contours, it is identified whether the object belongs to the class in question. Among the applications that stand out from the background subtraction processes is the detection of people, used for presence detection, trajectory tracking, security, among others [10]. Simple Mail Transfer Protocol (SMTP) is a network protocol used for text-based e-mail communication. As an e-mail is sent, it involves the validation of text strings of ascii character strings sent to an smtp server. Communication via this protocol requires three command sequences, mail, rcpt and data, which establish the sending, receiving and content addresses of the mail respectively. The protocol can be used for various e-mail servers [njspecifically the sequence and syntax of smtp commands observed during email delivery. The authors present several improvements for detecting unsolicited email sources from different botnets (fingerprinting.
This paper presents the development of an image processing system, as a low-cost and high-level alternative for the indoor space occupancy estimation by image processing using the background subtraction technique, with alert and email notification via smtp protocol when the number of people exceeds the number of people allowed in the area. The programming is done in Python programming language, using the open library specialized in artificial vision Opencv. The hardware tool used corresponds to a Raspberry Pi model 3B+ embedded board and the device for image acquisition is a 5mpx Raspberry camera. In addition, the proposed system is compared with a conventional occupancy estimation system based on CO2 concentration measurement.-
RELATED WORKS
Mutis et al. [12] developed a method to determine the occupancy of an indoor space focused on indoor air quality control. For the detection and counting of people they relied on the yolov3 deep neural network, and the training was based on the application of nada Dataset, in its section focused on action recognition. In the same line, Han et al. [13]especially when people are the major source of indoor contaminants such as in office buildings. In this paper, we investigate occupancy estimation methods using a dynamic neural network model based on carbon dioxide concentration in a space. We conducted an experiment in a single room to measure carbon dioxide concentration and actual occupancy continuously in the room. We trained and tested the dynamic neural network model tdnn (time-delayed neural network propose a method that uses a dynamic neural network based on the data obtained in a first day of CO2 concentration measurement, decreasing the value of the average error with respect to conventional neural network systems. Yuan et al. [14] implement a method based on fast-sampling infrared sensor detection for collecting real scene data and defining a non-homogeneous Hidden Marcov Model, taking into consideration the effects produced by changes in the environment, followed by a Softmax regression model for the calculation of occupancy estimation probabilities. Szczurek et al. [15] focus on the measurement of estimation of a space intermittently with the measurement of time series of CO2 concentration, temperature and relative humidity. The tests were performed on 60-min shots and the results are applicable to occupancy time estimation. Similarly, Zhou et al. join CO2 concentration level measurement as an occupancy estimation method. They apply a wavelet denoising method and employ the GcForest cascade technique for occupancy estimation [16]. The proposed method presented acceptable performance in terms of reliability of detections, similar to support vector machines, classification trees and Marcov models. Additionally, Zemouri et al. [17] due to the important role it plays in controlling a number of demand-driven applications like smart lighting and smart heating, as well as improving the energy efficiency of these applications in a broader sense. Office occupancy monitoring in commercial buildings can yield huge savings and improvements in terms of thermal, visual, and air quality. However, this is often impeded due to the lack of fine-grained occupancy information. This paper explores the use of low-priced environmental (temperature and humidity use a Raspberry Pi embedded board in addition to Raspberry family sensors for measuring the values of environmental variables such as temperature and humidity to determine the occupancy level of offices, with estimation accuracies of up to 87 %. Disha et al. develops a method based on the collection of information from machine learning algorithms and humidity, light, temperature and CO2 sensors [18]. In their study, the learning processes delivered average accuracy in occupancy estimation of about 95%. On the other hand, Longo et al. [19]i.e., understanding how many people are present in a specific place, an information which is valuable in many scenarios (HVAC and lighting system control, building energy optimisation, allocation and reservation of spaces, etc. propose an alternative method to the conventional ones based on the measurement of the CO2 concentration level, and use a low-cost system based on the capture of frames from Wi-Fi and Bluetooth/Low Energy management from people's mobile devices, generating hit rates close to 95 %.
METHODOLOGY
Figure 1 shows the proposed methodology based on 3 stages for the development of the indoor occupancy estimation system. In the first stage, the area where the occupation estimation will be performed is characterized, in order to determine parameters for the location of the image or video capture device. The second stage shows the development of the tool based on image processing, with immersed stages of definition of conditions for people counting, image preprocessing, blur and morphology filtering, thresholding with contour search and configuration for sending notifications by SMTP protocol. Finally, in the third stage, the efficiency of the developed tool is measured.
Indoor space characterization
In this stage, the parameters required for the location of the image and video capture device are taken into consideration. Using a digital luxmeter, the level of light present was measured with varying shots throughout the day, since although it is an interior space with theoretically controlled light levels, the solar luminance is incident through the windows, varying the level of light in the area in question. In addition, the height and angle of inclination at which the Raspberry Pi camera should be placed is determined so that in its line of sight is both the place of entry to the area, as well as a considerable region of it. Table 1 shows the parameters used to characterize the area.
As shown, the heights at which the camera was located were 2.0 and 3.5 meters respectively, with tilt angles of 60° and 180°. In all four combinations, a view of the entrance to the area is achieved, but only with a 60° angle of inclination is a view of a considerable area of the interior space achieved. In this case, the combination of 3.5 m height and 60° tilt angle is used above the 2.0 m combination, as this reduces the false negative effects of overlapping views of people. Additionally, the light level fluctuated between 460 and 865 lux, corresponding to normal illumination levels for an indoor space [20].
Image processing tool
The tool is developed in Python programming language in version 3.6.8, under the Thonny development environment in version 3.2.7. To estimate the occupation of an interior space, image processing and treatment stages are used, which also requires OpenCV, a library specialized in computer vision, and Numpy, a library used for handling and operations with vectors and matrices. Likewise, the Time library is used for time accounting during processing and for validation, while the Smtplib library and the MIMEMultipart and MIMEText packages are used for communication and sending information through the SMTP protocol. The stages required for the development of the tool are presented below.
People counting conditions
People counting is performed by means of a test line, so that each time an object crosses in an upward direction, the value of a counter defined to accommodate the number of people entering the zone is increased by 1. In the same way, each time the test line is crossed downward, the value of the counter is decreased by 1. The test line is located in a zone equivalent to the area where the access door to the interior space is located. The objects are discriminated for counting through a thresholding step by image processing applying the background subtraction technique. Figure 2 shows the graph representing the conditions for people counting. Section 1 shows the location of the counting test line at the entrance zone to the indoor area. Section 2 shows how to move the body to pass the test line and increase the people counter by 1, while section 3 shows how to move out of the zone and decrease the counter by 1.
Image preprocessing
The image delivered by the video camera is resized to a size of 640x360 pixels, in order to guarantee the relationship between the space occupied by a person in the image frame and the threshold value subsequently defined for the segmentation of the moving object and the image background. The preprocessing starts with a conversion of the video shots to grayscale, thus obtaining the equivalent to the luminance of the image. The original image in RGB (Red, Green, Blue) color scale undergoes a transformation process according to the light intensities captured by the human eye, with a value of 30% red, 59% green and 11 % blue [21], as shown in equation 1.
Additionally, the image frames are segmented, so that initially the separation of the moving objects from the background is achieved. For this, a Gaussian Mixture Modeling is applied by means of the BackgroundSubtractorMOG2 algorithm, which generates adaptability to the objects that were static and start to move, so that they are reincorporated to the image [22]. Equation 2 presents the model that represents the Gaussian Mixture Modeling for image background segmentation, in which represents the learning ratio, and (Mk t)works in binary mode, so that it takes a value of 1 and 0 in case of finding or not finding matches respectively.
Blur and morphological filter
The blur filter is applied with the objective of eliminating redundancies in the image, generating the loss of minor details and a smoothing effect [23]. In addition, this type of filtering does not perform two-dimensional convolutions, but generates two sweeps in one dimension, one for the X-axis and one for the Y-axis. Equation 3 shows the mathematical basis of a Gaussian blurring or smoothing filter, in which X and Y represent the horizontal and vertical offset respectively, and a indicates the standard deviation of the Gaussian distribution. Generally, variant filter sizes between 3x3 and 7x7 kernels are employed in human detection processes. In the present case, a Gaussian mask was used for blur filtering of size 7x7 taking into account that the objective is to prepare the image for processing to a point where each of the objects present in the image frame are distinguishable, thus reducing the computational requirements.
In addition to the Gaussian blur filtering, a morphological filtering step is applied. With this, the image is simplified and shape characteristics of the objects are maintained. In this case, mathematical morphology is applied in order to highlight the structure of the moving object. Morphological operations are performed with respect to a structural element, (an element with shape and size that can be variable, where the sectors with a value of 1 are those that take relevance in the process). The most common morphological operations are dilation and erosion, which generate expansion and thinning effects respectively [24]. In the present study, we propose the use of the morphological operation of opening, equivalent to initially applying an erosion and then a dilation, thus eliminating the external noise of the structure [25]. In addition, an elliptical-shaped structural element of size 5x5 is used. The mathematical basis of the morphological opening operation is presented in equation 4.
Thresholding and finding contours
This stage allows the distinction of those image sectors that correspond to moving objects. The binary thresholding technique is used, so that all those image sectors with concentration and pixel size greater than the base value are counted as people entering or leaving the interior space [26]. In this case, a threshold value of T=50 was used. The thresholding process is completed by contour search. In this case, an external contour search configuration of the image is used, since only the detection of the person is required. In addition, the simple approximation method is used for contour distinction, so that emphasis is placed only on the contour endpoints, which results in the system using less memory, since it does not store all the contour points, making the image processing faster.
Configuration and notification via e-mail by SMTP protocol
Initially, an e-mail account is created as a server. The account is created with the Gmail provider and allows SMTP notification to accounts from other providers. At the configuration level, the user is required to activate the option to allow access to applications of unknown origin from his email account. For compatibility and access purposes, this configuration is required for both the server account and the receiving account. Table 2 shows the configuration parameters, in which "From" and "Password" refer to the identification and authentication of the server email address, "To" refers to the destination email address. "Subject" allows the receiver to preliminarily identify the content of the mail, and takes the name "ISOE 2021 Python", this being the name of the software for the inner space estimation. Additionally, "Message" contains the entire content of the e-mail, indicating that it is an automatic message from the software and indicating the occupancy level in the interior zone. As "Security Protocol" the Transport Layer Security or TLS protocol is configured, which allows the confidentiality of the information contained in the message, as well as the authentication of the message. Finally, in "Port" the port 587 is defined, as this is the one supported by the vast majority of servers and guarantees security in the sending.
System efficiency
The measurement of the efficiency of the computational tool is performed in two stages. First, the efficiency of the image processing for the detection of people is determined, which ends up influencing the occupancy estimation. Equation 5 shows how the detection accuracy (da) is determined, in which Pp, indicates the number of people in the area that the system was able to detect, Pnp, indicates the people that were not detected in the area, and Np, indicates the actual number of people in the area.
Currently, most of the systems that estimate the occupation of interior space are based on the measurement of CO2 concentration levels, with approximations generated by the sensing that is executed with varying times of at least one minute. The proposed system is based purely on computer vision applying tools based on image processing, which, being a different technology than the conventional for these applications, eliminates the use of sensors for measuring CO2 levels and estimates the percentage of occupation of the indoor space by automatic counting. Likewise, the system developed for indoor space occupancy estimation based on image processing is compared with a conventional occupancy estimation system based on CO2. The systems are compared with respect to the speed of execution, as well as the time in which the system determines the intervention of a person in the area of interest, and the hit rate in the occupancy estimation. The method used for the comparison of the techniques is a prioritization matrix, in which the factors mentioned are weighted against each other initially with respect to the level of importance, taking values of 1/10, 1/5, 1, 1, 5, 10, as very unimportant, unimportant, equally important, more important and much more important respectively. Table 3 presents the initial weighting of parameters for the prioritization matrix. For an indoor occupancy estimation system, the most important parameter is the implementation cost as 65.23 % of the final consolidation, followed by the hit rate with 21.73 %, and the speed of detection with 13.04 %.
RESULTS
Figure 3 represents the general architecture of the system. Initially, the Raspberry Pi camera is shown connected via the CSI (Camera Serial Interface) communication bus to the Raspberry Pi 3B+ board. On this, the image processing is executed using mainly the OpenCv computer vision package under the Python programming language. Once the video images have been processed, the respective occupancy estimation notification is sent via Internet connection and applying the Simple Mail Transfer Protocol (SMTP) to an address previously stipulated in the programming. Likewise, the final processing can be visualized through HDMI-VGA connection of the Raspberry Pi 3B+ to a monitor. At the power supply level, the Raspberry Pi is connected via an adapter to 3.3 V in direct current, while the monitor is connected to 110 V in alternating current.
Table 4 also presents the programming logic of the system. This includes factors such as the use of libraries, definition of characteristics for counting and storing the number of people involved in the video, as well as the configuration for communication via SMTP protocol. In addition, the requirements to develop the processes of gray scale conversion, background segmentation, Gaussian filtering, creation of the structural element, morphological filtering, thresholding, contour generation, validation of the contours with respect to size and counting of people for the estimation of occupation and its respective notification are detailed.
Additionally, Figure 4 presents the image processing in the interior space. Section (a) shows the original image, while in (b) and (c) the conversion to grayscale and segmentation of the image background is presented. Likewise, (d) shows the application of the Gaussian smoothing filter, (e) the application of the morphological operation filter, while (f) shows the thresholding stage with the immersed contour search.
Likewise, Figure 5 shows the notification by e-mail and using the SMTP protocol. The tests are performed with two of the most popular email domains at present. Section (a) shows the notification to a Gmail account, while section (b) shows the notification to an Outlook.com account.
Figure 6 shows the ratio of times at which the detection of a person in the inner zone is achieved. The pulse at high (1) indicates the detection of the person, while at low (0) indicates that the detection was not achieved. In (a) shows the performance in the morning hours, in (b) the average performance in the afternoon hours, while in (c) shows its performance in the evening hours. The proposed method (represented with gray color) presented better behavior in terms of the speed with which a person intervening in the area is detected, since the behavior of the systems based on CO2 levels measurement (represented in black color) generally requires an average measurement of one minute of the ppm concentration levels in the room. The proposed image processing-based method detects an entrant in 2.15 seconds, while the CO2-based system reduces its average person detection to 1.74 seconds, but after an average waiting time of 1 minute.
Likewise, results are presented regarding the hit rate of the system based on image processing for the estimation of indoor space occupancy. The system presented an average hit rate of 94.44 %, with 49 correct detections of persons present in the area, 3 persons not detected out of a total of 52 persons present, and its calculation was estimated as shown in equation 6. The error rate of 5.56 % is generated by factors such as non-detection due to shadowing effects of the previous detection. This generates occlusion in few cases from the line of sight of the video camera located in the characterized area.
In addition, Figure 7 shows the performance of the hit rate in the tests performed on the image processing system, and its comparison with systems based on CO2 measurement. For the occupancy estimation system based on image processing, the hit rate ranged from 92 % to 97 %, with an average of 94.44 %. For the system based on the measurement of carbon dioxide levels, the occupancy estimation hit rate ranged from 91.50 % to 95.40 %, with an average hit rate of 93.53 %.
Regarding the comparison of the systems taking into account the cost of implementation, the system based on image processing requires mainly a microprocessor board as a Raspberry Pi 3B+ embedded system, as well as the Raspberry camera connected by CSI standard. The market cost of these devices is around USD 75 to USD 80. As for the system based on the measurement of CO2 concentration levels, usually devices such as the MX1102 data logger are used [27]comfort, well-being and productivity. Existing IEQ monitoring approaches rely on sensor networks deployed at selected locations to collect environmental measurements, and are limited in scale and adaptability due to infrastructure cost and maintenance requirement. To enable high-granularity IEQ monitoring with agile adaption to the dynamic indoor environment, we propose an "automated mobile sensing" system that dispatches a sensor-rich navigation-capable robot to actively survey the indoor space. Data collected in this fashion is sparse in the joint temporal and spatial domain, and cannot be used directly for ieq evaluation. To deal with this special characteristics, we developed a spatio-temporal interpolation algorithm to capture the global trend and local variation in order to use the data efficiently to reconstruct the ieq dynamics. We compared the performance of the automated mobile sensing with a dense sensor network in a laboratory where we measured the air-change effectiveness (ashrae standard 129, which have a market cost of approximately usd 600. These devices measure CO2 concentration in the range of 0 to 5000 parts per million (ppm), with ndir CO2 sensor technology self-calibrated, ensuring accuracy in the measured levels [28]where cheap, fast and accurate measurement of exhaled CO2 vs. time is crucial in the evaluation of lung and tracheal function during surgical anaesthesia and is an under-used bio-marker for underlying respiratory conditions. Current detection methods do not adequately meet these requirements and suffer from considerable cross-talk associated with the commonly used anaesthetic gas nitrous oxide (N2O.
Using the prioritization matrix tool, the proposed method for occupancy estimation using open-source tools based on image processing is compared to the system based on CO2 concentration level measurement, according to parameters such as detection speed, detection accuracy and implementation costs. Regarding the speed of detection of people, the proposed method is 0.41 seconds slower on average at the time of detecting a new person entering the area, but does not require a wait of one minute. Therefore, the proposed method allows to know by instantaneous processing the number of people entering the zone, having better performance than the CO2-based system. In terms of average detection efficiency, the proposed method had an average hit rate of 94.44 %, while the CO2-based system achieved an average detection hit rate of 93.53 %. Although the proposed method managed to improve the conventional method, for the purposes of comparison by prioritization matrix, the techniques were taken on an equal footing due to their similarity in hit rates. Finally, in terms of implementation cost, the proposed method manages to reduce by up to 7.5 times the value of the implementation, so it presents a much better performance with respect to the CO2 concentration measurement system.
Table 5 shows the final decision matrix for the comparison between the proposed method based on image processing with free software tools, and the conventional occupancy estimation system based on CO2 concentration measurement. The proposed method presented a prioritization of 87.972 %, due to its better performance with respect to the detection time of a person entering the area, and its low implementation cost. In addition, its improvement with respect to the detection hit rate. For its part, the system based on CO2 measurement presented a prioritization of 12.028 %.
The proposed method is comparable and is shown as an alternative to systems based on CO2 measurement. With respect to the work done by Zhou et al. in which MX1102 data logger sensors were also used, it was possible to equalize the hit rate above 90 %, and the time in which people are detected was improved, due to their waiting time for the sampling of CO2 concentration data. Likewise, the proposed method improved the 94 % hit rate in the detections, with respect to that performed by Jiang et al. [29] with CL11 sensors, where a previous sampling time of one minute is required for CO2 concentration measurement and occupancy estimation. Likewise, with respect to studies using Chauvin Arnoux CA 1510 sensors, a one-minute sampling time is required for CO2 concentration measurement and occupancy estimation. the hit rate was equaled with an efficiency of more than 90 %, and the time taken to detect a person was improved, since in some cases even sampling at 15-minute intervals was required. In addition, with respect to work such as that carried out by Ansanay [30], the detection rate was equalized with correct detections in more than 90 % of the cases, and the detection time was improved, since in this case, samples were taken at 10-minute intervals. Likewise, the proposed method improved considerably in detection successes compared to that performed by Han et al. [13]especially when people are the major source of indoor contaminants such as in office buildings. In this paper, we investigate occupancy estimation methods using a dynamic neural network model based on carbon dioxide concentration in a space. We conducted an experiment in a single room to measure carbon dioxide concentration and actual occupancy continuously in the room. We trained and tested the dynamic neural network model TDNN (time-delayed neural network, who initially obtained error rates of up to 50 %, so they went from a 7-minute sampling to 15 minutes, which is much higher than the proposed method based on image processing, which almost instantaneously achieves the detection of a new person in the area.
CONCLUSIONS
The characterization of the indoor area allowed from the beginning the attenuation of the effects generated by shadows and overlapping, as these are factors directly related to parameters such as incident light level, height and tilt angle at the location of the image and video capture device. The developed method was tested with brightness values between 460 and 895 lx, values that coincide with standard indoor and office brightness levels between 300 and 1000 lx. The attenuation of the effects of shadows can be complemented with variations in the characterization area where the camera is located by locating it at higher heights than those proposed and with tilt angles that increase the field of view of the device. Nevertheless, the processing system based on image processing achieved an average hit rate of 94.44%, mostly determined by the rate of true positives and negatives in the detection, referring to the persons actually intervening or not in the main frame of the image (or in the interior space). In addition, the proposed method considerably reduced the time in which it counts a person entering the area with respect to conventional CO2 occupancy estimation systems, since, the latter, on average, make measurement sweeps in ranges that vary from 1 to 15 minutes, while, with the image processing and automatic counting, these times were reduced to 2.15 seconds on average. In addition, being a technology based on low-cost, high-level hardware tools and open-source tools, implementation costs are reduced up to 7.5 times. The comparison between the technology developed for occupancy estimation through image processing and the CO2 concentration measurement system using the prioritization matrix technique, allowed emphasizing relevant parameters for large-scale implementation such as implementation cost, hit rate and processing speed, and defining a prioritization between them according to the conditions and needs present in the implementation environment. Finally, both the technology and the method developed are replicable in controlled environments where it is required, among others, to improve air quality conditions and mitigate the effects of the spread of highly contagious diseases generated by crowds of people.