DroidLight: Lightweight Anomaly-based Intrusion Detection System for Smartphone Devices

Smartphone malware attacks are increasing alongside the growth of smartphone applications in the market. Researchers have proposed techniques to detect malware attacks using various approaches, which broadly include signature and anomaly-based intrusion detection systems (IDSs). Anomaly-based IDSs usually require training machine learning models with datasets collected from running both benign and malware applications. This may result in low detection accuracy when detecting zero-day malwares, i.e. those not previously seen or recorded. In this paper, we propose DroidLight, a lightweight IDS which can detect zero-day malware efficiently and effectively. We designed an algorithm for DroidLight that is based on one class classification and probability distribution analysis. For each smartphone application, the classification model learns its normal CPU utilisation and network traffic pattern. The model flags an intrusion alert if there is any significant deviation from the normal pattern. By deploying three self-developed malwares we performed realistic evaluation of DroidLight, i.e. the evaluation was performed on a real device while a real user was interacting with it. Evaluation results demonstrate that DroidLight can detect smartphone malwares with accuracy ranging from 93.3% to 100% while imposing only 1.5% total overhead on device resources.


INTRODUCTION
Smartphones are becoming ever more popular. In 2019, the number of smartphone users reached 3.3 billion and it is expected that the number of smartphone users will reach 3.21 billion in 2020 1 . This growth in smartphones has attracted an increasing number of security attacks on smartphone devices in recent years, mostly in the form of malicious applications (commonly termed malware). According to Kaspersky Lab 2 smartphone malware attacks doubled in 2018 (116.5 million) as compared to 2017 (66.4 million).
Researchers have proposed techniques to detect smartphone intrusions using various approaches, which broadly include signature and anomaly-based techniques such as [1,5,6,8,12,13,20,21,23,26,27]. However, the majority of these state-of-the-art techniques emphasise detection accuracy, while neglecting eciency in terms of resource consumption, which manifests as performance overhead on the device (from here on we will use the terms device and smartphone interchangeably) resources and timeliness/detection latency, which is the time between arrival of an intrusion and its detection. Although eciency is usually high for signature-based intrusion detection, eciency always suers in the case of anomaly-based intrusion detection. This is because anomaly-based intrusion detection usually incorporates computationally expensive data extraction and machine learning algorithms, which impose signicant performance overhead on the device due to its constrained resources. The work in [7] experimentally demonstrates that running a classication algorithm on the smartphone device for anomaly-based intrusion detection can result in signicant overhead on the device resources such as CPU, memory, and battery.
Furthermore, anomaly-based smartphone IDSs usually deploy machine learning techniques, which require training binary class classication models with the datasets collected from running both benign and malware samples inside sandboxes or emulators. Collecting samples of all the malware families is not an easy task as there are always new variants of malware. Also, there are new malware families which can sense if they are running on real devices or on any kind of emulator or sandbox in order to bypass model training 3 . Therefore, machine learning based IDSs may not be able to detect zero-day intrusions, i.e., those not recorded by the classication models during the training phases. To address this, researchers in [19] and [25] introduced one class classication (OCC) based IDSs, which can work with only the normal behavioural data collected from running benign applications and does not require training with the malware samples. However, such OCC based IDSs usually suer from high false positive rates due to the outliers present in the training datasets. These outliers may arise as a result of sudden behavioural changes in the smartphone, for example, a sudden CPU spike due to some Android OS maintenance activity, a sudden rise in the network trac due to some background download or installation processes, etc. Also, OCC model training with the device-wide behavioural data may result in low detection accuracy rate. This is because device-wide behaviour usually covers a variety of behavioural patterns, which may bring challenge for the OCC classier in identifying the malicious pattern due to the malware attack. To address the issues discussed above, we propose DroidLight, a lightweight smartphone IDS that detects zero-day malwares eciently and eectively. In particular, DroidLight makes the following contributions to address the issues in the existing smartphone IDSs: (1) Probability distribution analysis on the raw data to reduce false positive rate: DroidLight detects zero-day intrusions using OCC models. These models learn the normal pattern of the device resource utilisation and classify a behaviour as malicious if it deviates signicantly from its normal pattern. DroidLight addresses the high false positive rate issue in the OCC by performing window-based probability distribution analysis on the raw resource utilisation data as a pre-processing step for model training and testing. (2) Application-specic modelling to achieve high detection accuracy: We achieve high detection accuracy for DroidLight by building application-specic OCC models, which learn the normal resource utilisation pattern for each individual application available in the device. During the testing/detection phase, DroidLight chooses the OCC model that is specic to the application running in the foreground. This model can accurately perform the classication between normal and malicious application behaviour. (3) Hybrid intrusion detection approach to improve DroidLight eciency: We build DroidLight with hybrid services, i.e. DroidLight-Host and DroidLight-Server. DroidLight-Server performs the heavyweight task of training the applicationspecic OCC models on a remote server, whereas DroidLight-Host performs only a small amount of computation on the device for detecting the intrusions. DroidLight-Host processes only the previous one minute of CPU utilisation and network trac data (each sampled at 5 seconds) in order to detect the intrusions on the device. Firstly, it extracts (60/5) = 12 samples for each metric, i.e. a total of (12⇥2) = 24 metric samples from the device and, secondly, it pre-processes these samples by using probability distribution analysis before performing the OCC for the intrusion detection.
We created a realistic evaluation environment for DroidLight, where we deployed three self-developed malwares on a real smartphone device. The malwares represent the behaviour of information theft, currency-mining bot, and DDoS attack. We evaluated Droid-Light accuracy and eciency in detecting these malwares while a real user was interacting with the device. This realistic evaluation process ensures that DroidLight can maintain its accuracy even in detecting the malwares on real devices under their realistic usage.
Experimental evaluation with three dierent device usage conditions (device idle, user using WhatsApp, user playing YouTube video) suggests that DroidLight can detect high intensive (in terms of device resource usage) smartphone malwares with accuracy ranging from 93.3% to 100%. This accuracy is achieved by pre-processing the raw utilisation data using probability distribution analysis and then performing application-specic OCC modelling using the preprocessed data. Experimental results reveal that this accuracy is better than the accuracy achieved by performing application-specic OCC modelling using raw data and by performing device-wide OCC modelling using the pre-processed data. Results further prove that false positive rates are reduced when we use probability distribution analysis as data pre-processing. Experiments also indicate that performing the intrusion detection by DroidLight on the device every one minute imposes a total 1.5% performance overhead on the device resources, which is comparable with MADAM [20].
The remainder of the paper is organised as follows. Section 2 presents related work in ecient smartphone IDSs, use of OCC for smartphone IDSs, and realistic evaluation of smartphone IDSs. Section 3 discusses the DroidLight approach for smartphone intrusion detection. Section 4 and Section 5 provide details of the DroidLight algorithm and architecture, respectively. Experimental results are presented and discussed in Section 6. Finally, Section 7 concludes the paper and discusses future work.

RELATED WORK
We discuss the state of the art in smartphone IDSs under the following two categories: (i) ecient smartphone IDSs and (ii) one class classication for smartphone IDSs.
(i) Ecient smartphone IDSs: In general, there are two main approaches in smartphone IDSs: signature-based and anomaly-based. Signature-based systems [1], [12], [6], [27], [5] are based on matching target application signatures with known malware sample signatures, which are stored in a signature database. The signatures are obtained by using reverse engineering methods, which decompile application packages to extract the signatures. Since signaturebased malware detection systems are based on known malwares, they cannot detect unknown malwares or variants of known malwares. Anomaly-based or behaviour-based systems [21], [8], [13], [20], [26], [23] monitor application or system behaviour in order to identify anomalous activities which may arise due to malware attacks. In general, they identify anomalous activities by deploying machine learning classication algorithms such as SVM, HMM, Naive Bayes, KNN, etc. The computational costs of these algorithms are generally high. Moreover, smartphone IDSs need to perform the detection in timely fashion with low detection latency, which is an important aspect when one considers the potential of modern malware to cause serious damage in a very short period of time.
Recently, researchers proposed host-based IDSs like Andromaly [21], MADAM [20], and Drebin [3], which can perform malware detection on the device in a lightweight manner. MADAM is ecient with CPU and memory overhead of 0.9% and 9.4%, respectively, whereas Andromaly shows CPU and memory overhead of 5.5% and 8.5%, respectively. On the other hand, Drebin is ecient in terms of its detection latency. However, the eciency for these IDSs was explored only for the testing/detection phase without considering the eciency during the training phase. For machine learning based IDSs, it is important to have their eciency in both the training and the detection phases. Researchers in [7], [4], [14], [18] put their eort in distributing the malware detection tasks between device and remote servers/clouds in order to achieve better eciency by performing pre-processing of the data on the device and ooading execution of the heavyweight machine learning algorithms to the server. [4] runs static analysis on the host side before performing dynamic analysis on the server side for detailed malware analysis. [7] showcases the tradeo between executing intrusion detection on device and on server, without proposing any particular scheduling algorithm to decide where to run the analysis; moreover, they do not provide a solution to distribute the analysis task between device and server. In [14] and [18] only architectures of the proposed hybrid solutions are proposed without any working proof of concept prototype. DroidLight uses the hybrid solution in a different way, where the heavyweight training of the classication models is performed on a remote server and the lightweight intrusion detection tasks are performed on the device. Importantly, the training is performed on a continuous basis until stable models are built, which produce accurate results in detecting the intrusions.
(ii) One class classication for smartphone IDSs: One class classication (OCC) is a form of classication, which can work with only a single class of data, without requiring a second class of data as it is the case in binary class classication. Researchers from dierent domains have used OCC algorithms in outlier/novelty detection and concept learning in scenarios where data from negative class is absent, poorly sampled or not dened well [11]. In the literature so far only a couple of works [19] and [25] propose OCC algorithm in detecting smartphone intrusions. In [19] a One-Class Support Vector Machine (SVM) algorithm is used. The algorithm trains the classication model using only the features (application permission and control ow graphs) extracted from the Android manifest and bytecode les of the benign applications. During the testing or detection phase, the classication model raises a negative alarm if the testing data are signicantly dierent from the training data. Thus, [19] follows a static analysis approach for detecting the intrusions. The authors in [25] propose accurate detection of zero-day smartphone malware by using One-Class SVM, which identies anomalous behaviour that deviates from normal behaviour of a large number of benign applications. They performed the analysis for intrusion detection on a remote cloud server in order to achieve better eciency. DroidLight on the other hand uses OCC for performing dynamic analysis of the behavioural data to detect zero-day intrusions in smartphones under their realistic usage. Moreover, DroidLight performs the detection on the device eciently and eectively.

DROIDLIGHT APPROACH TO INTRUSION DETECTION
DroidLight aims at detecting zero-day malwares, which have not been discovered by any IDS or anti-malware software before and do not have any samples available for training binary class classication based IDSs. DroidLight uses one class classication (OCC) for detecting zero-day malwares. For smartphone IDSs, OCC can learn the normal pattern of the device resource utilisation and classify a pattern as malicious if it deviates signicantly from its normal pattern. At this point, it is important to note our rst assumption: malware attacks (as discussed in the previous section) generate signicant deviation in the normal resource utilisation pattern of the device. We support this assumption with the observations from a preliminary experiment, where we considered playing a YouTube video on an Android device for a period of 10 minutes. In the second half of playing the video, i.e. in the minutes 6-10, we injected a self-developed DDoS attack that continuously sends thousands of network packets per second to a remote web service. The timeseries graph in Figure 1 presents the CPU utilisation pattern of the YouTube application in the experimental period. The DDoS attack period (second half -minutes 6-10) is marked in colour. The graph supports our assumption by showing the signicant deviation in the CPU utilisation pattern during the DDoS attack period. In the literature we note works such as [21], [22], [2], [15], [9], [16], [17] where deviations in smartphone resource utilisation are used as indicators of smartphone intrusions.
Timeseries (minutes) CPU Utilisation (%)    For DroidLight, we use the OCC algorithm that is proposed by [10]. The algorithm rst generates articial data from a multivariate normal distribution as estimated from the training data and secondly, uses these articial data as a second class in the construction of a binary class classication model. The classication is based on the Bayes' Theorem 4 . Although the OCC algorithm brings the benet of detecting a malware without requiring its prior knowledge, it poses the challenge of reducing false malware alarms that may arise due to the outliers/spikes which are usually present in the training datasets. This challenge is rather common for all categories of classication algorithms. In the time-series graph in Figure 1, we observe two CPU utilisation spikes in minute 1, which are marked with green circles. These spikes match the malicious behaviour in the second half, and hence may be wrongly identied as malware and result in false alarms. However, we further observe that the utilisation spikes are instantaneous, meaning that they rise high, but drop back to their normal utilisation quickly, whereas the utilisation due to DDoS attack remains continuously high for a period of time. Based on this observation we make our second assumption: resource utilisation spikes in smartphones are usually instantaneous and high resource utilisation due to malware attacks is consistent for a period of time. Considering this assumption, we propose a solution for the false alarm issues which may arise due to the outliers/spikes in the training data. We pre-process the raw resource utilisation data using probability distribution analysis before they are used for training the OCC models. A windowbased (per minute) probability distribution of the raw data gives us distribution curves for each minute in the experimental period. The distribution curves are presented in Figure 1. Each distribution curve can represent three distribution characteristics of the raw data: Mean, Standard Deviation (SD), and Maximum Probability Density (MaxPD). From the distribution curves we observe that the curves are shifting towards the right with slightly thinner and taller appearances in the second half when the DDoS attack is in action. These shifts are actually happening in the form of changes in the values of the distribution characteristics as seen in Table 1. Considering the average values in the rst half (minutes 1-5) and in the second half (minutes 6-10), we observe the following changes in the second half: (i) the Mean value is increased from 50.7 to 74.4, (ii) the SD value is reduced from 5.5 to 3.6, and (iii) the MaxPD value is increased from 0.07 to 0.12. Importantly, we observe that the curve in minute 1 (where we noticed two spikes in the time-series graph) is atter than the other curves in the rst half, which is due to the fact that the CPU utilisation values are distributed between the high utilisation (spikes) and the normal utilisation. From Table 1, we see that the distribution characteristics in minute 1 (Mean=56.6, SD=15.1, MaxPD=0.02) do not match with the characteristics in the second half: rather they are close to the characteristics in the rst half.
In summary, we can classify between normal behaviour and malicious behaviour as well as between a normal spike/outlier and malicious behaviour by performing pre-processing of the raw resource utilisation data with probability distribution analysis and using the distribution characteristic values (Mean, SD, MaxPD) as input for the OCC models. One may argue to use only windowbased averaging of the raw data to solve the false alarm issues, but probability distribution characteristics (Mean, SD, MaxPD) give us more independent metrics to achieve better classication results.
Furthermore, we follow an application-specic modelling approach for DroidLight. This is based on our third assumption: the resource consumption on the device is due to the application running in the foreground and the background services or activities running in the device contribute negligible amount towards this consumption. This assumption is valid for all the Android OS based smartphone devices as the Android OS always prioritises the foreground application over the background services and activities running in the device 5 .
Finally, while considering the evaluation of DroidLight, we found that researchers usually create malware datasets by running malware applications (downloaded from online malware repositories) individually on sandboxes or emulators and they do not consider any user activities during the malware application run. These datasets may not represent realistic malware behaviour as malwares are likely to perform attacks even when users are interacting with their devices and performing benign activities. This may result in degraded accuracy for anomaly-based IDSs while identifying intrusions on real devices. We address this issue by performing the testing of the classication models using the data collected from a real smartphone device under both benign and malicious conditions. To avoid running genuine malware applications on real devices, we deploy self-developed malwares that represent the malicious activities of the real world malware applications. In Section 6, we further support the three assumptions that we made in this section as well as validating our approach with rigorous experiments. In the next section, we describe the algorithm that we developed for DroidLight based on the OCC algorithm proposed by [10] and our approach as discussed above.

DROIDLIGHT ALGORITHM
The DroidLight algorithm comprises three phases: data pre-processing, model training, and intrusion detection. We present the pseudocode of the algorithm in Algorithm 1 and explain the phases as follows.

Data Pre-processing
During this phase the DroidLight algorithm performs probability distribution analysis on the raw resource utilisation metric data (CPU utilisation and network trac). This analysis is window-based, where for each application the algorithm divides the time-series data into a number of data bins (pseudocode line 2), each with a xed window size (1 minute) and does statistical computation to prepare the following vector that consists of distribution characteristics for each metric in each bin: ⇥ Mean, Standard Deviation (SD), Maximum Probability Density (MaxPD) ⇤ (pseudocode lines 3-9). These vectors created for all the metrics constitute the vector for each bin, for example ⇥ cpuMean, cpuSD, cpuMaxPD, networkMean, networkSD, networkMaxPD ⇤ (pseudocode line 10). Finally, in this phase, the algorithm returns an array (one for each application) that contains a number (equal to the number of data bins) of vectors (pseudocode lines 12 and 14).

Model Training
During this phase, the DroidLight algorithm builds OCC models for each application by taking the input as an array of vector data points (pseudocode lines 1-4), which are generated by the data pre-processing phase.

Intrusion Detection
During this phase, the DroidLight algorithm identies whether the testing data vector is normal or anomalous. The testing vector is generated by the data pre-processing phase by considering only the last 1 minute of data. The testing data period is decided experimentally, which gives enough samples to distinguish between normal and anomalous data samples. The algorithm uses the trained OCC model (specic to the application running in the foreground) in order to perform the classication of the testing vector point (pseudocode line 1). Finally, the algorithm ags an intrusion when the OCC model classies the testing vector point as anomalous (pseudocode lines 2-6). Ai .setCl assAt t r ibut e(ai ) 3: Mi = buildOCC Model (Ai ) 4: end for 5: return M Intrusion Detection input: X -a single vector with L metrics generated by pre-processing phase (for last 1 minute of data samples collected from the currently running foreground application); Mx -trained OCC model for the currently running foreground application output: 0 -normal; 1 -anomaly

DroidLight-Host
DroidLight-Host implements the following modules on the smartphone device:

Behavioural Metrics
Collector. This module collects the behavioural metrics such as CPU utilisation and total number of network packets transmitted/received. The frequency of collecting these metrics is 5 seconds, which we experimentally found to be optimal on our experimental device. Any frequency higher than this increases overhead on device performance, whereas any frequency lower than this results in poor detection accuracy. However, this parameter may require tuning for some users and some devices with respect to their usage. The module uses top (linux-based system monitoring utility) for collecting CPU utilisation and checks proc les in the Android system for retrieving network packet information.

Local Storage.
We use the Android SD card as the storage location for this module. The SD card stores the metrics collected by the Behavioural Metrics Collector module and, in addition, stores the OCC models trained by the DroidLight-Server. The metrics and the models are stored with labels specic to the applications, which are active in the device. The module sends the metrics data periodically to the DroidLight-Server. The DroidLight-Server stores the metrics data in its remote storage module and performs the training of the OCC models. The module retrieves the updated models once they are trained successfully. This communication is established via HTTP that communicates with a PHP web service running on the DroidLight-Server. We consider a time interval of 5 minutes for the communication, which we experimentally found to be optimal in terms of overhead on device performance. Again, users can tune this time internal based on their device and network strength to communicate with the remote server. It is important to note that the module ushes the metrics data every time it communicates and sends the data to the DroidLight-Server, and hence local storage needs to store a maximum of the last 5 minutes of metrics data only. Also, the retrieval of the updated models from the DroidLight-Server for an individual application stops depending on its model performance. For example, training for WhatsApp may stop once it is trained with 60 minutes of data samples when the trained model starts performing accurately, i.e. no more updates are required for the WhatsApp model after 60 minutes of its run. Hence, the size of the trained models stored in the local storage may vary.

DroidLight Intrusion
Detector. This module periodically executes the DroidLight algorithm (explained in section 4) for intrusion detection. As the algorithm requires the previous 1 minute of testing data samples, the module executes the algorithm every 1 minute. The module gets the metrics data and the OCC model (specic to the foreground application) from the storage module. We design and implement this module as an Android background application using Android Studio 6 , which further imports Apache Common Maths 7 and Weka 8 libraries for performing statistical operations and OCC. The application integrates with the other two modules discussed above to complete the functionality of DroidLight-Host in performing the intrusion detection.

DroidLight-Server
The DroidLight-Server implements the following modules on a remote server:

Remote Storage.
This module receives the metrics data sent periodically by DroidLight-Host. The module stores the metrics as well as the OCC models trained by the DroidLight Model Trainer module, which is explained in the next section. The module ushes the metrics data for an application once it gets an accurate model and updates the models every time they are trained with newer datasets.

DroidLight Model
Trainer. The DroidLight algorithm uses OCC models (specic to the applications active in the smartphone) in order to detect the intrusions. These models require training with the normal behavioural metrics data stored in the remote storage module in the DroidLight-Server. We perform the training of the models by implementing a Java application in the server that executes whenever there are updated metrics data in the remote storage. The Java application integrates Weka library to build the OCC models for each application that has training metrics data stored in the remote storage. Each Smartphone application has its own requirement of training data samples to produce an accurate model and hence, the training period for the models (which are application specic) varies application to application, device to device, and also, user to user.

EVALUATION
This section presents the evaluation of DroidLight in terms of its detection accuracy and eciency.

Data Collection
To collect the training and testing data from the device we installed DroidLight-Host (section 5) on an Android smartphone (Samsung Galaxy S2 duos running Android Lollipop version 5.0). We refer to this smartphone as droid-device. DroidLight-Host monitors the behavioural metrics (CPU utilisation and network trac) every 5 seconds and stores them on the device's SD card with the names of the running foreground applications as labels for the data points.
As the DroidLight architecture consists of hybrid (host and serverbased) services, DroidLight-Host sends the stored data periodically every 5 minutes to the remote DroidLight-Server (section 5). DroidLight-Server stores the data in its remote storage for model training. DroidLight-Host ushes the metrics data every time it communicates and sends the data to the DroidLight-Server. Hence, local storage stores only a maximum of the last 5 minutes of metrics data, which occupies only 4KB of SD card space on our droid-device.

Self-developed malware applications
To perform our experiments with malicious application behaviour we developed three malware applications: DroidDDoS, DroidThief, and DroidHijack, which mimic the behaviour of real malwares as detailed in Table 3. The typical functionalities that we coded for these malware applications are as follows: DroidDDoS: This malware application continuously transmits hundreds or thousands of network packets per second to the remote web server. Hence, the application converts the device into a bot to perform a DDoS attack on the web server. As presented in Table 2 a small DDoS attack per device will require a total 1.5 million devices to perform a Mirai DDoS 9 attack on a web server, whereas a medium and large attack will require 75,000 and 30,000 smartphone devices respectively. This malware closely mimics the following real malware from Table 3: Loicdos, Ksapp, SeaWeth, and Tascudap.
DroidThief: This malware application retrieves a le from the SD card and sends it to the remote web server (a MacBook running PhP web service) via http. The application runs in the background periodically every 10 seconds to perform this task. In Table 2 we categorise the attack size of the malware based on the size of the le that it steals. This malware closely mimics the following real malwares from Table 3: DroidDream, DroidKungFu, Geinimi, Droid-Deluxe, and GingerMaster.
DroidHijack: This malware continuously runs in the background and consumes device CPU up to 90-100%. This malware closely mimics the CoinKrypt malware from Table 3. Gets root access to the device in order to generate fraudulent advertising revenue by downloading other applications and clicking on application advertisements provided by legitimate ad networks Rootkit

DroidLight Intrusion Detection Performance
In this section we explore the performance of DroidLight both in terms of detection accuracy and false alarms.

Training and Testing Data Preparation.
We considered detection of the self-developed malwares, i.e. DroidDDoS, DroidThief, and DroidHijack on droid-device under three realistic conditions: (i) the device is idle (running no user activities), (ii) the user is chatting on WhatsApp, and (iii) the user is playing a video on YouTube. We experimented with all the three sizes of DroidDDoS and DroidThief attack i.e small, medium, and large as described in Table 2.
During the training phase, we ran DroidLight-Host in the droiddevice to collect the metric (CPU utilisation and network trafc) data samples under each of the aforementioned conditions. DroidLight-Host sends the metrics data to the DroidLight-Server every 5 minutes. Each time DroidLight-Server receives new data from DroidLight-Host, it executes data pre-processing to prepare the training datasets, and then builds the OCC models (one for each condition) with the training datasets. Thus, the training continues until stable OCC models are built. The decision on when to consider that the model for a particular application is stable, i.e. when to stop the training for a particular application depends on the model false alarm rates. For example, if a trained model does not raise many false alarms for a signicant period of time then we may declare that the model is accurate or stable. The number of model false alarms further depends on a number of factors such as: device/application usage, metric pattern for the application, outliers in the metrics data, etc. Hence, we need to take decisions on when to stop the training by evaluating model false alarm rates at runtime. This is possible only with the help of a dynamic decision making that considers the aforementioned factors aecting the model false alarms and evaluates the false alarm rates periodically at runtime. While we keep adoption of such approach as our future work, we  Table 4. During the testing phase, DroidLight-Host continuously monitors metric (CPU utilisation and network trac) data under each of the realistic conditions while one of the self-developed malware performs its malicious activities in the background. The monitoring continues separately for 10 minutes for each condition. During the monitoring, in every 1 minute, DroidLight-Host executes its intrusion detection algorithm in order to detect the malicious activities by analysing the previous one minute of monitored metrics. Thus, DroidLight detection occurs for 10 attack samples for each attack under each condition.
6.3.2 Accuracy of Detection. Table 5 presents the DroidLight intrusion detection results when detecting the self-developed malware attacks with dierent data pre-processing approaches in the Droid-Light algorithm. The approaches are: (i) application-specic raw data analysis: this does not perform any pre-processing on the raw metrics data collected from specic application; rather it simply  feeds the raw data to the training phase in the algorithm, (ii) devicewide probability distribution analysis: this performs probability distribution analysis on the metrics data collected from running dierent applications in the device, and (iii) application-specic probability distribution analysis: this performs probability distribution analysis on the metrics data collected from running specic applications and this is the approach we employed in the DroidLight algorithm. From the results in Table 5 we observe the following: (1) Overall accuracy is improved with the application-specic probability distribution analysis (chosen by DroidLight) as the data pre-processing approach. (2) DroidLight fails to detect small DroidThief attacks. This is not surprising as we have already mentioned that DroidLight is intended to detect only those malware that aect device resource utilisation signicantly and make a deviation in the utilisation pattern. Small DroidThief attacks represent the behaviour of small information theft attacks, which steal device information (IMEI, OS version, device version, etc.), recent call logs, or contact list, etc., in order to send them to a remote server for further instruction to propagate spam or to participate in DDoS attacks. That means that although DroidLight fails to detect small information theft attacks, it can successfully detect attacks carried out as a result of small information theft attacks.
(3) If we consider high intensive attacks, i.e. the attacks that aect CPU utilisation or network trac signicantly like DDoS medium and large attacks, DroidThief large attack, and DroidHijack attack, then DroidLight accuracy lies in the range 93.3% to 100% for all three realistic experimental conditions that we have considered.
The accuracy achieved by DroidLight in this experiment may vary for the same or dierent applications run by dierent smartphone users on dierent devices. The accuracy results simply show that training DroidLight with the normal behavioural metrics data collected from running benign applications or while the device is idle can achieve accuracy up to 100% in detecting malware attacks similar to real-world information theft, currency-mining bot, or DDoS attack on smartphone devices. To adopt DroidLight to other realistic smartphone usage conditions, the training period for DroidLight needs to be adjusted, which as we have discussed will require dynamic decision making on how long to run the training in our continuous training process.

False Alarms.
In order to identify the false alarms from DroidLight-Host, we carried out 60 minutes of experiment with each of the three realistic conditions that we considered for evaluating the accuracy. Table 6 presents the results of the false positives. From the table we nd that the data pre-processing approach adopted by DroidLight has fewer false positives (3 in total) than the raw data analysis approach (7 in total). On the other hand, the device-wide analysis approach has only 1 false positive; however, this is outweighed by its low accuracy as observed in the accuracy results.

DroidLight Eciency
In this section we measure the overhead of DroidLight on the device resources and the timeliness/detection latency of DroidLight.
6.4.1 Overhead on Device Resources. We measured the overhead of DroidLight by running a standard Android benchmarking application, named Quadrant Standard Edition 10 on droid-device. We present the benchmark result in Table 7, which reports the performance of the device with and without running DroidLight. Higher values of the results indicate better performance and each value is an average of 10 executions of the benchmark application. From the table we see that the total performance overhead of DroidLight is 1.5%. This overhead is mainly caused by two of the DroidLight-Host modules: the Behavioural Metrics Collector module, which collects the CPU and network metrics every 5 seconds and the Intrusion Detector module, which runs the DroidLight algorithm (pre-processing of metrics data by performing probability distribution analysis and one class classication of the pre-processed data) every 1 minute on the device. The total overhead of DroidLight (1.5%) is similar to the total overhead of MADAM [20] (1.4%); however, DroidLight has less impact on memory (4.6%) and I/O (0.32%) as compared to MADAM (memory -9.4% and I/O -4%). MADAM claims to have less overhead in comparison with the other smartphone IDSs.
In addition to identifying the overhead of DroidLight on CPU, memory, and I/O, we tried to identify the overhead of DroidLight on device network trac as well. This overhead is important to consider as DroidLight establishes HTTP communication with a remote server every 5 minutes in order to send and receive the metrics and the model data, respectively. In Figure 3 we present the pattern of the CPU utilisation and the network trac for 10 minutes while the device is Idle and DroidLight-Host is running in the background. From the gure we observe that the pattern of the network trac show spikes with around 5 minute intervals (minute 5 and 10). This is due to the HTTP communication that happens every 5 minutes. However, these spikes do not generate any false alarms as DroidLight deals with such spikes/outliers using probability distribution analysis. This is evident from the distribution graph in Figure 3, where the curves for network at minutes 5 and 10 are shifted towards the left, which is similar to the curves at other minutes in the graph and dierent from the curves generated by malicious behaviour as observed earlier in the paper. Moreover, the HTTP communication for the data transfer is required only for a period of time, until the OCC models for each active application in the smartphone are trained properly. Table 4 presents the testing/classication time for each experimental condition, which is in milliseconds. However, as DroidLight-Host executes its detection algorithm every 1 10 https://play.google.com/store/apps/details?id=com.aurorasoftworks. quadrant.ui.standard   YES minute, the timeliness/detection latency of DroidLight may uctuate from milliseconds (100-250 milliseconds) to 1 minute. Similar to the accuracy results, both the timeliness and false positive results may also vary depending on the training period of the OCC models; a model trained with a short period of training data may detect an intrusion quickly but may have high false positives, whereas a model trained with a long period of training data may require a longer time for detection but may have low false positives. Table 8 summarises the features of the existing lightweight anomaly-based smartphone IDSs in order to showcase where Droid-Light stands. It is important to note that the results presented in the table from dierent IDSs are collected under dierent experimental setups and test environments, and hence their comparison cannot be truly justied: rather the results give us an understanding of what these IDSs are capable of in terms of the stated features.

CONCLUSION
This paper proposes DroidLight, a lightweight smartphone IDS, which can detect zero-day malwares eectively and eciently without signicant overhead. We evaluated DroidLight performance under realistic smartphone usage conditions. The evaluation results showed that DroidLight can accurately detect high intensive (in terms of device resource usage) self-developed malwares which represent the behaviour of information theft, currency-mining bot, and DDoS attack.
In future, we aim to extend DroidLight's functionality by incorporating a decision making algorithm which can dynamically decide on how long to run DroidLight's model training for individual applications and for individual users on a given device. This will help to achieve a high rate of detection accuracy and low false positive rates for DroidLight under realistic smartphone usage condition.