Effective identification of distributed energy resources using smart meters net-demand data

International policies and targets to globally reduce carbon dioxide emissions have contributed to increasing penetration of distributed energy resources (DER) in low ‐ voltage distribution networks. The growth of technologies such as rooftop photovoltaic (PV) systems and electric vehicles (EV) has, to date, not been rigorously monitored and record keeping is deficient. Non ‐ intrusive load monitoring (NILM) methods contribute to the effective integration of clean technologies within existing distribution networks. In this study, a novel NILM method is developed for the identification of DER electrical signatures from smart meter net ‐ demand data. Electrical profiles of EV and PV systems are allocated within aggregated measurements including conventional electrical appliances. Data from several households in the United States are used to train and test classification and regression models. The usage of conventional machine learning techniques provides the proposed algorithm with fast processing times and low system complexity, key factors needed to differentiate highly variable DER power profiles from other loads. The results confirm the effectiveness of the proposed methodology to individually classify DER with performance metrics of 96% for EV and 99% for PV. This demonstrates the potential of the proposed method as an embedded function of smart meters to increase observability in distribution networks.

In [18], the authors previously presented a state of the art covering load modelling and NILM methods for DER electrical profile identification/forecasting. Relevant research studies were filtered finding the majority of NILM focussed on conventional loads rather than DER electrical profiles. From over 127 research studies analysed, 4 were focussed on NILM for EV and 5 for PV electrical profiles disaggregation, none of them combining both. Therefore, this paper introduces an innovative NILM method for the identification of DER from aggregated measurements using smart metre net-demand data. The method is developed upon conventional machine learning algorithms such as k-nearest neighbour (kNN), random forest (RF), and an artificial neural network (ANN) known as multilayer perceptron. The novelty of the proposed algorithm consists of implementing a NILM method to identify the presence and thus the electrical network connection point of DERs, and also to quantify the capacity of EV and PV electrical signatures in the customer side. The classification and regression models are evaluated using data from the 'Pecan Street -Dataport' public dataset [19].
The main contribution of this research consists of evaluating the proposed NILM method for the identification of the characteristic electrical profiles produced by EV and PV systems in the residential sector. This helps to increase the observability of low-voltage distribution networks, a sector presently lacking observability and in need of advanced grid monitoring [20].
This study builds on preliminary research in [21] and [22]. In [21], a regression model based on RF and kNNs was proposed. The method was based on the disaggregation of a PV system generation profile from aggregated household measurements of house C recorded in the Smart star project dataset (denoted as Smart* project) [23]. In [22], a NILM classification model based on support vector machines (SVM) and statistical features of electrical measurements was proposed. Data acquired with an OpenPMU instrument [24] was used to classify the presence of a rooftop PV system from the aggregated load of a residential dwelling in the UK. The main differences and new contributions in this research are: (i) the inclusion of both classification and regression approaches; (ii) the identification of EV and PV electrical profiles using the same method; (iii) the addition of ANN as the classification and regression model; (iv) the inclusion of statistical variables from windows of the active power as the input of the classification and regression models; (v) a deeper evaluation of the NILM method performance; (vi) the usage of the Pecan Street household loads; and (vii) the training and validation on a larger dataset comprising 1 month of data from 18 different households.
The remaining study is divided into seven sections. In Section 2, relevant research studies are discussed and NILM methods focussed on DER load identification are summarised. Section 3 describes the proposed methodology and introduces the main concepts of the NILM experimental method. Section 4 reports the results of the classification and regression model for DER identification. In Section 5, a sensitivity analysis is presented focussing on the evaluated metrics and processing times. The paper continues in Section 6 with the benchmarking and validation of the model against other literature and datasets. A discussion is included in Section 7 to provide a comprehensive assessment of strengths and limitations of the proposed NILM method. Finally, Section 8 presents the conclusions and lines of future work available to this study.

| WORK TO DATE
NILM techniques based on machine learning methods proposed to date can be classified as supervised or unsupervised based on the information provided to the model during training and testing.
Supervised NILM methods use labelled data; thus, the class of an expected output is known. One of the main disadvantages of these techniques consists of acquiring labelled data, which requires monitoring not only aggregated loads but also specific ones. In contrast, unsupervised methods using cluster techniques, such as k-means, deep learning, Hidden Markov Models (HMM) and variations thereof do not require labelling. The disadvantage of these methods is their higher complexity compared with supervised techniques [18].
NILM techniques have been widely studied for the identification of conventional loads yet limited work has considered these approaches applied to DER identification. However, the increasing penetration of DER in low-voltage networks has changed this trend and NILM for DER identification is gaining relevance in the scientific literature. In [25], the authors presented an event-based NILM method to classify EV and air conditioning units among aggregated measurements from a household. The method was based on statistical features from active power records reaching recall and precision metrics of 93% and 88%, respectively. In [26], a NILM regression method was proposed to separate appliances power contribution from individual houses in the USA. Among the loads, power consumption from EV and conventional electrical appliances was disaggregated using an unsupervised technique. As a result, an average root-mean-squared error of 1.101 was achieved for 20 households. Regarding solar generation systems, the authors previously proposed a classification approach to identify the contribution of a 3.5-kW peak (kWp) PV system [22]. SVMs and a principal component analysis were implemented as a supervised NILM method to separate PV generation from conventional loads, obtaining an F 1 score of 96%. In [27], a regression model was developed to estimate PV power generation and PV module orientation. Based on a deep learning approach, the authors used a large dataset of over 1000 customers to identify the size of PV generation systems. The unsupervised approach contributed to achieving a mean average percentage error of 2.09% in the disaggregation of PV power generation. The mentioned and recent research studies focussed on the disaggregation of EV, and PV are summarised in Table 1.
Overall, it can be observed that the literature has focussed on assessing the penetration of either EV or PV and the study of both DER combined is seldom addressed. An initial investigation for the classification was done in the IEEE European Low-Voltage Test Feeder focussing on aggregated measurements in the low-voltage side of the distribution network [32]. However, the lack of suitable monitoring on lowvoltage distribution electrical systems could make the implementation of such approaches difficult in the near future [30]. Therefore, the work in this study improves upon previous studies by addressing multiple DER (i.e. EV and PV) and by approaching the classification and regression problem from smart metre net-load demand data.

| PRINCIPLE AND METHOD
Aggregated measurements of an electrical system sum app individual contributions from each load connected to the circuit. This can be represented as in Equation (1).  where P(t) represents the net load in the time t, P i the power consumption of the ith load in a house with j loads, and T the simulation horizon. The value of T can be computed as in Equation (2).
The proposed algorithm is based on statistical features computed from windows of aggregated measurements, which are used as inputs of the chosen machine learning techniques to identify DER electrical profiles. The NILM method consists of six stages as in the flowchart illustrated in Figure 1.

| Data acquisition and interpretation
This research has been developed using the 'Pecan Street -Dataport' dataset [19]. Pecan Street is one of the largest public databases available for research studies. It includes data from the electrical, water and transport sector with more than a thousand residential dwellings and businesses. Volunteers from mainly four locations in the USA contributed to developing this dataset used for research and product testing purposes. Additionally, it includes information from about 250 houses with PV generation and 65 load profiles from EV.
In this research study, aggregated loads from 18 houses located in Austin, Texas are used. These houses were selected within the dataset among those with PV and EV because they have different EV profiles, which add higher variability to the input dataset. The chosen profiles included EV owners with vehicles such as Chevrolet Volt, Mitsubishi i-MiEV Nissan Leaf, Tesla Model S. These vehicles draw active power between 3.5 and 10 kW at each of their respective houses. Among the selected load profiles, 10 had installed rooftop PV systems on their premises with a PV generation capacity in the range of between 3 and 12 kW peak. Active power measurements are recorded every minute for a period of 1 month in 2016.

Data Acquisition & Interpretation
Pecan Street -Dataport Dataset

| Data pre-processing
The Pecan Street dataset is filtered to select DER electrical profiles (PV and EV) and aggregated household load from the properties studied. This data is required for the training and testing stages. Pecan Street also includes individual load profiles from conventional devices (i.e. kitchen appliances, electronics, lighting etc.). In practice, the utility implementing this method is likely to only have the aggregated household load, so these individual load profiles are not used in this study since they are already contained within the aggregate data. As a result, four electrical profiles are obtained as illustrated in Figure 2 for house 9647. It was found that EV owners charge their vehicle for about 3 h each day (between 10 and 21 kWh depending on the amount of energy the car needs), representing one-eighth of the existing data. Therefore, it is sensible to simulate additional EV charging patterns by adding the EV profile to the data at other times of day (5 h in advance and 5 h ahead) whenever EV is the objective of the proposed NILM. This provides richer and more diverse information to be used as input of the classification and regression algorithms. In contrast, the PV profiles are largely cyclical lasting several hours per day and with no generation at night times. Thus, PV generation profiles are used as originally reported in the dataset.

| Data processing
The proposed method utilises sliding windows, which is a technique used for the identification of events in small periods of time within a dataset. This permits the division of the dataset into several parts to be trained and tested, enabling realtime identification [33] instead of having to wait the full day to get the load profile. The sliding-window technique relies on two main factors: the size of the window (n) and the shifting factor (a) between consecutive windows. Each window is formed as in Equation (3). where Consequently, values on each window are analysed to determine if the window includes relevant information from a particular DER or mainly load demand from conventional loads, so it is used in training as a positive or negative example. To do so, a threshold value of 0.1 kW is used to filter noisy measurements from EV load demand or PV generation and to label each window as 'DER' or 'non_DER', respectively. Therefore, if the recorded DER power is lower than the set threshold value for more than half of the window size, it is labelled as not having the targeted DER. In contrast, windows presenting a higher influence of DER in the house's net load are set as containing the DER. The labelling of each window is defined in Python as in Figure 3.
The labelling of windows into classes facilitates the identification of how unbalanced the training data is. Subsequently, the number of 'DER' and 'non_DER' windows are matched by randomly discarding those from the predominant class. This enables the creation of a balanced dataset with both classes equally represented and contributes to reducing the probability of overfitting machine learning algorithms to one of the classes.

| Feature extraction
The time series created as windows of n samples represents a single characteristic (aggregated net load) in the time domain of the aggregated household measurements. Thus, statistical variables are extracted from each window (W i ) to reduce noise in the net load signal, optimise computational times, and provide machine learning models with several time-domain features [22,34]. Namely, minimum (W i-min ), maximum (W i-max ), the maximum minus the minimum of W i (W i-max -W i-min ), mean (W i ), variance (σ 2 ), standard deviation (σ) and kurtosis (K u ) are obtained from each window. The mathematical expressions to derive these features are defined as follows [22,34].
The input matrix (X ), used for both classification and regression models, is then formed with the described features as it is expressed in (9).

| Load identification
Processed data are used to train and test machine learning algorithms to obtain classification and regression models. In classification problems, the expected output (y class ) is defined as a DER class (EV/PV or not EV/not PV). In contrast, regression models use active power as expected values (y reg ). Since the proposed NILM method identifies only one DER at the time, two different models are trained, one for PV and one for EV prediction. Considering those models can be either regressors or classifiers, this results in a total of 4 different models being trained.
Subsequently, processed data is divided into training and test sets. In this case, cross-validation is implemented to avoid overfitting the system and to provide a realistic measure of the proposed method's performance. This technique also contributes to making efficient usage of the available data. A k-fold of 5 is set to split the processed data into 5 random smaller sets, from which 4 parts are used to train the model and 1 is used to test its performance. This process is repeated 5 times until each fold is used as a test set. To avoid overlapping windows in both training and test sets, the input data X and the expected output vector y have been created using the windows obtained from the processed dataset randomly. Performance metrics are provided as the average value of the results achieved on each fold. Each test is completed using three conventional machine learning algorithms: kNN, RF, and an ANN based on a multilayer perceptron.
Neighbour-based methods (kNN) establish the expected values based on the correlation of samples (called neighbours) surrounding the unknown value [21]. Since the method does not need to know the input data or characteristics, it is adaptable to predict complex loads, such as EV and PV systems from aggregated measurements [32]. The proposed method uses uniform weights to predict new samples; thus, known values are equally evaluated to compute a prediction [35].
RF defines its predictions based on an agreement between estimators (called trees). Each estimator, which can be seen as an upside-down tree, works separately to make a prediction considering training records [21]. The final answer is then computed as the average of all predictions in the forest [33]. RF uses criterions such as Gini impurity and entropy to measure the quality of a sample classification on each node of the three [36]. Gini impurity is a metric based on erroneous classifications, which provides 1.0 for a misclassification and 0.0 for a perfect prediction. Entropy measures how well organised is a dataset [28].
A multilayer perceptron (ANN) with only one hidden layer is also proposed for the DER profile identification. ANNs are used in several research environments, such as computer vision [37]. These methods have been considered to present the highest performance for NILM methods [38]. In this work, a supervised multi-layer perceptron (MLP) is implemented with an input of dimension n (window size), a hidden layer of 100 neurons (default value of the scikit-learn library in Python), and output 1 (prediction). This method is trained using backpropagation and the activation functions are rectified linear units and identity for hidden and output layers, respectively. The theoretical background for ANN can is consulted in [35,36], and [39].

| Evaluation metrics
The performance of classification and regression models is evaluated with different metrics. In classification approaches, Boolean values are compared to provide an efficacy of a method. True and false predictions of positive (i.e. a window containing DER) and negative values (i.e. a window without DER) are used as the base to calculate the performance of the NILM method. The used metrics are accuracy (A CC ), recall (Re), precision (Pr), and F 1 score, which are the most common metrics used among NILM studies. These metrics are defined as defined in Equations (10) to (13) [18].
where TP, TN, FP, and FN are true positive, true negative, false positive and false negative predictions, respectively. Regression methods require metrics focussed on error, which is measured by comparing expected values with predicted results. The performance of the regression model is evaluated using the mean bias error (MBE), mean average error (MAE), root-mean-squared error (RMSE), and coefficient of determination (R 2 score). Definitions of the implemented metrics for regression models are shown in Equations (14) to (17) [21,40].
RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 m where y i is the ith expected value, b y i is the ith prediction, y is the median value of the expected values vector (y test ), and m is the size of y test .

| Description of dataset and initial conditions
One month of data, from selected houses with EV and PV electrical profiles from the Pecan Street dataset, has been used to train and test the machine learning models. This short time of data is prolonged when EV is selected as a target by making the shifting EV load profiles described in Section 3.2. This reduces dependency on large datasets for EV to test the proposed method and enables us to compare achieved results with the existing literature. While smart metres frequently report in sampling rates between 1 min and 1 h [41,42], recent research studies repeatedly use a variable window size between 1 s and 10 min. In this research study, window sizes of n = 5, 10, 15 min and a = 1 min are initially evaluated matching most of the literature. This serves as an initial approach to analyse the impact of parameters' variation on each machine learning algorithm. Thus, the number of maximum iterations (max i ), neighbours (k), and estimators (r) have been selected for ANN, kNN and RF using grid-search within the ranges provided in Table 2. The variation of these parameters allows the evaluation of their effect on the performance and processing times of the proposed method, a trade-off needed to be included for the realtime identification approach.
The proposed NILM method was tested using Python 3.8.8 and the machine learning library scikit-learn (version 0.24.1) on an Intel Core i7-8700 @ 3.20 GHz PC with a 16-GB DDR4 RAM. All experiments were run in the same machine to get comparable computational costs.

| Approach for DER classification
Results were clustered considering the target (EV or PV) and the machine learning technique in use. The selected classification metrics, Equations (10) to (13), provide the best score when results are closer to 1. When the metric score is over 0.5, it indicates that the model is able to recognise the input data to provide a prediction, rather than making a guess. The results are presented using box plots to provide a complete picture of the initial 900 simulations (90 � 5 per DER). Each box can be interpreted in four percentiles graphically denoted by error bars (each one representing a percentile or 25% of the results) and the box itself, where 50% of the values fall within. The line in the centre of the box indicates the median value of all results for each experiment. Outliers are plotted outside of the error bars.
Overall, RF exhibited better performance for the classification of both EV and PV than kNN and ANN as shown in Figure 4. If the results achieved between recall and precision are compared, in general, it can be said that the models tend to provide a higher number of false negative predictions. In the first scenario, defined for the classification of EV load profile, the reported metrics are between 75% and 97%, with an average result of 89%. Machine learning methods exhibited a consistent performance with ANN metrics presenting low variability but the lowest performance of the three methods. The classification of PV generation profiles exhibited better performance in average but higher deviation in performance metrics than those for EV. Evaluation metrics were obtained in the range of 78% and 99%, with a mean value of 92%. As it can be seen in Figure 4, with an overall F 1 score of 94%, RF results outperformed the ANN and kNN machine learning methods, which reached an average performance of about 89% and 91% for the same metric.

| Approach for DER regression
The regression metrics MBE, MAE, RMSE, and R 2 score are used to evaluate the performance of the proposed NILM method. The best possible scores for these metrics are 0 in the case of MBE, MAE, and RMSE while the R 2 score indicates that a prediction is close to the expected value when the metric is 1.
Overall, consistent error-based metrics were achieved to predict EV load profiles presenting higher errors than those for PV. Two factors contributing to this are the negative power consumption observed in the smart metre when the PV generation is higher than the load demand in the house, and the -7 second one is related to the larger data available to train the model for PV than EV. Achieved results for the proposed NILM method as a regression model are illustrated in Figure 5.
The MBE indicates the overall bias of the model to estimate unknown samples. While positive values indicate the regression model over forecast new samples, negative ones indicate it is likely the model will underestimate them. In our case, the identification of EV projected a negative MBE, which represents a large bias to underestimate new samples. This can be caused by the similar characteristics of EV with other larger loads in the houses and the effect of high variability in PV generation profiles affecting the training stage of the identification models. In the case of PV, the metric was still negative but considerably close to 0. This means the model is likely to generate new predictions with low bias and these estimations might be slightly lower than the expected values.
The MAE reveals the average deviation of the prediction concerning expected values. The proposed method provided a larger error to identify EV than PV systems. In the case of EV identification, this metric presented an average value of 0.42 kW using RF, which represents 12% and 4% error for identification of the 3.5-and 10-kW EV load profiles. This deviation increased in the case of a kNN and ANN as regressors, in which case maximum average deviations of 16% and 24% were obtained, respectively, for 3.5-kW EV. In the case of PV systems, lower values were obtained for this metric indicating better performance to identify this DER. An average MAE of 0.39, 0.29 and 0.23 kW was reported for PV identification based on ANN, kNN and RF. This represents a deviation between 2% and 13% for the identification of the PV systems ranging between 3 and 12 kWp.
The RMSE and the R 2 score are metrics related to how well the model can predict new samples. Significant error values are penalised by RMSE (in kW), which means the larger the deviation of the residuals from the actual value, the larger the RMSE. Electric vehicle identification produced the largest RMSE values. Average RMSE values of 1.41, 1.20 and 0.90 kW were obtained for ANN, kNN and RF, correspondingly. This represents a deviation regarding EV power consumption in the selected houses between 9% and 40%. In the case of PV systems identification, better RMSE values were obtained with 0.68 kW for ANN, 0.58 kW for kNN as regression models while a score of 0.47 kW was obtained for RF. This represents a deviation regarding PV systems in the input dataset between 23% and 4%.
The R 2 score provides a measure of how well the model makes new predictions with the best possible score of 1.0. This metric exhibited average scores around 0.59, 0.70 and 0.82 for the EV identification using ANN, kNN and RF, respectively. Therefore, even though the model works relatively well for the identification of EV, there is still room for improvement. Higher values of R 2 score were reported for PV systems with overall scores of 0.83 for Ann, 0.87 for kNN, and 0.92 for RF.

| SENSITIVITY ANALYSIS
In this section, the variation of several parameters is analysed to evaluate their effect on the performance and processing times of the proposed NILM method, namely, the DER, window width (n), and the machine learning parameter value set. The results are reported below considering each machine learning algorithm and the DER evaluated.

| Performance
Evaluation metrics for the classification approach present the model as a successful method to identify either EV or PV systems. Since the F 1 score provides a balanced representation of accuracy, recall, and precision, results for the F 1 score are provided as a function of the DER and the proposed parameters of each machine learning technique. This is max i , k, and r for ANN, kNN, and RF, respectively, as defined in Table 2. As an example, experimental results for a window width of n = 10 are exhibited in Figure 6. For this value of n, the F 1 score oscillated between 85% and 91%. In general, the lowest performance was achieved for the identification of EV (dark blue) for the three algorithms. This lower performance can be understood from the negative effect caused by the high variability of PV generation and the mixed EV load demand when a small-capacity charger is used as it was shown in Figure 2. In contrast, disaggregation of PV systems exhibited better performance than their counterpart (Red). This is due to a large number of windows that are available for PV in training-since the PV generation lasts in average 10-12 h versus the shorter burst of few hours for EV-and the excess of power is delivered back to the grid, which is seen as a negative signal easier to identify by the algorithm during training. Looking at the performance of machine learning algorithms, kNN continuously reduced its performance for a k higher than five neighbours. Contrarily, ANN and RF slightly improve the performance of the NILM method with the increase in their parameters. Therefore, max i = 500, k = 5 and r = 250 are selected as best parameters of each algorithm.
Once machine learning main variables are fixed, additional simulations are performed changing the window size within values in the range of n = [5, 10, 15, 30, 45, 60] (min). The classification results using these parameters are summarised per DER, window size, and machine learning algorithm as it is presented in Table 3. Despite overall classification scores were close within different window sizes, in general, it can be said that larger windows contributed to an increase in performance metrics with RF providing the best performance among the chosen machine learning techniques.
Consistently for the regression approach, the machine learning techniques provided the lower error for the abovementioned parameters. As it was previously observed in the classification approach, predictions of the regression models were similar for the three evaluated window sizes. In general, all machine learning algorithms provided their best performance for 60-min windows. However, ANN did not present a constant outcome for the identification of EV load profiles. A summary of all 900 simulations ran for regression models are provided in Table 4 as a function of the scenario, the window width (n), and the machine learning technique.

| Processing times
Processing times of the evaluated NILM method were in the order of the microseconds per processed window for the majority of the simulations. Each machine learning algorithm exhibited similar computational times to process data from EV and PV windows. Due to the implementation of statistical variables as inputs of classification and regression models, relatively constant times were observed between different sizes of the window. This is because the window size does not change the dimension of the input vector (X ) to train and test the algorithms, designed to have seven features.
Among the implemented machine learning methods, kNN demonstrated to be the fastest one for training and testing purposes providing results under 3 and 17 µs per window, respectively. This method presented test processing times 12 and 5 times higher than training times for classification and regression approaches. In contrast, ANN and RF required longer computational times per window to train the model. In the case of ANN, the training process took up 1 ms for both classification and regression models. This approach projected the fastest times to make predictions requiring less than 3 µs per window, this is less than 0.3% of the time to train the model. RF presented computational times of up to 500 µs per window during the fitting stage for classification purposes. This time was improved for the classification of EV and PV windows, when up  to 60 µs per window were required to predict a DER class from a particular window. However, the regression model based on RF provided the largest times during training, reaching up to 2 ms per window. Similarly, this method presented the largest computational time to disaggregate EV and PV power from windows of aggregated measurements with predictions projected in up to 100 µs. Given that all those prediction times are in μs or less and that the minimum windows duration is 5 min, these results show how all the proposed methods can operate in real time, yielding a prediction for the window before the next load window is collected. Computational times for the classification and regression model performance evaluation are illustrated per DER and machine learning algorithm in Figure 7.

| Sensitivity to algorithms' input features
In this section, we compare the effectiveness of the statistical features being used as input for our machine learning algorithm versus using the raw input load values. Thus, using machine learning model parameters defined in Section 5.1, classification and regression metrics for two sets of inputs are compared. Namely, windows of active power and statistical variables (see Equation 9) were used as inputs of the NILM method. For the window of the active power, the input matrix X is defined as being Wi defined as per Equation (3). Windows of 60 min were chosen as a case base to analyse the main difference in performance metrics of the proposed method. This window size was selected because, in general, the three machine learning techniques have presented their best performance for this value of n as described in Section 5.1. Achieved results are presented in Table 5.
Overall, the statistical variables as inputs of the NILM method did not generate a significant improvement of ANN performance metrics. In fact, a reduction in its performance was achieved for n = 60. Metrics such as MAE, RMSE and R 2 score were reduced by 5%, 1% and 2% for EV, and 3%, 4% and 1% for PV. The MBE was also further located from 0 in about 23% for EV, but this metric was improved in almost 50% for PV. Nevertheless, kNN and RF projected a considerable increase of metrics in the classification as well as a substantial reduction of the deviation from EV and PV electrical profiles projections. While an improvement of about 4% was achieved for the F 1 score for both DER using kNN, the other metrics presented improvements over 10% for EV and 20% for PV. RF also exhibited a significant increase in metrics such as the F 1 score with 3% and 2% improvement for EV and PV identification, respectively. The R 2 score was outperformed in about 4% for PV but a considerable increase of 15% was achieved during EV identification.

| BENCHMARKING AND VALIDATION
The performance of the proposed NILM technique is compared to other methods available in the literature. The developed NILM model is also tested in other publicly available datasets from the literature to evaluate its adaptability to identify DER from electrical profiles different from the ones used for training purposes.

| Comparison with existing literature
Relevant research studies regarding the classification and the identification of both EV and PV systems have been compared with the proposed NILM method. To provide a comprehensive analysis, achieved metrics are presented for the proposed method as the classification and regression model using n = 60 min and r = 250 for RF. This is because as it was illustrated in Figure 6 and Table 4, RF outperformed both ANN and kNN for the evaluated scenarios for classification and regression approaches. However, since the experimental settings and dataset are different from those in other studies, this is not a direct comparison but a reference of how well the method performed regarding other methods. Most of the relevant references used in this research study to compare achieved results are focussed on the DER active power from the Pecan Street Dataset, which contains both EV and PV of several houses. The exceptions are [22] (OpenPMU [24] dataset with electric current as feature) and [21] (Smart* Project dataset). The main characteristics and presented performance metrics of NILM methods developed for the classification of EV or PV systems are summarised in Table 6.
As it can be seen, other authors have previously proposed relatively similar NILM model parameter configurations in terms of window size (3-10 min), step size (1 min), and performance. In our case, the largest window width is proposed to improve NILM method performance using only one month to train and test the algorithm. As a result, the system can effectively identify EV and PV classes from household aggregated loads with outstanding metrics overcoming previous research studies. In the case of PV identification, outstanding results were achieved in all metrics providing similar or better performance in comparison with existing methods.
Similar results were achieved for the implementation of the proposed method as a regression model for EV and PV disaggregation. RF metrics were consistently used to compare  achieved results with other sources. As it was proposed in the classification approach, results for the identification of EV and PV using 30 days of data from the Pecan Street dataset are compared with the existing literature. These results are presented and contrasted with other sources in Table 7.

T A B L E 5 Effect of statistical variables as inputs of the non-intrusive load monitoring method
Regarding methods proposed in the current literature, the identification of EV power consumption from aggregated measurements provided a better RMSE score than the one presented in [26]. In this source, the achieved normalised RMSE represented a deviation of 28% with respect to the maximum power of the selected EV. In our case, the reached RMSE oscillated between 26% and 9% for EV load profiles within the range of 3.5 and 10 kW. In addition, the performance of the system to disaggregate PV power generation is consistent with the literature. The identification of PV generation profiles projected an RSME deviation between 16% and 4% for installed capacities from 3.5 and 12 kWp. This is consistent with the achieved MAE, which exhibits errors representing up to 8% of the original PV signal. Thus, results presented by authors in [27] and [30] are improved in this case. The R 2 score also proves the proposed method to be more accurate than previous results presented in [21].

| Model generalisation: comparison to other datasets
To evaluate the adaptability of the proposed method and remove any bias from the input data, external datasets have been used as testing sets, namely, 3 days of house 4336 from Pecan Street and 3 days of house C from the Smart* Project [23]. Considering the results achieved during the crossvalidation stage, the model parameters have been set to n = 60, max i = 500, k = 5, and r = 250. The training set was selected as 30 days of the 18 houses used in sections 4 and 5. Additionally, the adequation of the external data to create the test set was performed as described in Section III. However, an initial step was completed for House C of Smart* Project to convert the electric current of the PV system into power, which is described in [21]. The NILM method was evaluated using ANN, kNN, and RF. Achieved results are shown in Table 8.
Similar to previous results, RF provided the best metrics for the classification and the identification of EV and PV profiles from unknown testing sets. In the evaluated scenarios, the proposed NILM model performed well to obtain classes from net load demand measurements. However, a slight reduction in performance metrics was observed in general, which is expected. In the case of PV power disaggregation, the performance of the system provided a considerable deviation between predictions and expected values. It can be said that the errors are relatively small if one analyses metrics such as MAE, MSE, and RMSE. However, the R 2 score (which should be closer to 1 in the best cases) indicates a low performance in the case of kNN and an average performance of ANN and RF. Furthermore, poor metrics were obtained for EV identification. This is consistent with the difficulty to identify the EV load profile using only active power, which can be similar to large loads or the peak demand of other loads at certain times, for example, washing machines.

| DISCUSSION
The implementation of the balanced data step in the development of the proposed NILM method contributed to improving overall performance for the DER identification in processing times and performance. The method performed up to 8 times faster for the classification and about 3 times faster for regression purposes when using balanced data as input of the machine learning algorithms. The discrimination of windows into containing or not containing the targeted DER led to training and test sets, which were smaller in size and also richer in information. This is because the dominant window class is reduced to make them match in size to the ones with less appearance in time. Thus, the data passed to the machine learning algorithms contain the same percentage of windows with the influence of DER power consumption/generation as T A B L E 6 Proposed non-intrusive load monitoring method for distributed energy resources identification using classification models versus the existing literature Ref.

Parameters Houses Acc (%) Pr (%) Re (%) F 1 (%) n a Data size
Electric vehicles (EV) as target [12] 10 min 1 min - those with only power consumption from conventional loads. For example, in the case of EV, there are more windows classified as not containing an EV than the ones having electrical information of the car in the charging stage. Reducing the number of windows having only information of power consumption from conventional loads helps training the machine learning algorithm to a more general case. This is confirmed with the classification metrics exhibiting relatively low variation among them for fixed parameters setup (i.e. Acc, Re, Pr, and F 1 score for a particular scenario). Additionally, the extraction of statistical features from windows of aggregated measurements contributed to reducing the negative impact caused by an increased number of samples per window. This is because the larger size of the input matrix X negatively impacts processing times of the implemented machine learning techniques. Thus, keeping constant the dimension of X contributed to increasing the performance metrics of the NILM method with relatively close processing times for different window sizes. In terms of proposed machine learning techniques, the three algorithms provided consistent outcomes. First, kNN proved to be the fastest method during the training. This method yielded outstanding F 1 scores of 93% and 97% for the classification of EV and PV electrical signatures. In the case of the ANN, remarkable results were achieved for the implemented multilayer perceptron ANN, providing slightly lower results than kNN but the shortest processing times to make predictions. An F 1 score of 85% and 92% was achieved with this method to classify EV and PV from aggregated measurements. RF demonstrated the effectiveness to understand the non-linearities of EV and PV profiles overcoming the other two methods. In the classification, the F 1 score was improved to 96% and 98% for EV and PV, respectively. Similar tendencies were observed in regression approaches, with RF overcoming kNN and this one surpassing ANN metrics.
Overall, results considering the same database for training and testing demonstrated how RF presents the best performance. This makes this method applicable to houses where T A B L E 7 Proposed non-intrusive load monitoring method for distributed energy resources identification using regression models versus existing literature

Metrics Window (n)
Step ( historical data is available. Although this performance is maintained when different datasets are used for training and test purposes, ANN also proved to be adaptable to this scenario. In this case, lower metrics were obtained for EV and PV but in general, the proposed method provides good predictions for classification and regression methods to disaggregate the DER electrical signature. Although a different setup was used in this manuscript, the balanced data lead to accomplishing consistent metrics along with classification and regression approaches. As a result, the proposed method provided a fair improvement of scores projected in other research studies, with most of these outcomes being outperformed. Comparison of the findings with those of other studies confirms the beneficial performance of the proposed method as a classification and regression model. The main limitation of the proposed model is based on the resolution of the measured data (1 min). This makes it difficult to implement advanced feature projection techniques based on transient characteristics of the electrical system under analysis. Since both EV and PV systems are inverter-based technologies, harmonic distortion on electric systems is a potential feature to be selected as input of a NILM method. This could contribute to improve the performance of the proposed method to identify DER from unknown data and more specifically, for regression purposes.

| CONCLUSION
The objective of this study was to propose an online NILM method for the classification and power disaggregation of EV and PV systems using net demand readings from smart metres. This study explored sliding windows and balanced data implementation as inputs of the NILM method. Results from 900 simulations were provided using box plots to summarise, as best as possible, all carried out simulations and make simpler the comparison with other methods. Additionally, comprehensive sensitivity analysis provided an assessment of parameters' effect on the proposed NILM model. This contributed to selecting the best parameters of the evaluated machine learning technique, which were used to make a comparison with previous research studies regarding EV and PV profile identification. Lastly, a model generalisation evaluation was included to allow testing the model in external datasets.
The proposed NILM method presents an effective technique to increase the information provided from smart metres to utility companies and distribution network operators. The better access to information improves the observability in the secondary, low-voltage distribution networks by enabling access to real-time information of DER presence. Similarly, the proposed NILM method contributes to an effective transition from conventional electric power systems to dynamic smart grids, which is a feature of the green energy transition and decarbonisation of the power generation and transport sectors. The implementation of the proposed NILM type algorithms on smart metres themselves could provide utility companies with real-time power generation/consumption from DER. Overall, these types of methods could be used as input of advanced control systems to mitigate the negative impacts of uncoordinated DER growth in low-voltage distribution networks, which could increase the system's flexibility, reliability, and efficiency. This approach also provides opportunities to integrate other DER such as electric heating to support the electrification of the heating sector.
Future work could be completed using datasets with higher reporting rates (i.e. faster measurements of voltage and current to compute active, reactive power, and other electrical characteristics) allowing the implementation of the transient analysis. Furthermore, a more advanced setup for ANN could be used to improve the performance of the method, without a considerable compromise of processing times as it was observed in Section 5.2. Additionally, the Pecan Street dataset does not include solar radiation correlated to the PV generation. However, knowing the location of each household, simulations could be made to obtain this feature. This and other environmental factors such as ambient temperature could be used as input of the algorithm to improve the performance of the proposed model for PV identification. In the case of EV systems, inner variables from the vehicle itself (e.g. battery state of charge, harmonic content etc.) could be used as an additional input to increase the performance of regression models. Therefore, the improvement of the methods not only requires the development of strategies in the academic world but also the participation of industry to provide real and updated datasets to achieve sensible solutions.