Fault Location in Low Voltage Smart Grids Based on Similarity Criteria in the Principal Component Subspace

This paper presents a new strategy based on multi-variate statistical analysis for fault location and classification in power distribution networks with distributed energy resources, variable loads, and switches enabling grid reconfiguration. The statistical method relies on impedance measurements acquired at the substation buses to build a data-driven model of the network operating conditions with dimensionality reduction, and considers a few reference scenarios representing standard operating conditions and short-circuit operation to perform fault location and classification with use of similarity criteria in the principal component subspace. Moreover, this paper includes a case study with a real-based low voltage power distribution network to test and validate the methodology.


I. INTRODUCTION
The increasing application of digital technology is gradually transforming old-fashioned power distribution networks into modern smart grids with enhanced supervision, protection, and control features. Nevertheless, fault location and classification tasks still face challenges related to limited measurements along the feeders, usually available only at the distribution substation, and the inaccurate representation of network components, such as loads, distributed generators, lines, and the status of switches [1]. As a result, many faults are identified correctly only after trouble calls from affected customers, which may lead to unacceptable interruption times and have negative impacts for the system operation in general. However, in the event of a fault, information about its type and location should be available as soon as possible to start grid reconfiguration and restore normal energy supply [2]. Therefore, an automated strategy capable of overcoming these stumbling blocks is necessary for fast, accurate, adequate fault location and classification.
In this scenario, there are great opportunities for artificial intelligence (AI) techniques, as the increased processing power and reduced costs of computers enable the application of cutting-edge mathematical and information processing strategies in the search of faults. A few AI-based approaches for fault location at distribution level have been proposed recently; for instance, [3] combines principal component analysis, support vector classifiers, and feed-forward neural networks to perform fault location and classification in radial distribution networks, using measurements available at the substation together with information about circuit breaker and relay statuses; [4] presents a data-driven mixed-integer linear programming algorithm for fault location relying on smart meters at low voltage (LV) level and remote fault indicators at medium voltage (MV) level; [5] presents a feature selection method based on the information gain and minimum description length discretization algorithm together with a complementary expert information system to detect high-impedance faults; [6] uses continuous wavelet transform to generate gray-scale images of transient zero-sequence current signals together with a convolutional neural network for feature extraction and fault detection in resonant grounding distribution systems; [7] applies the Stockwell transform to three-phase current signals and extracts features used as inputs in different machine learning tools with the goal of locating different types of faults in power distribution grids; [8] reduces the multiple estimation problem in fault location by combining support vector machines and the k-nearest neighbors with features extracted from fundamental voltage and current signals; and [9] applies the Fischer-Rao registration method to preserve the shape of data from faults at different locations and operating conditions and hierarchical cluster analysis for fault classification.
Fitting into this context, this paper presents a new strategy for fault location and classification in power distribution networks with distributed energy resources and variable loads installed along the feeders that is general enough to consider xxx-x-xxxx-xxxx-x/20/$xx.00 © 2020 European Union changes in the grid configuration (e.g. switch status). Thus, it assumes that phasor measurement units (PMU) or similar measurement devices are deployed at the secondary of substation transformers. The methodology relies on impedance measurements gathered at the substation buses to build a statistical model of the network operating conditions with dimensionality reduction and considers a few reference scenarios representing standard operation and short-circuits to perform fault location and classification based on similarity criteria in the principal component subspace. Moreover, it is capable of distinguishing faults from variations in the standard operating conditions and identifying the grid configuration correctly. Furthermore, testing is conducted in a real-based LV power distribution network under different fault conditions. This text is structured as follows: Section II presents the methodology, Section III describes the application example used to test and validate the method, Section IV includes simulation results and discussions, and Section V presents the conclusions.

II. METHODOLOGY
This section introduces the multivariate statistical analysis used to build a data-driven model of the network operating conditions with dimensionality reduction and provides the theoretical background necessary to go through the rest of this paper. For an in-depth explanation, see [10].
First, consider a set K of reference scenarios in which distinct operating conditions of the power distribution network under consideration are represented (i.e. including standard operation and short-circuits). Then, let X (k) be the n × m observation matrix (1) of the k th scenario, centered (zero mean) and scaled (unit variance), with n observations referred to the number of phasor quantities sampled over time and m variables referred to measurements of phasor quantities at every substation.
The covariance matrix S (k) can be computed from X (k) and further decomposed in the m × m matrices V (k) and Λ (k) using eigenvalue decomposition according to (2). Columns in V (k) are the eigenvectors and contain the principal components, which represent orthonormal vectors whose directions express the major variability of the data and the relative weights of the original variables. In turn, Λ (k) is a diagonal matrix and contains the eigenvalues, which express variability in the direction of each principal component or column of V (k) . The matrices V (k) and Λ (k) can be written as (3) and (5), respectively.
Once V (k) and Λ (k) are computed, dimensionality reduction is achieved by retaining r < m principal components or columns of V (k) which present the largest eigenvalues λ 1(k) , · · · , λ r(k) . As a result, V (k) is reduced to an m × r matrix P (k) given by (6) which represents the major trends of the data set with some loss of information.
Next, consider a generic testing scenario denoted by k', possibly containing a fault or some deviation from the standard operating conditions, and let V (k ′ ) and Λ (k ′ ) be its eigenvector matrix and eigenvalue matrix, respectively. For each k' under consideration, the choice of an appropriate value of r to reduce V (k ′ ) into P (k ′ ) is based on the similarity criteria, calculated as a weighted cosine sum ϕ k,k ′ of the dot product of v j(k) and v j(k ′ ) weighted by the normalized variancē λ j,j(k ′ ) , j = 1, · · · , r, as in (7). Only the the reference scenarios k ∈ K 0 with standard operation or faults at the secondary substation buses are considered to select r principal components. This procedure also allows to identify the correct network configuration and operating condition of the testing scenario, since ϕ k,k ′ should be close to 1.0 if v j(k) and v j(k ′ ) , j = 1, · · · , r, are similar.
Once r is defined, if a fault occurred, further investigation is conducted to identify its possible locations. In this case, fault location and classification are performed by comparing the statistical model given by P (k ′ ) and Λ (k ′ ) with those computed for the k ∈ K − K 0 reference scenarios not evaluated previously. The results calculated with (7) are ranked in descending order. Finally, the fault buses of the training scenarios with the highest values of (7) are identified as the most probable locations of the fault in the testing scenario.

III. CASE STUDY
The methodology was tested in a real-based LV power distribution network simulated in Matrix Laboratory (MATLAB) illustrated in Fig. 1. It represents a LV distribution network located in Catalonia, Spain, which consist of primary distribution feeders with branches connecting the substation node to the customers (i.e. local energy producers or consumers). In total, the network has 43 buses, 2 distribution substations (one with a 250-kVA and the other with a 630-kVA transformer Dyn11, 400 V secondary), 41 feeders modeled as short R-L lines (with R X = 5.4 for overhead lines and R X = 2.7 for underground cables), 1 switch, 20 different energy consumers (among them, 1 industrial, three-phase customer with 70 kW of contracted power and and 19 residential, single-phase customers with less than 10 kW of contracted power), and distributed generation from 4 solar photovoltaic (PV) modules (10 kWp each). In total, the length of the primary distribution feeder connecting SS-1 − 0 to SS-2 − 0 is 325 m and the length of the longest lateral branch is 95 m.
PMUs are installed at both substation nodes and sample phase voltage and line current phasor quantities from which the equivalent impedance is calculated. In other words, the number of variables is m = 2 × 3 = 6 in all scenarios. It is noteworthy that the statistical models are built with impedance magnitudes only to suit the algorithm in use, as it is linear. This choice is not expected to make a negative impact on the accuracy of fault location, since the network behavior and loads are mainly resistive and the faults are purely resistive.
In the reference scenarios, standard operation and faults are simulated with typical hourly values of PV generation and load consumption profiles over a year, which provides n = 365 observations per reference scenario. Although this time intervals are chosen due to the real PV generation and load consumption profiles available for the simulations, it is noteworthy that a shorter or longer time interval can be used to build the statistical models without loss of generality. In addition, the reference fault scenarios include three-phase symmetrical faults with fault resistance R F = 1mΩ applied at the substation and load buses at midday. Both switch statuses on and off are considered in all training and testing scenarios.
In turn, testing scenarios consist of variations in the PV generation under normal operation, reduced by 25%, 50%, 75%, and 100% of the standard operation profiles, and threephase symmetrical faults with fault resistance R F = 1Ω simulated at the load buses at midday, considering the same

IV. RESULTS AND DISCUSSION
The deviations computed under normal operation with variations in the PV generation profiles are displayed in Table I, considering standard operation and faults at the substation buses. Meanwhile, the fault location results obtained with the switch off and on are displayed in Table II and Table III, respectively, which describe the fault scenario k' (i.e. faulty bus) in the first column and the results obtained with the similarity criteria in the principal component subspace, including the correct identification of the grid setting regarding the switch mode and part of the network where the fault is in the second column; the faulty bus k ∈ K determined by considering the sum of dot products calculated with (7) for all candidate buses, given information about the right network setting, in the third column; the distance error between the actual fault bus k' and the calculated bus k in the fourth column; and the r principal components used to compute (7) for all candidate scenarios in the fifth column.    The results displayed in Tables I to III indicate that the multivariate statistical case-based reasoning strategy is capable of distinguishing between faults, standard operation, and variations in the standard operating conditions correctly. Additionally, in the event of a fault, the methodology identifies the part of the network where it occurred correctly in all cases and the true location of the fault with good accuracy in most cases. The fault location is identified correctly in 2 out of 16 scenarios when the switch is off and in 4 out of 16 scenarios when the switch is on, whereas the maximum distance error is 194.2 m when the switch is off (faulty bus 2− 15 identified as 2 − 17) and 174.7 m when the switch is on (faulty bus 1 − 15 identified as 1 − 9). Despite the correct identification of the network setting, these errors are approximately the double of the length of the longest lateral branch and respectively stand for 59.8% and 53.8% of the total length from one substation to the other. Nevertheless, the actual faulty bus is among the first ranked results in almost all scenarios with both switch modes on and off when only the right network setting is considered, which shows the importance of identifying the grid setting correctly before performing fault location with this methodology. Consequently, the maximum errors of this fault location procedure remain in the same part of the network delimited by the switch where the point of fault is. Moreover, the average errors of all scenarios listed in Tables II and III stand for 27.0% and 23.0% of the total length from one substation to the other, which is less than the length of the longest lateral branch. The overall results are acceptable, as the fault location problem is a typical multiple-solution problem. Furthermore, Fig. 2 evinces that the equivalent impedance seen at the substations is almost the same for faults at different buses in the same part of the network and a fixed grid configuration.
It is noteworthy that the ranking of results according to (7) may be inaccurate over a range of scenarios due to the differences between the training and testing scenarios in use, such as variations in the standard operating conditions, differ-ent faults, inaccurate network representations, etc. Therefore, the method can be improved by including more reference scenarios in the training data sets, including different faults, timescales, operating conditions, and additional information about the network topology, data from different sources, etc.

V. CONCLUSIONS
The multivariate statistical case-based reasoning strategy presented in this article is capable of locating and classifying faults with good accuracy. Moreover, it is also capable of identifying the network configuration correctly and distinguishing faults from variations in the standard operating conditions. The procedure used to identify the correct grid configuration prior to the location of the fault improves the accuracy of the method, as it reduces the number of candidate scenarios and limits the search to the right part of the network in all testing scenarios. Nonetheless, the method may provide inaccurate results over a range of scenarios, since the fault location problem presents multiple solutions.