Comparison of Principal Component Analysis Techniques for PMU Data Event Detection

Principal component analysis (PCA) is a dimensionality reduction technique often applied to process and detect events in large amounts of data collected by phasor measurement units (PMU) at transmission and distribution level. This article considers five different approaches to select an appropriate number of principal components, builds the statistical model of the PMU data online over a sliding window of 10 seconds and 1 minute, and evaluates the computation times and the accuracy of correct event detections with use of two statistical tests in a 1-hour data file from the UT-Austin Independent Texas Synchrophasor Network with phasor quantities collected at different PMU substations.


I. INTRODUCTION
T HE increasing digitalisation of electric power systems is generating vast quantities of data at different locations, voltage levels, and time intervals. These data include distinct electrical quantities collected by smart meters and phasor measurement units (PMU) that can be exploited to characterise energy behavioural patterns and identify anomalies of different nature. However, due to the high complexity and massive amounts of data, it is a difficult task to visualise and identify patterns, outliers, and abnormal behaviours at relevant scales.
In this scenario, dimensionality reduction techniques are appealing to reduce a dataset with optimality and minimal loss of information. Among the dimensionality reduction techniques commonly applied to electric power systems, principal component analysis (PCA) is one of the most widely used (see, e.g. [1]- [5]). It builds a data-driven model of observations in which the covariance structure is described with a reduced number of dimensions through a few linear combinations of the original variables that express major trends in the dataset. Much as the strategies to define an appropriate number of principal components are well known in literature (see, e.g. [6], [7]), there is a lack of consensus about how to adjust them to detect specific events of interest in PMU data. Therefore, a systematic evaluation of those procedures is necessary to perform dimensionality reduction and event detection with PCA effectively.
Fitting into this context, this paper provides an in-depth comparative analysis of five different approaches to select an adequate number of principal components that define the statistical model of a PMU network when it comes to the accuracy of correct event detections with usage of distinct statistical tests. To do so, it relies on phasor quantities measured at multiple PMU substations to build a PCA model of the network operating conditions in real time, which increases the situational awareness of the analysis, over a sliding window of distinct lengths of time. The analysis is tested in a 1-hour data file from Texas Synchrophasor Network and applicable to power transmission and distribution networks with multiple PMUs installed at different locations without requesting any information about the network topology and its electrical parameters.
The text is organized as follows. The theoretical background is presented in Section II: II-A explains the building of the PCA model, II-C includes five different calculation methods to select the number of principal components, II-B describes event detection in the projection subspace and residual subspace. Afterwards, a study case is shown in Section III, the results and discussion are described in Section IV, and conclusions are finally presented in Section V.

II. THEORETICAL BACKGROUND ON PCA
The PCA methodology presented in this article, adapted from [8], builds a statistical model of the PMU data collected over a time window of duration τ s and detects anomalies with use of two complementary indicators: the Hotelling's T 2 (T 2 ) and the square prediction error (SPE) statistics. The former measures the square distance of the projected data to the centre of the model, whereas the latter measures the square distance of the observation to the projection subspace.
Thereby, the method may be divided into three main steps, as presented in Sections II-A-II-B: building the PCA model with PMU data, selection of principal components to the projection subspace, and event detection in the projection subspace and residual subspace.

A. Building the PCA model with PMU data
Let X be the n×m observation matrix displayed in (1) with n observations -referred to the number of samples of phasor quantities -and m variables -referred to PMU locationssupposed to be centered (zero mean) and scaled (unit variance) Then, compute the covariance matrix of X and apply eigenvalue decomposition to obtain two m × m matrices V (whose columns are the eigenvectors and contain the principal components) and Λ (diagonal matrix whose elements express the variability in the direction of each principal component or column of V) with (2) Dimensionality reduction in the number of variables can be performed by retaining r principal components of V (r < m) with the largest eigenvalues. Then, the m × m matrix V becomes an m×r matrix P which defines a projection space of lower dimension representing the r most significant principal components. As the choice of an appropriate value of r is not straightforward, five different methods to select an appropriate value of r are presented later in Section II-C and compared in Section IV.
The usage of P instead of V to transform X into the principal components representation space results in a projection onto a space of lower dimension in which some information contained in the original data is lost. As a matter of a fact, since V is a unitary matrix, the inverse operation is carried out with the transpose, i.e. VV T = I, but P is not unitary, therefore PP T = I. The scores and the transformation of scores into the original data with P can be calculated with (3) where t andx denote the score and projection of a single observation x (top) and T andX denote the score matrix and projection matrix of the whole dataset X (bottom). The difference between X andX is the residual matrixX which resumes the information contained in the m − r components from the residual space for each observation and can be calculated with the residual matrixC. Thereby, the complete PCA model can be described as in (4)

B. Detection in the projection subspace and residual subspace
Event detection in the projection subspace is evaluated with the T 2 index, which computes a weighted distance of the projected data to the centre of the model using λ i as a weight using (5) For a single observation x whose score vector is t, the T 2 x index is given by (6) The statistical limit T 2 lim is calculated analytically with (8) where α is the confidence level and F α (r, n − r) is the critical point of the Fischer-Snedecor distribution for r and n − r degrees of freedom. Any result that surpasses T 2 lim is tagged as faulty for the T 2 statistics.
In turn, event detection in the residual subspace is evaluated with the SPE index, which evaluates the variation out of the projection space defined by the r principal components through the error componentx. The SPE of an observation x, Q x is given by (9) The statistical limit Q lim is calculated analytically with (10) where c α is the normal deviation for a confidence level α. Any result that surpasses Q lim is tagged as faulty for the SPE statistics.

C. Selection of principal components to the projection subspace
Five different methods to select an appropriate value of r are taken into consideration, as described in the following subsections.
1) Kaiser criterion: In this method, r is selected such that all principal components whose eigenvalues are below the average variance are dropped from the matrix P, in agreement with (12). In other words, this criterion consists in retaining all r principal components whose variance is larger than one, as X is a scaled matrix. This ensures that every principal component selected contains at least as many information as a single original variable in terms of variance.
2) Automatic scree plot: In this method, the eigenvalues λ i are plotted decreasingly as a function of their element number i in the matrix Λ and the chosen value of r corresponds to the eigenvalue whose distance to the origin of the coordinate system is the shortest, in agreement with (13). The idea is to search for an elbow in the plot, which always displays a downward curve, from which the eigenvalues are approximately equal.
3) Explained variance: In this method, a minimum percentage of the total variance V ar (%) is previously defined and r is taken as the smallest integer satisfying (14) 4) Variance reconstruction error: In this method, further explained in [6], the optimal value of r is determined by the minimum variance reconstruction error (VRE), in agreement with (15), considering a faulty observation x f represented by an m-dimensional unitary vector ξ i multiplied by a fault magnitude f and the correlation matrix of reconstruction error R. This procedure results in the best reconstruction of the variables, as the VRE decreases monotonically in the residual subspace and increases in the projection subspace with the number of principal components, and the selection of r can be adjusted to detect specific events of interest defined by x f .

5) Statistical detectability:
In this criterion, based on [8], r is chosen such that the smallest detectable events can be detected statistically in the projection subspace and residual subspace, according to Section II-B, considering that T 2 x ≥ T 2 lim must hold so that an event can be detected with T 2 statistics and Q x ≥ Q lim must hold so that an event can be detected with SPE statistics. For a faulty observation x f , r is selected such that (16) holds with a single PCA model or (17) holds with two PCA models built separately for T 2 and SPE statistics, which enables adjustments to detect specific events of interest defined by x f .
This criterion considers the worst-case scenario to build onesize-fits-all PCA models for the overall grid. This choice shall be able to detect the smallest theoretical values of T 2 x f and Q x f computed with each individual variable possibly involved in the event for 16 or 17 and consequently presents the highest theoretical statistical detectability without requiring different PCA models for each set of variables.

III. CASE STUDY
The procedures described in Section II-C to select an appropriate number of principal components and perform event detection are tested with PMU data from the UT-Austin Independent Texas Synchrophasor Network available in [9] (see website). A map of the locations of the PMUs, installed at distinct transmission and distribution voltage levels within the Electric Reliability Council of Texas (ERCOT), is shown in Fig. 1.
Only low-frequency oscillations below 15 Hz can be found in the dataset, as the phasor quantities are gathered at f s = 30 Hz. Thus, this article is concerned about low-frequency voltage transients which last no more than a few hundred milliseconds, as highlighted in Fig. 2; the detection of faster dynamic events is out of the scope of this article, as it requires a higher sampling frequency of PMUs. The events found in the dataset, whose types and inner causes are unknown, are listed in Table I. Overall, they are expected to be isolated events at a single location (which ensures detection of the smallest events in the worst-case scenario) and occur once or less every 10 seconds. Their magnitudes are greater than 1 % of the nominal voltage, varying from a few Volts at low voltage level (substation at Fort Davis) to a few thousand Volts at high voltage level (substations at Edinburg and Waco). Duration of disturbances is also heterogeneous, lasting from a few milliseconds (impulses) to a few hundred milliseconds (transients).
Relying on voltage magnitudes, the PCA model is built online over a sliding window. This approach captures the dynamic time-varying nature of power systems and adapts the PCA model to the most recent operating conditions. The five different methods to select the number of principal components described in Section II-C are considered to build the PCA model and further tested and compared in terms of performance of T 2 and SPE statistics over a 10−second and 1−minute sliding window, supposedly associated with   Transient  9  Edinburg  Transient  10  Edinburg  Impulse, multiple  11  Edinburg  Impulse, multiple  12  Edinburg  Impulse, multiple  13  Edinburg  Transient  14  Edinburg  Transient  15 Fort Davis  Impulse, single  16 Edinburg Impulse, multiple different types of events and intrinsic characteristics. As a result, 20 different scenarios were produced per detected event (i.e. 2 statistical tests times 5 selection criteria of r times 2 window lengths).

IV. RESULTS AND DISCUSSION
This section presents the results of event detection for the scenarios described in Section III, comparing the accuracy of the methods described in Section II-C to select the number of principal components r for the events shown in Table I. Tables II and III display the event detection results and the corresponding r obtained for each selection method and time window, considering a 10−second and 1−minute sliding window to build the PCA models, respectively. A confidence level α = 0.95 is chosen to calculate T 2 and SPE for all selection methods because it does not result in missed detections in this dataset with the statistical detectability criterion over a 10−second sliding window. The locations and event magnitudes listed in Table I are considered to compute the statistical detectability and the VRE. In addition, the explained variance criterion is computed with an explained variance of 75 % because of the average residual subspace defined by the other criteria.
On the whole, considering the correct detections with both T 2 and SPE statistics, the best method to select the number of principal components is the statistical detectability criterion, which detects more events than the other criteria for all lengths of time evaluated. Nevertheless, it is noteworthy that all events of interest are detected with at least one of the T 2 or SPE statistics when a 10−second window is used to build the PCA model, and that the event number 8 is always missed with a 1−minute window, regardless of the method chosen to define an appropriate number of principal components. Additionally, it can be noticed that the combined use of T 2 and SPE statistics leads to a higher number of correct event detections, reduces the number of missed detections, and consequently increases the detection capability of the PCA model.
In general, the SPE is expected to detect variations associated with changes in the correlation structure and presents small values, whereas the T 2 is expected to detect deviations from the average normal operating conditions and presents larger values. As a consequence, the SPE is more sensitive than the T 2 and tends to be a better indicator of abnormalities, since changes in the correlation structure of faulty observations are expected to be observed in the residual matrix. In particular, the SPE is expected to present the best performance when the VRE criterion is applied to select r, as it is aimed at minimizing the variance reconstruction error associated with the residual subspace to ensure detection of specific events of interest.
In most cases, the highest number of correct event detections occurs when a 10-second window is applied to build the PCA model with both T 2 and SPE statistics. It happens because the PCA models built over this length of time are more sensitive to the dynamics of the system and consequently are more suitable to detect the events of interest, which last no more than a few hundred milliseconds. Therefore, it is expected that a higher number of missed detections will occur with longer window sizes, which are associated with different phenomena and/or a static representation of the operating conditions of the grid. This explains why the results obtained with a 1−minute window are slightly worse than those obtained with a 10−second window in most cases.
Furthermore, a comparative analysis of the five approaches presented in Section II-C to select an adequate value of r over time evinces that the Kaiser criterion, the automatic scree plot, and the explained variance are quantitative and arbitrary, whereas the statistical detectability and the VRE are qualitative. As a matter of a fact, the former criteria rely on the variance of the data and do not allow for adjustments that ensure the detection of specific events of interest, whereas the latter criteria can be adjusted to detect events defined by specific magnitudes f and direction vectors ξ. This explains why the results obtained with the statistical detectability criterion with both T 2 and SPE statistics are the most accurate, which implies that the detectability in the projection subspace and the residual subspace is the best indicator to select an adequate number of principal components and detect distinct events of interest.

V. CONCLUSION
This paper presents a PCA-based strategy for PMU data event detection with usage of T 2 and SPE statistics, considering five different methods to select an appropriate number of principal components which are compared in terms of correct event detections. The results indicate that the PCA methodology is able to identify different types of events, regardless of the approach used to select the number of principal components, and that the results calculated with T 2 and SPE statistics are complementary in some cases, which enhances the detection capability of the PCA model. Nonetheless, the results obtained when the statistical detectability criterion is used to define the number of principal components are the most accurate with both T 2 and SPE statistics, which implies that the detectability in the projection subspace and the residual subspace is the best indicator to select an adequate number of principal components. Moreover, the window size applied to build the PCA model also contributes to the task and shall be adjusted according to the events of interest.