------------------------------------------------------------------------- | [vc, vcJ, vcK, h] = variance_captured(Z, P, J, K, vars, rep) | ------------------------------------------------------------------------- ----------------- | FUNCTIONALITY | ------------------------------------------------------------------------- | This function aim is to compute the variance captured (vc) in a PCA | | model depending on the number of principal components retained. | | Additionally, the method also computes the variance captured for each | | variable (vcJ) and time sample (vcK) according to the number of | | principal components retained if the input data matrix is a | | batch-wise unfolded matrix (the number of columns of the pre-treated | | data matrix - Z - is equal to J x K). Otherwise these values are set | | to the empty set. Finally, and if the user indicated this (rep = | | true), the method represents each of the computed variances captured | | by the PCA model. | ------------------------------------------------------------------------- -------------------- | INPUT PARAMETERS | ------------------------------------------------------------------------- | Z: Double matrix with the pre-processed data. Rows are related | | to observations, while columns are the measured variables. In | | case of batch-wise unfolded matrices, columns can be | | decomposed as variables measured along time, and thus, they | | are equal to J x K. Additionally, rows for a batch-wise | | unfolded matrix are the different batches monitored. | | P: Double matrix storing the loading matrix obtained from the | | eigenvalue decomposition of the covariance/correlation matrix | | of the data matrix. Rows are the measured variables, while | | columns are the principal components retained to build the | | PCA model. | | J: Integer value that indicates the number of variable mesured | | for each observation. Must be a value greater than 1. | | K: Integer value that indicates the number of samples measured | | for each variable. Must be a value greater than 0. When K > 1 | | we are using unfolded data matrices. This means that the | | number of columns in Z must be equal to J x K. | | vars: Cell array that contains the labels of the measured | | variables. It must have J elements. | | rep: Boolean that indicates if the variance captured computed (vc, | | (and vcJ and vcK when applicable) has to be represented (rep | | = true) or not (rep = false). | ------------------------------------------------------------------------- --------------------- | OUTPUT PARAMETERS | ------------------------------------------------------------------------- | vc: Double matrix indicating the percentage of variance captured | | for each variable measured (columns in Z). It has as many | | rows as principal components were retained when building the | | PCA model (size(P, 2) and as many columns as variables | | measured (size(Z, 2)). Values returned in this matrix are in | | the following range: [0, 100]%. | | vcJ: Double matrix containing the percentage of variance explained | | for each measured variable. These values are only computed | | when the matrix comes from a batch-wise unfolded matrix | | (size(Z, 2) == J x K), and otherwise the returned value is | | the empty set. Values in this matrix are in range [0, 100]%. | | vcK: Double matrix containing the percentage of variance explained | | for each sample measured. These values are computed only when | | the data matrix (Z) comes from a batch-wise unfolded matrix | | (size(Z, 2) == J x K). Otherwise, the value returned is the | | empty set. Values in this matrix are in range [0, 100]%. | | h: Array of double values (in case of a batch-wise unfolded | | matrix) or double value (otherwise) indicating the handles to | | the representations of the variance captured for vc, vcJ and | | vcK (when applicable). | -------------------------------------------------------------------------
0001 % ------------------------------------------------------------------------- 0002 % | [vc, vcJ, vcK, h] = variance_captured(Z, P, J, K, vars, rep) | 0003 % ------------------------------------------------------------------------- 0004 % ----------------- 0005 % | FUNCTIONALITY | 0006 % ------------------------------------------------------------------------- 0007 % | This function aim is to compute the variance captured (vc) in a PCA | 0008 % | model depending on the number of principal components retained. | 0009 % | Additionally, the method also computes the variance captured for each | 0010 % | variable (vcJ) and time sample (vcK) according to the number of | 0011 % | principal components retained if the input data matrix is a | 0012 % | batch-wise unfolded matrix (the number of columns of the pre-treated | 0013 % | data matrix - Z - is equal to J x K). Otherwise these values are set | 0014 % | to the empty set. Finally, and if the user indicated this (rep = | 0015 % | true), the method represents each of the computed variances captured | 0016 % | by the PCA model. | 0017 % ------------------------------------------------------------------------- 0018 % -------------------- 0019 % | INPUT PARAMETERS | 0020 % ------------------------------------------------------------------------- 0021 % | Z: Double matrix with the pre-processed data. Rows are related | 0022 % | to observations, while columns are the measured variables. In | 0023 % | case of batch-wise unfolded matrices, columns can be | 0024 % | decomposed as variables measured along time, and thus, they | 0025 % | are equal to J x K. Additionally, rows for a batch-wise | 0026 % | unfolded matrix are the different batches monitored. | 0027 % | P: Double matrix storing the loading matrix obtained from the | 0028 % | eigenvalue decomposition of the covariance/correlation matrix | 0029 % | of the data matrix. Rows are the measured variables, while | 0030 % | columns are the principal components retained to build the | 0031 % | PCA model. | 0032 % | J: Integer value that indicates the number of variable mesured | 0033 % | for each observation. Must be a value greater than 1. | 0034 % | K: Integer value that indicates the number of samples measured | 0035 % | for each variable. Must be a value greater than 0. When K > 1 | 0036 % | we are using unfolded data matrices. This means that the | 0037 % | number of columns in Z must be equal to J x K. | 0038 % | vars: Cell array that contains the labels of the measured | 0039 % | variables. It must have J elements. | 0040 % | rep: Boolean that indicates if the variance captured computed (vc, | 0041 % | (and vcJ and vcK when applicable) has to be represented (rep | 0042 % | = true) or not (rep = false). | 0043 % ------------------------------------------------------------------------- 0044 % --------------------- 0045 % | OUTPUT PARAMETERS | 0046 % ------------------------------------------------------------------------- 0047 % | vc: Double matrix indicating the percentage of variance captured | 0048 % | for each variable measured (columns in Z). It has as many | 0049 % | rows as principal components were retained when building the | 0050 % | PCA model (size(P, 2) and as many columns as variables | 0051 % | measured (size(Z, 2)). Values returned in this matrix are in | 0052 % | the following range: [0, 100]%. | 0053 % | vcJ: Double matrix containing the percentage of variance explained | 0054 % | for each measured variable. These values are only computed | 0055 % | when the matrix comes from a batch-wise unfolded matrix | 0056 % | (size(Z, 2) == J x K), and otherwise the returned value is | 0057 % | the empty set. Values in this matrix are in range [0, 100]%. | 0058 % | vcK: Double matrix containing the percentage of variance explained | 0059 % | for each sample measured. These values are computed only when | 0060 % | the data matrix (Z) comes from a batch-wise unfolded matrix | 0061 % | (size(Z, 2) == J x K). Otherwise, the value returned is the | 0062 % | empty set. Values in this matrix are in range [0, 100]%. | 0063 % | h: Array of double values (in case of a batch-wise unfolded | 0064 % | matrix) or double value (otherwise) indicating the handles to | 0065 % | the representations of the variance captured for vc, vcJ and | 0066 % | vcK (when applicable). | 0067 % ------------------------------------------------------------------------- 0068 0069 function [vc, vcJ, vcK, h] = variance_captured(Z, P, J, K, vars, rep) 0070 0071 %% PARAMETER CHECKING 0072 0073 % First of all, we check if the user passed all input parameters. 0074 if nargin < 6 0075 % No, we are missing some parameters, so we inform of the problem. 0076 error('variance_captured:paramCheck', ... 0077 'ERROR: Method needs 5 input parameters!'); 0078 end 0079 0080 % Next, we check if the input parameters are correct. 0081 0082 % - Z - 0083 % Is it a numeric matrix? 0084 if ~isnumeric(Z) 0085 % - No, so we inform of the problem. 0086 error('variance_captured:wrongType', ... 0087 'ERROR: Z must be a numeric type!'); 0088 else 0089 % - We get the dimensions of Z. 0090 dims = size(Z); 0091 % - Is it a two-dimensional matrix? 0092 if length(dims) ~= 2 0093 % -- No, so we inform the user of the problem. 0094 error('variance_captured:wrongType', ... 0095 'ERROR: Z must be a two-dimensional matrix!'); 0096 else 0097 % Is it a matrix? 0098 if dims(1) <= 1 || dims(2) <= 1 0099 % -- No, so we inform the user of the problem. 0100 error('variance_captured:wrongType', ... 0101 'ERROR: Z must be a two-dimensional matrix'); 0102 end 0103 end 0104 end 0105 0106 % - P - 0107 % Is it a numeric matrix? 0108 if ~isnumeric(P) 0109 % - No, so we inform of the problem. 0110 error('variance_captured:wrongType', ... 0111 'ERROR: P must be a numeric type!'); 0112 else 0113 % - We get the dimensions of Z. 0114 dims = size(P); 0115 % - Is it a two-dimensional matrix? 0116 if length(dims) ~= 2 0117 % -- No, so we inform the user of the problem. 0118 error('variance_captured:wrongType', ... 0119 'ERROR: P must be a two-dimensional matrix!'); 0120 else 0121 % Is it a matrix? 0122 if dims(1) <= 1 || dims(2) <= 1 0123 % -- No, so we inform the user of the problem. 0124 error('variance_captured:wrongType', ... 0125 'ERROR: Z must be a two-dimensional matrix'); 0126 elseif dims(1) ~= size(Z, 2) 0127 % -- It does not have the same number of measured variables, so 0128 % we inform the user of this problem. 0129 error('variance_captured:wrongDimensions', ... 0130 'ERROR: Rows in P must be equal to columns in Z!'); 0131 end 0132 end 0133 end 0134 0135 % - J - 0136 % Is it a numeric value? 0137 if ~isnumeric(J) 0138 error('variance_captured:wrongType', ... 0139 'ERROR: J must be a numeric value!'); 0140 end 0141 0142 % Is it a value? 0143 if size(J, 1) ~= 1 || size(J, 2) ~= 1 0144 % - No, it isn't. We inform the user of the problem. 0145 error('variance_captured:wrongType', ... 0146 'ERROR: J must be a [1 x 1] value!'); 0147 end 0148 0149 % Is within the range? 0150 if J <= 1 0151 % - No, it isn't. We inform the user of the problem. 0152 error('variance_captured:outOfRange' , ... 0153 'ERROR: J must be a value greater than 1!'); 0154 end 0155 0156 % - K - 0157 % Is it a numeric value? 0158 if ~isnumeric(K) 0159 error('variance_captured:wrongType', ... 0160 'ERROR: K must be a numeric value!'); 0161 end 0162 0163 % Is it a value? 0164 if size(K, 1) ~= 1 || size(K, 2) ~= 1 0165 % - No, it isn't. We inform the user of the problem. 0166 error('variance_captured:wrongType', ... 0167 'ERROR: K must be a [1 x 1] value!'); 0168 end 0169 0170 % Is within the range? 0171 if K <= 0 0172 % - No, it isn't. We inform the user of the problem. 0173 error('variance_captured:outOfRange' , ... 0174 'ERROR: K must be a value greater than 0!'); 0175 elseif K == 1 0176 % - Additionally, if K is equal to 1, we are working with a 0177 % non batch-wise unfolded data matrix, so we have that size(Z, 2) is 0178 % equal to J. 0179 if J ~= size(Z, 2) 0180 % -- It's not the case, so we inform of the problem. 0181 error('variance_captured:valueNotValid', ... 0182 ['ERROR: When K == 1, J must be equal to the number of ', ... 0183 'columns in Z!']); 0184 end 0185 % - We indicate that we are not dealing with batch-wise unfolded data 0186 % matrices. 0187 bw = false; 0188 % - And thus, we do not have to compute the other variance captured 0189 % values (vcJ and vcK). 0190 vcJ = []; vcK = []; 0191 else 0192 % - We are working with batch-wise unfolded matrices, so we check if J 0193 % x K is equal to the number of columns in Z. 0194 if J * K ~= size(Z, 2) 0195 % -- It's not the case, so we inform of the problem. 0196 error('variance_captured:valueNotValid', ... 0197 ['ERROR: When K > 1, J x K must be equal to the number ', ... 0198 'of columns in Z!']); 0199 end 0200 % - Finally, we indicate that we are working with batch-wise unfolded 0201 % data matrices. 0202 bw = true; 0203 end 0204 0205 % - vars - 0206 % Is it a cell array of strings? 0207 if ~iscellstr(vars) 0208 % - No, it isn't, so we inform of the problem. 0209 error('variance_captured:wrongType', ... 0210 'ERROR: vars must be a cell array of strings!'); 0211 elseif length(vars) ~= J 0212 % - vars does not have the same number of measured variables, so we 0213 % inform of the situation. 0214 error('variance_captured:wrongDimensions', ... 0215 'ERROR: vars must have J elements!'); 0216 end 0217 0218 % - rep - 0219 % Is it a boolean variable? 0220 if ~islogical(rep) 0221 % - No, it isn't. Therefore, we indicate the problem. 0222 error('variance_captured:wrongType', ... 0223 'ERROR: rep must be a boolean variable!'); 0224 end 0225 0226 %% COMPUTATION OF THE VARIANCE CAPTURE FOR EACH COLUMN IN Z. 0227 0228 % First of all, we compute the total variation of the original data. 0229 tV = sum(Z .^ 2); 0230 0231 % Now, we define the variable to store the normalised loading values. 0232 nP = zeros(size(P)); 0233 0234 % And we initialise the variable to store the variance capture for each 0235 % variable and principal component. 0236 vc = zeros(size(P, 2), size(Z, 2)); 0237 0238 % Now, for each principal component retained... 0239 for i = 1:size(P, 2) 0240 % - We compute the normalised loading vector... 0241 nP(:, i) = P(:, i) / norm(P(:, i)); 0242 % - Now, we compute the estimated X matrix for the i-th loading vector. 0243 Zhat = Z * nP(:, i) * nP(:, i)'; 0244 % - And finally, we compute the variance captured for each variable 0245 % using the i-th principal component. 0246 vc(i, :) = 100 * sum(Zhat .^ 2) ./ tV; 0247 end 0248 0249 % Do we have to represent the variance captured? 0250 if rep 0251 % - Yes, we have. But, how many handle figures we need? If we are not 0252 % dealing with batch-wise unfolded matrices (K == 1), we only need 1 0253 % handle. Ohterwise, we need 3 of them (one for vc, another to 0254 % represent vcJ and the last one to represent the vcK). 0255 if bw 0256 % - We need three handles, so we initialise the variable to store 0257 % the handles. 0258 h = zeros(1, 3); 0259 % - And the first one is the handle associated with the variance 0260 % captured for each column in Z. 0261 h(1) = figure; 0262 else 0263 % - We only need one handle, so we initialise its value now. 0264 h = figure; 0265 end 0266 % Now, we represent the stacked variance captured for each principal 0267 % component. 0268 bar(vc', 'Stacked'); 0269 % Now, we redimensionate axis. 0270 axis([0, size(Z, 2) + 1, 0, max(sum(vc)) * 1.1]); 0271 % If we have a batch-wise unfolded matrix... 0272 if bw 0273 % For each variable (minus 1)... 0274 for i = 1 : J - 1 0275 % We add a line to divide variable i from variable i + 1. 0276 line([i * K, i * K], [0, max(sum(vc)) * 1.1], ... 0277 'LineStyle', '-', 'Color', 'k'); 0278 end 0279 % And now, we label the figure axes. 0280 xlabel('Variables along time'); 0281 title('Variance captured for each variable and sample'); 0282 else 0283 % Since we do not have variables along time, we label differently 0284 % the axes. 0285 xlabel('Measured variable'); 0286 title('Variance captured for each measured variable'); 0287 end 0288 % And finally, we indicate the variable described within each division. 0289 set(gca, 'XTick', K / 2 : K : size(Z, 2) * K, ... 0290 'XTickLabel', vars); 0291 % We label the figure axes (the ones remainig). 0292 ylabel('Variance captured'); 0293 end 0294 0295 %% VARIANCE CAPTURE FOR EACH VARIABLE AND EACH TIME SAMPLE. 0296 0297 % In this stage, we compute the variance capture for each variable 0298 % (accumulating all time samples for that variable for each principal 0299 % component retained) and each time sample (we accumulate the variance 0300 % captured for each variable at a given time sample for each principal 0301 % component retained), but only if we are dealing with batch-wise unfolded 0302 % data matrices. 0303 if bw 0304 % First, we initialise the variable to store the variance captured for 0305 % each variable for each principal component retained. 0306 vcJ = zeros(size(P, 2), J); 0307 0308 % Now, for each measured variable... 0309 for i = 1:J 0310 % - We compute the total variance for that variable for all 0311 % retained principal components. 0312 vJi = sum(sum(vc(:, (i - 1) * K + 1 : i * K))); 0313 % - Now, for each principal component retained... 0314 for j = 1:size(P, 2) 0315 % -- We compute the variance captured for all samples of the 0316 % i-th variable based on the j-th principal component. 0317 vcJ(j, i) = sum(vc(j, (i - 1) * K + 1 : i * K)) * 100 / vJi; 0318 end 0319 end 0320 0321 % Now, that we have the information, do we have to represent it? 0322 if rep 0323 % - Yes, so we create the figure to represent the information. 0324 h(2) = figure; 0325 % - Now, we represent the stacked variance captured. 0326 bar(vcJ', 'Stacked'); 0327 % - Next, we redimensionate the figure axes. 0328 axis([0, J + 1, 0, max(sum(vcJ)) * 1.1]); 0329 % - And finally, we indicate to which variable we are explaining 0330 % the variance captured. 0331 set(gca, 'XTick', 1 : J, 'XTickLabel', vars); 0332 % - And we label the figure axes. 0333 xlabel('Variable measured'); ylabel('Variance captured'); 0334 title('Variance captured for each variable (samples accumulated)'); 0335 end 0336 0337 % The last step is to compute the variance captured for each time 0338 % sample (we accumulate the variance captured for all variables at that 0339 % time sample). First, we create the variable to store the values. 0340 vcK = zeros(size(P, 2), K); 0341 0342 % Now, for each sample... 0343 for i = 1 : K 0344 % - We compute the total variance captured for the i-th time sample 0345 % for all principal components and variables. 0346 vKi = sum(sum(vc(:, i : K : size(Z, 2)))); 0347 % - And now, for each principal component retained... 0348 for j = 1 : size(P, 2) 0349 % -- We compute the variance captured for all variables for the 0350 % i-th sample for the j-th principal component retained. 0351 vcK(j, i) = sum(vc(j, i : K : size(Z, 2))) * 100 / vKi; 0352 end 0353 end 0354 0355 % Now, that we have the information, do we have to represent it? 0356 if rep 0357 % - Yes, so we update the handle's array with the new figure. 0358 h(3) = figure; 0359 % - Next, we represent the bar plot. 0360 bar(vcK', 'Stacked'); 0361 % - Now, we redimensionate the figure axis. 0362 axis([0, K + 1, 0, max(sum(vcK)) * 1.1]); 0363 % - And we label the figure axes. 0364 xlabel('Time sample'); ylabel('Variance captured'); 0365 title('Variance captured for each sample (variables accumlated)'); 0366 end 0367 0368 end