Home > Codi > variance_captured.m

variance_captured

PURPOSE ^

-------------------------------------------------------------------------

SYNOPSIS ^

function [vc, vcJ, vcK, h] = variance_captured(Z, P, J, K, vars, rep)

DESCRIPTION ^

 -------------------------------------------------------------------------
 |      [vc, vcJ, vcK, h] = variance_captured(Z, P, J, K, vars, rep)     |
 -------------------------------------------------------------------------
 -----------------
 | FUNCTIONALITY |
 -------------------------------------------------------------------------
 | This function aim is to compute the variance captured (vc) in a PCA   |
 | model depending on the number of principal components retained.       |
 | Additionally, the method also computes the variance captured for each |
 | variable (vcJ) and time sample (vcK) according to the number of       |
 | principal components retained if the input data matrix is a           |
 | batch-wise unfolded matrix (the number of columns of the pre-treated  |
 | data matrix - Z - is equal to J x K). Otherwise these values are set  |
 | to the empty set. Finally, and if the user indicated this (rep =      |
 | true), the method represents each of the computed variances captured  |
 | by the PCA model.                                                     |
 -------------------------------------------------------------------------
 --------------------
 | INPUT PARAMETERS |
 -------------------------------------------------------------------------
 | Z:      Double matrix with the pre-processed data. Rows are related   |
 |         to observations, while columns are the measured variables. In |
 |         case of batch-wise unfolded matrices, columns can be          |
 |         decomposed as variables measured along time, and thus, they   |
 |         are equal to J x K. Additionally, rows for a batch-wise       |
 |         unfolded matrix are the different batches monitored.          |
 | P:      Double matrix storing the loading matrix obtained from the    |
 |         eigenvalue decomposition of the covariance/correlation matrix |
 |         of the data matrix. Rows are the measured variables, while    |
 |         columns are the principal components retained to build the    |
 |         PCA model.                                                    |
 | J:      Integer value that indicates the number of variable mesured   |
 |         for each observation. Must be a value greater than 1.         |
 | K:      Integer value that indicates the number of samples measured   |
 |         for each variable. Must be a value greater than 0. When K > 1 |
 |         we are using unfolded data matrices. This means that the      |
 |         number of columns in Z must be equal to J x K.                |
 | vars:   Cell array that contains the labels of the measured           |
 |         variables. It must have J elements.                           |
 | rep:    Boolean that indicates if the variance captured computed (vc, |
 |         (and vcJ and vcK when applicable) has to be represented (rep  |
 |         = true) or not (rep = false).                                 |
 -------------------------------------------------------------------------
 ---------------------
 | OUTPUT PARAMETERS |
 -------------------------------------------------------------------------
 | vc:     Double matrix indicating the percentage of variance captured  |
 |         for each variable measured (columns in Z). It has as many     |
 |         rows as principal components were retained when building the  |
 |         PCA model (size(P, 2) and as many columns as variables        |
 |         measured (size(Z, 2)). Values returned in this matrix are in  |
 |         the following range: [0, 100]%.                               |
 | vcJ:    Double matrix containing the percentage of variance explained |
 |         for each measured variable. These values are only computed    |
 |         when the matrix comes from a batch-wise unfolded matrix       |
 |         (size(Z, 2) == J x K), and otherwise the returned value is    |
 |         the empty set. Values in this matrix are in range [0, 100]%.  |
 | vcK:    Double matrix containing the percentage of variance explained |
 |         for each sample measured. These values are computed only when |
 |         the data matrix (Z) comes from a batch-wise unfolded matrix   |
 |         (size(Z, 2) == J x K). Otherwise, the value returned is the   |
 |         empty set. Values in this matrix are in range [0, 100]%.      |
 | h:      Array of double values (in case of a batch-wise unfolded      |
 |         matrix) or double value (otherwise) indicating the handles to |
 |         the representations of the variance captured for vc, vcJ and  |
 |         vcK (when applicable).                                        |
 -------------------------------------------------------------------------

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 % -------------------------------------------------------------------------
0002 % |      [vc, vcJ, vcK, h] = variance_captured(Z, P, J, K, vars, rep)     |
0003 % -------------------------------------------------------------------------
0004 % -----------------
0005 % | FUNCTIONALITY |
0006 % -------------------------------------------------------------------------
0007 % | This function aim is to compute the variance captured (vc) in a PCA   |
0008 % | model depending on the number of principal components retained.       |
0009 % | Additionally, the method also computes the variance captured for each |
0010 % | variable (vcJ) and time sample (vcK) according to the number of       |
0011 % | principal components retained if the input data matrix is a           |
0012 % | batch-wise unfolded matrix (the number of columns of the pre-treated  |
0013 % | data matrix - Z - is equal to J x K). Otherwise these values are set  |
0014 % | to the empty set. Finally, and if the user indicated this (rep =      |
0015 % | true), the method represents each of the computed variances captured  |
0016 % | by the PCA model.                                                     |
0017 % -------------------------------------------------------------------------
0018 % --------------------
0019 % | INPUT PARAMETERS |
0020 % -------------------------------------------------------------------------
0021 % | Z:      Double matrix with the pre-processed data. Rows are related   |
0022 % |         to observations, while columns are the measured variables. In |
0023 % |         case of batch-wise unfolded matrices, columns can be          |
0024 % |         decomposed as variables measured along time, and thus, they   |
0025 % |         are equal to J x K. Additionally, rows for a batch-wise       |
0026 % |         unfolded matrix are the different batches monitored.          |
0027 % | P:      Double matrix storing the loading matrix obtained from the    |
0028 % |         eigenvalue decomposition of the covariance/correlation matrix |
0029 % |         of the data matrix. Rows are the measured variables, while    |
0030 % |         columns are the principal components retained to build the    |
0031 % |         PCA model.                                                    |
0032 % | J:      Integer value that indicates the number of variable mesured   |
0033 % |         for each observation. Must be a value greater than 1.         |
0034 % | K:      Integer value that indicates the number of samples measured   |
0035 % |         for each variable. Must be a value greater than 0. When K > 1 |
0036 % |         we are using unfolded data matrices. This means that the      |
0037 % |         number of columns in Z must be equal to J x K.                |
0038 % | vars:   Cell array that contains the labels of the measured           |
0039 % |         variables. It must have J elements.                           |
0040 % | rep:    Boolean that indicates if the variance captured computed (vc, |
0041 % |         (and vcJ and vcK when applicable) has to be represented (rep  |
0042 % |         = true) or not (rep = false).                                 |
0043 % -------------------------------------------------------------------------
0044 % ---------------------
0045 % | OUTPUT PARAMETERS |
0046 % -------------------------------------------------------------------------
0047 % | vc:     Double matrix indicating the percentage of variance captured  |
0048 % |         for each variable measured (columns in Z). It has as many     |
0049 % |         rows as principal components were retained when building the  |
0050 % |         PCA model (size(P, 2) and as many columns as variables        |
0051 % |         measured (size(Z, 2)). Values returned in this matrix are in  |
0052 % |         the following range: [0, 100]%.                               |
0053 % | vcJ:    Double matrix containing the percentage of variance explained |
0054 % |         for each measured variable. These values are only computed    |
0055 % |         when the matrix comes from a batch-wise unfolded matrix       |
0056 % |         (size(Z, 2) == J x K), and otherwise the returned value is    |
0057 % |         the empty set. Values in this matrix are in range [0, 100]%.  |
0058 % | vcK:    Double matrix containing the percentage of variance explained |
0059 % |         for each sample measured. These values are computed only when |
0060 % |         the data matrix (Z) comes from a batch-wise unfolded matrix   |
0061 % |         (size(Z, 2) == J x K). Otherwise, the value returned is the   |
0062 % |         empty set. Values in this matrix are in range [0, 100]%.      |
0063 % | h:      Array of double values (in case of a batch-wise unfolded      |
0064 % |         matrix) or double value (otherwise) indicating the handles to |
0065 % |         the representations of the variance captured for vc, vcJ and  |
0066 % |         vcK (when applicable).                                        |
0067 % -------------------------------------------------------------------------
0068 
0069 function [vc, vcJ, vcK, h] = variance_captured(Z, P, J, K, vars, rep)
0070 
0071 %% PARAMETER CHECKING
0072 
0073 % First of all, we check if the user passed all input parameters.
0074 if nargin < 6
0075     % No, we are missing some parameters, so we inform of the problem.
0076     error('variance_captured:paramCheck', ...
0077         'ERROR: Method needs 5 input parameters!');
0078 end
0079 
0080 % Next, we check if the input parameters are correct.
0081 
0082 % - Z -
0083 % Is it a numeric matrix?
0084 if ~isnumeric(Z)
0085     % - No, so we inform of the problem.
0086     error('variance_captured:wrongType', ...
0087         'ERROR: Z must be a numeric type!');
0088 else
0089     % - We get the dimensions of Z.
0090     dims = size(Z);
0091     % - Is it a two-dimensional matrix?
0092     if length(dims) ~= 2
0093         % -- No, so we inform the user of the problem.
0094         error('variance_captured:wrongType', ...
0095             'ERROR: Z must be a two-dimensional matrix!');
0096     else
0097         % Is it a matrix?
0098         if dims(1) <= 1 || dims(2) <= 1
0099             % -- No, so we inform the user of the problem.
0100             error('variance_captured:wrongType', ...
0101                 'ERROR: Z must be a two-dimensional matrix');
0102         end
0103     end
0104 end
0105 
0106 % - P -
0107 % Is it a numeric matrix?
0108 if ~isnumeric(P)
0109     % - No, so we inform of the problem.
0110     error('variance_captured:wrongType', ...
0111         'ERROR: P must be a numeric type!');
0112 else
0113     % - We get the dimensions of Z.
0114     dims = size(P);
0115     % - Is it a two-dimensional matrix?
0116     if length(dims) ~= 2
0117         % -- No, so we inform the user of the problem.
0118         error('variance_captured:wrongType', ...
0119             'ERROR: P must be a two-dimensional matrix!');
0120     else
0121         % Is it a matrix?
0122         if dims(1) <= 1 || dims(2) <= 1
0123             % -- No, so we inform the user of the problem.
0124             error('variance_captured:wrongType', ...
0125                 'ERROR: Z must be a two-dimensional matrix');
0126         elseif dims(1) ~= size(Z, 2)
0127             % -- It does not have the same number of measured variables, so
0128             % we inform the user of this problem.
0129             error('variance_captured:wrongDimensions', ...
0130                 'ERROR: Rows in P must be equal to columns in Z!');
0131         end
0132     end
0133 end
0134 
0135 % - J -
0136 % Is it a numeric value?
0137 if ~isnumeric(J)
0138     error('variance_captured:wrongType', ...
0139         'ERROR: J must be a numeric value!');
0140 end
0141 
0142 % Is it a value?
0143 if size(J, 1) ~= 1 || size(J, 2) ~= 1
0144     % - No, it isn't. We inform the user of the problem.
0145     error('variance_captured:wrongType', ...
0146         'ERROR: J must be a [1 x 1] value!');
0147 end
0148 
0149 % Is within the range?
0150 if J <= 1
0151     % - No, it isn't. We inform the user of the problem.
0152     error('variance_captured:outOfRange' , ...
0153         'ERROR: J must be a value greater than 1!');
0154 end
0155 
0156 % - K -
0157 % Is it a numeric value?
0158 if ~isnumeric(K)
0159     error('variance_captured:wrongType', ...
0160         'ERROR: K must be a numeric value!');
0161 end
0162 
0163 % Is it a value?
0164 if size(K, 1) ~= 1 || size(K, 2) ~= 1
0165     % - No, it isn't. We inform the user of the problem.
0166     error('variance_captured:wrongType', ...
0167         'ERROR: K must be a [1 x 1] value!');
0168 end
0169 
0170 % Is within the range?
0171 if K <= 0
0172     % - No, it isn't. We inform the user of the problem.
0173     error('variance_captured:outOfRange' , ...
0174         'ERROR: K must be a value greater than 0!');
0175 elseif K == 1
0176     % - Additionally, if K is equal to 1, we are working with a
0177     % non batch-wise unfolded data matrix, so we have that size(Z, 2) is
0178     % equal to J.
0179     if J ~= size(Z, 2)
0180         % -- It's not the case, so we inform of the problem.
0181         error('variance_captured:valueNotValid', ...
0182             ['ERROR: When K == 1, J must be equal to the number of ', ...
0183             'columns in Z!']);
0184     end
0185     % - We indicate that we are not dealing with batch-wise unfolded data
0186     % matrices.
0187     bw = false;
0188     % - And thus, we do not have to compute the other variance captured
0189     % values (vcJ and vcK).
0190     vcJ = []; vcK = [];
0191 else
0192     % - We are working with batch-wise unfolded matrices, so we check if J
0193     % x K is equal to the number of columns in Z.
0194     if J * K ~= size(Z, 2)
0195         % -- It's not the case, so we inform of the problem.
0196         error('variance_captured:valueNotValid', ...
0197             ['ERROR: When K > 1, J x K must be equal to the number ', ...
0198             'of columns in Z!']);
0199     end
0200     % - Finally, we indicate that we are working with batch-wise unfolded
0201     % data matrices.
0202     bw = true;
0203 end
0204 
0205 % - vars -
0206 % Is it a cell array of strings?
0207 if ~iscellstr(vars)
0208     % - No, it isn't, so we inform of the problem.
0209     error('variance_captured:wrongType', ...
0210         'ERROR: vars must be a cell array of strings!');
0211 elseif length(vars) ~= J
0212     % - vars does not have the same number of measured variables, so we
0213     % inform of the situation.
0214     error('variance_captured:wrongDimensions', ...
0215         'ERROR: vars must have J elements!');
0216 end
0217     
0218 % - rep -
0219 % Is it a boolean variable?
0220 if ~islogical(rep)
0221     % - No, it isn't. Therefore, we indicate the problem.
0222     error('variance_captured:wrongType', ...
0223         'ERROR: rep must be a boolean variable!');
0224 end
0225 
0226 %% COMPUTATION OF THE VARIANCE CAPTURE FOR EACH COLUMN IN Z.
0227 
0228 % First of all, we compute the total variation of the original data.
0229 tV = sum(Z .^ 2);
0230 
0231 % Now, we define the variable to store the normalised loading values.
0232 nP = zeros(size(P));
0233 
0234 % And we initialise the variable to store the variance capture for each
0235 % variable and principal component.
0236 vc = zeros(size(P, 2), size(Z, 2));
0237 
0238 % Now, for each principal component retained...
0239 for i = 1:size(P, 2)
0240     % - We compute the normalised loading vector...
0241     nP(:, i) = P(:, i) / norm(P(:, i));
0242     % - Now, we compute the estimated X matrix for the i-th loading vector.
0243     Zhat = Z * nP(:, i) * nP(:, i)';
0244     % - And finally, we compute the variance captured for each variable
0245     % using the i-th principal component.
0246     vc(i, :) = 100 * sum(Zhat .^ 2) ./ tV;
0247 end
0248 
0249 % Do we have to represent the variance captured?
0250 if rep
0251     % - Yes, we have. But, how many handle figures we need? If we are not
0252     % dealing with batch-wise unfolded matrices (K == 1), we only need 1
0253     % handle. Ohterwise, we need 3 of them (one for vc, another to
0254     % represent vcJ and the last one to represent the vcK).
0255     if bw
0256         % - We need three handles, so we initialise the variable to store
0257         % the handles.
0258         h = zeros(1, 3);
0259         % - And the first one is the handle associated with the variance
0260         % captured for each column in Z.
0261         h(1) = figure;
0262     else
0263         % - We only need one handle, so we initialise its value now.
0264         h = figure;
0265     end
0266     % Now, we represent the stacked variance captured for each principal
0267     % component.
0268     bar(vc', 'Stacked');
0269     % Now, we redimensionate axis.
0270     axis([0, size(Z, 2) + 1, 0, max(sum(vc)) * 1.1]);
0271     % If we have a batch-wise unfolded matrix...
0272     if bw
0273         % For each variable (minus 1)...
0274         for i = 1 : J - 1
0275             % We add a line to divide variable i from variable i + 1.
0276             line([i * K, i * K], [0, max(sum(vc)) * 1.1], ...
0277                 'LineStyle', '-', 'Color', 'k');
0278         end
0279         % And now, we label the figure axes.
0280         xlabel('Variables along time');
0281         title('Variance captured for each variable and sample');
0282     else
0283         % Since we do not have variables along time, we label differently
0284         % the axes.
0285         xlabel('Measured variable');
0286         title('Variance captured for each measured variable');
0287     end
0288     % And finally, we indicate the variable described within each division.
0289     set(gca, 'XTick', K / 2 : K : size(Z, 2) * K, ...
0290         'XTickLabel', vars);
0291     % We label the figure axes (the ones remainig).
0292     ylabel('Variance captured');
0293 end
0294 
0295 %% VARIANCE CAPTURE FOR EACH VARIABLE AND EACH TIME SAMPLE.
0296 
0297 % In this stage, we compute the variance capture for each variable
0298 % (accumulating all time samples for that variable for each principal
0299 % component retained) and each time sample (we accumulate the variance
0300 % captured for each variable at a given time sample for each principal
0301 % component retained), but only if we are dealing with batch-wise unfolded
0302 % data matrices.
0303 if bw
0304     % First, we initialise the variable to store the variance captured for
0305     % each variable for each principal component retained.
0306     vcJ = zeros(size(P, 2), J);
0307     
0308     % Now, for each measured variable...
0309     for i = 1:J
0310         % - We compute the total variance for that variable for all
0311         % retained principal components.
0312         vJi = sum(sum(vc(:, (i - 1) * K + 1 : i * K)));
0313         % - Now, for each principal component retained...
0314         for j = 1:size(P, 2)
0315             % -- We compute the variance captured for all samples of the
0316             % i-th variable based on the j-th principal component.
0317             vcJ(j, i) = sum(vc(j, (i - 1) * K + 1 : i * K)) * 100 / vJi;
0318         end
0319     end
0320     
0321     % Now, that we have the information, do we have to represent it?
0322     if rep
0323         % - Yes, so we create the figure to represent the information.
0324         h(2) = figure;
0325         % - Now, we represent the stacked variance captured.
0326         bar(vcJ', 'Stacked');
0327         % - Next, we redimensionate the figure axes.
0328         axis([0, J + 1, 0, max(sum(vcJ)) * 1.1]);
0329         % - And finally, we indicate to which variable we are explaining
0330         % the variance captured.
0331         set(gca, 'XTick', 1 : J, 'XTickLabel', vars);
0332         % - And we label the figure axes.
0333         xlabel('Variable measured'); ylabel('Variance captured');
0334         title('Variance captured for each variable (samples accumulated)');
0335     end
0336     
0337     % The last step is to compute the variance captured for each time
0338     % sample (we accumulate the variance captured for all variables at that
0339     % time sample). First, we create the variable to store the values.
0340     vcK = zeros(size(P, 2), K);
0341     
0342     % Now, for each sample...
0343     for i = 1 : K
0344         % - We compute the total variance captured for the i-th time sample
0345         % for all principal components and variables.
0346         vKi = sum(sum(vc(:, i : K : size(Z, 2))));
0347         % - And now, for each principal component retained...
0348         for j = 1 : size(P, 2)
0349             % -- We compute the variance captured for all variables for the
0350             % i-th sample for the j-th principal component retained.
0351             vcK(j, i) = sum(vc(j, i : K : size(Z, 2))) * 100 / vKi;
0352         end
0353     end
0354     
0355     % Now, that we have the information, do we have to represent it?
0356     if rep
0357         % - Yes, so we update the handle's array with the new figure.
0358         h(3) = figure;
0359         % - Next, we represent the bar plot.
0360         bar(vcK', 'Stacked');
0361         % - Now, we redimensionate the figure axis.
0362         axis([0, K + 1, 0, max(sum(vcK)) * 1.1]);
0363         % - And we label the figure axes.
0364         xlabel('Time sample'); ylabel('Variance captured');
0365         title('Variance captured for each sample (variables accumlated)');
0366     end
0367     
0368 end

Generated on Wed 12-Sep-2012 13:03:54 by m2html © 2005