A new efficient approach to the direct restricted active space self-consistent field method

We present an implicitly parallel method for integral-block driven restricted active space self-consistent field~RASSCF! algorithms. Our algorithm entirely avoids testing the index space for nonzero contributions to the CI vector, by finding entire blocks of contributions through use of simple algebraic rules~propagation rules !. The blocks themselves are efficiently identified by introducing a RASmodel space . Our algorithm is capable of making efficient use of modern supercomputer hardware, supporting both shared and distributed memory architectures and hybrids. Applicability of our method is demonstrated with a RASSCF investigation of the first two excited states of indole. ©2003 American Institute of Physics. @DOI: 10.1063/1.1578620 #


I. INTRODUCTION
The restricted active space SCF ͑RASSCF͒ variant of the MCSCF method has been first proposed by Olsen et al. 1 and many successful applications have since been reported ͑e.g., Refs.2-4͒.The RASSCF method may be considered a logical extension to the CASSCF method, which allows one to tackle two limitations of the latter method.
The practical limit of the system size in a CASSCF computation is around 12-14 active orbitals, at least in cases where the number of electrons is about the same as the number of active orbitals and no symmetry can be exploited.However, the resulting orbital occupancies of several orbitals in the active space are frequently either close to 2 or 0 over the entire range of electronic states and spatial configurations of interest.
Dynamic correlation effects are frequently included in an electronic structure computation in a subsequent step, for example, by adding a calculation treating dynamic correlation perturbatively ͑e.g., Refs. 5 and 6͒, or by performing a MRCI calculation, where selected configuration state functions ͑CSFs͒ from the CASSCF wave function are used as reference configurations. 7,8However, a qualitatively adequate description of some systems sometimes requires the inclusion of dynamical correlation, i.e., an extended active space.Such systems are, for example, negative ions, electronic dipoles and excited states.
In the RASSCF formalism, the configuration space is specified by dividing the active molecular orbitals into three subsets and imposing restrictions on the allowed configurations based upon occupations within those subsets.The first subset, denoted RAS1, typically includes all doubly occupied MOs in some accepted reference, e.g., a CASSCF wave function.Allowed configurations must contain a specified minimum number of electrons within RAS1.The second subset, denoted RAS2, includes MOs believed to be particularly important for the system under investigation.No occupancy restrictions are imposed on RAS2.The third subset, RAS3, consists of weakly occupied MOs which contribute relatively less to the description of the system of interest.Any allowed configuration can only have a specified maximum number of electrons in RAS3.
The main challenge in large scale CASSCF problems is the efficient computation of coupling coefficients, and this is not different in the RASSCF method.However, the restrictions introduced for the RAS1 and RAS3 subspaces mean that the indexing of the CSFs is more complicated.This in turn means that highly efficient strategies such as the method of reduced excitation strings 9 cannot be employed.
Our aim in this paper is to propose an efficient, integral block driven algorithm of the RASSCF method, and to describe its implementation on a massively parallel computer.Our approach is based on a model space representation of the RAS active orbitals, and an efficient expansion of the model subspaces.This idea is inspired by the classic paper of Saunders and van Lenthe. 8The contributions of a block of integrals ͑with an identical representation in the model space͒ are computed together.The reconstruction of binary strings is avoided and the full list of substring addresses is constructed directly, using simple recursive algebraic rules ͑propagation rules͒.Finally, the proof of applicability of our RAS implementation is given in the form of an application in Sec.V.

II. THEORY
As in any other MCSCF method, we seek the solution of the CI eigenvalue problem, where Hϭ͕͗K͉H ˆ͉L͖͘ϭ͕H KL ͖ ͑2͒ is the representation matrix in a basis of orthonormal manyparticle configuration state functions ͑CSF͒, which we denote as ͕͉K͖͘.In general, only the lowest eigenvalues and their eigenvectors are of interest.In practice, use is normally made of iterative eigenvector procedures, such as the methods of Lanczos 10 or Davidson. 11The time consuming step in these methods is a linear transformation of the CI vector,

ϭHC, ͑3͒
where C is an approximate eigenvector from the previous iteration.The central practical problem is the evaluation of the ͕H KL ͖.The general result can be expressed as where the summation is over orbital indices, and (i͉ j) and (i j͉kl) are the usual one and two electron repulsion integrals.The A i j KL and B i jkl KL are numerical vector coupling coefficients that depend on the nature of ͉K͘ and ͉L͘.In second quantization, these numerical vector coupling coefficients emerge as matrix elements of creation and annihilation operators, a ˆi † and a ˆj , where ,␥ denote spin.Each Slater determinant of the CI expansion can be written as the tensor product of two strings, one of the occupied ␣-orbitals, K ␣ , and one of the occupied ␤-orbitals, K ␤ , ͉K͘ϭ͉K ␣ K ␤ ͘.

͑7͒
A string K is an ordered product of creation operators acting on the vacuum, and can be represented by a binary word of length M , where each bit b i represents a spin orbital.The bit value of b i indicates whether the orbital i is occupied or empty, i.e., the binary representation of K , K , contains N 1's and (M ϪN ) 0's.We denote by L the list of all strings in a defined ͑lexical͒ order.Each string K can be identified by its address, A͕K ͖, which is identical to its position in L. Knowles  where k refers to an electron, l to an orbital, M is the number of orbitals, N the number of electrons and denotes spin.This addressing scheme for strings in full CI is simple.However, in anticipation of the added complexity that will arise when addressing RAS strings we now introduce a graphical representation of Eq. ͑9͒. Figure 1͑a͒ illustrates the ␣-string space for M ϭ8 orbitals and N ␣ ϭ4 electrons. 13Each string corresponds to one path or walk on the grid of the graph.All paths begin at the foot of the graph, and finish in its head, advancing one level upwards for each orbital.If an orbital is occupied, the sloped segment step upwards is being used.The elements of the addressing array are added to the graph in Fig. 1͑a͒.According to Eq. ͑8͒ the address of a string K ␣ is thus the sum of all circled values visited by the appropriate walk, plus 1.The ␤-string address is obtained in the same way.The address of the Slater determinant ͉K͘ may then be defined as

͑10͒
Equation ͑4͒ is a simple sum of products.While the one and two electron integrals are generally nonzero quantities, the coupling coefficients are mostly zero.Efficient computer implementations of any CI method take advantage of this fact.One can distinguish between two main types of strategies to deal with Eq. ͑4͒, namely the configuration driven approach ͑CDA͒ and the integral driven approach ͑IDA͒.In the CDA, given a configuration pair, ͉K͘,͉L͘, all index pairs (i, j) and quadruples (i, j,k,l) that give nonzero coupling coefficients need to be found.In the IDA, by contrast, all configuration pairs ͉K͘,͉L͘ for nonzero coupling coefficients need to be found, given a set of orbital indices, i, j,k,l.In modern implementations both strategies have found their application.To a degree, the optimum approach will depend on the properties of the computer hardware, and efficient implementations that take advantage of vector processors have been devised.More recently, the development of computer hardware has been towards parallel configurations, where distributed and shared memory architectures are often found together, and vector architectures are less relevant.This should be reflected in the design of any implementation.In particular, methods for a well functioning load balancing should be considered.
The true computational cost of evaluating the matrix multiplication of Eq. ͑3͒ consists of finding all nonzero coupling coefficients, A i j KL ,B i jkl KL , and performing the SAXPY operations, for all nonzero coupling coefficients.As the proportion of nonzero coupling coefficients is tiny ͓for example, Ӷ0.01% for a CASSCF͑12,12͔͒ an efficient method of finding nonzero coupling coefficients implicitly and without trying or testing the complete space of orbital and/or configuration indices is thus highly desirable.To find a solution to this indexing problem, we start by rewriting the vector of Eq. ͑3͒ as a sum of 2 terms: The one-electron term, 1e K , may be written with the outer sum over orbital indices i and j and the inner sums over ␣-strings and ␤-strings,

͑13͒
Likewise, the two-electron term, 2e K , may be rewritten similarly, yielding Thus, in Eqs.͑13͒ and ͑14͒ the outer summation is over orbital indices i, j and i, j,k,l, respectively.This implies the use of an IDA, where the orbital indices and the repulsion integrals are given, and all string pairs K ␣ ,L ␣ and K ␤ ,L ␤ that give a nonzero contribution to 1e and 2e need to be found.
In restricted CI methods in general, and in the RASSCF method in particular, the restrictions imposed on the active orbital space give rise to certain complications concerning the construction and indexing of CSFs and their strings.However, an efficient solution to the indexing problem should, nevertheless, as in the case of full CI/CASSCF implementations, aim to minimize or eliminate redundant index space testing.We propose in this paper a solution to this problem, based on two steps: the introduction of a RAS model orbital space, and the efficient reconstruction of string pairs K ␣ ,L ␣ from the model strings in that space, using propagation rules.
The graphical representation of CSFs used in this work and outlined in Sec.II A greatly facilitates insight in and treatment of the RASSCF indexing problem.In Sec.II B we introduce the model space representation of the RAS space, again aided by a graphical representation.Based on the model space concept in Sec.II C we develop an efficient method for reconstructing the string pairs that give nonzero contributions to Eqs. ͑13͒ and ͑14͒, using propagation rules.The method described in this work entirely avoids potentially costly index space testing.

A. A graphical representation of the RAS configurations
The string space in the RASSCF method is a subset of the corresponding CAS string space, where the total number of active orbitals and electrons are identical.Due to the particle number restriction in the RAS1 and RAS3 orbital subspaces, the construction of CSFs is relatively more complicated.The complication arises on two levels: the indexing of strings and the combination of ␣-strings with ␤-strings.The remedy to both problems lies in the classification of certain groups of RAS strings.To this end we will make use of the concept of string categories. 14Within any string category indexing is then simple, as no restrictions apply.Allowed combinations of ␣-strings with ␤-strings are found by identifying allowed combinations of string categories.
It is convenient to define several graphs.Each graph corresponds to a subset of the paths on the FCI graph.The space of strings is then described by the sum of all possible walks on all graphs.In addition, there will usually be restrictions on the allowed combination of graphs.If the state of interest is a singlet, then the set of graphs for both spin spaces will be identical.Figure 2 shows three example ␣-string graphs ͑of a total of nine graphs͒ for the case with N ␣ ϭ6, M ϭ12, M (RAS1)ϭ4, M (RAS3)ϭ4 and a maximum of 2 electrons excited out of RAS1 and into RAS3, respectively.As in the CASSCF case, the graphical representation of the ␣-strings and ␤-strings may be used to order the paths.This is best done by assigning consecutive local addresses to strings within one graph.If the graphs themselves are ordered, then the global string address will be the sum of the local string address and an offset accounting for all paths in preceding graphs.
Kozlowski and Pulay 14 proposed a two level addressing scheme, the first level being a string category determined by the number of holes, i h , in RAS1 and the number of electrons, i e , in RAS3, while the second level gives the local string address within a given category.A category in this scheme corresponds exactly to a string graph as described above, and we will largely adopt the notation used in Ref. 14.The category Cat(i h ,i e ) is defined as Cat͑i h ,i e ͒ϭ͑ i h ϩ1 ͒ϩ͑ MxHoleϩ1 ͒i e , ͑15͒ and the number of categories ͑graphs͒ for each spin space is given by Cat͑MxHole,MxElec͒ϭ͑MxHoleϩ1 ͒͑ MxElecϩ1 ͒. ͑16͒ The length of that category ͑number of paths in the corresponding graph͒ is The address of an ␣-string K ␣ i h ,i e in category Cat(i h ,i e ) can then be calculated as

L͓Cat͔, ͑18͒
where A͕K ␣ ͖ denotes the local string address and the second term is the sum of lengths of all previous categories.The local address can be assigned in the following way: for a given category, Cat(i h ,i e ), any string can be considered as a combination of appropriate subspace strings, where each term K ␣ RAS1 , K ␣ RAS2 , K ␣ RAS3 denote the corresponding subspace strings.This is the crucial difference to the addressing scheme used by Olsen et al., 1 who chose to not logically separate the addressing of the individual RAS subspaces.The advantage of defining the local string address as in Eq. ͑19͒ will become clear shortly.
A choice must be made as to how the subspace strings themselves should be addressed.Since the string subspaces resemble CASSCF string spaces, it is straightforward to again use the indexing formula of Knowles and Handy 12 ͓Eqs.͑8͒ and ͑9͔͒, i.e., The number of orbitals, M , and electrons N in Eq. ͑9͒ are substituted with the corresponding RAS subspace variables, M (RASX) and N RAS X, respectively, for the RASX subspace (X͕1,2,3͖).
An example may serve to clarify this.Consider a RAS problem with M ϭ12 orbitals, N ␣ ϭ6 electrons, 4 orbitals in each RAS subspace, and maximum numbers of holes in RAS1 and particles in RAS3 both equal 2. One ␣-string that observes these restrictions is K ␣ ϭ124578.Its binary representation is K ␣ ϭ0000 1101 1011.The string category is Cat͑1,0͒ϭ2 ͑i.e., i h ϭ1, i e ϭ0). Figure 3 shows the graphi-FIG.2. Three example RAS ␣-string graphs for N ␣ ϭ6, M ϭ12 with RAS subspace definitions M (RAS1)ϭ4; MxHoleϭ2 and M (RAS3)ϭ4; MxElecϭ2.The total number of graphs ͑ϭcategories͒ is here ͑MxHoleϩ1͒ ϫ͑MxElecϩ1͒ϭ9.
FIG. 3.An example of the addressing of an RAS ␣-string: M (RAS1) ϭM (RAS2)ϭM (RAS3)ϭ4, K ␣ ϭ124578, K ␣ ϭ0000 1101 1011, i h ϭ1, i e ϭ0.The numbers on the slopes are the arc weights, Z, for each subspace, as obtained from Eq. ͑9͒, and the resulting substring addresses are shown next to the corresponding graph segment.The local string address, A͕K ␣ ͖, is then obtained from Eq. ͑19͒.
cal representation of K ␣ as a walk on the corresponding RAS graph ͓cf.Fig. 2͑b͔͒.The arc weights are obtained individually for each subspace graph ͓Eq.͑9͔͒ and the local string address is Only the graph corresponding to Catϭ1 precedes the current RAS graph, i.e., the lexical ␣-string address The ␣-strings cannot be combined freely with all ␤-strings, because of the restrictions for the allowed number of holes (i h ϩi h ЈрMxHole; the prime Ј signifies ␤-spin͒ in RAS1 and the allowed number of electrons (i e ϩi e Ј рMxElec) in RAS3.The resulting expression for valid SDs in the RAS expansion can be obtained using once again a two level addressing scheme.The local address for a string pair in a given set of string graphs, A͕K ␣ ,K ␤ ͖, is given by where L͓Cat(i h Ј ,i e Ј)͔ is the total number of ␤-string walks on the corresponding ␤-string graph.The address of CSF ͉K͘ where F Cat CatЈ is the offset accounting for all CSFs prior to K. It is defined as L͓Cat͑i h ,i e ͔͒ϫL͓CatЈ͔.͑23͒ Equation ͑21͒ assigns a unique local address to any combination of paths in a given ␣-string graph and a ␤-string graph ͑there are L͓Cat(i h ,i e )͔ϫL͓Cat(i h Ј ,i e Ј)͔ such combina- tions͒.The offset, F Cat CatЈ , may be precomputed for every allowed graph combination.It accounts for all previous allowed combinations, giving the global address for the SD.Although we are conducting the discussion of the RAS algorithm on the basis of SDs as basis CSFs, note that it is straightforward to use spin-adapted CSFs, by appropriately redefining A͕K ␣ ,K ␤ ͖ and F Cat CatЈ .It is apparent from Eq. ͑19͒, and perhaps even more so from the graphs in Fig. 2, that the subspace strings K ␣ RAS1 , K ␣ RAS2 , K ␣ RAS3 may be generated as all possible walks on their respective subspace graphs.The task of finding all allowed configurations in the RAS expansion is then simply to find all allowed combinations of subspace graphs.

B. The RAS model space
We will now introduce a model space representation of the RAS string space.The RAS model space closely re-sembles CAS string spaces, thus simply avoiding the problems associated with the restrictions imposed on the RAS1 and RAS3 orbital subspaces.
Suppose we would like to construct a RAS wave function using the determinantal approach, with a maximum number of holes ͑MxHole͒ allowed in the RAS1 space and a maximum number of electrons ͑MxElec͒ allowed in the RAS3 space.A model RAS space may then be defined with M m orbitals, given by and N m ϭN ␣ m ϩN ␤ m electrons where and The remaining ͑spin͒ orbitals excluded from this model space will always be occupied in the RAS1 subspace and unoccupied in the RAS3 subspace.Figure 4 shows a RAS type ␣-string consisting of RAS1, RAS2 and RAS3 strings, and the model space string representing it ͑MxHoleϭ2, MxElecϭ2͒.Note that the number of electrons in the model space is constant ͑other than in the individual RAS sub-spaces͒.
It is useful to think of the model space as ͑1͒ a compact way of representing the total RAS space, or ͑2͒ a combination of the RAS2 space with compacted RAS1 and RAS3 subspaces.
These two interpretations are visualized in Fig. 5 and Fig. 6, respectively.The most striking feature of representation 1 ͑Fig.5͒ of the model string space is the fact that, although representing restricted string spaces, they are entirely unrestricted and are in fact identical to those used in a CASSCF problem with M (CAS)ϭM m (RAS) orbitals and N(CAS)ϭN m (RAS) electrons.This paves the way for an efficient construction of all model strings, since this is a well studied problem.In particular, it is clear that the method of reduced excitation lists presented in Ref. 9 may be applied and we shall return to the details shortly.
Crucial to the usefulness of the model space idea is representation 2: the subdivision of the model space into the RAS2 subspace and compacted RAS1 and RAS3 model subspaces.Because the dimension of these model subspaces is MxHole and MxElec, respectively, the number of model space graphs in this representation equals the number of categories as given by Eq. ͑16͒ for each set of RAS spin graphs.Furthermore, Eq. ͑15͒ is not dependent on any absolute number of particles in any RAS subspace, so that there is a one to one correspondence of each complete graph with one model graph ͑cf.Fig. 2 and Fig. 6͒.This property will be of particular use once a model space string has been found, and is central to the method described here.The idea is simple: once a model string is found, it may be associated with a certain model string space.It is then by implication also associated with a certain string graph in the complete RAS space.
We have already mentioned that the unrestricted character of the model string space may be exploited, since the problem of finding model string pairs ͕K m ,L m ͖ closely re- sembles the corresponding problem for full CI.This problem is well studied and a number of efficient direct methods have been proposed ͑e.g., Refs. 1 and 15-17͒.Here we will use the method of reduced excitation strings, which we have introduced in Ref. 9 to entirely avoid index space testing in direct full CI/CASSCF computations.However, the model space and reduced string method are not interdependent, and any direct method tackling the indexing problem in full CI could be used.The reduced list concept may be applied here without change, and we will now briefly remind ourselves of the method, while placing it in the current context.
In order to avoid confusion between orbital labels of different RAS subspaces, the model space and the full RAS space, we will use separate orbital indices for labeling the respective orbitals: ͕a,b,c,d͖RAS1; ͕i, j,k,l͖RAS2; ͕p,q,r,s͖RAS3; ͕w,x,y,z͖model space; ͕i, j,k,l͖full RAS space.Note that we use the same set of indices i, j,k,l for both the full RAS space and the RAS2 subspace.The reason for this is that it is ͑at different times͒ useful to liken both sets of orbitals to the active space of a CASSCF.It will always be clear from the context which orbital set is meant.
The set of orbitals under consideration is restricted to those contained in the RAS model space.Thus, given the -model space orbital indices w,x, we define the model space excitation list X w x containing, in lexical order, all string pair addresses A͕K m ͖, A͕L m ͖ and sgn wx K m for which the relation . Thus all model space excitation lists in the model space may be constructed in this fashion, and the efficiency of this method comes from the fact that no redundant index space testing is needed in order to find the needed string pairs.M m Ϫ2 , will be used in the same manner.
Once a model string pair K m , L m is found, each model string may be associated with one category, simply by counting the number of holes/electrons in the RAS1 and RAS3 model substrings.In graphical terms this means that for each walk corresponding to a -model string, we need to find the -model string space graph ͑cf.Fig. 6͒ that has the correct dimension to superimpose the walk onto it.If the loop stretches over more than one RAS subspace, the model strings K m ,L m might then belong to different model space graphs ͑categories͒.In Sec.II C we will show how to expand the model substrings.However, the RAS2 substrings, K RAS2 and L RAS2 are already in the final form.At this point we should mention an additional benefit of using the model space approach.In traditional implementations of the RASSCF method ͑e.g., Refs. 1 and 14͒, one has to deal with the possibility of out of space excitations, i.e., the excitation a ˆi † a ˆj ͉L ͘ could result in a string ͉K ͘ whose number of holes in RAS1 or the number of electrons in RAS3 exceed the limits, namely MxHole and MxElec, respectively.Due to the resemblance of the RAS model space to the string space for a full CI it is, by construction, not possible to produce an out of space excitation, and the problem is entirely avoided.This holds true also for double excitations.

C. An efficient expansion technique: Propagation rules
Each walk on a model string graph represents a set of strings on the full graph.Correspondingly, each pair of walks on the model string graph, related by a single (a ˆw † a ˆx ) or double (a ˆw † a ˆy † a ˆz a ˆx ) excitation and thus building a loop, represents a set of string pairs on the full graph.We now need to translate each model string pair K m ,L m , together with its associated parity factor, sgn wx K m , into a set of RAS string pairs and their parity factor, ͕K While the RAS2 substring is fully determined by the model string, the RAS1 and RAS3 substrings need to be evaluated separately by expanding the model substrings.
To achieve this, we propose a set of propagation rules.The aim of the propagation rules is to efficiently compute all string pairs, ͕K ,L ͖, that give nonzero contributions for a given excitation term ͗K ͉a ˆi † a ˆj ͉L ͘.All RASX substrings that are represented by a model substring can be constructed by propagating the binary representation of the excitation operator from the right to the left of a binary representation of the remaining orbitals, which we will denote r.s.Importantly, it can be shown that, by defining the address of subspace strings through Eq. ͑20͒ the actual construction of the subspace strings is not necessary and the substring addresses can be obtained directly.This is done by means of simple recursive evaluation of blocks of substring addresses ˆA͕K RASX (2)͖,...,A͕K RASX (n)͖‰ from the first substring address A͕K RASX (1)͖ of each block.Propagation Rule I. Table I shows a series of RASX string pairs.They are constructed from the complete list of strings generated from two 0Јs and two 1Јs, ordered according to Handy's index equation ͓Eq.͑9͔͒ and supplemented by the short strings 01 and 10 to the right.The resulting list is, by construction, consecutive elements of a list of strings generated from three 0Јs and three 1Јs, also ordered according to Handy's index equation.If we refer to consecutive strings of the short list, ͕0011,...,1100͖, as r.s. and r.s.Ј, respec- tively, then, according to Eq. ͑20͒ we have the trivial relationship  where the first argument ͑1͒ simply indicates an unchanged position of the added ͑bold͒ bits.Propagation Rule II.Table II shows a series of string pairs.They are constructed from a constant binary string, 01011 and a bit b a whose position a is variable.The bit value of b a is 1 for K RASX and 0 for L RASX.As b a propagates from the right to the left of the strings, the total string addresses change if the bit values in fixed bit positions in the strings change.In particular, if after the propagation of the variable bit to position a ͑from position aϪ1) we have b a ϭb aϪ1 , the string and its address have not changed.It is necessary to know only a number of reduced RAS1 and RAS3 string lists and some binomial coefficients, both of which may be precomputed at certain stages of the computation.The outlined scheme avoids redundant index space testing in a similar fashion the reduced list approach devel-oped in Ref. 9 does for unrestricted CI.The algorithm is based on the systematic propagation of substring variables a,b,c,d ͑or p,q,r,s) over the entries of a reduced string list of the RAS1 ͑or RAS3͒ orbital space.
It can be shown that all possible lists of string addresses can be constructed using the propagator function, where u͕a,b,c,d͖ and the parameter is the number of variable indices in the RASX substring.The propagation rule, P, for the RASX subspace may then be expressed as for a given reduced string ͑r.s.͒.When proceeding to the next reduced string ͑in the case of 0 indices in RASX each entry may be interpreted as a reduced string͒, the relationship is trivial, as shown above, If the number of holes and electrons in the respective subspaces RAS1 and RAS3 is greater than 2, care must be taken to ensure each term ͗K ͉a ˆi † a ˆj ͉L ͘ is represented by only one model space term ͗K m ͉a ˆw † a ˆx ͉L m ͘, in order to avoid multiple contributions.Figure 8 illustrates the problem.The same string pair, K ,L , is arrived at by expanding the model space string pairs in the complete RAS space, using the appropriate propagation rules.The string pair,   † a ˆ6 ͉L m ͘ both give ͑among others͒ the con- tribution ͗K ͉a ˆ12 † a ˆ10 ͉L ͘.
K ,L , has been constructed using the same propagation rule, but starting from different model string pairs, K m (1),L m (1) and K m (2),L m (2).Each model string pair is related to a different excitation operator in the model space, namely a ˆ7 † a ˆ6 and a ˆ8 † a ˆ6 , respectively.This problem is, however, easily avoided by requiring the model space indices w,x,y,z to be flush-right in the RASX m model subspaces.In graphical terms this means that loop-opening and loopclosing segments will be concentrated on the lowest possible orbital level in the RASX m model subspaces ͑cf.In the special case of both MxHole and MxElec assuming their maximum values the configuration spaces of CAS and RAS become identical.
The 2e Excitation Contribution.The 2e excitation term ͓Eq.͑14͔͒ of the vector ͓Eq.͑3͔͒ is by far the computationally more expensive one.It consists of two parts that differ conceptually.The ␣␣/␤␤ parts ͑first two terms͒ each represent a double excitation involving electrons of one spin type only.All that is needed to accommodate for the double excitations in a single string space, using the model space approach, is an extension of the model subspace by up to two extra model space orbitals for each model subspace.The ␣␤-part and ␤␣-part ͓the last two terms in Eq. ͑14͔͒ are simply combinations of single excitations in both the ␣-string space and the ␤-string space.Thus, for the purpose of finding all spin-string pairs, the ␣␤ part of the 2e excitation may be treated in an identical fashion to the 1e excitation.
As in the case of the 1e excitation, out-of-space excitations within a spin-string are entirely avoided by construction, through usage of the model space approach.The combination of two valid model string pairs for the ␣␤-part and ␤␣-part may, though, result in an out-of-space excitation.However, such a combination may simply be rejected at the model space level, before any expansion of external subspaces takes place.
The symmetry properties of the two-electron integrals (i j͉kl) mean that it is possible to reduce the loops over i, j,k,l to unique integrals and the summation in Eq. ͑4͒ over iу j, kуl and i(iϪ1)/2ϩ jуk(kϪ1)/2ϩl.By introducing the model space we replace summation over orbital indices i, j,k,l with model space orbital indices w,x,y,z.A single set of model space indices potentially represent a set of orbital indices.Nevertheless, the symmetry condition also holds for the RAS model space: Because of these restrictions we will only find a restricted number of excitation combinations between RAS subspaces.
The possible combinations are shown in Table III V summarizes the number of unique index quadruples, due to the restriction on possible combinations of (wx) with (yz) ͓cf.Eq. ͑36͔͒.The following can be seen.
͑1͒ The contribution of (w→x,y→z)ϭ(2→2,2→2) is the largest one and its relative importance grows with increasing size of the RAS2 subspace, M (RAS2).This means that in most cases the extension of RAS1 and RAS3 subspaces will simply result in a list of consecutive subspace addresses, and that both sign, sgn ijkl K , and integral, (i j͉kl), are constant for all contributions derived from a model space string pair.͑2͒ All other relatively large contributions stem from excitations with at least one of the two excitations being a 2 →2.The advantage in this case is that the problem of finding relevant pairs K ␣ i h ,i e , L ␣ i h ,i e is very similar to the 1e case.
In summary, most double excitations are computed from consecutive CSFs, with constant integral (i j͉kl) and parity factor sgn ijkl K .This part is getting relatively more important as the system size grows.The next most relevant part consists logically of a 1e excitation type problem.This is particularly true for the ␣␣ 2e part.In  of the discussed cases is summarized analytically and for the same example cases already used above.The two dominating contributions are becoming more important with the growing size of the orbital space, and as a consequence, the efficiency of the propagation rule scheme becomes relatively greater for larger problems.

D. The effect of propagation rules on the order of integrals and the computation of parity factors
In an actual implementation of the propagation rule algorithm, once a model string pair is found for both the ␣-spin and ␤-spin model string space, the RAS1 and RAS3 subspace strings need to be expanded.This will be done in a nested loop over all RAS1 and RAS3 expansions.If all model space indices w,x (w,x,y,z) lie within the RAS2 subspace, then all contributions will be for the same integral (i͉ j) "(i j͉kl)….If, on the other hand, some or all model space indices fall into one or both of the other subspaces, the contributions will be for a list of integrals.It is therefore useful to assemble the corresponding list of integrals, for every set of model space indices, in the order they will be accessed.This order is entirely given by the propagation rules, and only one list needs to be assembled for any set of model space indices.The integral lists for ␣␣-, ␣␤-, ␤␣and ␤␤contributions will be identical and need to be assembled only once.
It is by now clear that the value associated with a bra-ket term of the form ͗K ͉a ˆi † a ˆj ͉L ͘ may differ from its model space analog sgn 0 m, ϭ͗K m ͉a ˆw † a ˆx ͉L m ͘ at most by a factor of Ϫ1.If, for example, the index i lies within the RAS3 orbital subspace, then the propagation of the appropriate bit towards the left will result in a sign change every time the bit value of b iϪ1 equals 1.Because of this behavior and the direct association of the sign change with the propagation of variable bits, it is convenient to pre-compute the sign-changing factors for each spin subspace together with the RAS1 and RAS3 subspace string address pairs.The term sgn ij K ϭ͗K ͉a ˆi † a ˆj ͉L ͘ is then computed as

͑37͒
The same applies without change to the 2e equivalent, sgn ij K ϭ͗K ͉a ˆi † a ˆk † a ˆl a ˆj ͉L ͘,

III. THE DIRECT RAS ALGORITHM
The algorithm that follows from the use of an integral driven CI is inherently parallel, and the parallel loop may include any combination of orbital indices i, j,k,l.We will now provide the algorithms for one-and two-electron excitation term contributions of Eq. ͑12͒.All reduced lists of model space and RAS subspaces and the associated Handy's index matrices are precomputed only once.The outer loop is over the model space orbital indices w,x,y,z.The computation of model space string pairs is analogous to string pair computation as described in Ref. 9 for full CI string pairs.Once the model string pair K ,L is known, its category and propagation rules for both the RAS1 and RAS3 subspaces are identified.This information is used to assemble the list of integrals into a vector, where they will be accessed in a sequential manner during updating of the -vector.Then all excitation lists are assembled, by application of the propagation rules to the RAS subspace model strings, RAS1 m and RAS3 m .We denote these subspace excitation lists E RASX .Finally, nested loops over subspace excitation list entries lead to the local address, and by combining ␣and ␤-strings, to the CSF address.The appropriate algorithms for 2e ␣␣ and 2e ␣␤ parts, respectively, of the -vector are summarized in the Appendix.
The dimension of the integral list is a function of the number of orbital indices in RAS1 and RAS3, n1 and n3, respectively,

͑39͒
If all orbital indices lie within RAS2, then the integral list contains only one element.The maximum length of the integral vector depends on the relative size of the RAS1 and RAS3 orbital subspaces.In the special case of M (RAS1) ϭM (RAS3)Ͼ2 the maximum dimension is for two indices each in the RAS1 and RAS3 subspaces, For example, with M (RAS1)ϭM (RAS3)ϭ10, the maximum length of the integral vector is 2025.At any stage the length of the integral vector will be small when compared to the CI vector.The memory requirements for the reduced lists of RAS subspace strings and model space strings are equally small compared to the CI vector, as are the partial excitation strings, E RASX .
Thus the memory requirements are dominated by the vectors and C.However, if the working memory is not large enough to accommodate the entire CI vectors, it is straightforward to adapt the algorithm to use only certain blocks of these vectors, by taking advantage of the blocking of string addresses of all strings that belong to a certain graph.This means that the vectors and C with elements K ␣ K ␤ and C K ␣ K ␤ are also blocked with respect to graph combinations ͑categories͒.Since the graph combination is fixed for all contributions of a found model string pair, it is possible to adapt the described scheme by either reading in the required block of the CI vectors from distributed memory, or by restricting contributions to a given CI vector block at a time.
Particularly in the first iterations of a typical MCSCF computation, there are only very few nonzero elements C K ␣ K ␤ .This means that in principle, only those contributions, with C K ␣ K ␤ 0 need to be considered to compute .This is an inherent weakness of the integral driven approach, since the resulting algorithm does not provide for efficient exclusion of blocks of the CI vector.However, to increase efficiency in the model space approach proposed here, one could assemble a short list of flags that indicate whether any of the ␣-strings obtained by expanding a ␣-model string belongs to a block of C with at least one nonzero element An analogous list would be needed for ␤-string blocks.The most simple realization of this idea would be to define the blocks as having contributions of a unique ␣-graph/␤-graph combination.More elaborate methods could include, for example, lists for specific RAS2 substrings, possibly combined with the appropriate propagation rules.Finally, it should be noted that use of the propagation rules means that it is not straightforward to take advantage of point group symmetry.

IV. PARALLELIZATION
We will now describe the parallel implementation of the direct RASSCF algorithm.A thorough discussion of the method of parallelization used and its performance can be found in Ref. 9. We assume a scalable parallel distributed memory computer architecture consisting of nodes, each with local memory.The nodes themselves may be symmetric multiprocessor ͑SMP͒ machines with shared memory.As mentioned in the previous section, the integral driven algorithm is already inherently parallel.The main issue to address is thus how to ensure good load balancing.A flexible load balancing mechanism helps to achieve optimum scaling and is essential in heterogenous environments, where a cluster may consist of nodes of different architecture.It is also important in the realistic situation of uneven preloading of allocated nodes, i.e., in situations where some workers must share resources with concurrently running programs on the same node.
Load balancing can be achieved when the total number of independent, parallel tasks is large compared to the number of processing elements ͑PEs, also: CPUs͒.The overhead of subdividing the total work into parallel sub-tasks should ideally be kept small compared to total execution time.Communication between workers should be avoided and should be handled by a separate process, in order to avoid synchronization delays. 18he sub-tasks that are executed on parallel nodes correspond to model space index pairs w,x (1e contribution͒, or model space index quadruples, w,x,y,z (2e contribution͒.The relatively small size of the model space ͑as opposed to the orbital space comprising all active orbitals͒ means the total number of independent tasks is much reduced.Table VII shows for different sizes of the RAS2 subspace and MxHoleϭMxElecϭ2 the number of parallel tasks for both the 1e and 2e contributions.The number of tasks is independent of the size of the RAS1 and RAS3 subspaces, and lies within a useful range, i.e., there are sufficiently many sub-tasks to enable load balancing for a large number of PEs, even for the smallest RAS2 orbital subspace.For larger RAS2 subspaces, in order to reduce overhead due to communication, several subtasks may be grouped together in order to adapt granularity of the parallelism.This may be done dynamically, in order to adapt to the number of requested or allocated CPUs.
The IDA approach followed by us here involves the outer loops over the model space orbital indices w,x,y,z.Inside these nested loops, the assembly of certain blocks of the vector takes place.However, with the exception of the CI vector itself, all data inside the loops are static.Therefore, we can minimize communication overhead by passing these static data to all worker nodes only once at the beginning of each eigenvector iteration.This is of advantage especially for distributed memory architectures, and, in particular, for low-cost configurations with standard network interprocessor communication.On shared memory architectures ͑e.g., an SMP node͒ these static data need to exist only once.
We now discuss the implementation of the general strategy just discussed for distributed memory architectures using Linda. 18We begin with a few definitions.We assume that each node may be an SMP.The number of processors on each SMP is indicated as NProcS ͑on single processor nodes NProcSϭ1).The value NTask is computed, with NTask ϭNProcSϫH and H is chosen to both optimize loadbalancing and minimize communication overhead.The main process is called the master process ͑master͒.The process spawned on the first processing node retrieves the index (wxyz) ͑initially (wxyz)ϭ (wx) max "(wx) max ϩ1…/2 with (wx) max ϭM m (M m ϩ1)/2).A list of NTask indices is assembled on the worker, by testing the validity of (wxyz), storing it if valid, testing (wxyz)Ϫ1, and so forth.Invalid indices are simply dropped when found.The worker then puts a new index (wxyz)Јϭ(wxyz)ϪNTaskϪN invalid into tuple space ͑the virtual shared memory area in the Linda model͒, which is then retrieved by the next worker, etc.If NProcSϾ1, then NProcSϪ1 shared memory processes are created, each of which will be assigned a subset of NTask/NProcS tasks.Thus the original loop over model orbital space indices, w,x,y,z, is parallelized.Each processor then carries out the assembly of using the original serial algorithm.On finishing, the next available index (wxyz) is retrieved from tuple space and this procedure is continued until all model space generator combinations are processed, i.e., the index (wxyz)ϭ0 is found.Finally the intermediate results are passed through tuple space to be combined to give the final result of the computation.
SMP architectures are supported in three ways: ͑i͒ using Linda, ͑ii͒ using shared memory, or ͑iii͒ a combination of both.In the first case, ͑i͒, one Linda worker is created on each PE.The communication overhead in this scheme is extremely small.Furthermore, using Linda even on SMPs takes full advantage of the load balancing mechanism described above.However, the static data ͑e.g., the CI vector C, integrals, etc.͒ must be replicated for each Linda worker.
The alternative ͑ii͒ is the usage of shared memory (NProcSϾ1).In this case all static data get replicated only once per SMP and are shared by all PEs.All result vectors ͑one per PE͒ are summed before the result of this worker is passed back to tuple space, where it is subsequently retrieved by the master process.This method has the advantage that all static data exist only once in working memory on an SMP node.For example, on an SMP with NProcSϭ2 the memory requirements are for three CI vectors; i.e., one result vector for each PE plus one shared CI vector ͑C͒.This is particularly useful if available working memory is limited and/or if the size of the CI vector is very large.A further advantage of this option is a reduction of communication by a factor of NProcS, since all static data and also the combined results have to be passed only once per NProcS workers.The price to be paid in this case is that there is no load balancing between PEs on an SMP.The tasks defined by the NProcS indices are of very similar length, but will be slightly different.In practice, this means that efficiency will decrease if NProcS becomes very large.
The third option is any combination of the previous two.In this case the number of PEs using shared memory, NProcS, may be chosen to comprise any available number of PEs on each node, i.e., on a 4way SMP node NProcS can assume the values 1, 2, 3 or 4. Case ͑iii͒ would then correspond to two Linda workers running on that node, each of them comprising NProcSϭ2 shared memory processes.Taking advantage of this option the mode of parallelism may be tailored to the available architecture.
The approach described above ensures great flexibility in choosing the number of processors and leads, due to the relatively high number of tasks, to an effective load balancing, provided the number of PEs is small compared to the number of tasks.The intermediate assembly of a sublist of NTask tasks ensures the optimum granularity, through dynamical adaptation of NTask.The load balancing scheme implemented also allows for the fact that, in many environments, the CPU-time on different nodes of a parallel machine may be shared among a number of running programs and thus automatically uses the resources as they become available.

V. EXAMPLE APPLICATION
The RASSCF program developed and implemented as part of this work, has so far been applied in a small number of projects, including the calculation of the first two excited states of indole.We will in this section denote by RAS(N,M I ϩM II ϩM III )͓MxHole,MxElec͔ a RASSCF calculation with N electrons in an active space of M I RAS1 orbitals, M II RAS2 orbitals and M III RAS3 orbitals.MxHole and MxElec signify the maximum number of particles excited out of the RAS1 space, and into the RAS3 orbital space, respectively.
Indole has two low singlet excited states, 1 L b ͑covalent͒ and 1 L a ͑ionic͒, which are separated by a small energy gap of approximately 0.4 eV.A CASSCF(10,9)/6-31G* calculation overestimates this gap ͑1.20 eV͒, mainly because the energy of the ionic 1 L a state is overestimated.CAS-PT2/ANO calculations by Serrano and Roos 19 reproduce the experimental value.The goal of this study is to give a balanced description of the excited states that includes a approximation to the energy gap of 0.4 eV between the S1 and S2 excited states.
In Table VIII and Fig. 9 we illustrate the results of the CASSCF computation.A comparison with experimental results in Table VIII clearly shows that the results of the CASSCF computation are not only quantitatively, but also qualitatively, incorrect.According to the experimental data, the two states of indole ( 1 L a and 1 L b ) have similar 0-0 transitions ͑emissions from the minimum for the state to the ground-state minimum͒.Therefore, the potential energy surface probably has two different minima on the surface of S1: one for each state.Presumably there is a conical intersection ͑surface crossing͒ between S1 and S2, because the 1 L a state is S2 at the Franck-Condon geometry.At the CASSCF(10,9)/6-31G* level, the surface is wrong: the 1 L a minimum lies on S2.
A common technique used in order to enhance qualitatively the description of electronic states is to ''double'' the active space of -orbitals in a CASSCF computation.The corresponding CASSCF͑10,18͒ in the case of indole is, however, outside the scope of practical computation.The RASSCF treatment of this enlarged active space on the other hand is absolutely practical.A second possibility for qualitatively improving the wavefunction is to include -orbitals into the set of active orbitals.
Preliminary computations using the 3-21G basis set were performed in order to identify the most suitable RAS active space definition.In a first test the active space was doubled by including the 9 2p orbitals of AЉ symmetry ͑a -system used in the CASSCF computation͒ and the 9 3p orbitals of AЉ symmetry.Ten of these orbitals were placed in the RAS3 subspace, and the 3 nearly doubly occupied orbitals from the CASSCF computation in the RAS1 subspace.This corresponds to a CAS͑4,5͒ reference space in RAS2.Two computations were performed, one with MxHoleϭMxElecϭ2 and one with MxHoleϭMxElecϭ1.In a second test the active FIG.9. Potential energy surfaces ͑S1,S2͒ of indole, calculated at the CASSCF level ͓CAS(10,9)/6-31G*͔.orbital space from the first test was augmented by 10 -orbitals and *-orbitals corresponding to the 10 CC and CN -bonds.The RAS1 subspace now contained 13 orbitals, RAS2 5 orbitals and RAS3 20 orbitals, and excitations out of RAS1 and into RAS3 were limited to 1 particle each.Again the RAS2 subspace translates into a CAS͑4,5͒ reference space.In Table IX we summarize the computations and their results.It is clear from these preliminary results that inclusion of -orbitals and single excitations relative to the reference space are particularly important.The strategy from now on was ͑a͒ to increase the basis to 6-31G*, ͑b͒ to reduce the RAS space by eliminating the MOs with occupancies closest to 2.00 and 0.00, and vary the RAS2 subspace, and ͑c͒ to improve the energies with the 6-311ϩG* basis set.Table X shows the orbital occupancies resulting from step ͑a͒.A number of orbitals that are virtually doubly occupied or unoccupied are excluded from the active space and the orbital occupancies from the resulting RAS(20,8ϩ5ϩ11)͓1,1͔ are also shown in Table X.The excitation energies obtained from these calculations are shown in Table XI.In general, all calculated energies for S1 are 0.7-1.4eV higher than the experimental value.However, the energy gap between S1 and S2 is much more accurate than the CASSCF value of 1.20 eV ͑cf.Table VIII͒.The reduction of the active space has only a small effect, while the use of the larger (6-311ϩG*) basis set improves the results substantially.
Figure 10 shows the potential energy surface as computed at the RAS(20,7ϩ6ϩ11)͓1,1͔/6-31G* level.The results in the figure are qualitatively correct, although the 0-0 energy for the 1 L a state is off by more than 1 eV.In contrast to the vertical excitations, the potential energy surface corrected with the 6-311ϩG* basis is worse than the ones with 6-31G*.The reason for this is probably that the points were optimized with the latter basis.
The result of this first application of our RASSCF program is encouraging.The improvements of the obtained excitation energies compared to the CASSCF(10,9)/6-31G* is considerable, but the crucial result is the qualitatively correct potential energy surface.The computational cost increases only moderately, with the number of CSFs increasing from 8001 for the CASSCF͑10,9͒ computation, to 53735 for the RASSCF(20,7ϩ6ϩ11)͓1,1͔.On the other hand, exploratory calculations have shown that in order to complete the study of the potential energy surface and optimize the transition structure between the two minima on the S1 surface, extension of the RAS2 subspace is necessary.

VI. SUMMARY
In this paper we have developed an efficient method for updating the CI vector in a RASSCF computation within the iterative Davidson/Lanczos methods.The string based algorithm arises due to two new concepts: ͑i͒ the RAS model space concept ensures a very efficient identification of blocks of contributions to the -vector, and we have made use of the reduced list algorithm previously introduced in Ref. 9 for efficient CASSCF computations.The RAS model space allows the representation of a restricted space problem in an unrestricted manner, thus avoiding the complications inherent in the RASSCF method.͑ii͒ Propagation rules are used to efficiently construct the string addresses ͑as opposed to the strings͒ in a predefined order.Propagation rules are a simple, but very efficient algebraic tool avoiding table-lookups and the construction of contributing strings themselves.
The resulting method is memory efficient, since it utilizes compressed storage of precomputed model excitation lists in the main memory.The construction of the full excitation strings is fully avoided.The algorithm developed follows an integral driven approach.The resulting outer nested loops over model orbital indices w,x,y,z lends itself to efficient parallelization.Every model space index quadruple w,x,y,z represents a list of integrals (i j͉kl), i.e., we have an integral block driven approach.Clusters of index quadruples with variable cluster size ensure adjustable granularity of the parallelism, thus avoiding scaling problems.Communication overhead ͑if run in parallel͒ is minimal through implicit task definition by only one integer.This technique also leads to a natural load balancing.
The algorithm is implemented in the current development version of the Gaussian package of programs. 20The implementation comprises the option of parallel execution.Running in parallel may be done using distributed memory ͑Linda͒ or shared memory, or a combination of these.As well as the most general case of Slater determinants, we also implemented the straightforward simplifications for singlets and triplets using Hartree Waller functions.
A test example application has been performed with the current version of the program.The results show that the RASSCF method is a useful tool allowing us to approach problems that are currently not tractable with the CASSCF method.

FIG. 1 .
FIG.1.͑a͒: ␣-string graph for N ␣ ϭ4 electrons in M ϭ8 orbitals.Every possible path from the bottom ͑foot͒ to the top ͑head͒ of the graph corresponds to an ␣-string.The orbitals are ordered and at each orbital level an ␣-electron may be added.At each vertex in the graph there are up to two paths upwards, corresponding to the next ␣-spin orbital being occupied ͑sloped path͒ or unoccupied.A lexical addressing of the strings is achieved by using Handy's addressing array ͑Ref.12͒ ͓cf.Eqs.͑8͒ and ͑9͔͒.The numbers on the slopes are the arc weights, Z(k,l), and the address of any string is obtained simply by taking the sum of the arc weights.This addressing scheme corresponds to a strict left-to-right ordering of the strings.͑b͒ The bold line represents the walk corresponding to the binary string 11001001.The corresponding string address is 31.

FIG. 4 .
FIG.4.An example of a RAS ␣-string consisting of four orbitals in each RAS subspace, and its representing model space string (MxHoleϭ2, MxElecϭ2).The model string is constructed using the RAS2 substring and parts of the RAS1 and RAS3 substrings, of dimension MxHole and MxElec, respectively.The RAS model string has no occupancy restricted subspaces.

͑28͒͑30͒
where b n is the bit value of the nth bit in the binary representation of K m , K m.The sums represent the number of occupied orbitals ͑i.e., b n ϭ1) between orbitals w and x.K m and L m are constructed by inserting the appropriate bits b w ,b x into the binary representation of a reduced string, consisting of M m Ϫ2 bits, with N m Ϫ1 1s.The list of all such reduced strings in lexical order is denoted L N m Ϫ1 M m Ϫ2 .Similarly, for 2e excitations a model space excitation list X w y x z is defined, containing all string pair addresses A͕K m ͖, A͕L m ͖ and sgn wxyz K m for which the relation ͗K m ͉a ˆw † a ˆy † a ˆz a ˆx ͉L The model string pairs K m ,L m are obtained by inserting the bits b w ,b x ,b y ,b z into the corresponding reduced string.To accommodate for all possible 1e and 2e excitations we need only five reduced lists, which may be precomputed:

FIG. 5 .
FIG. 5. A graph representing the complete ␣-model string space; M m ϭM (RAS2)ϩMxHoleϩMxElec; ͓ M (RAS2)ϭ4, MxHoleϭMxElecϭ2͔ and N ␣ m ϭ4.The model string graph resembles a CAS string graph, due to the absence of occupancy restrictions in the RAS model space.

Figure 7
Figure 7 shows an example for the reconstruction of one string pair K ␣ m ,L ␣ m , for the model space excitation a ˆ6␣ † a ˆ3␣ and using an arbitrary reduced model string.The resulting walks for K ␣ m and L ␣ m form a loop on the model string graph.For the assembly of the model excitation list X 3␣ 6␣ the complete set of walks of reduced model strings ͑in lexical order͒, i.e., all entries of L N m Ϫ1 If in contrast b a b aϪ1 , the string address has changed by a number equal to the number of binary permutations of all bits to the left of position a, represented by the binomial coefficient ( N left (a) M (RASX)Ϫa ),where N left (a) simply denotes the number of 1Јs to the left of position a.This is again due to the nature of Handy's index equation.If b a ϭ1; b aϪ1 ϭ0 the string address rises, if b a ϭ0; b aϪ1 ϭ1 the string address lowers, i.e., ⌬aϭ1: ⌬A͕K RASX ͖ϭ͑b a Ϫb aϪ1 ͒ͩ M ͑ RASX ͒Ϫa N left ͑ a ͒ ͪ .͑32͒ These two propagation rules indicate an efficient algorithm for the computation of the remaining RAS1/RAS3 string addresses, if the first address of a given list is known.
͗K m ͉a ˆw † a ˆx ͉L m ͘ result in the same contributing term ͗K ͉a ˆi † a ˆj ͉L ͘, through an application of the propaga- tion rules.The example shown here illustrates the problem: the two model space excitations ͗K m ͉a ˆ7 † a ˆ6 ͉L m ͘ and ͗K m ͉a ˆ8 Fig. 8͒.It is then straightforward to see that the only restrictions on the values of MxHole and MxElec are 0рMxHoleр2M ͑ RAS1 ͒, 0рMxElecр2M ͑ RAS3 ͒.

-
Do loop over reduced model ␣-string list L N ␣ m Ϫ2 M m Ϫ4 -Insert bits b w ,b x ,b y ,b z into reduced model string and form model excitation string(→ K ␣ m ,L ␣ m ) -If K ␣ m ,L ␣ m areinvalid, jump to loop end ͓this is to avoid double contributions; cf.Sec.II C͔ →sgn ␣

2 .
The 2e excitation "␣␤ part… contribution to the -vector -Do (parallel) loop over model space index quadruples w,x,y,z, ͓cf.Eq. ͑36͔͒ -Integral list ͕(i j͉kl) ͖ is already known from ␣␣ part.-Do loop over reduced model ␣-string list L N ␣ m Ϫ1 M m Ϫ2 -Insert bits b w ,b . 7. The construction of the RAS model string pair K ␣ m ,L ␣ m from a reduced model string, for the model space excitation a ˆ6␣ † a ˆ3␣ .The string K ␣ m is represented by the left walk of the loop. FIG

TABLE I .
The addresses of RAS1/RAS3 substrings for consecutive reduced strings are themselves consecutive numbers.This follows directly from the strict left-to-right ordering of strings on the corresponding unrestricted substring graph.

TABLE II .
Propagation of the variable bits from left to right through all bit-positions of a fixed reduced string ͑01011͒ results in a change of substring address only if the two interchanged bits, b a and b aϪ1 , differ.The number by which the address changes corresponds to a precomputable binomial coefficient.

TABLE V .
Number of double excitations x→w, z→y between RAS model orbital subspaces; analytically and numerically for different RAS2 orbital subspaces.͓Constructed using Tables III and IV and Eq.͑36͔͒.

TABLE VII .
The number of independent parallel tasks, defined as unique model space index quadruples, w,x,y,z, for various RAS2 orbital subspaces (MxHoleϭMxElecϭ2).

TABLE IX .
Results of preliminary RAS computations of the first two singlet excited states of indole.The 3-21G basis set was used for these computations.

TABLE VIII .
CASSCF and CAS-PT2 ͑Ref.19͒ results for indole.For the source of experimental results see the references in ͑Ref.19͒.