Select-divide-and-conquer method for large-scale configuration interaction

A select-divide-and-conquer variational method to approximate configuration interaction CI is presented. Given an orthonormal set made up of occupied orbitals Hartree-Fock or similar and suitable correlation orbitals natural or localized orbitals , a large N-electron target space S is split into subspaces S0 ,S1 ,S2 , . . . ,SR. S0, of dimension d0, contains all configurations K with attributes energy contributions, etc. above thresholds T0 T 0 ,T 0 ; the CI coefficients in S0 remain always free to vary. S1 accommodates Ks with attributes above T1 T0. An eigenproblem of dimension d0+d1 for S0+S1 is solved first, after which the last d1 rows and columns are contracted into a single row and column, thus freezing the last d1 CI coefficients hereinafter. The process is repeated with successive Sj j 2 chosen so that corresponding CI matrices fit random access memory RAM . Davidson’s eigensolver is used R times. The final energy eigenvalue lowest or excited one is always above the corresponding exact eigenvalue in S. Threshold values T j ; j=0,1 ,2 , . . . ,R regulate accuracy; for large-dimensional S, high accuracy requires S0+S1 to be solved outside RAM. From there on, however, usually a few Davidson iterations in RAM are needed for each step, so that Hamiltonian matrix-element evaluation becomes rate determining. One hartree accuracy is achieved for an eigenproblem of order 24 106, involving 1.2 1012 nonzero matrix elements, and 8.4 109 Slater determinants. © 2006 American Institute of Physics. DOI: 10.1063/1.2207621


I. INTRODUCTION
Quantum chemistry largely revolves around the development and application of methods to approximate Schrödinger's equation for stationary states, H⌿ = E ⌿ .͑1͒ Orbital basis set methods provide the most general approach 1 when the number of active electrons is not too large. 2After selection of a suitable orbital set, which is essential to obtain meaningful chemical or physical results, 3 the main question is what to do with a computationally intractable representation H of the Hamiltonian in the given orbital set, ][6][7][8][9][10] Within the CI approaches, highly correlated configuration interaction 11 ͑HCCI͒ methods, viz., CI going well beyond the singles and doubles treatment, constitutes a practical alternative to using a full Hamiltonian representation, particularly after the developments in the companion paper 12 to be referred to as I.
One possibility is to select a priori a subspace S having invariant properties with respect to separate unitary transformations of the occupied orbitals and of distinct sets of cor-relation orbitals grouped in a well-defined manner, 11 giving rise to multireference CI ͑MRCI͒, [13][14][15][16][17][18] complete active space ͑CAS͒ CI, 19 restricted active space ͑RAS͒ CI, 20 and so on. 21he corresponding CI-matrix eigenvalue problem is where the subindex was dropped in the understanding that the following also applies to excited states.Equation ͑3͒ is often solved by Davidson's method. 22,23Typically, an accuracy of 10 −8 a.u.for a ground state requires 40-60 iterations dominated by the evaluation of a vector , = Hv, ͑4͒ in terms of an approximate eigenvector v.In a basis of N-electron symmetry eigenfunctions or configuration-statefunctions ͑CSFs͒ the corresponding H s are given by lengthy formulas whose application may require hundreds of computer processor cycles, justifying to store them in disk memory after being evaluated only once.Thus in a CSF framework, the upper-limit size for evaluation of Eq. ͑4͒ is determined by the capacity of the disks employed and by the willingness to spend comparatively large amounts of time to retrieve H while the computer processors remain idle or are used by competing programs.Alternatively, for determinantal CI spaces, each nonzero H can be evaluated in a few cycles of computer time 11 so that the entire Hamiltonian matrix can be recalculated on-the-fly as needed for each iteration of Eq. ͑4͒ therefore overcoming the necessity to store H s in disk.This was, in fact, the reason behind the move from CSFs to determinants about two decades ago. 20,24oncurrently, and departing from invariant spaces, another possibility was advanced in the previous paper 12 where it was shown that a model space M can be split a priori into a selected space S harboring the energetically most important configurations, and a remainder whose energy effect ⌬E can be predicted with fair accuracy through variational-like estimates before embarking on an HCCI calculation.The corresponding energy E M is given by where ␦E is a residual error that can be estimated after some work. 12Whatever HCCI method is invoked, the energy of a stationary state can be approximated by the rhs of Eq. ͑5͒ which in turn demands the solution of ͑3͒.In order to obtain small and reliable ⌬E and ␦E values, however, it is still necessary to deal with S spaces which are too large to handle by modern HCCI. 11f the input-output bottleneck of the eigenproblem 25 is overcome, a return to CSFs has the following advantages: ͑i͒ the number of nonzero H s is significantly reduced, resulting in a corresponding reduction of the number of arithmetic operations to evaluate Eq. ͑4͒, ͑ii͒ by diminishing the size of vectors and v by several orders of magnitude, data localization is improved thus speeding up data transit between memory hierarchies, and ͑iii͒ excited states are readily and unequivocally identified.While retaining Davidson's or similar algorithms, 26 this paper presents a select-divide-andconquer variational method ͑SDC-CI͒ for the specific solution of the atomic or molecular CI-matrix eigenvalue problem with little or no resort to external storage devices.After introducing CI notation and a general strategy to select configurations in Sec.II, SDC-CI is formulated in Sec.III.An example and a general discussion are given in Sec.IV.

A. Motivation
Let S be a target space calling for a CI treatment.It is convenient that S be made up of CSFs.These are built up from a suitable orthonormal set of occupied orbitals.As before, H S is the representation of the Hamiltonian in S; Here we have rewritten Eq. ͑3͒ just to emphasize a new and necessary requirement ͓not present in Eq. ͑3͔͒.The use of natural orbitals 27,28 or localized orbitals 29 as correlation orbitals, in order to facilitate an a priori and quantitative selection and deletion of CSFs. 12Distinct from general eigenproblems, Eq. ͑6͒ is a very special eigenvalue problem in which various eigenvector components are related among themselves in a way which can be anticipated 12 before knowing its solution.This circumstance allows the subdivision of S into subspaces S 0 , S 1 , S 2 , ... ,S R each characterized by configurations with decreasing importance in their energy contributions.Other criteria to select S i subspaces will also be discussed.

B. CI notation and Brown's formula
A general HCCI wave function can be written as 30 In Eq. ͑7͒, K and g label configurations and degenerate elements, respectively, and C gK denotes a CI coefficient.F gK is a CSF expressed as a linear combination of Slater determinants D iK , where O͑⌫ , ␥͒ is a symmetric projection operator 31 for all appropriate symmetry operators ⌫ and a given irreducible representation ␥. 32 The calculations in this paper will use the full range of g K degenerate elements, although this is not necessary, in general. 30,33iven a CI wave function ⌿, the energy contribution ⌬E gK of F gK is defined as where ⌿͑−F gK ͒ denotes N͑⌿ − F gK C gK ͒ and N is a normalization factor, such that the wave function ⌿͑−F gK ͒ has the same remaining expansion coefficients as ⌿ except for renormalization.⌬E gK can be approximated by Brown's formula, 34 where E = ͗⌿͉H͉⌿͘.Equation ͑7͒ is now rewritten as in an obvious notation.In Eq. ͑12͒ and ͑13͒, Sign is the minus sign if the contribution of negative C gK coefficients to the sum of squares is larger than the one provided by the positive coefficients, and vice versa.Unsigned B K coefficients are also of interest. 12Similarly as ⌬E gK in Eq. ͑10͒, ⌬E K for expansion ͑11͒ is given by which is used just for estimating the truncation energy error.A thorough discussion of Eq. ͑14͒ together with predictive formulas for B K coefficients up to sextuply excited configurations is found in Paper I.

C. Energy threshold T egy
Brown's formula provides a useful criterion to select configurations K based on energy thresholds T egy , provided the coefficients B K can be predicted.This happens when K can be formed as a product of combinations of singly and doubly excited configurations, such as any triply or quadruply excited configuration in a MRCI singles and doubles out of a single reference configuration.Such configurations are called disconnected configurations. 12A subdivision of S into subspaces S j , ͑j =0,1,2, ... ,R͒ can be partially characterized by

D. Occupation number threshold T on
The rest of the configurations are called connected configurations.They are selected according to occupation number thresholds T on based on density matrix concepts.As already mentioned, the SDC-CI method may require the correlation orbitals a , b , c to be approximate natural orbitals, 27,28 viz., eigenfunctions a of the reduced first-order density matrix ␥͑1,1Ј͒, where the n a s are the eigenvalues or occupation numbers.Alternatively, in extended molecules, if localized orbitals 29 are used, the occupation numbers n a have to be replaced by effective occupation numbers n a eff , ͑18͒ For each q-excited configuration K the product P͑q , K͒ of corresponding occupation numbers is given by where K i represents either a correlation natural orbital or a localized orbital.The characterization of the subspaces S j , ͑j =0,1,2, ... ,R͒ is enriched by enforcing In ͑20͒ and ͑21͒, T j on may be a number or, more generally, a suitable function of some parameters, where F dh ജ 1 is a deep-hole factor 12 associated to holes involving inner electrons, and F con ജ 1 is a factor for connected configurations. 12The introduction of F dh and F con is to recognize the well characterized families of configurations that are comparatively less important for a given value of P͑q , K͒. 12 The parameter m is shown explicitly on the lefthand side ͑lhs͒ of ͑22͒ for later purposes.10 −m ͑F dh F con ͒ 1/4 may be interpreted as an average occupation number below which configurations involving that natural orbital are deleted from an original model space M.

E. Harmonic truncation threshold T har
A final subdivision of S into subspaces S j , ͑j =0,1,2, ... ,R͒ is specified by a harmonic truncation threshold T j har indicating that subspace S j is truncated in the orbital basis after a given harmonic ᐉ j , Resort to a fraction is dictated by the rule that first occurring subspaces have larger thresholds than subspaces further down the sequence

F. Selection strategy
A subspace S j consists of the following configurations: ͑i͒ All ͑connected and disconnected͒ triples with Here, the occurrence of disconnected triples is due to their carrying connected contributions in the many-body perturbation theory sense. 12ii͒ Of the remaining ͑disconnected͒ triples, those with Here it is assumed that disconnected quadruples have negligible connected contributions; otherwise, they must also enter in ͑iii͒.͑v͒ Quintuple-and higher-excited configurations can be selected according to ͑iii͒ and ͑iv͒.͑vi͒ Finally, for atoms, S j contains configurations with harmonics up to 1 / T j har .A similar threshold for molecules may be provided if warranted.

III. SELECT-DIVIDE-AND-CONQUER CI
In synthesis, the target space S is split into subspaces S 0 , S 1 , S 2 , ... ,S R each characterized by configurations ͕⌽ ij ; i =1,2, ... ,d j ; j =0,1, ... ,R͖ above thresholds ͕T j ϵ T j egy , T j on , T j har ; j =0,1, ... ,R͖: S 0 ϵ ͕⌽ i0 ;i = 1,2, ... ,d 0 ͖, T 0 egy ,T 0 on ,T 0 har , ͑25a͒ Successive S j ͑j ജ 2͒ are chosen so that corresponding CI matrices fit random access memory ͑RAM͒.We start by solving an eigenproblem of dimension d 0 + d 1 in S 0 + S 1 , after which the last d 1 rows and columns are contracted into a single row and column, giving where subspace S 1 was contracted into a single dimensional space c 1 ; of course, E ͑0+1͒ in ͑27͒ is equal to E ͑0+c 1 ͒ in ͑30͒ except for roundoff errors.This amounts to freeze the last d 1 CI coefficients hereinafter.Upon completion of this first step, the resulting matrix of dimension d 0 + 1 is enlarged by adding d 2 rows and columns associated to S 2 , and the process is repeated, H S 0 +s r +S r C ͑0+s r +r͒ = E ͑0+s r +r͒ C ͑0+s r +r͒ , ͑32a͒ until incorporation of subspace S R exhausts the target space S. Each step yields increasingly accurate eigenvalues E ͑0+s r +r͒ converging from above, and wave functions ⌿ ͑r͒ ,

,R͒. ͑33b͒
In practice, the value of R may attain several thousands.In the present computer code, d 0 + r −1+d r is required not to exceed 65 536, so that each index and in H can be stored in only two bytes ͑16 bits͒, a demand that can be lifted for large RAM memories.͑We actually use suitable offsets to extend the range of and values to a large extent past 65 536 while still using two bytes per index.͒Since H s with ഛ d 0 , ഛ d 0 , are present in all CI matrices, the CI coefficients C i0 ͑r͒ in S 0 always remain free to vary.Clearly, the final energy eigenvalue E ͑0+s R +R͒ ͑lowest or an excited one͒ is at or above its exact partner in S, where ␦⑀ is another residual error ͑with positive sign͒ that needs to be assessed.The first two terms of the sequence ͕T j ͖, T 0 , and T 1 , are the main threshold values regulating ␦⑀, as illustrated in the next section.

A. Application and accuracy
SDC-CI interferes with the rigorous solution of the eigenproblem ͑6͒ by freezing most of the linear variational coefficients after a first variational estimate on a relatively small subspace of CSFs.Only the linear variational coefficients of CSFs in S 0 are free to vary all the time.Both the energy and the wave function are affected by these variational constraints; we shall only consider energy effects, exhibited by ␦⑀ in Eq. ͑34͒.
As an example, we choose the CI matrix of the Ne ground state for one of the largest calculations of Paper I. The model space M is a full CISDTQ ͑CI singles, doubles, triples, and quadruples͒ in a 12s12p11d10f10g9h8i7k6l5m4n3o3q3r orbital basis spanning a CSF space of dimension 1.4ϫ 10 9 , and involving 1.1ϫ 10 12 distinct Slater determinants. 12This model space M is pruned before evaluation of symmetry eigenfunctions, as discussed in Paper I; the pruned subspace P is about one sixth the size of M, and carries a truncation energy error about 13 hartree ͑Ref.12͒ with respect to M, which is mentioned just to clarify the general context.
The target subspace S is obtained by truncating the pruned subspace P with the following thresholds: T R egy =10 −11 a.u., and T R on given by Eq. ͑22͒ with m = 6.5 ͑T on = F dh F con •3ϫ 10 −7 ͒.Thus, S is here obtained by the selected CI ͑SCI͒ method. 12As it turns out, subspace S harbours 859 903 configurations, 24.06ϫ 10 6 CSFs, 8.36ϫ 10 9 detors, and gives rise to 1.18ϫ 10 12 nonzero Hamiltonian matrix elements.The truncation energy error accumulated in going from the pruned space P into the selected space S is of no concern here but is carefully discussed in Paper I.
We now focus on assessing the uncertainty ␦⑀ in Eq.

͑34͒.
In Table I the SDC-CI energy E ͑0+s R +R͒ ͑fifth column͒ is shown as a function of T 1 on values ͑first column͒, and also as a function of the highest ᐉ value in the orbitals spanning the S 0 and S 1 subspaces ͑second column͒.The remaining perti- nent thresholds are fixed T 0 egy =10 −5 a.u., T 1 egy =10 −8 a.u., and T j egy =10 −11 a.u.͑j =2,3, ... ,R͒; the T j on s are given by Eq. ͑21͒ with m = 2.7 ͑j =0͒, and m = 6.5 ͑j =2,3, ... ,R͒, respectively.The third and fourth columns show the order d 0 + d 1 of the matrix to be treated outside RAM, and the corresponding energy eigenvalue, respectively.It is seen that achieving a low energy eigenvalue E 0+1 in S 0 + S 1 is not a requisite for an accurate energy E ͑0+s R +R͒ .
As thresholds are tightened, the evolution of the energy E ͑0+s R +R͒ towards the exact result E S ͑usually unknown and unattainable with contemporary workstations͒, is from above, of course.Monotonic behavior is generally obtained for fixed values of T 0 har and T 1 har .The most favorable results, viz., those with lowest and most rapidly convergent energies, occur for T har =1/ᐉ =1/4, and are used to push down T on hoping to reach adequate convergence, as shown in lines 2-5 of Table I.These results suggest an accuracy better than 1 hartree ͑␦⑀ =1 hartree͒, which is also supported by smooth energy convergence patterns in Paper I, at the onetenth of 1 hartree level.This is more precise than the few hartree typically used to assess convergence in full CI benchmarks. 35he occurence of energy oscillations with decreasing values of T har shows that significant interactions are being frozen at various steps, thus imposing constraints which impair the full extent of the variational procedure to the degree shown in the last column of Table I.The problem is finding an appropriate way to select a priori those sets of configurations that must be taken together in order to achieve a desired accuracy.Our approach here has been circumscribed to vary T 1 on and T 1 har thresholds after reaching convergence with the less sensitive thresholds T 1 egy and T 0 .Another obvious criterion ͑not considered͒ is to limit natural orbitals in S 1 to those with occupation numbers above some threshold.At any rate, there is still much to be learnt about accuracy control in SCI-CI.
The studies of Table I were repeated by changing the occupied orbitals from Hartree-Fock to Brueckner ones, 36,37 viz., orbitals for which the coefficients of singly excited configurations are zero.Brueckner orbitals were approximated in the framework of a CISD using a general method, 38 valid for the ground state, and also for excited states.Unfortunately, the results were similar to those shown in Table I, the energy converging 24 hartree above the one obtained with the Hartree-Fock orbitals, indicating a higher energy limit for CISDTQ of Ne ground state when calculated in terms of Brueckner orbitals instead of the Hartree-Fock orbitals.

B. Further thoughts
Since Davidson's eigensolver is used R times, the usual threshold for energy convergence in Davidson's algorithm must be divided by R, viz., for an accuracy of 10 −8 a.u. and R = 1000, each Davidson application must be required to converge to within 10 −11 a.u.
High accuracy may require a large value of d 0 + d 1 demanding the eigenproblem in S 0 + S 1 to be solved outside RAM.That, however, only affects the first iteration.From there on, all work proceeds in RAM.Also, in general, just a few Davidson iterations are needed for each step, even with such tight energy convergence threshold as 10 −11 a.u.for each Davidson application, since the most important energy contributions and eigenvector components are already in S 0 and S 1 .Moreover, because of the relatively small dimension of the vectors and v in ͑4͒, the data set is considerably more localized.Profiles of program execution running 13 h entirely in RAM ͑Ref.12͒ show that about 8% of the time is spent on Eq. ͑4͒ ͑3% on code in RAM and 5% on code using disk͒, the rest being used mainly in the evaluation of H s.

C. Comparison with previous work and outlook
The idea to freeze CI coefficients is not a new one.Back in 1969, one of the authors ͑Bunge͒ was asked 39 whether it would be sensible to use the configurational expansion Eq. ͑7͒ in small chuncks, then lock C gK coefficients ͑g =1,2, ... ,g K ͒ by Eq. ͑12͒, and finally use the corresponding configurations in the less general Eq.͑11͒.Formally, after some generalization, the idea can be embodied in Eqs.͑27͒-͑33͒.In retrospective, had Schaefer's suggestion been seriously pursued, the ingredients of SCI to achieve acceptable accuracy might have appeared shortly afterwards, probably following Kutzlenigg's review on electron-pair theories 40 and Shavitt's review on CI ͑Ref.41͒ in consecutive chapters of Schaefer's volume 3 of Modern Theoretical Chemistry.More recently, the other author ͑Carbó-Dorca͒ played with the idea of avoiding the eigenproblem altogether; 42 that idea was also short of providing the necessary accuracy.These two ideas, however, met with SCI needs and SCI possibilities, giving way to SDC-CI.
In an effort to lower the size of MRCI expansions, internally contracted MRCI was proposed. 43Meyer's and similar approaches 11 using perturbation theory estimates of various CI coefficients are the nearest relatives of SDC-CI.The latter is more accurate at the expense of having to evaluate all pertinent matrix elements.We encourage young workers to apply perturbation theory 44,45 to SDC-CI in an effort to search for a middle ground between our's and Meyer's ideas, where not all matrix elements would be necessary.
Other investigations are also warranted: ͑i͒ to reduce ␦⑀ in Eq. ͑34͒, ͑ii͒ to lessen the computational time to evaluate H s, and ͑iii͒ to run simultaneously on many processors.With regard to ͑i͒, more flexible selections of subspaces have to be examined.Wholesale evaluation of H s, on the other hand, is open to several theoretical alternatives: for atoms, 33,46 and for molecules, 11,33 and a wealth of computer implementations yet to be explored.

D. Conclusions
Typical H S matrices encountered in SCI need tens or hundreds of terabytes of disk to store expensive-to-evaluate H s, and considerable disk retrieval performance for each Davidson iteration. 47,48A general eigensolver has to face the full implications of such disk-read bottleneck, 25,48 and pressing demands through memory hierarchies to access widely scattered elements of the vectors and v in the evaluation of ͑4͒.The computationally tractable SDC-CI method, on the other hand, overcomes the input-output bottleneck and part of the scattered data problem, and can be applied to a target space S provided the orbital set is made up of Hartree-Fock ͑or Brueckner͒ occupied orbitals, plus approximate natural orbitals ͑or localized orbitals͒.The price paid by SDC-CI is a small loss of accuracy relative to general algorithms 22,23 when both methods can be used.This small loss of accuracy, however, is more than compensated by the SDC-CI capability to deal with CI matrices having trillions of expensive-toevaluate nonzero matrix elements between CSFs, 12 much larger than ever attempted by other methods on a single processor, and comparable with a recent calculation on 432 processors. 48CI ͑Ref.12͒ and SDC-CI are intimately intertwined: without SDC-CI the accuracy of SCI would fall short by several orders of magnitude, and without SCI, SDC-CI cannot even be formulated.For this reason, we believe that all HCCI methods eventually making use of SDC-CI will ultimately be assimilated into an SCI framework.The reverse, however, is not true, selection of configurations finds application in many other methods that do not require a variational solution. 12

TABLE I .
Convergence of the Ne ground state energy with T 1 on and T 1