NWChem: Past, Present, and Future

Specialized computational chemistry packages have permanently reshaped the landscape of chemical and materials science by providing tools to support and guide experimental efforts and for the prediction of atomistic and electronic properties. In this regard, electronic structure packages have played a special role by using first-principledriven methodologies to model complex chemical and materials processes. Over the last few decades, the rapid development of computing technologies and the tremendous increase in computational power have offered a unique chance to study complex transformations using sophisticated and predictive many-body techniques that describe correlated behavior of electrons in molecular and condensed phase systems at different levels of theory. In enabling these simulations, novel parallel algorithms have been able to take advantage of computational resources to address the polynomial scaling of electronic structure methods. In this paper, we briefly review the NWChem computational chemistry suite, including its history, design principles, parallel tools, current capabilities, outreach and outlook.


I. INTRODUCTION
The NorthWest Chemistry (NWChem) modeling software is a popular computational chemistry package that has been designed and developed to work efficiently on massively parallel processing supercomputers [1][2][3] . It contains an umbrella of modules that can be used to tackle most electronic structure theory calculations being carried out today. Since 2010, the code is distributed as open-source under the terms of the Educational Community License version 2.0 (ECL 2.0).
Electronic structure theory provides a foundation for our understanding of chemical transformations and processes in complex chemical environments. For this reason, accurate electronic structure formulations have already permeated several key areas of chemistry, biology, biochemistry, and materials sciences, where they have become indispensable elements for building synergies between theoretical and experimental efforts and for predictions. Over the last few decades, intense theoretical developments have resulted in a broad array of electronic structure methods and their implementations, designed to describe structures, interactions, chemical reactivity, dynamics, thermodynamics, and spectral properties of molecular and material systems. The success of these computational tools hinges upon several requirements regarding the accuracy of many-body models, reliable algorithms for dealing with processes at various spatial and temporal scales, and effective utilization of ever-growing computational resources. For instance, a) Electronic mail: karol.kowalski@pnnl.gov b) Current Affiliation: Schrödinger, Inc, New York, NY 10036, USA c) Current Affiliation: Max Planck Institute für Kohlenforschung, 45470 Mülheim an der Ruhr, Germany d) Current Affiliation: Gaussian Inc., Wallingford, CT 06492, USA the predictive power of computational chemistry requires sophisticated quantum mechanical approaches that systematically account for electronic correlation effects. Therefore, the design of versatile electronic structure codes is a major undertaking that requires close collaboration between experts in theoretical and computational chemistry, applied mathematics, and computer science.
NWChem [2][3][4][5][6][7][8] , like other widely used electronic structure programs, was developed to fully realize the potential of computational modeling to answer key scientific questions. It provides a wide range of capabilities that can be deployed on supercomputing platforms to solve two fundamental equations of quantum mechanics 9-11 -time-independent and time-dependent Schrödinger equations: ih and a fundamental equation of Newtonian mechanics where forces F i include information about quantum effects. Given the breadth of electronic structure theory, it does not come as a surprise that equations (1)-(2) can be solved using various representations of quantum mechanics employing wavefunctions (|Ψ ), electron densities (ρ( r)), or self-energies (Σ(ω)), which comprise the wide spectrum of NWChem's functionalities to compute the electronic wavefunctions, densities, and associated properties of molecular and periodic systems. These functionalities include Hartree-Fock [12][13][14][15] self-consistent field (SCF) and post-SCF correlated many-body approaches that build on the SCF wavefunction to tackle static and dynamic correlation effects. Among correlated approaches, NWChem offers second-order Møller-Plesset perturbation theory; single-and multi-reference, ground-and excited-state, and linearresponse coupled-cluster (CC) theories; multi-configuration self-consistent field (MCSCF); and selected and full configuration interaction (CI) codes. NWChem provides extensive density functional theory [16][17][18] (DFT) capabilities with Gaussian and plane-wave basis set implementations. Within the Gaussian basis set framework, a broad range of DFT response properties, ground and excited-state molecular dynamics, linear-response (LR) and real-time (RT) timedependent density functional theory (TDDFT) are available. The plane-wave DFT implementations offer the capability to run scalable ab initio and Car-Parrinello molecular dynamics 19 , and band-structure simulations. The plane-wave code supports both norm-conserving [20][21][22] and projector augmented wave (PAW) 23 pseudopotentials.
For all DFT methods outlined above, both analytical or numerical gradients and Hessians are available for geometry optimization and vibrational analysis. Additionally, NWChem is capable of performing classical molecular dynamics (MD) simulations using either AMBER or CHARMM force fields. Through its modular design, the ab initio methods can be coupled with the classical MD to perform mixed quantum mechanics and molecular mechanics simulations (QM/MM). Various solvent models and relativistic approaches are also available, with the spin-orbit contribution being only supported at the Hartree-Fock (HF) and DFT levels of theory and associated response properties. The NWChem functionality described is only a subset of its full capabilities. We refer the reader to the NWChem website 8 to learn about the full suite of functionalities available to the user community.
Currently, NWChem is developed and maintained primarily by researchers at the Department of Energy (DOE) Pacific Northwest National Laboratory (PNNL), with help from researchers at other research institutions. It has a broad user base, and it is being used across the national laboratory system and throughout academia and industry around the world. In this paper, we provide a high-level overview of NWChem's core capabilities, recent developments in electronic methods, and a short discussion of ongoing and future efforts. We also illustrate the strengths of NWChem stemming from the possibility of seamless integration of methodologies at various scales and review scientific results that would not otherwise be obtainable without using its highly-scalable implementations of electronic structure methods.

II. BRIEF HISTORY
The NWChem project [1][2][3][4][5][6][7]24,25 started in 1992. It was originally designed and implemented as part of the construction project associated with the EMSL user facility at PNNL. Therefore, the software project started around four years before the EMSL computing center was up and running. This raised challenges for the software developers working on the project, such as predicting the features of future hardware architectures and how to deliver high performing software, while maintaining programmer productivity. Overcoming these challenges led to a design effort that strove for flexibility and extensibility, as well as high-level interfaces to functionality that hid some of the hardware issues from the chemistry software application developer. Over the years, this design and implementation have successfully advanced multiple science agendas, and NWChem's extensive code base of more than 2 million lines provides high-performance, scalable software code with advanced scientific capabilities that are used throughout the molecular sciences community.
NWChem is an example of a co-design effort harnessing the expertise of researchers from multiple scientific disciplines to provide users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently and in their use of computing resources from high-performance parallel supercomputers to conventional workstation clusters. In particular, NWChem has been designed to handle • biomolecules, nanostructures, interfaces, and solidstate, • chemical processes in complex environments, • hybrid quantum/classical simulations, • ground and excited-states and non-linear optical properties, • simulations of UV-Vis, photo-electron, X-ray spectroscopies, • Gaussian basis functions or plane-waves, • ab-initio molecular dynamics on the ground and excited states, • relativistic effects.
The scalability of NWChem has provided a computational platform to deliver new scientific results that would be unobtainable if parallel computational platforms were not used. For example, NWChem's implementation of a non-orthogonally spin adapted CCSD(T) method has been demonstrated to scale to 210,000 processors available at the Oak Ridge National Laboratory's (ORNL) Leadership Computing Facilities, [26][27][28] whereas the plane-wave DFT code has been able to utilize close to 100,000 processor cores on NERSC's Cray-XE6 supercomputer. 29 Although implemented only for the perturbative part of coupled-cluster with singles and doubles (CCSD) 30 and triples correction (CCSD(T)), 31 NWChem was one of the first computational chemistry codes to have been ported to utilize graphics processing units (GPUs). 32 Several parts of the code have also been rewritten to take advantage of the Intel Xeon Phi family of processors -good scalability and performance have been demonstrated for the ab initio molecular dynamics plane-wave DFT code on the most recent Knights Landing version of the processor. 33,34 The non-iterative triples part of the CCSD(T) method has been demonstrated to scale to 55,200 Intel Phi threads and 62,560 cores through concurrent utilization of CPU and Intel Xeon Phi Knights Corner accelerators. 35

III. DESIGN PRINCIPLES
NWChem has a five-tiered modular architecture. The first tier is the Generic Task Interface. This interface (an abstract programming interface, not a user interface) serves as the mechanism that transfers control to the different modules in the second tier, which consists of the Molecular Calculation Modules. The molecular calculation modules are the high-level programming modules that accomplish computational tasks, performing particular operations using the specified theories defined by the user in the input file. These independent modules of NWChem share data only through a disk-resident database, which allows modules to share data or to share access to files containing data. The third tier consists of the Molecular Modeling Tools. These routines provide basic chemical functionality such as symmetry, basis sets, grids, geometry, and integrals. The fourth tier is the Software Development Toolkit, which is the basic foundation of the code. The fifth tier provides the Utility Functions needed by nearly all modules in the code. These include such functionality as input processing, output processing, and timing. [topsep=0pt, partopsep=0pt] The Generic Task Interface controls the execution of NWChem. The flow of control proceeds in the following steps: 1. Identify and open the input file. 2. Complete the initialization of the parallel environment. 3. Process start-up directives. 4. Summarize start-up information and write it to the output file. 5. Open the run-time database. 6. Process the input sequentially (ignoring start-up directives), including the first task directive. 7. Execute the task. 8. Repeat steps 6 and 7 until reaching the end of the input file or encountering a fatal error condition.
The input parser processes the user's input file and translates the information into a form meaningful to the main program and the driver routines for specific tasks.
As mentioned in step 5 of the task flow control, NWChem makes use of a run-time database to store the main computational parameters. This is in the same spirit of check-pointing features available in other quantum chemistry codes. The information stored in the run-time database can be used at a later time in order to restart a calculation. Restart capabilities are available for most modules. For example, SCF generated files (run-time database and molecular orbitals) can be used either to continue a geometry optimization or to compute molecular properties. The important second and fourth tiers are discussed as part of the subsequent sections.

IV. PARALLEL TOOLS
The design and early development of Global Arrays [36][37][38][39] (GA) toolkit occurred in the same period when the NWChem project started. The GA toolkit, which is the central component of the Software Development Toolkit, was adopted by the NWChem developers as the main approach for the parallelization of the dense matrices present in quantum chemistry methods that make use of local basis functions. In current computer science parlance, Global Arrays can be viewed as a Partitioned Global Address Space (PGAS) model that provides a high level of abstraction for the programmer to the dense distributed arrays. In contrast to message passing constructs such as MPI, where the developer has to worry about coordinating send and receive operations, the use of Global Arrays in NWChem requires so-called single-sided functions (e.g. put, get, accumulate) to manipulate data structures in a single operation. The choice of distribution model for sharing a given global array among the memory available to the processes in use plays a crucial role in efficient parallelization at large scale.
The GA toolkit has been ported to a variety of parallel computer architectures. The porting process has focused in the past in optimizing the ARMCI 40 library. The Aggregate Remote Memory Copy (ARMCI) library optimizes performance by fully exploiting network characteristics such as latency, bandwidth, and packet injection rate through the use of low-level network protocols (e.g. Infiniband Verbs). More recent porting options make use either of ComEx 41 or of the ARMCI-MPI 42 communication runtimes. Both ComEx and ARMCI-MPI make use of MPI libraries, instead of low-level network protocols, albeit with different approaches.

V. MAIN METHODOLOGIES
In this section, we describe the key methods that comprise the Molecular Calculation Modules. We first describe the Gaussian basis HF and DFT implementations for molecular systems. This is followed by the post-SCF wavefunction-based perturbative (MP2), multiconfiguration SCF, and high-accuracy (coupled-cluster theory) approaches for molecules, including the tensor contraction engine (TCE). Molecular response properties and relativistic approaches are then described. The plane-wave based DFT implementation for Car-Parrinello molecular dynamics and periodic condensed phase systems is described next, followed by classical molecular dynamics and hybrid methods.
The most expensive part to compute in the SCF code is the two-electron contribution to the matrix element of the Fock operator (resulting from the sum of Coulomb and Exchange operators). To compute these matrix elements, NWChem developers have implemented parallel algorithms using either a distributed data approach 44 (where the Fock matrix is distributed among the aggregate memory of the processes involved in the calculation) or a replicated data approach (where an entire copy of the Fock matrix is stored in memory of each process).
Several options are available for the initial guess of the SCF calculations. The default choice uses the eigenvectors of a Fock-like matrix formed from a superposition of the atomic densities. Other options include the use of eigenvectors of the bare-nucleus Hamiltonian or the one-electron Hamiltonian, the projections of molecular orbital from a smaller basis to a larger one, or molecular orbitals formed by superimposing the orbitals of fragments of the molecule being studied. Symmetry can be used to speed up the Fock matrix construction via the petite-list algorithm. Molecular orbitals are symmetry adapted as well in NWChem. The resolution of the identity (RI) four-center, two-electron integral approximation has also been implemented. 45 In order to avoid full matrix diagonalization, the SCF program uses a preconditioned conjugate gradient (PCG) method that is unconditionally convergent. Basically, a search direction is generated by multiplying the orbital gradient (the derivative of the energy with respect to the orbital rotations) by an approximation to the inverse of the level-shifted orbital Hessian. In the initial iterations, an inexpensive one-electron approximation to the inverse orbital Hessian is used. Closer to convergence, the full orbital Hessian is used, which should provide quadratic convergence. For both the full or one-electron orbital Hessians, the inverse-Hessian matrix-vector product is formed iteratively. Subsequently, an approximate line search is performed along the new search direction.
Both all-electron basis sets and effective core potentials (ECPs) can be used. Effective core potentials are a useful means of replacing the core electrons in a calculation with an effective potential, thereby eliminating the need for the core basis functions, which usually require a large set of Gaussians to describe them. In addition to replacing the core, they may be used to represent relativistic effects, which will be discussed later.

B. Density Functional Theory
The NWChem DFT module for molecular systems uses a Gaussian basis set to compute closed-and open-shell densities and Kohn-Sham orbitals in the local density approximation (LDA), generalized gradient approximation (GGA), τ-dependent and Laplacian-dependent meta-generalized gradient approximation (metaGGA), any combination of local and non-local approximations (including exact exchange and range-separated exchange), and asymptotically corrected exchange-correlation potentials. NWChem contains energy-gradient implementations of most exchangecorrelation functionals available in the literature, including a flexible framework to combine different functionals. However, second derivatives are not supported for metafunctionals and third derivatives are supported only for a selected set of functionals. For a detailed description, we refer the reader to the online documentation 46 .
The DFT module reuses elements of the Gaussian basis SCF module for the evaluation of the Hartree-Fock exchange and of the Coulomb matrices by using 4-index 2-electron electron repulsion integrals; the formal scaling of the DFT computation can be reduced by choosing to use auxiliary Gaussian basis sets to fit the charge density 47 and use 3-index 2-electron integrals, instead.
The DFT module supports both a distributed data approach and a mirrored arrays 48 approach for the evaluation of the exchange-correlation potential and energy. The mirrored arrays option, used by default, allows the calculation to hide network communication overhead by replicating the data between processes belonging to the same network node.
In analogy with what is available in the SCF module, the DFT module can perform restricted closed-shell, unrestricted open-shell, and restricted open-shell calculations. However, in contrast to the SCF module that uses PCG to solve the SCF equation, the DFT module implements diagonalization with parallel eigensolvers. 49-54 DIIS (direct inversion in the iterative subspace or direct inversion of the iterative subspace) 55 , level-shifting 56,57 and density matrix damping can be used to accelerate the convergence of the iterative SCF process. Another technique that can be used to help SCF convergence makes use of electronic smearing of the molecular orbital occupations, by using a gaussian broadening function following the prescription of Warren and Dunlap 58 . Additionally, calculations with fractional numbers of electrons can be performed to analyze the behavior of exchange-correlation functionals and their impact on molecular excited states and response properties. [59][60][61][62][63][64][65][66] The Perdew and Zunger 67 method to remove the selfinteraction contained in many exchange-correlation functionals has been implemented 68 within the Optimized Effective Potential (OEP) method 69,70 and within the Krieger-Li-Iafrate (KLI) approximation. 71,72 The asymptotic region of the exchange-correlation potential can be modified by the van-Leeuwen-Baerends exchange-correlation potential that has the correct − 1 r asymptotic behavior. The total energy is then computed using the definition of the exchange-correlation functional. This scheme is known to tend to over-correct the deficiency of most uncorrected exchange-correlation potentials 73,74 and can improve TDDFT-based excitation calculations, but it is not variational. A variationally consistent approach to address this issue is via range-separated exchangecorrelation functionals and the recently developed nearly correct asymptotic potential or NCAP 75 , which are implemented in NWChem.
To describe dispersion interactions, both the exchangehole dipole moment dispersion model (XDM) 76 and Grimme's DFT-D3 dispersion correction (both zerodamped and BJ-damped variants) for DFT functionals 77,78 are available. In many cases, one can obtain reasonably accurate non-covalent interaction energies at van der Waals distances with meta-functionals in NWChem even without adding extra dispersion terms. 79 Numerical integration is necessary for the evaluation of the exchange-correlation contribution to the density functional when Gaussian basis functions are used. The threedimensional molecular integration problem is reduced to a sum of atomic integrations by using the approach first proposed by Becke 80 . NWChem implements a modification of the Stratmann algorithm 81 , where the polynomial partition function w A (r) is replaced by a modified error function erfn (where n can be 1 or 2).
The default quadrature used for the atomic centered numerical integration is an Euler-MacLaurin scheme for the radial components (with a modified Mura-Knowles 82 transformation) and a Lebedev 83 scheme for the angular components.
On top of the petite-list symmetry algorithm used in the same fashion as in the SCF module, the evaluation of the exchange-correlation kernel incurs additional time savings when the molecular symmetry is a subset of the O h point group, exploiting the octahedral symmetry of the Lebedev angular grid.
NWChem also has an implementation of a variational treatment of the one-electron spin-orbit operator within the DFT framework. Calculations can be performed either with an all-electron relativistic approach (for example, ZORA) or with an ECP and a matching spin-orbit (SO) potential.

Time-Dependent Density Functional Theory
a. Linear-Response Time-Dependent Density Functional Theory: NWChem supports a spectrum of single excitation theories for vertical excitation energy calculations, namely, configuration interaction singles (CIS) 92 , time-dependent Hartree-Fock (TDHF or also known as random-phase approximation RPA), time-dependent density functional theory (TDDFT) [93][94][95] , and Tamm-Dancoff approximation 96 to TDDFT. These methods are implemented in a single framework that invokes Davidson's trial vector algorithm (or its modification for a non-Hermitian eigenvalue problem). An efficient special symmetric Lanczos algorithm and kernel polynomial method has also been implemented. 97 In addition to valence vertical excitation energies, corelevel excitations 98 and emission spectra 99,100 can also be computed. Analytical first derivatives of vertical excitation energies with a selected set of exchange-correlation functionals can also be computed, 101 which allows excited-state optimizations and dynamics. Origin-independent optical rotation and rotatory strength tensors can also be calculated with the LR-TDDFT module within the gauge including atomic orbital (GIAO) basis formulation. 62,[102][103][104] Extensions to compute excited-state couplings are currently underway and will be available in a future release.
b. Real-Time Time-Dependent Density Functional Theory: Real-time time-dependent density functional theory (RT-TDDFT) is a DFT-based approach to electronic excited states based on integrating the time-dependent Kohn-Sham (TDKS) equations in time. The theoretical underpinnings, strengths, and limitations are similar to traditional linear-response (LR) TDDFT methods, but instead of a frequency domain solution to the TDKS equations, RT-TDDFT yields a full time-resolved, potentially non-linear solution. Real-time simulations can be used to compute not only spectroscopic properties (e.g., ground and excitedstate absorption spectra, polarizabilities, etc.) 98,105-108 , but also the time and space-resolved electronic response to arbitrary external stimuli (e.g., electron charge dynamics after laser excitation) 105,109 and non-linear spectroscopies. 110,111 RT-TDDFT has the potential to be efficient for computing spectra in systems with a high density of states 112 as, in principle, an entire absorption spectrum can be computed from only one dynamics simulation.
This functionality is developed on the Gaussian basis set DFT module for both restricted and unrestricted calculations and can be run with essentially any combination of basis set and exchange-correlation functional in NWChem. A number of time propagation algorithms have been implemented 113 within this module, with the default being the Magnus propagator. 114 Unlike LR-TDDFT, which requires second derivatives, RT-TDDFT can be used with all the functionals since only first derivatives are needed for the propagation. The current RT-TDDFT implementation assumes frozen nuclei and no dissipation.

Ab Initio Molecular Dynamics
This module leverages the Gaussian basis set methods to allow for seamless molecular dynamics of molecular systems. The nuclei are treated as classical point particles and their motion is integrated via the velocity Verlet algorithm. 115,116 In addition to being able to perform simulations in the microcanonical ensemble, we have implemented several thermostats to control the kinetic energy of the nuclei. These include the stochastic velocity rescaling approach of Bussi, Donadio, and Parrinello 117 , Langevin dynamics according to the implementation of Bussi and Parrinello 118 , the Berendsen thermostat 119 , and simple velocity rescaling.
The potential energy surface upon which the nuclei move can be provided by any level of theory implemented within NWChem, including DFT, TDDFT, MP2, and the correlated wavefunction methods in the TCE module. If analytical gradients are implemented for the specified method, these are automatically used. Numerical gradients will be used in the event that analytical gradients are not available at the requested level of theory. This module has been used to demonstrate how the molecular dynamics based determination of vibrational properties can complement those determined through normal mode analysis, therefore allowing to achieve a deeper understanding of complex dynamics and to help interpret complex experimental signatures. 120 Extensions to include non-adiabatic dynamics have been implemented in a development version and will be available in a future release.

C. Wavefunction Formulations
The wavefunction-based methods play a special role in all electronic structure packages. Their strengths originate in the possibility of introducing, using either various orders of perturbation theory or equivalently through the linked cluster theorem (for example, see Refs. 121 and 122) various ranks of excitations, a systematic hierarchy of electron correlation effects. NWChem offers implementations of several correlated wavefunction approaches including many-body perturbation theory approaches and coupledcluster methods.  125 uses the RI approximation and is, therefore, only exact in the limit of a complete fitting basis. However, with some care, high accuracy may be obtained with relatively modest fitting basis sets. An RI-MP2 calculation can cost over 40 times less than the corresponding exact MP2 calculation. RHF and UHF references with only energies are available.

Multi-configurational Self-Consistent Field (MCSCF)
A large-scale parallel multi-configurational selfconsistent field (MCSCF) method has been developed in NWChem by integration of the serial LUCIA program of Olsen 126,127 . The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of nearly equal batches for parallel distribution. This implementation allows the execution of complete active space self-consistent field (CASSCF) calculations with non-conventional active spaces. An unprecedented CI step for an expansion composed of almost one trillion Slater determinants has been reported 127 .

Coupled-Cluster Theory
The coupled-cluster module of NWChem contains two classes of implementations (a) parallel implementation of the CCSD(T) formalism 31 for closed-shell systems, and (b) a wide array of CC formalisms for arbitrary reference functions. The latter class of implementations automatically generated by Tensor Contraction Engine 128,129 is an example of a successful co-design effort.
a. Closed-Shell CCSD(T): The coupled-cluster method was introduced to chemistry byČížek 130 (see also Ref. 131), and is a post-Hartree-Fock electron correlation method. Development of the canonical coupled-cluster code in NWChem commenced in 1995 under a collaboration with Cray Inc to develop a massively parallel coupled-cluster program designed to run on a Cray T3E. Full details of the implementation are given in Kobayashi and Rendell 132 .
The coupled-cluster wavefunction is written as an exponential of excitation operators acting on the Hartree-Fock reference: where T = T 1 + T 2 + ... is a cluster operator represented as a sum of its many-body components, i.e., singles T 1 , doubles T 2 , etc. and |Φ is the so-called reference function (usually chosen as a reference determinant). In practical applications the above sum is truncated at some excitation rank. For example, the CCSD method 30 is defined by including singles and doubles, i.e., T T 1 + T 2 . Introducing the exponential ansatz (4) into the Schrödinger equation, premultiplying both sides by e −T , using the Hausdorff formula, and projecting onto the subspace of excitation functions, gives a set of coupled non-linear equations that are solved iteratively to yield the coupled-cluster energy and amplitudes. For example, for the CCSD formulation one obtains where H N is the electronic Hamiltonian in normal product form (H N = H − Φ|H|Φ ), subscript C represents a connected part of a given operator expression, and ∆E CCSD is CCSD correlation energy. The closed-shell CCSD implementation employs the optimized form of the CC equations discussed by Scuseria et al. 133 as was programmed in the TITAN program 134 . The nature of the Cray T3E hardware required significant rewriting of earlier coupled-cluster algorithms to take into account the limited memory available per core (8 MW) and the prohibitive penalty of I/O operations. Of the various four indexed quantities, those with four occupied indices were replicated in local memory (i.e. the memory associated with a single core), and those with one or two virtual indices were distributed across the global memory of the machine (i.e. the sum of the memory of all the processors), and accessed in computational batches. The terms involving integrals with three and four virtual orbital indices still proved too costly for the available memory and to circumvent this problem, these terms were evaluated in a "direct" fashion. This structure distinguishes NWChem from most other coupled-cluster programs. Thus, to make effective use of the available memory, as much as possible should be allocated, by using global arrays, with the bare minimum for the arrays replicated in local memory. The canonical CCSD implementation in NWChem also contains the perturbative triples correction, denoted (T), of Raghavachari et al. 31 . This correction is an estimate from Møller-Plesset perturbation theory 123 and evaluates the triples contribution to MP4 using the optimized cluster amplitudes at the end of a CCSD calculation. The CCSD(T) method is commonly referred to as the gold standard for ab initio electronic structure theory calculations. Its computational cost scales as n 7 , making it considerably more expensive than a CCSD calculation. However, the triples are non-iterative and only require two-electron integrals with at most three virtual orbital indices, hence avoiding the previous memory and I/O issues and so the correction was easily adapted from the "aijkbc algorithm" of an earlier work by Rendell et al 135 .
In recent years, a great deal of effort was invested to enhance the performance of the iterative and non-iterative parts of the CCSD(T) workflow. Performance tuning of the iterative part resulted in scaling the code up to 223,200 processors of the ORNL Jaguar computer. 26,136 Significant speedups for the CCSD iterative part were achieved by introducing efficient optimization techniques to alleviate the communication bottlenecks caused by a copious amount of communication requests introduced by a large class of low-dimensionality tensor contractions. This optimization provided a significant 2-to 5-fold performance increase in the CCSD iteration time depending on the problem size and available memory, and improved the CCSD scaling to 20,000 nodes of the NCSA Blue Waters supercomputer 137 .
b. Tensor Contraction Engine and High-Accuracy Formulations: NWChem implements a large number of highrank electron-correlation methods for the ground, excited, and electron-detached/attached states as well as for molecular properties. The underlying ansatzes span configuration interaction (CI), coupled-cluster (CC), many-body perturbation theories (MBPT), and various combinations thereof. A distinguishing feature of these implementations is their uniquely forward-looking development strategy. These parallel-executable codes, as well as their formulations and algorithms, were computer-generated by the symbolic algebra program 138 called Tensor Contraction Engine (TCE). 128 TCE was one of the first attempts to provide a scalable tensor library for parallel implementations of many-body methods, which extends the ideas of automatic CC code generation introduced by Janssen and Schaefer, 139 Li and Paldus, 140 and Nooijen and co-workers. 141,142 The merits of such a symbolic system are many: (1) It expedites otherwise time-consuming and error-prone derivation and programming processes; (2) It facilitates parallelization and other laborious optimizations of the synthesized programs; (3) It enhances the portability, maintainability, extensibility, and thus the lifespan of the whole program module; (4) It enables new or higher-ranked methods to be implemented and tested rapidly which are practically impossible to write manually. TCE is, therefore, one of the earliest examples 139 of an expert system that lifts the burden of derivation/programming labor so that computational chemists can focus on imagining new ansatz-a development paradigm embraced quickly by other chemistry software developers. [143][144][145] The working equations of an ab initio electroncorrelation method are written with sums-of-products of matrices, whose elements are integrals of operators in the Slater determinants. For many methods, the matrices have the general form: 146 where Φ i is the whole set of the i-electron excited (or electron-detached/attached) Slater determinants,Ĥ is the Hamiltonian operator,T k is a k-electron excitation operator,R l is an l-electron excitation (or electron detachment/attachment) operator, andL † j is a j-electron deexcitation (or electron detachment/attachment) operator. Subscript 'C/L' means that the operators can be required to be connected and/or linked diagrammatically. For example, the so-called T 2 -amplitude equation of coupled-cluster singles and doubles (CCSD) is written as With the ansatz of a method given in terms of Eq. (8), TCE first (1) evaluates these operator-determinant expressions into sums-of-products of matrices (molecular integrals and excitation amplitudes) using normal-ordered second quantization and Wick's theorem, second (2) transforms the latter into a computational sequence (algorithm), which consists in an ordered series of binary matrix multiplications and additions, and third (3) generates parallelexecution programs implementing these matrix multiplications and additions, which can be directly copied into appropriate directories of the NWChem source code and which are called by a short, high-level driver subroutine humanly written (see Fig. 1).
In step (2), TCE finds the (near-)minimum cost path of evaluating sums-of-products of matrices by solving the matrix-chain problem (defining the so-called "intermediates") and by performing common subexpression elimination and intermediate reuse. In step (3), the computergenerated codes perform dynamically-load-balanced parallel matrix multiplications and additions, taking advantage of spin, spatial, and index-permutation symmetries. The parallelism, symmetry usage, and memory/disk space management are all achieved by virtue of TCE's data structure: every matrix (molecular integrals, excitation amplitudes, intermediates, etc.) is split into spin-and spatial-symmetryadapted tiles, whose sizes are determined at runtime so that the several largest tiles can fit in core memory. Only symmetrically-unique, non-zero tiles are stored gapless (with their storage addresses recorded in hash tables which are also auto-generated by TCE) and used in parallel tilewise multiplications and additions, which are dynamically distributed to idle processors on a first-come, first-served basis. NWChem's parallel middleware, especially Global Arrays, was essential for making the computer-generated parallel codes viable.

D. Relativistic Methods
Methods which include treatment of relativistic effects are based on the Dirac equation 193 , which has a fourcomponent wavefunction. The solutions to the Dirac equation describe both positrons (the "negative energy" states) and electrons (the "positive energy" states), as well as both spin orientations, hence the four components. The wavefunction may be broken down into two-component functions traditionally known as the large and small components; these may further be broken down into the spin components. [194][195][196][197] The implementation of approximate all-electron relativistic methods in quantum chemical codes requires the removal of the negative energy states and the factoring out of the spin-free terms. Both of these may be achieved using a transformation of the Dirac Hamiltonian known in general as a Foldy-Wouthuysen (FW) transformation. Unfortunately, this transformation cannot be represented in closed form for a general potential, and must be approximated. One popular approach is the Douglas and Kroll 198 method developed by Hess 199,200 . This approach decouples the positive and negative energy parts to second-order in the external potential (and also fourth-order in the fine structure constant, α). Other approaches include the zeroth order regular approximation (ZORA) [201][202][203][204] , modification of the Dirac equation by Dyall 205 , which involves an exact FW transformation on the atomic basis set level 206,207 and the exact 2-component (X2C) formulation, which is a catch-all for a variety of methods that arrive at an exactly decoupled two-component Hamiltonian using matrix algebra. 197,[208][209][210][211] NWChem contains released implementations of the DKH, ZORA, and Dyall approaches, while the X2C method is available in a development version. 209,211 Since these approximations only modify the integrals, they can, in principle, be used at all levels of theory. At present, the Douglas-Kroll, ZORA and X2C implementations can be used at all levels of theory, whereas Dyall's approach is currently available at the Hartree-Fock level.
a. Douglas-Kroll Approximation: NWChem contains three second-order Douglas-Kroll approximations termed as FPP, DKH, and DKHFULL. The FPP is the approximation based on free-particle projection operators 199 , whereas the DKH and DKFULL approximations are based on external-field projection operators 200 . The latter two are considerably better approximations than the former. DKH is the Douglas-Kroll-Hess approach and is the approach that is generally implemented in quantum chemistry codes. DKFULL includes certain cross-product integral terms ignored in the DKH approach (see for example, Häberlen and Rösch 212 ). The third-order Douglas-Kroll approximation (DK3) implements the method by Nakajima and Hirao 213,214 . b. Zeroth Order Regular Approximation (ZORA): The spin-free and spin-orbit versions of the one-electron zeroth order regular approximation (ZORA) have been implemented. Since the ZORA correction depends on the potential, it is not gauge invariant. This is addressed by using the atomic approximation of van Lenthe and coworkers. 215,216 Within this approximation, the ZORA corrections are calculated using the superposition of densities of the atoms in the system. As a result, only intra-atomic contributions are involved, and no gradient or second derivatives of these corrections need to be calculated. In addition, the corrections need only to be calculated once at the start of the calculation and stored. The ZORA approach is implemented in two ways in NWChem, one where the ZORA potential components are directly computed on an all-electron grid 204 and a second approach, where the ZORA potential is computed using the model potential approach due to van Wüllen and co-workers. 217,218 c. Dyall's Modified Dirac Hamiltonian Approximation: The approximate methods described in this section are all based on Dyall's modified Dirac Hamiltonian. This Hamiltonian is entirely equivalent to the original Dirac Hamiltonian, and its solutions have the same properties. The modification is achieved by a transformation on the small component. This gives the modified small component the same symmetry as the large component. The advantage of the modification is that the operators now resemble those of the Breit-Pauli Hamiltonian, and can be classified in a similar fashion into spin-free, spin-orbit, and spin-spin terms. It is the spin-free terms which have been implemented in NWChem, with a number of further approximations. Negative energy states are removed by a normalized elimination of the small component (NESC), which is equivalent to an exact Foldy-Wouthuysen (EFW) transformation. Both one-electron and two-electron versions of NESC (NESC1E and NESC2E, respectively) are available, and both have analytic gradients. [205][206][207]

E. Molecular Properties
A broad array of simple and response-based molecular properties can be calculated using the HF and DFT wavefunctions in NWChem. These include: natural bond analysis, dipole, quadrupole, octupole moments, Mulliken population analysis and bond order analysis, Löwdin population analysis, electronic couplings for electron transfer, 84,85 , Raman spectroscopy, 219,220 , electrostatic potential (diamagnetic shielding) at nuclei, electric field and field gradient at nuclei, electric field gradients with relativistic effects 221 , electron and spin density at nuclei, GIAO-based NMR properties like shielding, hyperfine coupling (Fermi-Contact and Spin-Dipole expectation values), indirect spinspin coupling, 222-224 G-shift, 225 EPR, paramagnetic NMR parameters, 226,227 and optical activity. 102,103,228,229 Note that only linear-response is supported and for single frequency, electric field, and mixed electric-magnetic field perturbations. Ground state and dynamic dipole polarizabilities for molecules can be calculated at the CCSD, CCSDT, and CCSDTQ levels using the linear-response formalism. 230 For additional information, we refer the reader to the online manual. 8

F. Periodic Plane-Wave Density Functional Theory
The NWChem plane-wave density functional theory (NWPW) module contains two programs: • PSPW -A pseudopotential and projector augmented (PAW) plane-wave Γ-point code for calculating molecules, liquids, crystals, and surfaces, • BAND -A pseudopotential plane-wave band structure code for calculating crystals and surfaces with small band gaps (e.g. semi-conductors and metals), These programs use a common infrastructure for carrying out operations related to plane-wave basis sets that is paral-lelized with the MPI and OpenMP libraries 29,33,34,[231][232][233][234][235] The NWPW module can be used to carry out many different kinds of simulations. In addition to the standard simulations implemented in other modules, e.g. energy, optimize, and freq, there are additional capabilities specific to PSPW and BAND that can be used to carry out NVE and NVT 236  A variety of exchange-correlation functionals have been implemented in both codes, including the local density approximation (LDA) functionals, generalized gradient approximation (GGA) functionals, full Hartree-Fock and screened exchange, hybrid DFT functionals, selfinteraction correction (SIC) functionals 256 , localized exchange method, DFT+U method, and Grimme dispersion corrections 77,78 , as well as recently implemented vdW dispersion functionals 257 , and meta-generalized gradient approximation (metaGGA) functionals. The program contains several codes for generating pseudopotentials, including Hamann 20 and Troulier-Martin 21 , and PAW 23 potentials. These codes have the option for generating potentials with multiple projectors and semi-core corrections. It also contains codes for reading in HGH 258 , GTH 259 , and normconserving pseudopotentials in the CPI and TETER formats. Codes for reading Optimized Norm-Conserving Vanderbilt (ONCV) pseudopotentials 260,261 and USPP PAW potentials will become available in future releases of NWChem.
The pseudopotential plane-wave DFT methods implemented in NWChem are a fast and efficient way to calculate molecular and solid-state properties using DFT 16,17,19,29,235,[262][263][264][265][266][267][268][269][270] . In these approaches, the fast varying parts of the valence wavefunctions inside the atomic core regions and the atomic core wavefunctions are removed and replaced by pseudopotentials [20][21][22][271][272][273][274] . Pseudopotentials are chosen such that the resulting pseudoatoms have the same scattering properties as the original atoms. The rationale for this approach is that the changes in the electronic structure associated with making and breaking bonds only occur in the interstitial region outside the atomic core regions (see Fig. 2). Therefore, removing the core regions should not affect the bonding of the system. For this approach to be useful, it is necessary for the pseudopotentials to be smooth in order for plane-wave basis sets to be used. As the atomic potential becomes stronger the core region becomes smaller and the pseudopotential grows steep. As a result, the pseudopotential can become very stiff, requiring large plane-wave basis sets (aka cutoff energies), for the first-row transition metals atoms, the lanthanide atoms, and towards the right-hand side of the periodic table (fluorine). The projected augmented plane-wave method (PAW) 23,232,[275][276][277] is another related approach that removes many of the problems of the somewhat ad hoc nature of the pseudopotentials approach. However, in the PAW approach, instead of discarding the rapidly varying parts of the electronic functions, these are projected onto a local basis set (e.g., a basis of atomic functions), and no part of the electron density is removed from the problem.
Another key feature of PAW is that by maintaining a local description of the system, the norm-conservation condition (needed for proper scattering from the core) can be relaxed, which facilitates the use of smaller plane-wave basis sets (aka cutoff energies) then for many standard pseudopotentials. Historically, the PAW method was implemented as a separate program in the NWPW module, rather than being fully integrated into the PSPW and BAND codes. This separation significantly hindered its development and use. As of NWChem version 6.8 (released in 2017), the PAW approach has been integrated into the PSPW code, and it is currently being integrated into the BAND code. It will become available in future releases of NWChem.
In recent years, with advances in High-Performance Computing (HPC) algorithms and computers, it is now possible to run AIMD simulations up to ∼1 ns for nontrivial system sizes. As a result, it is now possible to effectively use free-energy methods with AIMD and AIMD/MM approaches. Free energy approaches are useful for simulating reactions where traditional quantum chemistry approaches can be difficult to use and often require the expertise of a very experienced quantum chemist, e.g. reactions that are complex with concerted or multi-step components and/or interact strongly with the solvent. Recent examples include solvent coordination and hydrolysis of actinides metals 197,278-281 (see Fig. 3), hydrolysis of explosives 234 , and ion association in AlCl 3 237 . To help users learn how to use these new techniques, we developed a tutorial on Figure 3. Snapshots from a metadynamics simulation of the hydrolysis of the U 4+ aqua ion 278 . During the simulation a proton jumps from a first shell water molecule to a second shell water molecule and then subsequently to other water molecules via a Grotthuss mechanism. carrying out finite temperature free energy calculations in NWChem 282 .
The NWPW module continues to be actively developed. There are on-going developments for RPA and GW-RPA methods, an electron transfer MCSCF method, Raman and Mössbauer spectroscopy, and a hybrid method that integrates classical DFT 283 into ab initio molecular dynamics (AIMD-CDFT). In addition to these developments, we are actively developing the next generation of plane-wave codes as part of the NWChemEx project. These new codes, which are being completely written from scratch, will contain all the features currently existing in the NWPW module. Besides implementing fast algorithms to use an even larger number of cores and new algorithms to run efficiently on GPUs, it includes a more robust infrastructure to facilitate the implementation of an O(N) DFT code based on the work of Fattebert et al. 284

G. Optimization, Transition State, and Rate theory Approaches
A variety of drivers and interfaces are available in NWChem to perform geometry minimization and transition state optimizations. The default algorithms in NWChem for performing these optimizations are quasi-Newton methods with line searches. These methods are fairly robust, and they can be used to optimize molecules, clusters, and periodic unit cells and surfaces. They can also be used in conjunction with both point group and space group symmetries, excited state TDDFT surfaces, as well as with a variety of external fields, such as external point charges, COSMO 285 or SMD 286 Model. The default methods also work seamlessly with electronic structure methods that do not have nuclear gradients implemented by automatically using finite difference gradients. NWChem also contains default methods for calculating harmonic vibrational frequencies and phonon spectra for periodic systems. These methods are able to make use of analytic Hessians if they are available, otherwise a finite difference approach is used. A vibrational self-consistent field 287 (VSCF) method is also available in NWChem and it can be used to calculate anharmonic contributions to specified vibrational modes. There is also an interface called DIRDYVTST 288 that uses NWChem to compute energies, gradients, and Hessians for direct dynamics calculations with POLYRATE 289 .
A variety of external packages, such as ASE 253,290 and Sella 291,292 , can also be used for finding energy minima, saddle points on energy surfaces, and frequencies using either python scripting or a newly developed i-PI 252 interface. Python programs may be directly embedded into the NWChem input and used to control the execution of NWChem. The python scripting language provides useful features, such as variables, conditional branches, and loops, and is also readily extended. Other example applications for which it could be used include scanning potential energy surfaces, computing properties in a variety of basis sets, optimizing the energy with respect to parameters in the basis set, computing polarizabilities with a finite field, simple molecular dynamics, and parallel in time molecular dynamics 293 .
NWChem also contains an implementation of the nudged elastic band (NEB) method of Jónsson and coworkers [294][295][296][297] and the zero-temperature string method of vanden Eijden et al. 298 Both these methods can be used to find minimum energy paths. Currently, a quasi-Newton algorithm is used for the NEB optimization. A better approach for this kind of optimization is to use a nonlinear multi-grid algorithm, such as the Full Approximation Scheme (FAS) 299 . A new implementation of NEB based on FAS is available on Bitbucket 300 , and an integrated version will soon be available in NWChem.

H. Classical Molecular Dynamics
The integration of a molecular dynamics (MD) module in NWChem enables the generation of time evolution trajectories based on Newton's equation of motion of molecular systems in which the required forces can originate from a classical force field, any implemented quantum mechanical method for which spatial derivatives have been implemented, or hybrid quantum mechanical/molecular mechanical (QM/MM) approaches. The method is based on the ARGOS molecular dynamics software, originally designed for vector processors, 301 but later redesigned for massively parallel architectures. 25,[302][303][304] a. System Preparation: The preparation of a molecular system is done by a separate prepare module that reads the molecular structure and assembles a topology from the databases with parameters for the selected force field. The topology file contains all static information for the system. In addition, this module generates a so-called restart file with all dynamic information. The prepare module has a wide range of capabilities that include the usual functions of placing counter-ions and solvation with any solvent defined in the database. The prepare module is also used to define Hamiltonian changes for free energy difference calculations, and the definition of those parts of the molecular systems that will be treated quantum mechanically in QM/MM simulations. Some of the more unique features include setting up a system for proton hopping (QHOP) simulations, 305,306 and the setup of biological membranes from a single lipid-like molecule. This last capability has been successfully used for the first extensive simulation studies of complex asymmetric lipopolysaccharide membranes of Gramnegative microbes [307][308][309][310][311] and their role in the capture of recalcitrant environmental heavy metal ions, 312 microbial adhesion to geochemical surfaces, [313][314][315][316] and the structure and dynamics of trans-membrane proteins including ion transporters [317][318][319] (Fig. 4).
b. Force Fields: The force field implemented in NWChem consists of harmonic terms for bonded, angle and out of plane bending interactions, and trigonometric terms for torsions. Non-bonded van der Waals and electrostatic interactions are represented by Lennard-Jones and Coulombic terms, respectively. Non-bonded terms are evaluated using charge groups and subject to a user-specified cutoff radius. Electrostatic interaction corrections beyond the cutoff radius are estimated using the smooth particle mesh Ewald method. 320 Parameter databases are provided for the AMBER 321 and CHARMM 322 force fields.
Even for purely classical MD simulations, the integration with the electronic structure methods provides a convenient way of determining electrostatic parameters for missing fragments in standard force field databases, through the use of restrained electrostatic potential fitting 323,324 to which a variety of additional constraints and restraints can be applied.
c. Simulation Capabilities: Ensemble types available in NWChem are NVE, NVT, and NPT, using the Berendsen thermostat and barostat. 119 Newton's equations of motion are integrated using the standard leap-frog Verlet or velocity Verlet algorithms. A variety of fundamental properties are evaluated by default during any molecular dynamics simulation. Parallel execution time analysis is available to determine the parallel efficiency.
The MD module has extensive free energy simulation capabilities, [325][326][327][328][329][330] which are implemented in a so-called multi-configuration approach. For each incremental change of the Hamiltonian to move from the initial to the final state, sometimes referred to as a window, a full molecular simulation is carried out. This allows for a straightforward evaluation of statistical and systematic errors where needed, including a correlation analysis. 331 Based on the ARGOS code 301 it has some unique features, such as the separationshifted scaling technique to allow atoms to appear from or disappear to dummy atoms. 332 One of the advantages of the integration of MD into the electronic structure methods framework in NWChem is the ability to carry out hybrid QM/MM simulations (discussed in the next section). The preparation of molecular systems for the MD module allows for flexibly specifying parts of the molecular system to be treated by any of the implemented electronic structure methods capable of evaluating positional gradients.
A unique feature in the NWChem MD module is the optional specification of protonatable sites on both solute and solvent molecules. Pairs of such sites can dynamically change between protonated or unprotonated state, effectively exchanging a proton. Transitions are governed by a Monte Carlo type stochastic method to determine when transitions occur. This so-called QHOP approach was developed by the research group of Helms. 306 d. Analysis Capabilities: The NWChem MD capability includes two analysis modules. The original analysis module, analyze, analyzes trajectories in a way that reads individual structures one time step at a time and distributes the data in a domain decomposition fashion as in the molecular simulation that generated the data. The second data-intensive analysis module, diana, reads entire trajectories and distributes the data in the time domain. This is especially effective for analyses that require multiple passes through a trajectory, but requires the availability of potentially large amounts of memory. 333,334 An example of such analyses is the Essential Dynamics Analysis, a principal component analysis (PCA) based calculation to determine the dominant motions in molecular trajectories.
e. Parallel Implementation Strategy: The most effective way of distributing a system with large numbers of particles is through the use of domain decomposition of the physical space. The implementation in NWChem, facilitated through the use of the Global Arrays (GA) toolkit, partitions the simulation space into rectangular cells that are assigned to different processes ranks or threads. Each of these ranks carries out the calculation of intra-cell atomic energies and forces of the cells assigned. Inter-cell energies and forces are evaluated by one of the ranks that was assigned one or the other of the cell pairs.
Two load balancing methods have been implemented in NWChem, both based on measured computation time. In the first one, the assignment of inter-node cell pair calculations is redefined such that assignments move from the busiest node to the less busy node. This scheme requires minimal additional communication, and since only two nodes are involved in the redistribution of work, the communication is local, i.e. node to node. In the second scheme, the physical size of the most time-consuming cell is reduced, while all other cells are made slightly larger. This scheme requires communication and redistribution of atoms on all nodes. In practice, the first scheme is used until performance no longer improves, after which the second scheme is used once followed by returning to use the first scheme. This approach has been found to improve load balancing even in systems with a very asymmetric distribution

VI. HYBRID METHODS
We define hybrid methods as those coupling different levels of description to provide an efficient calculation of a chemical system, which otherwise may be outside the scope of conventional single-theory approaches. The physical motivation for such methods rests on the observation that, in the majority of complex chemical systems, the chemical transformation occurs in localized regions surrounded by an environment, which can be considered chemically inert to a reasonable approximation. Since hybrid methods require the combination of multiple theoretical methods in a single simulation, the diversity of simulation methodologies available in NWChem makes it a platform particularly apt for this purpose.
One common example involves chemical transformations in a bulk solution environment, forming the foundations of wide variety of spectroscopic measurements (UV-vis, NMR, EPR, etc.). The reactive region, referred to as the "solute", involves electronic structure degrees of freedom and thus requires the quantum mechanical (QM) based description, such as DFT or more complex wavefunction methods. In the conventional approach, such QM description would be necessarily extended to the entire system making the problem a heroic computational task. In a hybrid approach, the treatment of a surrounding environment ("solvent") would be delegated to a much simpler description, such as the continuum model (CM), for example. The latter is supported in NWChem via two models -COSMO 285 (COnductor-like Screening MOdel) and SMD 286 (Solvation Model based on Density) Model. The resulting QM/CM approaches are particularly well suited for accurate and efficient calculation of solvation free energies, geometries in solution, and spectroscopy in solution. The SMD model employs the Poisson equation with non-homogeneous dielectric constant for bulk electrostatic effects, and solvent-accessible-surface tensions for cavitation, dispersion, and solvent-structure effects, including hydrogen bonding. For spectroscopy in solution, the Vertical Excitation (or Emission) Model (VEM) has also been implemented for calculating the vertical excitation (absorption) or vertical emission (fluorescence) energy in solution according to a two-time-scale model of solvent polarization 336 .
For systems where an explicit solvation environment treatment is needed (for example, heterogeneous systems like a protein matrix), NWChem provides a solution in terms of combined quantum mechanics/molecular mechanics (QM/MM) approach. 337,338 Here, the environment is described at the classical molecular mechanics level. This offers more fidelity compared with a continuum solvent description, while still keeping the computational costs down. The total energy of the system in QM/MM approach can be represented as a sum of the energies corresponding to QM and MM regions: where ψ denotes electronic degrees of freedom, and r, R refer to nuclear coordinates of QM and MM regions correspondingly. The QM energy term can be further decomposed into internal and external parts where ρ is the electron density. As a generic module, the QM/MM implementation can utilize any of the Gaussian basis set based QM modules available in NWChem and supports nearly all the task functionalities. The calculation of QM energy remains the main computational expense in the QM/MM approach. This issue is more pronounced compared with the continuum coupling case, because of the additional atomistic degrees of freedom associated with MM description. The latter comes into play because any change in the MM degrees of freedom will, in general, trigger the recalculation of the QM energy (E qm (r, R; ψ)). To alleviate these issues during the optimization, the QM/MM module offers the option of alternating relaxation of QM and MM regions. During the latter phase, the user may utilize an approximation where the QM degrees of freedom are kept frozen until the next cycle of QM region relaxation, offering significant computational savings. A similar technique can be utilized in the dynamical equilibration of the MM region and calculations of reaction pathways and free energies. In addition to the native MD module, the NWChem QM/MM module can also utilize the external AMBER MD code 339 for running the classical part of the calculations. In this case, QM/MM simulations involve two separate NWChem and AMBER calculations with data exchange mediated through files written to disk.
Additionally, the QM/MM capability in NWChem has resulted in the development and refinement of force-field parameters, that can, in turn, be used in classical molecular dynamics simulations. Over the last two decades, classical parameters obtained using NWChem have been employed to address the underlying mechanisms of a variety of novel complex biological systems and their interactions (e.g., lipopolysaccharide membranes, carbohydrate moieties, mineral surfaces, radionuclides, organophosphorous compounds) 307,308,[311][312][313][340][341][342][343][344] which has led to a significant expansion of the database of AMBER-and Glycamcompatible force fields and the GROMOS force field for lipids, carbohydrates and nucleic acids. [345][346][347][348][349][350][351] For cases where a classical description of the environment is deemed insufficient, NWChem offers an option to perform an ONIOM type calculation. 352 The latter differs from QM/MM in that the lower level of theory is not restricted to its region but also encompasses regions from all the higher levels of description. For example, in the case of the two-level description, the energy is written as where subscripts H, L refer to high and low levels of theory correspondingly. The high-level treatment is restricted to a smaller portion of the system (R H ), while the low level of theory goes over the entire space (R). The second term in the above equation takes care of overcounting. The NWChem ONIOM module implements two-and threelayer ONIOM models for use in energy, gradient, geometry optimization, and vibrational frequency calculations with any of the pure QM methods within NWChem.
A new development in hybrid method capabilities of NWChem involves classical density functional theory (cDFT). [353][354][355] The latter represents a classical variant of electronic structure DFT, where the main variable is the classical density of the atoms. 356,357 Conceptually, this type of description lies between continuum and classical force field models, providing orders of magnitude improvements over classical MD simulations. The approach is based on incorporating important structural features of the environment in the form of classical correlation functions. This allows for efficient and reliable calculations of thermodynamical quantities, providing an essential link between electronic structure description at the atomistic level and phenomena observed at the macroscopic scale.

VII. PARALLEL PERFORMANCE
The design and development of NWChem from the outset was driven by parallel scalability and performance to enable large scale calculations and achieve fast time-tosolution by using many CPUs where possible. The parallel tools outlined in section IV provided the programming framework for this.
The advent of new architectures such as the GPU 358 platforms have required the parallel coding strategy within NWChem to be revisited. At present, the coupled-cluster code within TCE can utilize both the CPU and GPU hardware at a massive scale 32,359 . The emergence of many-core processors in the last ten years provided the opportunity for starting a collaborative effort with Intel corporation to optimize NWChem on this new class of computer architecture. As part of this collaboration, the TCE implementation of the CCSD(T) code was ported to the Intel Xeon Phi line of many-core processors 35 using a parallelization strategy based on a hybrid GA-OpenMP approach. The ab initio plane-wave molecular dynamics code (section V F) has also been optimized to take full advantage of these Intel many-core processors 33,231 .
In the rest of this section, we will discuss the parallel scalability and performance of the main capabilities in NWChem. a. Gaussian Basis Density Functional Theory: In Fig. 5, we report the parallel performance of the Gaussian basis set DFT module in NWChem. This calculation involved performing a PBE0 energy calculation (four SCF iterations in direct mode) on the C 240 molecule with the 6-31G* basis set (3600 basis functions) without symmetry. These calculations were performed on the Cascade supercomputer located at PNNL. c. Closed-shell CCSD(T): The parallel implementation of the CCSD(T) approach by Kobayashi and Rendell 132 , employing the spin adaptation scheme based on the unitary group approach (UGA) 133 within NWChem, was one of the first scalable implementations of the CC formalism capable of taking advantage of several hundred processors. This implementation was used in simulations involving tera-and peta-scale architectures where chemical accuracy is required to describe ground-state potential energy surfaces. One of the best illustrations of the performance of the CCSD(T) implementation is provided by calculations for water clusters 26 184 and IP-EOMCCSD calculations for carbon nanotubes. 362 A good illustration of the scalability of the TCE module is provided by the application of GA-based TCE implementations of the iterative (CCSD/EOMCCSD) and non-iterative (CR-EOMCCSD(T)) methods in studies of excited states of β -carotene 363 and functionalized forms of porphyrin 28 (see Fig.7(a) and (b), respectively). While non-iterative methods are much easier to scale across a large number of cores ( Fig.7 (b)), scalability of the iterative CC methods is less easy to achieve. However, using early task-flow algorithms for TCE CCSD/EOMCCSD methods 28 it was possible to achieve satisfactory scalability in the range of 1,000-8,000 cores.
e. Recent Implementation of Plane-Wave DFT AIMD for Many-Core Architectures: The very high degree of parallelism available on machines with many-core processors is forcing developers to carefully revisit the implementation of their programs in order to make use of this hardware efficiently. In this section, after a brief overview of the computational costs and parallel strategies for AIMD, we present our recent work 33 on adding thread-level parallelism to the AIMD method implemented in NWChem 3,29,364 . The main computational costs of an energy minimization or AIMD simulation are the evaluation of the electronic gradient δ E total /δ ψ * i = Hψ i and algorithms used to maintain orthogonality. These costs are illustrated in Figure 8. Due to their computational complexity, the electron gradient Hψ i and orthogonalization need to be calculated as efficiently as possible. The main parameters that determine the cost of a calculation are N g , N e , N a , and N pro j , where N g is the size of the three-dimensional FFT grid, N e is the number of occupied orbitals, N a is the number of atoms, N pro j is the number of projectors per atom, and N pack is the size of the reciprocal space.
The evaluation of the electron gradient (and orthogonality) contains three major computational pieces that need to be efficiently parallelized: • applying V H and V xc , involving the calculation of 2N e 3D FFTs; • calculating the non-local pseudopotential, V NL , dominated by the cost of the matrix multiplications W = P T Y , and Y 2 = PW , where P is an N pack × (N pro j · N a ) matrix, Y and Y 2 are N pack × N e matrices, and W is an (N pro j N a ) × N e matrix; • enforcing orthogonality, where the most expensive matrix multiplications are S = Y T Y and Y 2 = Y S, where Y and Y 2 are N pack × N e matrices, and S is an N e × N e matrix. In this work, Lagrange multiplier kernels are used for maintaining orthogonality of Kohn-Sham orbitals 29,365-368 .
In Fig. 9 the timing results for a full AIMD simulation of 256 water molecules on 16, 32, 64, 128, 256, and 1024 KNL nodes are shown. The "Cori" system at NERSC was used to run this benchmark. This benchmark was taken from Car Parrinello simulations of 256 H 2 O with an FFT grid of N g = 180 3 (N e =2056) using the plane-wave DFT module (PSPW) in NWChem. In these timings, the number of threads per node was 66. The size of this benchmark simulation is about 4 times larger than many mid-size AIMD simulations carried out in recent years, e.g. in recent work by Bylaska and co-workers. 279,280,[369][370][371][372] The overall timings show strong scaling up to 1024 KNL nodes (69632 cores) and the timings of the major kernels, the pipelined 3D FFTs, non-local pseudopotential, and Lagrange multiplier kernels all displayed significant speedups.
f. Classical Molecular Dynamics: The molecular dynamics module in the current NWChem release is based on the distribution of cells over available ranks in the calculation. Simulations exhibit good scalability when cells only require communication with immediately neighboring cells. When the combination of cell size and cutoff radius is such that interactions with atoms in cells beyond the immediate neighbors are required, performance is significantly affected. This limits the number of ranks that can effectively be used. For example, a system with 500,000 atoms will only scale well up to 1000 ranks. In future implementations, the cell-cell pair-list will be distributed over the available ranks. While this leads to additional communication for ranks that do not "own" a cell, the implementation of a new communication scheme that avoids global communication has been demonstrated to improve scalability by at least an order of magnitude. 304

VIII. OUTREACH
Given the various electronic structure methods available in NWChem, it does not come as a surprise that many of these functionalities have been integral to various projects focused on extensions of quantum chemical capabilities to exa-scale architectures and emerging quantum computing (see Fig. 10 253,255,[373][374][375][376][377][378][379][380] NWChem initially developed its own graphical user interface called the Extensible Computational Chemistry Environment 381 , which is currently supported by a group of open-source developers. In addition, multiple codes use quantities from the NWChem simulation, such as wavefunctions as input for the calculation of additional properties not directly available in the code. [382][383][384][385][386][387][388][389][390][391] NWChem is able to export electrostatic potential and charge densities with the Gaussian cube format 392 and can use the Molden format 393 to write or read molecular orbitals. This allows codes 394-398 to utilize NWChem's data to, for example, display charge densities and electrostatic potentials. NWChem can also generate AIM wavefunction files that have been used by a variety of codes to calculate various properties. 76,[399][400][401] Recently, NWChem has also been interfaced with the SEMIEMP code 402 , which can be used to perform real-time electronic dynamics using the INDO/S Hamiltonian. 403,404 b. Common Component Architecture: It is an attractive idea to encapsulate complex scientific applications as components with standardized interfaces. The components interact only through these well-defined interfaces and can be combined into full applications. The main motivation is to be able to reuse and swap components as needed and seamlessly create complex applications. There have been a few attempts to introduce this approach to the scientific software development community. The most notable DOEled effort was the Common Component Architecture (CCA) Forum 405 , which was launched in 1998 as a scientific community effort to create components designed specifically for the needs of high-performance scientific computing. A more recent development is the rise of Simulation Development Environment (SDE) framework 406 , which has features that are related to the components of CCA.
NWChem developers have participated in CCA and SDE effort resulting in the creation of the NWChem component. As an example, the NWChem CCA component was used in the building applications for molecular geometry optimization from multiple quantum chemistry and numerical optimization packages 407 , combination of multiple theoretical methods to improve multi-level parallelism 408 , demonstration of multi-level parallelism 409 , and standardization of integral interfaces in quantum chemistry 410 . In the end, the CCA framework was too cumbersome to use for developers, requiring significant efforts to develop interfaces and making components to work together. It resulted in the retirement of CCA Forum in 2010, but the work done on standardization of interfaces is continuing to benefit the quantum chemistry community to this day.
c. NWChemEx: The NWChemEx project is a natural extension of NWChem to overcome the scalability challenges associates with migrating the current code base to exa-scale platforms. NWChemEx is being developed to address two outstanding problems in advanced biofuels research: (i ) development of a molecular understanding of proton controlled membrane transport processes and (ii) development of catalysts for the efficient conversion of Figure 10. A "connected diagram" describing ongoing efforts to extend computational chemistry models to exa-scale and quantum computing. In each case, NWChem provides a testing and development platform. A significant role in these projects is played by Tensor Algebra for Many-body Methods (TAMM) library. The ECC acronym stands for the Exa-scale Catalytic Chemistry project supported by BES. 411 The QDK-NWChem interface with the lib-DUCC library is used for downfolding electronic Hamiltonians. 412 biomass-derived intermediates into biofuels, hydrogen, and other bioproducts; therefore the main focus is on enabling scalable implementations of the ground-state canonical CC formalisms utilizing Cholesky decomposed form of the two-electron integrals [413][414][415][416][417][418] as well as linear scaling CC formulations based on the domain-based local pair natural orbital CC formulations (DLPNO-CC). [419][420][421] and embedding methods.

d. Scalable Predictive methods for Excitations and Correlated phenomena (SPEC):
The main focus of the SPEC software project is to provide the users with a new generation of methodologies to simulate excited states and excited-state processes using existing peta-and emerging exa-scale architectures. These new capabilities will play an important role in supporting the experimental efforts at light source facilities, which require accurate and reliable modeling tools. The existing NWChem capabilities are being used to verify and validate SPEC implementations including excitation energy, ionization potential, and electron affinity variants of the EOMCC theory as well as hierarchical Green's function formulations ranging from the lower order GW+Bethe-Salpeter equation (GW+BSE) 422 to hierarchical coupled-cluster Green's function (GFCC) methods [185][186][187][188]423 and multi-reference CC methods.
e. Quantum Information Sciences: Quantum computing not only offers the promise of overcoming exponential computational barriers of conventional computing but also in achieving the ultimate level of accuracy in studies of challenging processes involving multi-configurational states in catalysis, biochemistry, photochemistry, and materials science to name only a few areas where quantum information technologies can lead to the transformative changes in the way how quantum simulations are performed. NWChem, with its computational infrastructure to characterize secondquantized forms of electronic Hamiltonians in various basis sets (Gaussian and plane-waves) and with wavefunction methodologies to provide an initial characterization of the ground-and excited-state wavefunctions, can be used as a support platform for various types of quantum simulators. The recently developed QDK-NWChem interface 424 (QDK designates Quantum Development Kit developed by Microsoft Research team) for quantum simulations and libraries for CC downfolded electronic Hamiltonians for quantum computing 412 are good illustrations of the utilization of NWChem in supporting the quantum computing effort.

IX. DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.

X. CONCLUSIONS
The NWChem project is an example of a successful codesign effort that harnesses the expertise and experience of researchers in several complementary areas, including quantum chemistry, applied mathematics, and high-performance computing. Over the last three decades, NWChem has evolved into a code that offers a unique combination of computational tools to tackle complex chemical processes at various spatial and time scales.
In addition to the development of new methodologies, NWChem is being continuously upgraded, with new algorithms, to take advantage of emerging computer architectures and quantum information technologies. We believe the community model of NWChem will continue to spur exciting new developments well into the future. This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://energy.gov/downloads/doe-public-access-plan).
This article may be downloaded for personal use only. Any other use requires prior permission of the author and AIP Publishing. This article appeared in volume 152, issue 18, page 184102 of the Journal of Chemical Physics and may be found at https://doi.org/10. 1063/5.0004997