New tuned range-separated density functional for the accurate calculation of second hyperpolarizabilities

The calculation of nonlinear optical properties (NLOPs) using density functional theory (DFT) remains a challenge in computational chemistry. Although existing range-separated functionals display the best performance for the calculation of this type of properties, their errors strongly depend on the family of molecules studied. Herein, we have explored a new strategy to empirically tune the range-separated LC-BLYP method to improve the accuracy of the calculation of the second hyperpolarizabilities ( γ ), which are poorly described by current density functional approximations. First, we benchmarked nine of the most accurate commonly used range-separated hybrid and optimally-tuned functionals ( i.e. B3LYP, PBE0, BH&HLYP, M06-2X, MN15, ωB97XD, CAM -B3LYP, LC-BLYP and OT-LC-BLYP) for the calculation of γ using as reference the CCSD(T) values of a chemically diverse set of 60 molecules. Among these nine functionals, LC-BLYP gives the lowest average errors. We determined the value of the range-separation parameter ω required to reproduce the CCSD(T) second hyperpolarizabilities with LC-BLYP functional (ω CC ) for the set of 60 molecules. Our new tuned range-separated functional, Tα -LC-BLYP, employs a quadratic correlation between ω CC and a molecular descriptor in terms of the linear polarizability and the number of electrons of the molecule. The average error of the γ values obtained with Tα -LC-BLYP is reduced by half or more as compared with the most accurate among the nine density functional approximations benchmarked.


Introduction
Chemical compounds bearing high nonlinear optical (NLO) features have raised as key building blocks of several devices for a variety of technologies such as electro-optics, 1, 2 optical switching [3][4][5][6] or three-dimension fluorescence microscopy. [7][8][9] In particular, materials bearing high second hyperpolarizabilities (γ) have gained special interest. 6,10 Although γ can be measured for instance through third harmonic generation experiments, 11,12 computational insights are very useful to complement and rationalize experimental data or design new prototypes. At the molecular level, Hartree-Fock (HF) (hyper)polarizabilities are not accurate enough and, therefore, a correlated wavefunction is required to compute the NLO properties (NLOPs). [13][14][15] Typically, second-order Møller-Plesset 16 (MP2) perturbation theory, coupled-cluster 17 with singles and doubles (CCSD), or including triple excitations through perturbative treatment, i.e. CCSD(T), together with a flexible basis set, 14,18 are considered appropriate methodologies to obtain accurate values of γ. [19][20][21][22] Unfortunately, post-HF approaches become computationally too expensive as the size of the molecule increases.
Kohn-Sham 23 density functional theory (KS-DFT) is a powerful alternative to correlated wavefunction methods due to its good balance between computational cost and accuracy. However, the expression for the exact functional is not known, existing a large variety of alternative approximate definitions. Most common density functional approximations (DFAs) available nowadays are designed and constructed to accurately predict kinetic and thermodynamic properties directly related to the difference of Gibbs energies, such as reaction barriers and reaction energies, 24 regardless of other observables such as electro-optical properties. For this reason, molecular electric properties still remain a challenge for most common DFAs, specially concerning the prediction of NLOPs. 25 This failure is related with the wrong asymptotic behaviour of approximate exchange−correlation (XC) potentials, the "near-sightedness" of the XC response kernels, and the lack of the integer discontinuity. 26,27 Instead, the HF potential does have the proper asymptotic behaviour and HF exchange does not suffer from self-interaction errors. The range-separated hybrid (RSH) functionals that include 100% of exact HF exchange at large electron-electron distances correctly describe the asymptotic behaviour of XC potentials, and are one of the DFA types providing the most accurate NLOPs up to date. 19,21,22 Further improvement on the results of the RSH functionals can be ideally achieved by tuning the RSH functionals. 26,[28][29][30] For instance, Kronik and co-workers reproduced experimental band gaps 31 and optical absorption energies 32 for a series of organic solids applying one-and/or two-dimensional optimally tuning (OT) protocols. The most popular strategy for the construction of OT-RSH functionals is based on the optimization of the range-Please do not adjust margins Please do not adjust margins separation parameter in the RSH to satisfy the DFT version of Koopmans' theorem. 33,34 Several research groups have previously explored the behaviour of RSH and OT-RSH functionals to compute the NLOP of a specific type of target molecules such as conjugated polymeric chains. 21,22,25,35 They have shown that a large drawback of DFAs is the presence of the delocalization error. The minimization of the delocalization error together with an optimal tuning of ω, is the key to improve the description of some systems. 26 Particularly, for lower-order response properties such as the electric first-hyperpolarizability (β), ω tuning largely improves the results of push-pull systems such as p-nitroanilyne (PNA) or dimethylaminonitrostylbene (DANS) oligomers. 36,37 Unfortunately, this tuning scheme is, in general, not accurate enough to provide reliable results for γ calculations, giving sometimes even worse results than their corresponding nonoptimized RSH functional. 21,38 Indeed, for cases like conjugated organic oligomers and polymers, which have raised a lot of interest in the field of nonlinear optics due to their very high NLOPs, the usual OT-RSH even decreases the accuracy for γ's. 19,21 Therefore, a novel and more specific strategy is required to compute accurate second hyperpolarizabilities at a moderate computational cost. In this work, we present a new alternative tuning scheme based on the correlations found between the linear polarizability (α) obtained at LC-BLYP level of theory and the optimal value of range-separation parameter ω that reproduces the values of γ obtained at CCSD(T) level. Furthermore, to the best of our knowledge, we have constructed one of the largest benchmark molecular sets including γ values computed at CCSD(T) level of theory.

Range-separated functionals
In RSH functionals, the electron-electron repulsion operator 1/r12 for the exchange energy is split into short-range (SR) and long-range (LR) terms depending on the interelectronic distance: 39 where erf is the standard error function; r12 is the interelectronic distance between electrons 1 and 2; λ governs the amount of exact exchange (eX) incorporated at both SR and LR; κ controls the extra amount of eX added only at LR; and ω is the rangeseparation parameter, defining the steepness of the error function and the interelectronic distance at which the transition from SR to LR takes place. If the main modification of the exchange potential with respect to the original functional occurs at large interelectronic distances, the functional is called longrange corrected (LC) functional. The implementation of the range-separated electron-electron repulsion operator into the exchange functional leads to the following expression for the exchange-correlation energy: where EeX is the exact HF exchange, EDFAx denotes the approximate (semi)local exchange energy, and Ec is the correlation energy.

Tuned range-separated functionals
In the most popular RSH functionals, the range-separation parameter ω is given as a universal constant defined after a fitting procedure against an appropriate dataset. For instance, the parameter values of Eq. 2 for three common RSH functionals are: LC-ωPBE 40 (λ=0.0, κ=1.0, ω=0.40 Bohr -1 ), CAM-B3LYP 41 (λ=0.19, κ=0.46, ω=0.33 Bohr -1 ) and LC-BLYP 39 (λ=0.0, κ=1.0, ω=0.47 Bohr -1 ). Tuned RSH functionals are usually based on one of these three widely accepted RSH functionals, in which the value of one (usually ω), two (ω and κ/λ) or three (ω, κ and λ) parameters are tuned specifically for each chemical system. 33 Among the nonempirical tuning schemes, the most popular approach is based on optimizing the value of ω to enforce the fulfilment of the DFT version of the Koopmans' theorem (i.e. Janak's theorem). 34,42 This theorem states that for the exact Kohn-Sham functional the energy of the highest occupied molecular orbital (HOMO) is equal to the negative of the ionization potential, εHOMO = -IP. 33 Usually this condition is imposed for both the neutral and the anionic state, where the ionization potential corresponds to the electron affinity of the neutral state. 34 Thus, the optimal tuning of the ω parameter is obtained through the minimization of the following expression for a N-electron system: Unfortunately, this approach to design OT-RSH functionals does not systematically improve the accuracy of the second hyperpolarizabilites. 19,21,43 Therefore, clearly another approach to tune ω should be used to design new RSH functionals for the evaluation of high-order NLOPs.

Molecular set
The molecular set ( Figure 1) used for the development of our new OT-RSH functional for computing static second hyperpolarizabilities (γ-NLO set hereafter) is built up with 60 molecular systems containing from 2 to 36 atoms of the second period and/or hydrogen. The first five/six oligomers of three of the most typical π-conjugated NLO polymers such as all-trans polyacetylene (PA), polydiacetylene (PDA) and all-trans polymethyneimine (PMI) are included in the set. Some small organic and inorganic molecules, and hydrogen chains (which are known to be challenging systems for the calculation of the second hyperpolarizability) are also included. Except dioxygen, all molecules in the set present a closed-shell ground state.

Computational details
All electronic structure calculations were carried out using Gaussian09 Revision E.01 44  For all DFT calculations, numerical integrations were performed by using the default grid in Gaussian09, which corresponds to a pruned grid of 75 radial shells and 302 angular points per shell.
The spin-unrestricted formalism was used for O2 (triplet ground state) and open-shell singlet o-, m-, and pbenzyne. The geometries of molecular set 2 (vide infra) and all systems present in the γ-NLO set but hydrogen chains were optimized at the MP2 level of theory, requesting tight convergence criteria on the root-mean square (RMS) and maxima of the forces and displacements as defined in Gaussian09. Molecular set 1 geometries were optimized using CCSD, which properly reproduces the CASPT2 singlet-triplet split for these diradical molecules. 46 For hydrogen chains, the alternated separation of 2.0 and 3.0 Bohr between hydrogen atoms was set in order to design a system consisting of moderately interacting molecules of dihydrogen, thus avoiding scenarios where the inclusion of strong or nondynamic correlation is required. All minima in the potential energy surface were characterized by analytical vibrational frequency calculations. All molecules were reoriented after the optimization in order to set the Z axis as the principal inertia axis with the highest moment of inertia, allowing us to focus only on the longitudinal component of the polarizability matrix (i.e. αzz) and the second hyperpolarizability tensor (i.e. γzzzz).
The CCSD(T) single-point energy calculations necessary for the finite-field calculations (see SI1, section 6) were done with an energy convergence criterion of 10 -8 atomic units (a.u.). The cutoff of the two-electron integrals accuracy was set to 10 -11 a.u. For all the molecules of γ-NLO set, set 1, and set 2, the T1 diagnostic was computed to confirm that CCSD(T) energies and γ values can be considered a good reference. O-, mand pbenzyne have an open-shell singlet ground state. The largest CCSD amplitudes for the benzynes are given by t2 amplitudes, but they are lower than 0.6, and therefore the perturbative (T) correction is not problematic. 47 Furthermore, in spite of their multireference character, CCSD(T) properly reproduces the experimental singlet-triplet splitting of o-, mand pbenzyne. 47 CCSD(T) static linear polarizability and second hyperpolarizabilities were computed as derivatives of the electronic energy with respect to an external electric field using the finite field (FF) approach, At the DFT level, both α and β can be analytically determined. Hence, γ was calculated numerically from the first derivative of β with respect to the external electric field using the FF approach. The number of field strengths chosen in the FF procedure varies for each molecule and method (DFAs or CCSD(T)). In particular, the range of field strength used was F=2 j ·10 -4 a.u. with integer values of j=0-7 for all molecules, and the addition of j=8-10 when the former field strengths were not enough to obtain converged numerical derivatives.
The truncation error affecting the numerical estimation of the derivatives, which comes from the higher-order terms neglected in the standard central-differentiation formulae, was minimized thanks to application of the Rutishauser-Romberg scheme: 48, 49 where R is the calculated property, i is the iteration number, and j indicates that the property has been computed using an electric field-strength of 2 j ·10 -4 a.u. in the FF procedure. By using this methodology, the average numerical errors of CCSD(T) γ is below 2.2% (see SI2, section 1 for further details).

Results and discussion
New tuned RSH for NLO properties RSH functionals usually provide a better description of NLOPs than regular global hybrid DFAs. 25,43 A possible approach to improve the accuracy of the NLOP obtained with the RSH functionals is tuning the parameters that determine their shortrange and long-range regions in the calculation of the exchange energy. Unfortunately, the most popular tuned RSH functionals, which are based on the minimization of J 2 , do not improve the Please do not adjust margins Please do not adjust margins  21,26,38 The main goal of this work is to propose a new type of tuned RSH replacing the minimization of J 2 by an alternative function that provides more accurate second hyperpolarizabilities.
As starting point to build the new OT-RSH functionals, we have used the LC-BLYP method, which is among the best RSH functionals to compute NLOPs. 43 Our first step was to optimize, for each molecule of the γ-NLO set, the ω parameter of LC-BLYP in order to reproduce the γ value computed with CCSD(T). We applied the root-finding bisection method 50 until finding the best value of ω with a precision of Δω=0.01. This accuracy is enough to reproduce the CCSD(T) values of γ (i.e. relative errors with respect to CCSD(T) are smaller than 1%) and obtain smooth profiles describing the dependence of LC-BLYP γ in terms of ω. This set of optimized ω are labelled as ωCC (see Table 1).
Our second goal is to find a correlation between ωCC and some molecular descriptor or property that can be easily computed for any molecule. In order to achieve this challenging goal, several possible molecular descriptors were tested, such as HOMO-LUMO gap, the first excitation energies computed within time-dependent DFT formalism or delocalization indices. [51][52][53] We obtained satisfactory correlations between functions based on these molecular descriptors and ωCC within subfamilies of the γ-NLO set, as for instance PA chains (see Figures SI2.5-SI2.6). However, we did not find any function based on these descriptors displaying a sufficiently good correlation with ωCC and involving all 60 molecules of the set.
Motivated by the idea of using γzzzz as self-descriptors to predict the best ω to reproduce CCSD(T) γ using LC-BLYP, we found good correlations using as molecular descriptor a function of LC-BLYP γzzzz (see Figure SI2.1). Considering that α and γ are both properties of even order (second and fourth derivatives with respect to the external electric field, respectively), and that usually their variation with respect to the size and type of the chemical systems follows similar trends (although variations of Journal Name | 5 Please do not adjust margins Please do not adjust margins γ are usually far larger than variations of α), we also checked the correlation of ωCC with αzz, the calculation of which bears a lower computational cost (see Figure SI2.2). However, since the γ-NLO set includes molecules of different sizes, one wants to consider size-extensive descriptors. This goal could be achieved by using γ/α, α/N or γ/N 2 (where N is the number of electrons in the molecule) instead of α or γ. For instance, by using as descriptor the ratio log(γ/α), we obtained fairly good correlations for all molecules of the set with the exception of the hydrogen chains (see Figure SI2.3). The use of a logarithm in the definition of the descriptors is motivated by the restriction of ωCC, which only takes values in the interval [0,1]. The results were even better using as descriptor log(γ/N 2 ), which provides a sufficiently good correlation for all the molecules of the set (see Figure SI2.4). However, the best correlation was obtained using as descriptor the linear polarizability obtained at LC-BLYP level and the number of electrons in the following way: This descriptor will be used throughout this work (see Figure 2). LC-BLYP α can be determined analytically using most computational packages. Therefore, lα descriptor bears a low computational cost and it can be fairly easily obtained.
As shown in Figure 2, there is fairly good quadratic correlation between ωCC and lα descriptor (Eq. 7, R 2 =0.77), providing a prediction of ω that leads to second hyperpolarizability values close to CCSD(T).
= 0.6269 · 2 − 0.4556 · + 0.3791 The predicted ωTα values obtained with Eq. 7 have a mean absolute error (MAE) with respect to the optimal ωCC of only 0.04 Bohr -1 . The correlation was stable under a leave-N-out (N=1, 2, 3) cross-validation tests 54 since the decrease in R 2 was only 0.02 for the three tests, indicating that there is no overfitting of the model.
γ values close to CCSD(T) quality could be achieved using the ωTα obtained from Eq. 7 within the OT-LC-BLYP framework. We refer to this new tuned range-separated hybrid density functional approach as Tα-LC-BLYP. In order to compute the second hyperpolarizability of a molecule using the Tα-LC-BLYP method, one must first compute the α value with the LC-BLYP functional, and obtain the corresponding ωTα from Eq. 6 and Eq. 7 to be used into the Tα-LC-BLYP functional for the final evaluation of γ.
It is worth mentioning that all tested descriptors based on α or γ showed two clearly distinct behaviours for molecules with either large or small γ, leading to the classification of the molecules in two subsets. The two subsets present linear correlations between ωCC and the (N)LOP-based indices with two very different slopes (see Figure SI2.7). The boundary between the small and large second hyperpolarizability regions is, to some extent, arbitrary.
Nevertheless, in order to keep our scheme as simple as possible, we chose a parabolic function, which can explain the variability of ωCC with respect to lα for the whole molecular set (see Figure  2). Interestingly, the MAE in the predictions of ω using one unique parabolic correlation for the whole set of molecules was lower than using two different linear correlations for each subset. Differences smaller than 0.05 Bohr -1 between the predicted ωTα and the optimal ωCC barely affect the final γ obtained for molecules with large second hyperpolarizabilities. For this reason, the values of ω were rounded at the second decimal. This threshold value could be even increased for molecules presenting low values of γ because then the value of the property becomes much less dependent on the RSH parameter. The small sensitivity of γ to the small variations of ωTα is responsible for the good performance of Eq. 7, which accounts for 77% of the variability of the second hyperpolarizability calculated at the CCSD(T) level of theory.

Benchmarking of DFA for the evaluation of the second hyperpolarizability
Once the correlation between ωCC and lα was established, we used the Tα-LC-BLYP functional to compute the static hyperpolarizabilities of the molecules of the γ-NLO set (see Table 2), and compare them to the ones obtained from other nine commonly used DFAs, namely the global hybrid GGAs B3LYP, 55, 56 PBE0, 57 BH&HLYP; 58 the global hybrid meta-GGAs M06-2X, 59 and MN15; 60 the long-range corrected functionals LC-BLYP, CAM-B3LYP, and ωB97xD 61 and the optimally-tuned LC-BLYP based on the minimization of J 2 values (Eq. 3), which we labelled as OT-LC-BLYP. We did not include GGA functionals because they usually give far less accurate γ. 43 The results presented in Figure 3 show that the mean absolute percentage error (MAPE) and MAE with respect to CCSD(T) values obtained with Tα-LC-BLYP are clearly lower than the errors obtained with the other nine functionals.
In agreement with the conclusions of previous studies, 43 the average errors in the calculation of γ are lower for the functionals that include more exact exchange. This is true for both hybrid GGAs and hybrid meta-GGAs. In this vein, BH&HLYP and M06-2X provide more accurate values than B3LYP, PBE0, or MN15. As expected, for the γ-NLO set, the most accurate values of γ are obtained by the range-separated functionals LC-BLYP, CAM-B3LYP, ωB97xD, and OT-LC-BLYP. Interestingly, the MAPEs obtained for these four DFAs are quite similar and only slightly smaller than for hybrid and meta-hybrids with a high percentage of exact exchange. Although LC-BLYP has a poor performance for the molecules with small γ, it gives more accurate second hyperpolarizabilities than the other RSHs for the molecules of the γ-NLO set with the highest γ values.

Please do not adjust margins
Please do not adjust margins  To explore the robustness of Tα-LC-BLYP, we have checked its performance for two sets of molecules with electronic structures that differ significantly from the ones included in the γ-NLO molecular set (see Figure 4). Molecular set 1 includes obenzyne, m-benzyne, and p-benzyne in their diradical singlet ground state. The NLOPs of these systems strongly depend on their diradical character. Molecular set 2 includes 8 hydrogen-Please do not adjust margins Please do not adjust margins bonded small dimers taken from reference 62. The NLOPs of these molecules arise from the interaction between the two monomers.
The best results in the calculation of second hyperpolarizabilities of the hydrogen-bonded complexes included in set 2 are given by M06-2X, CAM-B3LYP, and MN15 functionals, with MAEs of 3.24·10 2 , 4.49·10 2 , and 5.14·10 2 a.u., respectively (see Figure 6).    The values of ωCC and lα obtained for the new molecules do not trigger a substantial change in the correlation found for the γ-NLO molecular set (see Figure 2).
Since almost all points of the sets fit properly into the quadratic form that was previously fitted for γ-NLO molecular set (Eq. 7), qualitative good γ results are expected for the molecules in the sets 1 and 2 (see Tables SI2.8-SI2.10). For both sets, Tα-LC-BLYP clearly improves the results obtained with the LC-BLYP functional, being the Tα-LC-BLYP average errors about half the LC-BLYP ones. Indeed, for both sets, Tα-LC-BLYP is one of the best DFAs among the ones tested.
We also benchmarked the ten studied DFAs for the calculation of μz, αzz, and βzzz using as reference the CCSD(T) values of the 60 molecules of γ-NLO set (Tables SI1. [4][5][6][7][8][9][10][11][12]. Although Tα-LC-BLYP has been designed to give accurate results for second hyperpolarizabilities, this specific tailoring is not detrimental for the calculation of the dipole moment, the linear polarizability and the first hyperpolarizability. As a matter of fact, the MAPE of Tα-LC-BLYP is the smallest for μz, the second smallest for αzz and third smallest for βzzz. In order to test whether Tα-LC-BLYP is biased towards the calculation of second hyperpolarizabilites, we have also analyzed other properties. In particular, we have optimized the geometries and computed the vibrational frequencies of the molecules of the γ-NLO set with the B3LYP, LC-BLYP and Tα-LC-BLYP functionals and compared their values with respect to MP2 values (Tables SI1.13-15). The root-meansquare-deviation (RMSD) of the bond distances, angles, dihedrals and vibrational frequencies show that, although the best results are obtained with the B3LYP functional, the errors of Tα-LC-BLYP are lower than the errors of its parent functional LC-BLYP.

Conclusions
In this manuscript, 9 commonly used DFAs (the global hybrid GGAs B3LYP, PBE0, BH&HLYP, the global hybrid meta-GGAs M06-2X and MN15, the long-range corrected functionals LC-BLYP, CAM-B3LYP, and ωB97xD, and the optimally-tuned LC-BLYP based on the minimization of the J 2 values, OT-LC-BLYP) have been benchmarked for the evaluation of the second hyperpolarizability using as reference the CCSD(T) values. To this end, we have designed the γ-NLO molecular set, which contains a total of 60 entries. For the hybrid GGAs and hybrid meta-GGAs, the average errors of γ decrease with the increase of the amount of exact exchange included in the DFA. Among these nine functionals, the long-range corrected functionals are the most accurate, LC-BLYP giving the lowest average errors in the calculation of the second hyperpolarizabilities of the γ-NLO set.
For every system of the γ-NLO molecular set, we have determined the value of the range-separation parameter ω required to reproduce the CCSD(T) γ with LC-BLYP functional (ωCC). We have shown that ω is a key parameter in the RSH construction for γ calculation. Then, the γ for molecules with ωCC close to the LC-BLYP ω (i.e., ω=0.47) is more accurate if they are computed with this functional rather than with CAM-B3LYP, whereas the opposite is true for molecules with ωCC close to the CAM-B3LYP ω. The most accurate γ computed is obtained by the new tuned Tα-LC-BLYP functional, reducing about half the average errors of LC-BLYP, which ranks as the second most accurate DFA tested in this benchmark.
The design of Tα-LC-BLYP is based on a quadratic correlation between ωCC and the index given by Eq. 6. This correlation includes both the molecules displaying small and large γ. The tuned range-separated functional Tα-LC-BLYP employs this quadratic correlation to predict the tuned ω (ωTα) that reproduces the CCSD(T) γ. Tα-LC-BLYP clearly improves the results obtained with the most popular hybrid and rangeseparated hybrids and bears a computational cost only slightly higher than LC-BLYP, since, in addition to the calculation of γ, it only requires the previous calculation of the linear polarizability with LC-BLYP. Even though Tα-LC-BLYP is tailored for the calculation of γ, Tα-LC-BLYP performs also well for the calculation of other linear and non-linear optical properties, such as μz, αzz, and βzzz.
The results obtained for set 1 and set 2 are very promising and suggest that the correlation found herein could be generalized to include a wider spectrum of molecular systems and molecules that display high NLOP for reasons other than πdelocalization.

Author contributions
P.B-S. performed all the NLOP calculations, wrote the scripts and Fortran codes required for the data generation and analysis, and wrote the first draft of the manuscript. S.P.S. helped with the calculations that required the optimization of J and contributed to the draft. P.S., E.M. and J.M.L. designed and supervised the project and co-wrote the manuscript. All authors have contributed to the results' analysis as well as give approval to the final version of the manuscript.

Conflicts of interest
There are no conflicts to declare.