\documentclass[aps,prl,twocolumn,groupedaddress]{revtex4}
\usepackage{graphicx}
\bibliographystyle{apsrev}
\begin{document}
\title{Supplementary Information, Path Finding on High-Dimensional
Free Energy Landscapes}
\author{Grisell Diaz Leines}
\author{Bernd Ensing}
\email[]{b.ensing@uva.nl}
\homepage[]{http://molsim.chem.uva.nl}
\affiliation{Van 't Hoff Institute for Molecular Sciences,
Universiteit van Amsterdam, Science Park 904, 1098 XH
Amsterdam, The Netherlands}
\date{\today}
\maketitle
\subsection{Derivation of the path collective variable}
To define a reaction coordinate as a function of collective
variables $\sigma({\bf z})$, we use a projection
of ${\bf z}$ onto a discretized path of $M+1$ equidistant points, or
nodes, ${\bf s}_0, {\bf s}_1,\dots, {\bf s}_M$. The path of nodes forms a
smooth curve that connects the reactant and product free energy minima
and the projection provides a foliation of the ${\bf z}$-space such
that it attributes to each configuration of the system a measure of
the progress along the reaction coordinate, $\sigma({\bf z})$.
The left panel in figure~\ref{fig:foliation} illustrates the geometric
projection of a point ${\bf z}$ in a three-dimensional collective
variable space onto a path of nodes, yielding the point ${\bf
s}(\sigma({\bf z}))$. Here, $\sigma({\bf z})$ is the desired reaction
coordinate value, which we wish to be equal to zero for points ${\bf
z}$ that are in the plane $S_0$ perpendicular to the curve through the
reactant state minimum and equal to one for ${\bf z}$-configurations
that lie in the perpendicular plane $S_1$ through the product state
minimum. The geometric expression (eq.~4 in the paper) that we use
for this projection is derived in the following way.
First, we locate the path node, ${\bf s}_m$, that is closest to the
point ${\bf z}$ that we wish to project onto the path. Next, we make
use of the fact that, irrespective of the dimensionality, we can draw
a circle with center ${\bf c}$ through the triplet of nodes ${\bf
s}_m$ and its neighboring nodes ${\bf s}_{m-1}$ and ${\bf
s}_{m+1}$. In figure~\ref{fig:foliation} (left), point ${\bf z}$
happens to lie in front of the plane through the nodes. A gray
triangle depicts the plane $S_\sigma$ through ${\bf c}$ and ${\bf z}$
that is perpendicular to the plane of the circle. As the circle is a
smooth curve through the nodes, and the plane $S_\sigma$ is normal to
this curve, we choose $S_\sigma$ to be the plane that connects all
points that are projected to the same point ${\bf s}(\sigma({\bf z}))$
on the path and thus have the same $\sigma$ value.
The white slice indicates the range of all possible planes $S_\sigma$
that are closest to node ${\bf s}_m$. This range runs from $\sigma({\bf
z})=\frac{m-1/2}{M}$, where the plane bisects the line segment
between ${\bf s}_{m-1}$ and ${\bf s}_m$, to $\sigma({\bf
z})=\frac{m+1/2}{M}$, where the plane is equidistant from
${\bf s}_{m}$ and ${\bf s}_{m+1}$. As at these limits the
plane is also perpendicular to the line segment, it is easy to see
that the plane at one end coincides with the plane at the
corresponding end of the next triplet of nodes, resulting a smooth
foliation along the entire path.
To find the plane through ${\bf z}$ and thus the reaction coordinate
value $\sigma({\bf z})$, we set out to locate the point ${\bf x}$ on
the circle arc between ${\bf s}_{m}$ and ${\bf s}_{m+1}$, for which
\begin{equation}
|{\bf z} - {\bf s}_{m-1}| = |{\bf x} - {\bf z}|.
\end{equation}
\noindent
However, to avoid computation of arc lengths, we make an approximation
by assuming ${\bf x}$ to lie on the linear segment between ${\bf
s}_{m}$ and ${\bf s}_{m+1}$ (see figure~\ref{fig:foliation}) and
then define the fraction of the arc length from ${\bf
s}_{m}$ to ${\bf x}$ as:
\begin{equation}
f=\frac{|{\bf x} - {\bf s}_m|}{|{\bf s}_{m+1} - {\bf s}_m|}.
\label{eq_pointx}
\end{equation}
\noindent
This fraction is directly related to the projection of
${\bf z}$ to $\sigma$ as:
\begin{equation}
f\in[0,1] \Rightarrow \sigma \in \Big[\frac{m-1/2}{M},\frac{m}{M}\Big]
\label{eq_fraction}
\end{equation}
\begin{figure*}
\includegraphics[width=70mm,clip=true]{figure_suppinfo1.png}
\includegraphics[width=75mm,clip=true]{figure_suppinfo2.png}
\caption{\label{fig:foliation}
Left: a point ${\bf z}$ in a, here, three-dimensional, collective
variable space is projected onto a path of nodes, ${\bf s}(\sigma)$,
leading to point ${\bf s}(\sigma({\bf z}))$. Right: the reaction
coordinate value $\sigma$ as a function of the two torsion angle
collective variables used for the alanine dipeptide.
}
\end{figure*}
\noindent
Determining $\sigma (z)$ thus requires finding the fraction $f$. By
combining equations~\ref{eq_pointx} and~\ref{eq_fraction}, we can
write:
\begin{equation}
|{\bf z}-{\bf s}_{m-1}| = |f ({\bf s}_{m+1}-{\bf s}_{m}) +
{\bf s}_{m} - {\bf z}|
\end{equation}
\noindent
Introducing ${\bf v}_{1}= {\bf s}_{m} - {\bf z}$,
${\bf v}_{2}= {\bf z}-{\bf s}_{m-1}$ and ${\bf
v}_{3}={\bf s}_{m+1}-{\bf s}_{m}$, this equation is simplified to
\begin{equation}
|{\bf v}_{2}| = |f {\bf v}_{3} + {\bf v}_{1}|.
\end{equation}
\noindent
Rewriting to solve for $f$ results in a quadratic equation,
\begin{equation}
\label{eq16}
|{\bf v}_{3}|^{2} f^{2} +2({\bf v}_{1} \cdot {\bf v}_{3}) f + (|{\bf
v}_{1}|^{2} - |{\bf v}_{2}|^{2}) = 0
\end{equation}
\noindent with (relevant) root:
\begin{equation}
\label{eq17}
f = \frac {\sqrt{ ({\bf v}_{1} \cdot {\bf v}_{3})^{2} - |{\bf
v}_{3}|^{2} (|{\bf v}_{1}|^{2}- |{\bf v}_{2}|^{2})} -({\bf
v}_{1} \cdot {\bf v}_{3})}{|{\bf v}_{3}|^{2}}.
\end{equation}
The reaction coordinate value is related to $f$ by:
\begin{equation}
\label{eq18}
\sigma = \frac{m}{M} \pm \Big(\frac{f-1}{2M} \Big),
\end{equation}
\noindent
where the $\pm$ sign depends on whether $\sigma({\bf z})$ is on the
right or the left side of node ${\bf s}_{m}$. Combining the
expression for $f$ (eq.~\ref{eq17}) with equation~\ref{eq18} results
in the full expression for the projection:
\begin{eqnarray}
\label{eq19}
\sigma &=& \frac{m}{M} \pm \frac{ \sqrt{ ({\bf v}_{1} \cdot
{\bf v}_{3})^{2} - |{\bf v}_{3}|^{2} (|{\bf v}_{1}|^{2}- |{\bf
v}_{2}|^{2})}}{2M|{\bf v}_{3}|^{2}} \nonumber \\
&-& \frac{({\bf v}_{1} \cdot {\bf v}_{3}) - |{\bf v}_3|^2}{2M|{\bf
v}_{3}|^{2}}
\end{eqnarray}
\noindent
This measure of the progress along the path collective variable is
robust and very easy to compute, also for large sets of collective
variables. Moreover, it provides in the neighborhood of the path a
smooth foliation in ${\bf z}$-space as illustrated in the right panel
of figure~\ref{fig:foliation} for the case of the alanine dipeptide.
\begin{figure}[tb]
\includegraphics[width=85mm,clip=true]{figure_suppinfo3.png}
\caption{\label{fig:dynamics}Evolution of the path collective
variable, $\sigma$ (bottom panel) and the torsion angles $\psi$ and
$\phi$ in radians of the alanine dipeptide during the
path-metadynamics simulation (top and middle panel).}
\end{figure}
\subsection{Convergence of the path and the free energy}
The path-metadynamics simulation of the C7$_\mathrm{eq} \rightarrow$
C7$_\mathrm{ax}$ transition for the alanine dipeptide started from a
configuration that belongs to the C7$_\mathrm{eq}$ reactant state. The
(biased) dynamics of the collective variables of the first 1.5~ns
simulation is shown in figure~\ref{fig:dynamics}. The top and middle
panels show the evolution of the torsion angles $\psi$ and $\phi$. The
bottom panel shows the corresponding $\sigma$-value along the (moving)
path of nodes. The first barrier crossing occurs after about 175~ps
and after about 280~ps the system re-crosses back to the reactant
state. The time between successive barrier crossings does not reduce
here because after each recrossing the size of the repulsive Gaussian
potentials that are continuously added to the metadynamics biasing
potential is reduced in order to further refine the estimation of the
free energy profile.
\begin{figure}[tb]
\includegraphics[width=80mm,clip=true]{figure_suppinfo4.png}
\caption{\label{fig:convergence} Convergence of the path shown as the
mean square deviation of the node positions with respect to those of
the initial guess path as a function of time (top) and convergence
of the free energy profile shown as the mean square deviation of the
free energy at each node with respect to the final value (bottom).}
\end{figure}
Figure~\ref{fig:convergence} shows the convergence behavior of the
path-metadynamics method for the case of the alanine dipeptide
conformational transition. The convergence of the path is computed by
summing the root mean square distance between the path nodes at time $t$
and at time zero, as follows:
\begin{equation}
\mathrm{RMSD}(s) = \sqrt{\frac{1}{M} \sum_{i=1}^{M} \Big(|{\bf s}_{i}(t)-{\bf
s}_{i}(0)|\Big)^2},
\end{equation}
\noindent
with $M$ the number of nodes. The top panel in
figure~\ref{fig:convergence} shows that the path is converged after
about 500~ps path-metadynamics simulation. The dip in the curve
between 175 and 280~ps matches with the time that the system visits
and reconstructs the product state for the first time. Due to
algorithm used for the reparametrization of the curve to maintain
equidistant nodes, the positions of the nodes in the reactant state
deteriorate somewhat while the system is in the product state. After
the next barrier crossing this is fixed again and the path converges
quickly.
The convergence of the free energy profile is monitored using the root
mean square deviation of the free energy at each node with respect to
the final values:
\begin{figure}[tb]
\includegraphics[width=85mm,clip=true]{figure_suppinfo5.png}
\caption{\label{fig:comparisson} Convergence of the path (top panel)
and the free energy (bottom panel) of the path-metadynamics as
presented in the Letter (black lines) compared to that of a variant
in which the path follows the steepest descent along the gradient of
the free energy landscape (red lines) and a third variant in which
the free energy and path are optimized in an alternating fashion
until convergence (blue lines).}
\end{figure}
\begin{equation}
\mathrm{RMSD}(F) = \sqrt{\frac{1}{M} \sum_{i=1}^{M}
\Big(F_i(t)-F_i^\mathrm{final}\Big)^2}.
\end{equation}
As expected, the free energy profile converges soon after the path
nodes are converged, as shown in the bottom panel of
figure~\ref{fig:convergence}.
\subsection{Using the gradient of the free energy landscape}
Here we briefly discuss the effects of two important features of the
path-metadynamics method. The first is the approach to advance the
path of nodes to the highest transition density. This is a new
alternative to the approach of evolving the nodes along the gradient
of the free energy. Secondly, the method reconstructs the free energy
profile along the path at the same time as it evolves the guess path,
rather than converging the free energy and the path in an alternating
fashion.
We performed a series of simulations, using as a simplified model of
the alanine dipeptide a single particle on the two-dimensional
Ramachandran potential energy surface coupled to a Langevin thermostat
with a friction coefficient of 0.5, such that it mimics to a high
degree the dynamics of the alanine dipeptide molecule. The black graphs
in Figure~\ref{fig:comparisson} show the convergence of the path and
the free energy for this system using path-metadynamics, in very good
agreement with figure~\ref{fig:convergence}.
By using a very stiff harmonic ``tube'' potential that confines the
sampling to stay very close to the guess path, we can effectively
accumulate the average force due to the tube potential at each
node. This average force is an estimator of the gradient of the free
energy perpendicular to the gradient. We evolved the guess path
following the steepest descent toward the minimum free energy path,
which is where the perpendicular gradient is zero. The step-size for
the evolution of the nodes was taken as a compromise between good
efficiency and stability of the integrator for the evolution of the
nodes. The metric tensor was taken to be the 2x2 unit matrix.
Figure~\ref{fig:comparisson} shows the convergence of the path and the
free energy for this model system using two different sampling lengths
between each update of the path: 5~ps (solid red lines) and 12.5~ps
(dashed lines). Clearly, for this system, the approach of following
stepwise the free energy gradient is significantly slower than the
path-metadynamics technique of allowing the dynamics of the nodes to
move unhindered to the highest transition density. Note that a shorter
sampling length leads initially to a faster convergence, however, the
less accurate estimates of the average force results in larger
fluctuations in path motion on the long run, without really
converging. We also tried shorter sampling lengths of 1.0 and 0.5~ps
(data not shown), which do not converge at all. The algorithm can be
improved by increasing the sampling length during the simulation. A
factor that would further decrease the performance of the gradient
approach for an actual molecular system is that each update of the
nodes would require an unproductive equilibration simulation to
accommodate the system to its new confining path configuration.
Finally, we show the advantage of optimizing simultaneously the path
and the free energy profile by comparing to the alternative approach
of optimizing them in an alternating fashion. That is, first the free
energy profile is converged along a fixed guess path. Then the path is
moved to the sampled average transition density, after which the
second metadynamics simulation reconstructs the free energy profile
along the new guess path, and so forth. Not surprisingly this approach
is much less efficient than simultaneous optimization. We tried to
improve the alternating approach by, in the first four cycles, only
performing a rough free energy reconstruction, by only sampling one
barrier crossing and recrossing. Although this renders a 3 to 5 times
speedup compared to converging the free energy profile in every cycle,
this is still much less efficient compared to the simultaneous
optimization as done in the path-metadynamics method. The results are
shown by the blue lines in Figure~\ref{fig:comparisson}.
\end{document}