transition_sampling.likelihood package

Submodules

transition_sampling.likelihood.likelihood_max module

Likelihood maximization of aimless shooting results.

class transition_sampling.likelihood.likelihood_max.Maximizer(colvars_file, csv_file, niter=100, use_jac=True)

Bases: object

Class for finding the best reaction coordinate with likelihood maximization.

Utilizes the results of aimless shooting and colvar calculation to determine the optimum reaction coordinate by maximizing the likelihood of accepted being on a transition path and rejected states being off it. See optimization.optimize for more details on the maximization.

This is used to iteratively perform the maximization for all combinations of CVs starting from a single CV and going to longer combinations if adding another yields a statistically significant increase in likelihood. Construct an instance and call maximize.

Parameters
  • colvars_file (str) – File path to the CVs calculated with plumed in the default format.

  • csv_file (str) – File path to the cvs generated by aimless shooting that corresponds to the colvars_file

  • niter (Union[int, list[int]]) – Number of local optimizations to determine a global optimum. This can either be a single integer or a list of integers with the same length as the number of CVs. The list gives the number of optimizations to do for each combination length - it may be beneficial to have less iterations for higher lengths of CVs if the optimization becomes too long

  • use_jac (bool) – If true, the analytical jacobian will be calculated and used in the optimization. Usually results in some speedup.

colvars

A DataFrame of all the collective variables loaded from colvars_file

Type

pd.DataFrame

metadata

A DataFrame containing the metadata of each state in colvars_file, specifically if each was accepted or not.

Type

pd.DataFrame

niter

how many local optimizations to do for each global optimization with a given number of cvs. Length == # of cvs

Type

list[int]

use_jac

True if the analytical jacobian should be used in optimization

Type

bool

Raises

ValueError – If the number of states in the colvars_file != number of states in the csv_file

maximize(max_num_cvs=None)

Find the combination of CVs that maximize the likelihood of this data.

Only adds more CVs if they result in a statistically significant increase.

Parameters

max_num_cvs (Optional[int]) – The maximum number of CVs in a combination to evaluate (inclusive). If this number is reached before the statistical threshold for adding new CVs is, this function returns early. Set to None to set no limit

Return type

MaximizerSolution

Returns

  • All values for each tested combination, as well as the maximum

  • significant combination. See MaximizerSolution

class transition_sampling.likelihood.likelihood_max.MaximizerSolution(req_improvement)

Bases: object

Wrapper class to carry results for all tested combinations in maximization.

Parameters

req_improvement (float) – The required improvement of the objective function for an additional CV to be added to max

combinations

A dictionary representing the solution to all tested combinations. The first key indicates the length of the combination. Within that dictionary, the keys are sets of CV combinations, with values being a the SingleSolution representation of the optimization.

Type

dict[int: dict[frozenset: SingleSolution]]

max

The solution that has the maximum likelihood while still being significant as determined by req_improvement to add another CV

Type

SingleSolution

req_improvement

The required improvement of the objective function for an additional CV to be added to max

Type

float

to_csv(file)

Write the contents to a CSV. Non-present CVs get a NaN weight.

Parameters

file (str) – The filename to write the CSV to. Will append if a file is already present

Return type

None

to_dataframe()

Turn the contents into a rectangular data frame, one row per combination

In order to keep a rectangular format, cvs that aren’t in a specific combination get a weight of NaN. The objective function and all weights are

Return type

DataFrame

Returns

  • DataFrame with combination results as rows and columns of

  • [n_cvs, obj_val, p0, alpha0, <weight of each CV>]

class transition_sampling.likelihood.likelihood_max.SingleSolution(comb, obj, sol)

Bases: object

Carries the result of a single CV combination optimization.

Parameters
  • comb (tuple) – An ordered tuple of CVs that this solution represents

  • obj (float) – The objective function of this result

  • sol (ndarray) – The optimized solution. An ordered array of [p0, alpha0, coeffs_for_each_CV in comb]

See Parameters

transition_sampling.likelihood.optimization module

Likelihood maximization to determine reaction coordinates from aimless shooting results.

In all documentation below, n refers to the number of shooting points being used as data and m refers to the number of collective variables contributing to the reaction coordinate.

transition_sampling.likelihood.optimization.calc_p(p_0, r_vals, r_jac)

Calculate P(TP|r) for reaction coordinate given a weights vector and colvars. Also returns the jacobian if requested.

Parameters
  • p_0 (float) – Constant multiplier

  • r_vals (ndarray) – A length n vector of calculated reaction coordinates.

  • r_jac (ndarray) – If None, the jacobian will not be calculated. Otherwise, this should be an (n x m + 2) matrix representing the jacobian of each r with respect to each alpha. The 0 index column should be empty space. Columns 1:m+2 represent jacobians of the alphas

Return type

tuple[ndarray, Optional[ndarray]]

Returns

  • A tuple of two numpy arrays.

  • First term – A length n vector of P(TP|r) for every r

  • Second term – None if r_jac is None. Otherwise an (n x m + 2) matrix representing the jacobian of each p with respect to all parameters. The columns are ordered as p_0, alphas.

transition_sampling.likelihood.optimization.calc_r(alphas, colvars, calc_jac)

Calculate linear combination of reaction coordinates given a weights vector and colvars matrix. Also returns the jacobian with respect to the alphas if requested.

Parameters
  • alphas (ndarray) – a length m+1 array where m is the number of collective variables being analyzed. alphas[0] refers to the constant added term.

  • colvars (ndarray) – an (n x m) matrix where n is the number of shooting points and m is the number of collective variables.

  • calc_jac (bool) – True if the jacobian should be calculated and returned.

Return type

tuple[ndarray, Optional[ndarray]]

Returns

  • A tuple of two numpy arrays.

  • First term – A length n vector of calculated reaction coordinates, one for each shooting point.

  • Second term – None if calc_jac = False. Otherwise, an (n x m + 2) matrix representing the jacobian of each coordinate with respect to each alpha. The 0 index column is empty space allocated for p_0 later on. Columns 1:m+2 represent jacobians of the alphas

transition_sampling.likelihood.optimization.obj_func(to_opt, colvars, is_accepted, calc_jac)

Objective function to minimize for colvar maximization with fixed p_0

Parameters
  • to_opt (ndarray) – Array of m+2 parameters to optimize. Convention is - to_opt[0] : p_0 term - to_opt[1] : alpha_0, i.e. the constant added weight - to_opt[2:] : alphas_1…alphas_m for m collective variables.

  • colvars (ndarray) – an (n x m) matrix where n is the number of shooting points and m is the number of collective variables.

  • is_accepted (ndarray) – A length n boolean np array corresponding to if the ith state in colvars was accepted

  • calc_jac (bool) – True if the jacobian should be calculated and also returned

Return type

tuple[float, Optional[ndarray]]

Returns

  • A tuple of float, numpy array.

  • First term – The value to be minimized for likelihood maximization.

  • Second term – None if calc_jac is None. Otherwise: an m+2 length array representing the jacobian of the objective function for each optimized parameter

transition_sampling.likelihood.optimization.optimize(colvars, is_accepted, niter=100, use_jac=True)

Use basinhopping for global optimization of rxn coords as a linear combination of CVs.

The reaction coordinate \(r\) is linear combination of \(m\) collective variables \(\mathbf{x}\) with weights \(\mathbf{\alpha}\) plus a constant \(\alpha_0\):

\[r(\mathbf{x}) = \alpha_0 + \sum_{i=1}^m \alpha_ix_i\]

The probability that a shooting point is on a transition path (TP) is chosen to be modeled as a non-gaussian bell curve and is given by:

\[\Pr(TP|\mathbf{x}) = p_0 (1 - \tanh^2(r(\mathbf{x}))\]

This probability is maximized for CVs of accepted shooting points and minimized for rejected shooting points by optimizing \(\mathbf{\alpha}\), \(\alpha_0\), and \(p_0\).

Parameters
  • colvars (ndarray) – An (n x m) matrix where n is the number of shooting points and m is the number of collective variables to include in the reaction

  • is_accepted (ndarray) – A length n array of booleans indicating if the ith state was accepted or not.

  • niter (int) – Number of local optimizations to perform with basinhopping. Linearly scales.

  • use_jac (bool) – True if the analytical jacobian should be used during gradient descent. Otherwise, the default finite difference approximation will be used. In most cases, enabling this offers a speedup

Return type

tuple[float, ndarray]

Returns

  • The objective function of the solution and the solution which is an

  • array of length (m+2) of optimized parameters as [p0, alpha0, alphas]

Raises

RuntimeWarning – If the optimization was not successful

Module contents