Usage
This page details how to use lpath. Follow the Getting Started page to learn how to install lpath.
Introduction
lpath is made of four separate steps: discretize, extract, match, and plot. All these steps take a variety of different parameters, which are all listed in the API page.
The
discretizestep is used to discretize your trajectories for the purpose of finding successful transitions.The
extractstep takes what’s assigned indiscretizeand identifies all instances where there is a successful transition.The
matchstep takes what’s outputted inextractand cross pattern match to identify pathway classes. It is possible to reassign states in this step.The
plotstep takes what’s outputted inmatchand allows you to easily make different plots. The LPATHPlot class objects contains many pre-calculated datasets for custom plotting.
There are two different ways of running these steps. Due to the sheer amount of parameter options, it is recommended that users start with the Jupyter notebook.
Import each step’s
main()function and run everything in an interactive python session (e.g., Jupyter notebook). [RECOMMENDED]Run through the command line (e.g.,
lpath discretize -I west.h5 --assign-arguments='--config-from-file --scheme TEST'.
Molecular Dynamics
lpath is written to cluster pathways from molecular dynamics simulation. The input file will be a numpy or text file of the features used to assign states. For example, an alanine dipeptide system will use the phi, psi angles.
Discretize
In this step, we will assign each frame of an MD trajectory based on the phi/psi dihedral angles saved in dihedral.npy.
Here is an example assign function (in a file called module.py) used for assigning states:
def assign_dih(input_array):
"""
This is an example function for mapping a list of features to state IDs. This should be subclassed.
Parameters
----------
input_array : numpy.ndarray
An array generated from load_file.
Returns
-------
state_list : list
A list containing
"""
state_list = []
for val in input_array:
if val[0] >= -180 and val[0] <= -45 and val[1] >= -55 and val[1] <= 30: # Phi/Psi for Alpha Helix
state_list.append(0)
elif val[0] >= 165 and val[0] <= 180 and val[1] >= -55 and val[1] <= 30:
state_list.append(0)
elif val[0] >= -170 and val[0] <= -55 and val[1] >= 40 and val[1] <= 100: # Phi/Psi for C7eq
state_list.append(1)
elif val[0] >= 25 and val[0] <= 90 and val[1] >= -55 and val[1] <= 0: # Phi/Psi for C7ax
state_list.append(2)
else:
state_list.append(3)
return state_list
We will monkey-patch this function into lpath.
From the command line, run the following:
lpath discretize -I dihedral.npy -O states.npy -af module.assign_dih --stride 100
We’ve read in
dihedral.npywith a stride step of 100. This smaller dataset will be discretized with module.assign_dih. This will generate astates.npyfile to be used in theextractstep.
Extract
In this step, we will identify any successful transitions in the trajectory. We will be looking at the C7eq to C7ax transition.
Since we already read in the data every 100 frames with --stride 100 in the discretize step, we do not need to use --stride again.
Users who haven’t used stride in discretize and would like their dataset reduced at this point may use --stride here.
Note: If you read in your trajectory in discretize with a different stride, your other pre-calculated datasets might no longer match frame-by-frame. For that dataset to be used for reassignment later on (i.e. in match), use the --feature-stride 100 option and provide the appropriate stride step.
From the command line, run the following:
lpath extract --extract-input states.npy --extract-output pathways.pickle --source-state 1 --target-state 2
Match
In this step, we will pattern match any successful transitions we’ve identified in extract. We will, again, be looking at the C7eq to C7ax transition.
From the command line, run the following:
lpath match --input-pickle succ_traj/pathways.pickle --output-pickle succ_traj/match-output.pickle \ --cluster-labels-output succ_traj/cluster_labels.npy
After the comparison process is completed, it should show you the dendrogram. Closing the figure should trigger prompts to guide you further.
Input
yif you think the threshold (horizontal line which dictates how many clusters there are) should be at a different value. Otherwise, inputnand tell the program how many clusters you want at the end.
Plot
This step will help you plot some of the most common graphs, such as dendrograms and histograms, directly from the pickle object generated from match. Users may also elect to use the plotting scripts from the examples folder.
There is a script to plot NetworkX plots there.
More specifically, the following graphs will be made in the plots folder:
* Dendrogram showing separation between clusters
* Weights/Cluster bar graph
* Target iteration histograms (per cluster)
* Event duration histograms (per cluster)
From the command line, run the following and it should generate a separate file for each of the above graphs:
lpath plot --plot-input succ_traj/match-output.pickle
More options for customizing the graphs can be found by running lpath plot --help.
Weighted Ensemble Simulations
lpath is written to cluster pathways generated by the WESTPA software suite. Make sure WESTPA is installed. See the Getting Started page for more information.
Discretize
We will use WESTPA’s w_assign tool to assign to states. See the tool’s wiki page and Sphinx documentation for more information about the tool.
We’ll try to discretize a multi.h5 (generated with w_multi_west --ibstates) with w_assign based on what’s defined with the TEST scheme in the west.cfg. In TEST, a rectilinear grid is constructed in the progress coordinate space and certain “bins” are selected and assigned to each state.
Run the following in the command line to run
w_assign:lpath discretize -we -W multi.h5 -A ANALYSIS/TEST/assign.h5 \ --assign-args="-W multi.h5 -r west.cfg --config-from-file --scheme TEST"
Extract
In this step, we will identify any successful transitions in the trajectory. We will be looking at the C7eq to C7ax transition.
If you are looking to compare using segment IDs in the next step (not recommended for simulations combined with w_multi_west) or want to include the waiting time (time spent in the source state) in the pattern matching, make sure you turn on --trace-basis to trace all the way back to the basis state. Do note that this significantly increases the time it requires to extract all successful trajectories.
From the command line, run the following:
lpath extract -we -W multi.h5 -A ANALYSIS/TEST/assign.h5 --source-state 1 \ --target-state 2 --extract-output output.pickle --out-dir succ_traj
Match
In this step, we will pattern match any successful transitions we’ve identified in extract. We will, again, be looking at the C7eq to C7ax transition.
This will do the pattern matching and output individual h5 files for each cluster.
From the command line, run the following:
lpath match -we --input-pickle succ_traj/output.pickle --output-pickle succ_traj/match-output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \ --export-h5 --file-pattern "west_succ_c{}.h5"
After the comparison process is completed, it should show you the dendrogram. Closing the figure should trigger prompts to guide you further.
Input
yif you think the threshold (horizontal line which dictates how many clusters there are) should be at a different value. Otherwise, inputnand tell the program how many clusters you want at the end.
For cases where you want to run pattern matching comparison between segment IDs, you will have to use the largest common substring --substring option. By default, the longest common subsequence algorithm is used.:
lpath match -we --input-pickle succ_traj/output.pickle --output-pickle succ_traj/match-output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \
--export-h5 --file-pattern "west_succ_c{}.h5" --reassign-method "reassign_segid" --substring
Plot
This step will help you plot some of the most common graphs, such as dendrograms and histograms, directly from the pickle object generated from match. Users may also elect to use the plotting scripts from the examples folder.
There is a script to plot NetworkX plots there.
More specifically, the following graphs will be made in the plots folder:
* Dendrogram showing separation between clusters
* Weights/Cluster bar graph
* Target iteration histograms (per cluster)
* Event duration histograms (per cluster)
From the command line, run the following and it should generate a separate file for each of the above graphs:
lpath plot --plot-input succ_traj/match-output.pickle
More options for customizing the graphs can be found by running lpath plot --help.
Example Reassign file
The following is a reassign function if you decides to reclassify your states:
def reassign_custom(data, pathways, dictionary, assign_file=None):
"""
Reclassify/assign frames into different states. This is highly
specific to the system. If w_assign's definition is sufficient,
you can proceed with what's made in the previous step
using ``reassign_identity``.
In this example, the dictionary maps state idx to its corresponding ``state_string``.
We suggest using alphabets as states.
Parameters
----------
data : list
An array with the data necessary to reassign, as extracted from ``output.pickle``.
pathways : numpy.ndarray
An empty array with shapes for iter_id/seg_id/state_id/pcoord_or_auxdata/frame#/weight.
dictionary : dict
An empty dictionary obj for mapping ``state_id`` with ``state string``. The last entry in
the dictionary should be the "unknown" state.
assign_file : str, default : None
A string pointing to the ``assign.h5`` file. Needed as a parameter for all functions,
but is ignored if it's an MD trajectory.
Returns
-------
dictionary : dict
A dictionary mapping each ``state_id`` (float/int) with a ``state string`` (character).
The last entry in the dictionary should be the "unknown" state.
"""
# Other example for grouping multiple states into one.
for idx, pathway in enumerate(data):
# The following shows how you can "merge" multiple states into
# a single one.
pathway = numpy.asarray(pathway)
# Further downsizing... to if pcoord is less than 5
first_contact = numpy.where(pathway[:, 3] < 5)[0][0]
for jdx, frame in enumerate(pathway):
# First copy all columns over
pathways[idx, jdx] = frame
# ortho is assigned to state 0
if frame[2] in [1, 3, 4, 6, 7, 9]:
frame[2] = 0
# para is assigned to state 1
elif frame[2] in [2, 5, 8]:
frame[2] = 1
# Unknown state is assigned 2
if jdx < first_contact:
frame[2] = 2
pathways[idx, jdx] = frame
# Generating a dictionary mapping each state
dictionary = {0: 'A', 1: 'B', 2: '!'}
return dictionary
Analyzing the Pickle Files
By default, the pickle outputs of lpath contains a nested list of lists (or a multi-dimensional numpy array). The
output from the extract step would be a list of lists. The output from match is a multi-dimensional numpy array,
with the shortest pathway 0-padded to the longest length. Regardless of format, they should contain very similar information.
Formatting of the Pickle Files
The shape of the output is in the format:
(n_pathways, n_frames,
(iter_id/seg_id/state_id/pcoord_or_auxdata/frame#/weight))
The first dimension would be the total number of pathways. The second dimension would be the number of frames in that pathways,
which would vary depending if --trace-basis is chosen. each frame would always contain information like iteration number, segment number,
state id, frame number, and weight. The number of columns for pcoord or auxdata would depend on the options used to generate the
pickle object.
Loading in the File with Python
To load in the pickle object from the binary file, one can run the following in a python or ipython terminal:
import pickle
with open('pickle.output', 'rb') as f:
pickle_obj = pickle.load(f)
pickle_obj[0] would be the first pathway traced. pickle_obj[1] would be second. len(pickle_obj) would be the total number of
successful transitions found.
Analyzing each pathway
Let’s look at the first traced pathway:
pathway = pickle_obj[0]
pathway[0] would be the first frame of the first pathway. Typically, this could correspond to the frame it last exited
from the source state. With -trace-basis, this will trace all the way to the basis state.
If pickle_obj is a list of lists (created using lpath extract), then pickle_obj[0][-1] should always point to
the last frame (first time it hits the target state). If the pickle obj is a multi-dimensional numpy array (created using
lpath match), then pickle_obj[0][-1] could point to the last frame (first time it hits the target state) or it could
be a 0-padded frame since numpy does not like ragged arrays. The 0-padded frames will always have an iteration number of 0.
One can filter those frames by searching for columns using numpy indexing (e.g., pathway[pathway[:,0] != 0]).