diffnets package¶
Subpackages¶
Submodules¶
diffnets.analysis module¶
-
class
diffnets.analysis.Analysis(net, netdir, datadir)[source]¶ Bases:
objectCore object for running analysis.
Parameters: - net (nnutils object) – Neural network to perform analysis with
- netdir (str) – path to directory with neural network results
- datadir (str) – path to directory with data required to train the data. Includes cm.npy, wm.npy, uwm.npy, master.pdb, an aligned_xtcs dir, and an indicators dir.
-
assign_labels_to_variants(plot_labels=False)[source]¶ - Map DiffNet labels to each variant with option to plot
- a histogram of the labels.
Parameters: plot_labels (optional, boolean) – Save a matplotlob figure of the label histogram. Returns: lab_v – Dictionary mapping labels to their respective variants. Return type: dictionary
-
find_feats(inds, out_fn, n_states=2000, num2plot=100, clusters=None)[source]¶ Generate a .pml file that will show the distances that change in a way that is most with changes in the classifications score.
Parameters: - inds (np.ndarray,) – Indices of the topology file that are to be included in calculating what distances are most correlated with classification score.
- out_fn (str) – Name of the output file.
- n_states (int (default=2000)) – How many cluster centers to calculate and use for correlation measurement.
- num2plot (int (default=100)) – Number of distances to be shown.
- clusters (enspara cluster object) – Cluster object with center_indices attribute
-
get_rmsd()[source]¶ Calculate RMSD between actual trajectory frames and autoencoder reconstructed frames
diffnets.data_processing module¶
-
exception
diffnets.data_processing.ImproperlyConfigured[source]¶ Bases:
ExceptionThe given configuration is incomplete or otherwise not usable.
-
class
diffnets.data_processing.ProcessTraj(traj_dir_paths, pdb_fn_paths, outdir, atom_sel=None, stride=1)[source]¶ Bases:
object- Process raw trajectory data to select a subset of atoms and align all
- frames to a reference pdb. Results in a directory structure that the training relies on.
Parameters: - traj_dir_paths (list of str’s, required) – One string/path for each variant to a dir that contains ALL trajectory files for that variant. ORDER MATTERS – when training you will set a value “act_map” that depends on this order.
- pdb_fn_paths (list of str’s, required) – One string/path for each variant to a dir that contains the starting pdb file. Variants must be in same order as traj_dir_paths.
- outdir (str) – Name of dir to output processed data to. This dir will be used as input during DiffNet training.
- atom_sel (str, or array-like, shape=(n_variants, n_inds)) – (default=”name CA or name CB or name N or name C”)
- If str, it should follow the selection syntax used in MDTraj. e.g. pdb.top.select(“name CA”) - “name CA” would be appropriate. If list, there should be a list of indices for each variant since choosing equivalent atoms may require different indexing for each variant.
- stride : integer (default=1)
- Subsample every nth data frame. Value of 1 means no subsampling.
-
make_master_pdb()[source]¶ Creates a reference pdb centered at the origin using the first variant pdb specified in self.pdb_fn_paths.
-
make_traj_list()[source]¶ Makes a list of all variant trajectories where each item is a list that contains 1) a path to the trajectory, 2) a path to the corresponding topology (pdb) file, 3) a trajectory number - from 0 to n where n is total number of trajectories, and 4) an integer to indicate which variant simulation the trajectory came from.
-
preprocess_traj(inputs)[source]¶ - Strip all trajectories to a subset of atoms and align to a
- reference pdb. Also, calculate and write out the mean center of mass of all atoms across all trajectories. Will write out new trajectory (.xtc files) and corresponding “inidcator” lists to indicate which variant simulation each data frame came from.
Parameters: inputs (array-like, shape=(n_trajectories,4)) – For each trajectory there should be 1) path to trajectory, 2) path to corresponding topology file, 3) output trajectory number, and 4) integer indicating which variant the trajectory came from.
-
class
diffnets.data_processing.WhitenTraj(data_dir)[source]¶ Bases:
objectNormalize the trajectories with a data whitening procedure [1] that removes covariance between atoms in trajectories.
Parameters: data_dir (str) – Path to a directory that contains a topology file, a file with the mean center of mass of all atoms across all trajectories, and a dir named “aligned_xtcs” with all aligned trajectories. References
[1] Wehmeyer C, Noé F. Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics. J Chem Phys. 2018. doi:10.1063/1.5011399
-
apply_unwhitening(whitened, uwm, cm)[source]¶ Apply whitening to XYZ coordinates.
Parameters: - whitened (np.ndarray, shape=(n_frames,3*n_atoms)) – Whitened XYZ coordinates of a trajectory.
- wm (np.ndarray, shape=(n_atoms*3,n_atoms*3)) – whitening matrix
- cm (np.ndarray, shape=(3*n_atoms,)) – Avg. center of mass of each atom across all trajectories.
Returns: coords – XYZ coordinates of a trajectory.
Return type: np.ndarray, shape=(n_frames,3*n_atoms)
-
apply_whitening(coords, wm, cm)[source]¶ Apply whitening to XYZ coordinates.
Parameters: - coords (np.ndarray, shape=(n_frames,3*n_atoms)) – XYZ coordinates of a trajectory.
- wm (np.ndarray, shape=(n_atoms*3,n_atoms*3)) – whitening matrix
- cm (np.ndarray, shape=(3*n_atoms,)) – Avg. center of mass of each atom across all trajectories.
Returns: whitened – Whitened XYZ coordinates of a trajectory.
Return type: np.ndarray, shape=(n_frames,3*n_atoms)
-
apply_whitening_xtc_dir(xtc_dir, top, wm, cm, n_cores, outdir)[source]¶ Apply data whitening parallelized across many trajectories
Parameters: - xtc_fn (list of str’s) – Paths to trajectories.
- top (md.Trajectory object) – Topology corresponding to the trajectories
- outdir (str) – Directory to output whitened trajectory
- wm (np.ndarray, shape=(n_atoms*3,n_atoms*3)) – whitening matrix
- cm (np.ndarray, shape=(3*n_atoms,)) – Avg. center of mass of each atom across all trajectories.
- n_cores (int) – Number of threads to parallelize task across.
-
get_c00(coords, cm, traj_num)[source]¶ Calculates the covariance matrix.
Parameters: - coords (np.ndarray, shape=(n_frames,3*n_atoms)) – XYZ coordinates of a trajectory.
- cm (np.ndarray, shape=(3*n_atoms,)) – Avg. center of mass of each atom across all trajectories.
- traj_num (integer) – Used to name the covariance matrix we are going to write out for a trajectory.
-
get_c00_xtc_list(xtc_fns, top, cm, n_cores)[source]¶ Calculate the covariance matrix across all trajectories.
Parameters: - xtc_fn (list of str’s) – Paths to trajectories.
- top (md.Trajectory object) – Topology corresponding to the trajectories
- cm (np.ndarray, shape=(3*n_atoms,)) – Avg. center of mass of each atom across all trajectories.
- n_cores (int) – Number of threads to parallelize task across.
Returns: c00 – Covariance matrix across all trajectories
Return type: np.ndarray, shape=(n_atoms*3,n_atoms*3)
-
get_wuw_mats(c00)[source]¶ - Calculate whitening matrix and unwhitening matrix.
- Method adapted from deeptime (https://github.com/markovmodel/deeptime/blob/master/time-lagged-autoencoder/tae/utils.py)
Parameters: c00 (np.ndarray, shape=(n_atoms*3,n_atoms*3)) – Covariance matrix Returns: - uwm (np.ndarray, shape=(n_atoms*3,n_atoms*3)) – unwhitening matrix
- wm (np.ndarray, shape=(n_atoms*3,n_atoms*3)) – whitening matrix
-
diffnets.exmax module¶
Copyright 2015 by Washington University in Saint Louis. Authored by S. Joshua Swamidass. A license to use for strictly non-commerical use is granted. Derivative work is not permited without prior written authorization. All other rights reserved.
-
diffnets.exmax.distribution_of_sum(P, ignore_idx={})[source]¶ Given a set of binomial random variables parameterized by a vector P. Ignoring variables in ignore_idx… What is the distribution of their sum?
Output is the discreet distribution D, where the probability of a specific sum s is D[s]. O(N^2) time in length of P
For example, using this P… >>> P = [0.5, 0.25, 0.5]
>>> distribution_of_sum(P) array([ 0.1875, 0.4375, 0.3125, 0.0625])
Ignoring the 2nd (1 in zero indexing) variable, we have… >>> distribution_of_sum(P, [1]) array([ 0.25, 0.5 , 0.25, 0. ])
-
diffnets.exmax.expectation_E_EXP(P, E_or)[source]¶ Given a set of binomial random variables parameterized by a vector P. Conditioned on E[at least one success] = E_or What is the expectation of each random variable?
Output is a vector E of expectations.
alternate, equivalent implementation for error checking the problem with this implementation is that it is exponential time
-
diffnets.exmax.expectation_or(P, E_or)¶ Given a set of binomial random variables parameterized by a vector P. Conditioned on E[at least one success] = E_or What is the expectation of each random variable?
Output is a vector E of expectations.
All the implementations should produce the same results.
>>> R = rand(10) >>> >>> EL = expectation_or_LINEAR(R, 1) >>> EC = expectation_or_CUBIC(R, 1) >>> EE = expectation_E_EXP(R, 1)
>>> correlation, pvalue = pearsonr(EL, EC) >>> correlation > .99 and pvalue < .01 True
>>> allclose(EL, EE) True >>> correlation, pvalue = pearsonr(EL, EE) >>> correlation > .99 and pvalue < .01 True
This shows that all versions yield results that are > 99% correlated.
And we know the results for some simple cases.
>>> expectation_or([0.5, 0.5], 1) array([ 0.66666667, 0.66666667]) >>> expectation_or([0.5, 0.5], .75) array([ 0.5, 0.5])
-
diffnets.exmax.expectation_or_CUBIC(P, E_or)[source]¶ Given a set of binomial random variables parameterized by a vector P. Conditioned on E[at least one success] = E_or What is the expectation of each random variable?
Output is a vector E of expectations.
alternate, equivalent implementation for error checking the problem with this implementation is that it is O(N^3) time
-
diffnets.exmax.expectation_or_LINEAR(P, E_or)[source]¶ Given a set of binomial random variables parameterized by a vector P. Conditioned on E[at least one success] = E_or What is the expectation of each random variable?
Output is a vector E of expectations.
All the implementations should produce the same results.
>>> R = rand(10) >>> >>> EL = expectation_or_LINEAR(R, 1) >>> EC = expectation_or_CUBIC(R, 1) >>> EE = expectation_E_EXP(R, 1)
>>> correlation, pvalue = pearsonr(EL, EC) >>> correlation > .99 and pvalue < .01 True
>>> allclose(EL, EE) True >>> correlation, pvalue = pearsonr(EL, EE) >>> correlation > .99 and pvalue < .01 True
This shows that all versions yield results that are > 99% correlated.
And we know the results for some simple cases.
>>> expectation_or([0.5, 0.5], 1) array([ 0.66666667, 0.66666667]) >>> expectation_or([0.5, 0.5], .75) array([ 0.5, 0.5])
-
diffnets.exmax.expectation_range(P, lower, upper)¶ Given a set of binomial random variables parameterized by a vector P. Conditioned on the number successes between lower and upper (inclusive). What is the expectation of each random variable?
Output is a vector E of expectations. O(N^3) time in length of P
>>> R = rand(10) # a random vector of probabilities 10 elements long. >>> >>> lower, upper = 3, 6 >>> >>> EC = expectation_range_CUBIC(R, lower, upper) >>> EE = expectation_range_EXP(R, lower, upper)
>>> correlation, pvalue = pearsonr(EE, EC) >>> correlation > .99 and pvalue < .01 True
This shows that both versions yield results that are > 99% correlated.
-
diffnets.exmax.expectation_range_CUBIC(P, lower, upper)[source]¶ Given a set of binomial random variables parameterized by a vector P. Conditioned on the number successes between lower and upper (inclusive). What is the expectation of each random variable?
Output is a vector E of expectations. O(N^3) time in length of P
>>> R = rand(10) # a random vector of probabilities 10 elements long. >>> >>> lower, upper = 3, 6 >>> >>> EC = expectation_range_CUBIC(R, lower, upper) >>> EE = expectation_range_EXP(R, lower, upper)
>>> correlation, pvalue = pearsonr(EE, EC) >>> correlation > .99 and pvalue < .01 True
This shows that both versions yield results that are > 99% correlated.
-
diffnets.exmax.expectation_range_EXP(P, lower, upper)[source]¶ Given a set of binomial random variables parameterized by a vector P. Condition on the number successes between lower and upper (inclusive). What is the expectation of each random variable?
This version is slow, but more conceptually clear.
Output is a vector E of expectations. O(2^N) time in length of P
This version suffers from floating point error, and should not be used for anything other than testing.
-
diffnets.exmax.rand()¶ scipy.rand is deprecated and will be removed in SciPy 2.0.0, use numpy.random.rand instead
diffnets.nnutils module¶
-
class
diffnets.nnutils.ae(layer_sizes, wm, uwm)[source]¶ Bases:
torch.nn.modules.module.ModuleUnsupervised autoencoder
Parameters: - layer_sizes (list) – List of integers indicating the size of each layer in the encoder including the latent layer. First two must be identical.
- wm (np.ndarray, shape=(n_inputs,n_inputs)) – Whitening matrix – is applied to input data
- uwm (np.ndarray, shape=(n_inputs,n_inputs)) – unwhitening matrix
-
decode(x)[source]¶ Pass the latent space vector through the decoder
Parameters: x (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector Returns: recon – Reconstruction of the original input data Return type: torch.cuda.FloatTensor or torch.FloatTensor
-
encode(x)[source]¶ Pass the data through the encoder to the latent layer.
Parameters: x (torch.cuda.FloatTensor or torch.FloatTensor) – Input data for a given sample Returns: latent – Latent space vector associated with encoder1 Return type: torch.cuda.FloatTensor or torch.FloatTensor
-
forward(x)[source]¶ Pass data through the entire network
Parameters: x (torch.cuda.FloatTensor or torch.FloatTensor) – Input data for a given sample Returns: - recon (torch.cuda.FloatTensor or torch.FloatTensor) – Reconstruction of the original input data
- latent (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector
-
freeze_weights(old_net=None)[source]¶ - Procedure to make the whitening matrix and unwhitening matrix
- as untrainable layers. Additionally, freezes weights associated with a previously learned encoder layer.
Parameters: old_net (ae object) – Previously trained network with overlapping architecture. Weights learned in this previous networks encoder will be frozen in the new network.
-
class
diffnets.nnutils.classify_ae(n_latent)[source]¶ Bases:
torch.nn.modules.module.ModuleLogistic Regression model
Parameters: n_latent (int) – Number of latent variables
-
diffnets.nnutils.my_l1(x, x_recon)[source]¶ Calculate l1 loss
Parameters: - x (torch.cuda.FloatTensor or torch.FloatTensor) – Input data
- x_recon (torch.cuda.FloatTensor or torch.FloatTensor) – Reconstructed input
Returns: Return type: torch.cuda.FloatTensor or torch.FloatTensor
-
diffnets.nnutils.my_mse(x, x_recon)[source]¶ Calculate mean squared error loss
Parameters: - x (torch.cuda.FloatTensor or torch.FloatTensor) – Input data
- x_recon (torch.cuda.FloatTensor or torch.FloatTensor) – Reconstructed input
Returns: Return type: torch.cuda.FloatTensor or torch.FloatTensor
-
class
diffnets.nnutils.sae(layer_sizes, wm, uwm)[source]¶ Bases:
diffnets.nnutils.aeSupervised autoencoder
Parameters: - layer_sizes (list) – List of integers indicating the size of each layer in the
encoder including the latent layer. First two must be identical.
- wm : np.ndarray, shape=(n_inputs,n_inputs)
Whitening matrix – is applied to input data
- uwm : np.ndarray, shape=(n_inputs,n_inputs)
unwhitening matrix
-
classify(latent)[source]¶ Perfom classification task using latent space representation
Parameters: latent (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector Returns: Return type: Value between 0 and 1
-
forward(x)[source]¶ Pass through the entire network
Parameters: x (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector Returns: - recon (torch.cuda.FloatTensor or torch.FloatTensor) – Reconstruction of the original input data
- latent (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector
- label – Value between 0 and 1
-
class
diffnets.nnutils.split_ae(layer_sizes, inds1, inds2, wm, uwm)[source]¶ Bases:
torch.nn.modules.module.ModuleUnsupervised autoencoder with a split input (i.e. 2 encoders)
Parameters: - layer_sizes (list) – List of integers indicating the size of each layer in the encoder including the latent layer. First two must be identical.
- inds1 (np.ndarray) – Indices in the training input array that go into encoder1.
- inds2 (np.ndarray) – Indices in the training input array that go into encoder2.
- wm (np.ndarray, shape=(n_inputs,n_inputs)) – Whitening matrix – is applied to input data
- uwm (np.ndarray, shape=(n_inputs,n_inputs)) – unwhitening matrix
-
decode(latent)[source]¶ Pass the latent space vector through the decoder
Parameters: latent (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector Returns: recon – Reconstruction of the original input data Return type: torch.cuda.FloatTensor or torch.FloatTensor
-
encode(x)[source]¶ Pass the data through the encoder to the latent layer.
Parameters: x (torch.cuda.FloatTensor or torch.FloatTensor) – Input data for a given sample Returns: - lat1 (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector associated with encoder1
- lat2 (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector associated with encoder2
-
forward(x)[source]¶ Pass data through the entire network
Parameters: x (torch.cuda.FloatTensor or torch.FloatTensor) – Input data for a given sample Returns: - recon (torch.cuda.FloatTensor or torch.FloatTensor) – Reconstruction of the original input data
- latent (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector
-
freeze_weights(old_net=None)[source]¶ - Procedure to make the whitening matrix and unwhitening matrix
- as untrainable layers. Additionally, freezes weights associated with a previously learned encoder layer.
Parameters: old_net (split_ae object) – Previously trained network with overlapping architecture. Weights learned in this previous networks encoder will be frozen in the new network.
-
split_inds¶
-
diffnets.nnutils.split_inds(pdb, resnum, focus_dist)[source]¶ - Identify indices close and far from a residue of interest.
- Each index corresponds to an X,Y, or Z coordinate of an atom in the pdb.
Parameters: - pdb (md.Trajectory object) – Structure used to find close/far indices.
- resnum (integer) – The residue number of interest.
- focus_dist (float (nannmeters)) – All indices within this distance of resnum will be selected as close indices.
Returns: - close_xyz_inds (np.ndarray) – Indices of x,y,z positions of atoms in pdb that are close to resnum.
- non_close_xyz_inds (np.ndarray) – Indices of x,y,z positions of atoms in pdb that are not close to resnum.
-
class
diffnets.nnutils.split_sae(layer_sizes, inds1, inds2, wm, uwm)[source]¶ Bases:
diffnets.nnutils.split_aeSupervised autoencoder with split architecture
-
classify(latent)[source]¶ Perfom classification task using latent space representation
Parameters: latent (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector Returns: Return type: Value between 0 and 1
-
forward(x)[source]¶ Pass data through the entire network
Parameters: x (torch.cuda.FloatTensor or torch.FloatTensor) – Input data for a given sample Returns: - recon (torch.cuda.FloatTensor or torch.FloatTensor) – Reconstruction of the original input data
- latent (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector
- label – Value between 0 and 1
-
-
class
diffnets.nnutils.svae(layer_sizes)[source]¶ Bases:
diffnets.nnutils.vae-
forward(x)[source]¶ Pass data through the entire network
Parameters: x (torch.cuda.FloatTensor or torch.FloatTensor) – Input data for a given sample Returns: - recon (torch.cuda.FloatTensor or torch.FloatTensor) – Reconstruction of the original input data
- latent (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector
-
-
class
diffnets.nnutils.vae(layer_sizes)[source]¶ Bases:
diffnets.nnutils.ae-
encode(x)[source]¶ Pass the data through the encoder to the latent layer.
Parameters: x (torch.cuda.FloatTensor or torch.FloatTensor) – Input data for a given sample Returns: latent – Latent space vector associated with encoder1 Return type: torch.cuda.FloatTensor or torch.FloatTensor
-
forward(x)[source]¶ Pass data through the entire network
Parameters: x (torch.cuda.FloatTensor or torch.FloatTensor) – Input data for a given sample Returns: - recon (torch.cuda.FloatTensor or torch.FloatTensor) – Reconstruction of the original input data
- latent (torch.cuda.FloatTensor or torch.FloatTensor) – Latent space vector
-
diffnets.training module¶
-
class
diffnets.training.Dataset(train_inds, labels, data)[source]¶ Bases:
torch.utils.data.dataset.DatasetCharacterizes a dataset for PyTorch
-
class
diffnets.training.Trainer(job)[source]¶ Bases:
object-
apply_exmax(inputs)[source]¶ Apply expectation maximization to a batch of data.
Parameters: inputs (list) – list where the 0th index is a list of current classification labels of length == batch_size. 1st index is a corresponding list of variant simulation indicators. 2nd index is em_bounds. Returns: Return type: Updated labels – length == batch size
-
em_parallel(net, em_generator, train_inds, em_batch_size, indicators, em_bounds, em_n_cores, label_str, epoch)[source]¶ - Use expectation maximization to update all training classification
- labels.
Parameters: net (nnutils neural network object) – Neural network
em_generator (Dataset object) – Training data
train_inds (np.ndarray) – Indices in data that are to be trained on
em_batch_size (int) –
- Number of examples that are have their classification labels
updated in a single round of expectation maximization.
indicators (np.ndarray, shape=(len(data),)) – Value to indicate which variant each data frame came from.
em_bounds (np.ndarray, shape=(n_variants,2)) – A range that sets what fraction of conformations you expect a variant to have biochemical property. Rank order of variants is more important than the ranges themselves.
em_n_cores (int) – CPU cores to use for expectation maximization calculation
Returns: new_labels – Updated classification labels for all training examples
Return type: np.ndarray, shape=(len(data),)
-
get_targets(act_map, indicators, label_spread=None)[source]¶ Convert variant indicators into classification labels.
Parameters: - act_map (np.ndarray, shape=(n_variants,)) – Initial classification labels to give each variant.
- indicators (np.ndarray, shape=(len(data),)) – Value to indicate which variant each data frame came from.
Returns: targets – Classification labels for training.
Return type: np.ndarry, shape=(len(data),)
-
run(data_in_mem=False)[source]¶ Wrapper for running the training code
Parameters: data_in_mem (boolean) – If true, load all training data into memory. Training faster this way. Returns: net – Trained DiffNet Return type: nnutils neural network object
-
set_training_data(job, train_inds, test_inds, labels, data)[source]¶ Construct generators out of the dataset for training, validation, and expectation maximization.
Parameters: - job (dict) – See training_dict.tx for all keys.
- train_inds (np.ndarray) – Indices in data that are to be trained on
- test_inds (np.ndarray) – Indices in data that are to be validated on
- labels (np.ndarray,) – classification labels used for training
- data (np.ndarray, shape=(n_frames,3*n_atoms) OR str to path) – All data
-
split_test_train(n, frac_test)[source]¶ Split data into training and validation sets.
Parameters: - n (int) – number of data points
- frac_test (float between 0 and 1) – Fraction of dataset to reserve for validation set
Returns: - train_inds (np.ndarray) – Indices in data that are to be trained on
- test_inds (np.ndarray) – Indices in data that are to be validated on
-
train(data, training_generator, validation_generator, em_generator, targets, indicators, train_inds, test_inds, net, label_str, job, lr_fact=1.0)[source]¶ Core method for training
Parameters: - data (np.ndarray, shape=(n_frames,3*n_atoms) OR str to path) – Training data
- training_generator (Dataset object) – Generator to sample training data
- validation_generator (Dataset object) – Generator to sample validation data
- em_generator (Dataset object) – Generator to sample training data in batches for expectation maximization
- targets (np.ndarray, shape=(len(data),)) – classification labels used for training
- indicators (np.ndarray, shape=(len(data),)) – Value to indicate which variant each data frame came from.
- train_inds (np.ndarray) – Indices in data that are to be trained on
- test_inds (np.ndarray) – Indices in data that are to be validated on
- net (nnutils neural network object) – Neural network
- label_str (int) – For file naming. Indicates what iteration of training we’re on. Training goes through several iterations where neural net architecture is progressively built deeper.
- job (dict) – See training_dict.tx for all keys.
- lr_fact (float) – Factor to multiply the learning rate by.
Returns: - best_nn (nnutils neural network object) – Neural network that has the lowest reconstruction error on the validation set.
- targets (np.ndarry, shape=(len(data),)) – Classification labels after training.
-
diffnets.utils module¶
Module contents¶
diffnets Supervised and self-supervised autoencoders to identify the mechanistic basis for biochemical differences between protein variants.