Subset¶
Subset¶
- class sasmol.subset.Mask[source]¶
Bases:
objectBase class containing methods to extract or combine system objects using numpy masks
Examples
First example shows how to use class methods from system object:
>>> import sasmol.system as system >>> molecule = system.Molecule('hiv1_gag.pdb') >>> basis_filter = 'name[i] == "CA" and resid[i] < 10' >>> error, mask = molecule.get_subset_mask(basis_filter) >>> import numpy >>> numpy.nonzero(mask) (array([ 4, 11, 21, 45, 55, 66, 82, 101, 112]),)
Note
self parameter is not shown in the
Parameterssection in the documentation- apply_biomt(frame, selection, U, M, **kwargs)[source]¶
Apply biological unit transforms (BIOMT) to the coordinates of the chosen selection and frame.
Information on BIOMT available at: http://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-assemblies
- Parameters
frame – integer : frame number with coordinates to transform
selection –
- stringselection string in standard SASMOL format
specifying the coordinates to be transformed
U – numpy array : 3 x 3 rotation matrix
M – numpy array : 3 x 1 translation vector
kwargs – optional future arguments
- Returns
updated self._coor
- Return type
None
Examples
Note
TODO: add example
- copy_apply_biomt(other, frame, selection, U, M, **kwargs)[source]¶
Copy selected atoms (with initial coordinates from the given frame) to new Molecule object (other) and apply transforms taken from biological unit (BIOMT) to the coordinates.
Information on BIOMT available at: http://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-assemblies
- Parameters
other – system object : object to copy transformed information into
frame – integer : frame number with coordinates to transform
selection –
- stringselection string in standard SASMOL format
specifying the coordinates to be transformed
U – numpy array : 3 x 3 rotation matrix
M – numpy array : 3 x 1 translation vector
kwargs – optional future arguments
- Returns
updated self._coor
- Return type
None
Examples
Note
TODO: add example
- copy_molecule_using_mask(other, mask, frame)[source]¶
This method initializes the standard descriptors and coordinates for a subset molecule defined by the supplied mask array.
usage:
Here is a way to create a mask to be used somewhere else:
m1=system.Molecule(0) ### create a molecule m1 m1.read_pdb(filename) ### read in variables, coor, etc.
… do stuff …
basis_filter = XXXX ### your (see examples below)
error,mask = m1.get_subset_mask(basis_filter) ### get a mask
sub_m1=system.Molecule(1) ### create a new molecule sub_m1
error = m1.copy_molecule_using_mask(sub_m1,mask,frame) ### initializes sub_m1
- duplicate_molecule(number_of_duplicates, **kwargs)[source]¶
This method copies all attributes from one molecule to a new set of a user-supplied number of duplicate molecules
- Parameters
number_of_duplicates – integer : number of copies to make
kwargs – optional future arguments
- Returns
list of system objects
- Return type
molecules
Examples
>>> import sasmol.system as system >>> molecule = system.Molecule('hiv1_gag.pdb') >>> molecule.coor()[0][0] array([-21.52499962, -67.56199646, 86.75900269]) >>> molecule.name()[:10] ['N', 'HT1', 'HT2', 'HT3', 'CA', 'HA1', 'HA2', 'C', 'O', 'N']
>>> import sasmol.util as utilities >>> number_of_duplicates = 108 >>> molecules = utilities.duplicate_molecule(molecule, number_of_duplicates) >>> molecules[-1].coor()[0][0] array([-21.52499962, -67.56199646, 86.75900269]) >>> molecules[-1].name()[:10] ['N', 'HT1', 'HT2', 'HT3', 'CA', 'HA1', 'HA2', 'C', 'O', 'N']
Note
Using deepcopy directly in subset.py leads to inheritance conflict. Therefore subset calls a method held in utilities to make duplicates.
- get_coor_using_mask(frame, mask)[source]¶
This method extracts coordinates from frame=frame of system object (self) using a supplied mask which has been created before this method is called.
Coorindates are chosen for the elements that are equal to 1 in the supplied mask array.
- Parameters
frame – integer : trajectory frame number to use
mask –
- integer arraymask array of length of the number of atoms
with 1 or 0 for each atom depending on the selection used to create the mask
kwargs – optional future arguments
- Returns
error – string : error statement
coor – coordinates corresponding to those determined by the input mask
Examples
>>> import sasmol.system as system >>> molecule = system.Molecule('hiv1_gag.pdb') >>> basis_filter = "name[i] == 'CA'" >>> error, mask = molecule.get_subset_mask(basis_filter) >>> frame = 0 >>> error, coor = molecule.get_coor_using_mask(frame, mask) >>> coor[0][0] array([-21.72500038, -66.91000366, 85.45700073], dtype=COORD_DTYPE)
- get_dihedral_subset_mask(flexible_residues, mtype)[source]¶
This method creates an array of ones and/or zeros of the length of the number of atoms in “self”. It uses the user-supplied flexible_residue array to determine which atoms to include in the mask. This version is hard-wired for proteins or rna to choose the C(n-1), N(n), CA(n), C(n), and N(n+1) atoms or the O3’(n-1), P(n), O5’(n), C5’(n), C4’(n), C3’(n), O3’(n) and P(n+1) atoms that form the basis set for the rotation phi & psi or alpha, beta, delta, epsilon, and eta angles respectively. This method calles a c-method called mask to speed up the calculation (24.5 X faster).
- get_indices_from_mask(mask)[source]¶
This method returns the internal indices for the supplied mask.
- Parameters
mask –
- integer arraymask array of length of the number of atoms
with 1 or 0 for each atom depending on the selection used to create the mask
kwargs – optional future arguments
- Returns
integer array : indices of atoms determined by the input mask
- Return type
indices
Examples
>>> import sasmol.system as system >>> molecule = system.Molecule('hiv1_gag.pdb') >>> basis_filter = "name[i] == 'CA'" >>> error, mask = molecule.get_subset_mask(basis_filter) >>> indices = molecule.get_indices_from_mask(mask) >>> indices[:10] array([ 4, 11, 21, 45, 55, 66, 82, 101, 112, 119])
- get_subset_mask(basis_filter)[source]¶
This method creates an array of ones and/or zeros of the length of the number of atoms in “self” and uses the user-supplied filter string to filter the parameter descriptors to obtain a subset array that can be used to filter entities in other methods either in this class or elsewhere.
usage:
Here is a way to create a mask to be used somewhere else:
m1=system.Molecule(0) ### create a molecule m1 m1.read_pdb(filename) ### read in variables, coor, etc.
… do stuff …
basis_filter = XXXX ### your (see examples below)
error,mask = m1.get_subset_mask(basis_filter) ### get a mask
… do something with the mask using other functions in this class …
Here are some example basis_filter strings:
basis_filter = ‘name[i] == “CA” and resid[i] < 10’ basis_filter = ‘name[i][0] == “H” and resid[i] < 10’ basis_filter = ‘name[i] == “CA” and resid[i] >= 1 and resid[i] < 10’
The syntax for basis selection can be quite eloborate. For example,
basis_filter = ‘name[i] == “CA” and resid[i] >= 1 and resid[i] < 10 and moltype==”protein” and chain==”F” and occupancy==1 and beta>10.0 and element==”C” …’
could be used for advanced selection needs. See API for full details.
- init_child(descriptor)[source]¶
This method allows one to create a list of Molecule objects that are defined by the input descriptor.
usage:
This is a way to create a mask to be used somewhere else:
m1=system.Molecule(0) ### create a molecule m1 m1.read_pdb(filename) ### read in variables, coor, etc. m1.initialize_children() ### set up the masks etc.
… do stuff …
This initializes the following “children” with their masks already defined to the “parent” molecule
names() : names_mask() resnames() : resnames_mask() resids() : resids_mask() chains() : chains_mask() segnames() : segnames_mask() occupancies() : occupancies_mask() betas() : betas_mask() elements() : elements_mask()
The objects on the left contain the unique values and the objects on the right contain the masks that have the indices to extract the information for each unique value from the parent molecule.
NOTE: the pluarity of the words is chosen for a reason to distinguish the singular words used to keep track of the parent variables (name –> name[i] for each atom, while names –> corresponds to the unique names in the parent: len(names) <= len(name))
For “min3.pdb” if one wants to know the unique elements you would type:
m1.elements()
which yields:
[‘N’, ‘H’, ‘C’, ‘O’, ‘S’, ‘ZN’]
So, given a pre-defined object that has atomic information initialized by reading in the PDB file and intializing all children as shown above, one can get a list of subset objects for each type of element by typing:
element_molecules = m1.init_child(‘elements’)
then you could parse the full-subset molecule as its own entity
com = element_molecules[0].calccom(0)
which would give the center of mass for all the “N” atoms in the parent molecule.
Another example would be to get the COM of each amino acid in a protein.
residue_molecules = m1.init_child(‘resids’)
- for i in range(m1.number_of_resids()):
print(residue_molecules[i].calccom(0))
- NOTE: coordinates will have to be updated separately using
get_coor_using_mask … using the mask(s) generated by file_io.initialize_children()
- merge_two_molecules(mol1, mol2, **kwargs)[source]¶
This method combines two molecules into a single, new molecule. It will assign coordinates from the first frame of a molecule.
usage:
m1=system.Molecule(0) ### create a molecule m1 m1.read_pdb(filename1) ### read in variables, coor, etc.
m2=system.Molecule(1) ### create a molecule m2 m2.read_pdb(filename2) ### read in variables, coor, etc.
m3=system.Molecule(2) ### create a molecule m3
… do stuff …
error = m3.merge_two_molecules(m1,m2) ### sets the values that define mol3
If report_missing_descriptors=True is passed, descriptors that are not present in both input molecules are listed in the returned error list. These messages are informational; the merge still proceeds when the structural essentials are valid.
Might do: add a caller-supplied required_descriptors keyword so simulation, PDB-writing, scattering, or coarse-grain callers can define their own descriptor completeness requirements without imposing a global molecule schema.
- set_coor_using_mask(other, frame, mask)[source]¶
This method replaces coordinates from frame=frame of system object (self) using a supplied mask which has been created before this method is called.
Coordinates are chosen for the elements that are equal to 1 in the supplied mask array.
- Parameters
frame – integer : trajectory frame number to use
mask –
- integer arraymask array of length of the number of atoms
with 1 or 0 for each atom depending on the selection used to create the mask
kwargs – optional future arguments
- Returns
string : error statement
updated self._coor
- Return type
error
Examples
>>> import sasmol.system as system >>> molecule_1 = system.Molecule('hiv1_gag.pdb') >>> molecule_2 = system.Molecule('other_hiv1_gag.pdb') >>> basis_filter = "name[i] == 'CA'" >>> error, mask = molecule_1.get_subset_mask(basis_filter) >>> frame = 0 >>> error = molecule_1.set_coor_using_mask(molecule_2, frame, mask)
Note
molecule_2 must be smaller or equal to molecule_1 and that the coordinates in molecule_2 are in the same order in molecule_1
- set_descriptor_using_mask(mask, descriptor, value)[source]¶
This method writes the “value” to the given descriptor to the elements that are equal to 1 in the supplied mask array.
- Parameters
mask –
- integer arraymask array of length of the number of atoms
with 1 or 0 for each atom depending on the selection used to create the mask
descriptor – system property : a property defined in an instance of a system object
value – string : new value to apply to selection defined by mask
kwargs –
- point = Truewill translate to a fixed point
given by value variable
- Returns
updated self._descriptor
- Return type
None
Examples
>>> import sasmol.system as system >>> molecule = system.Molecule('hiv1_gag.pdb') >>> molecule.beta()[:10] ['0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00']
>>> basis_filter = "name[i] == 'CA'" >>> error, mask = molecule.get_subset_mask(basis_filter) >>> descriptor = molecule.beta() >>> value = '1.00' >>> error = molecule.set_descriptor_using_mask(mask, descriptor, value) >>> descriptor[:10] ['0.00', '0.00', '0.00', '0.00', '1.00', '0.00', '0.00', '0.00', '0.00', '0.00']
which can then be used to set the new values into the molecule
>>> molecule.setBeta(descriptor) >>> molecule.beta()[:10] ['0.00', '0.00', '0.00', '0.00', '1.00', '0.00', '0.00', '0.00', '0.00', '0.00']
Note
Coordinate arrays can not be manipulated by this method.
TODO: If possible, get rid of loop