Subset

Subset

class sasmol.subset.Mask[source]

Bases: object

Base class containing methods to extract or combine system objects using numpy masks

Examples

First example shows how to use class methods from system object:

>>> import sasmol.system as system
>>> molecule = system.Molecule('hiv1_gag.pdb')
>>> basis_filter = 'name[i] == "CA" and resid[i] < 10'
>>> error, mask = molecule.get_subset_mask(basis_filter)
>>> import numpy
>>> numpy.nonzero(mask)
(array([  4,  11,  21,  45,  55,  66,  82, 101, 112]),)

Note

self parameter is not shown in the Parameters section in the documentation

apply_biomt(frame, selection, U, M, **kwargs)[source]

Apply biological unit transforms (BIOMT) to the coordinates of the chosen selection and frame.

Information on BIOMT available at: http://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-assemblies

Parameters
  • frame – integer : frame number with coordinates to transform

  • selection

    stringselection string in standard SASMOL format

    specifying the coordinates to be transformed

  • U – numpy array : 3 x 3 rotation matrix

  • M – numpy array : 3 x 1 translation vector

  • kwargs – optional future arguments

Returns

updated self._coor

Return type

None

Examples

Note

TODO: add example

copy_apply_biomt(other, frame, selection, U, M, **kwargs)[source]

Copy selected atoms (with initial coordinates from the given frame) to new Molecule object (other) and apply transforms taken from biological unit (BIOMT) to the coordinates.

Information on BIOMT available at: http://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-assemblies

Parameters
  • other – system object : object to copy transformed information into

  • frame – integer : frame number with coordinates to transform

  • selection

    stringselection string in standard SASMOL format

    specifying the coordinates to be transformed

  • U – numpy array : 3 x 3 rotation matrix

  • M – numpy array : 3 x 1 translation vector

  • kwargs – optional future arguments

Returns

updated self._coor

Return type

None

Examples

Note

TODO: add example

copy_molecule_using_mask(other, mask, frame)[source]

This method initializes the standard descriptors and coordinates for a subset molecule defined by the supplied mask array.

usage:

Here is a way to create a mask to be used somewhere else:

m1=system.Molecule(0) ### create a molecule m1 m1.read_pdb(filename) ### read in variables, coor, etc.

… do stuff …

basis_filter = XXXX ### your (see examples below)

error,mask = m1.get_subset_mask(basis_filter) ### get a mask

sub_m1=system.Molecule(1) ### create a new molecule sub_m1

error = m1.copy_molecule_using_mask(sub_m1,mask,frame) ### initializes sub_m1

duplicate_molecule(number_of_duplicates, **kwargs)[source]

This method copies all attributes from one molecule to a new set of a user-supplied number of duplicate molecules

Parameters
  • number_of_duplicates – integer : number of copies to make

  • kwargs – optional future arguments

Returns

list of system objects

Return type

molecules

Examples

>>> import sasmol.system as system
>>> molecule = system.Molecule('hiv1_gag.pdb')
>>> molecule.coor()[0][0]
array([-21.52499962, -67.56199646,  86.75900269])
>>> molecule.name()[:10]
['N', 'HT1', 'HT2', 'HT3', 'CA', 'HA1', 'HA2', 'C', 'O', 'N']
>>> import sasmol.util as utilities
>>> number_of_duplicates = 108
>>> molecules = utilities.duplicate_molecule(molecule, number_of_duplicates)
>>> molecules[-1].coor()[0][0]
array([-21.52499962, -67.56199646,  86.75900269])
>>> molecules[-1].name()[:10]
['N', 'HT1', 'HT2', 'HT3', 'CA', 'HA1', 'HA2', 'C', 'O', 'N']

Note

Using deepcopy directly in subset.py leads to inheritance conflict. Therefore subset calls a method held in utilities to make duplicates.

get_coor_using_mask(frame, mask)[source]

This method extracts coordinates from frame=frame of system object (self) using a supplied mask which has been created before this method is called.

Coorindates are chosen for the elements that are equal to 1 in the supplied mask array.

Parameters
  • frame – integer : trajectory frame number to use

  • mask

    integer arraymask array of length of the number of atoms

    with 1 or 0 for each atom depending on the selection used to create the mask

  • kwargs – optional future arguments

Returns

  • error – string : error statement

  • coor – coordinates corresponding to those determined by the input mask

Examples

>>> import sasmol.system as system
>>> molecule = system.Molecule('hiv1_gag.pdb')
>>> basis_filter = "name[i] == 'CA'"
>>> error, mask = molecule.get_subset_mask(basis_filter)
>>> frame = 0
>>> error, coor = molecule.get_coor_using_mask(frame, mask)
>>> coor[0][0]
array([-21.72500038, -66.91000366,  85.45700073], dtype=COORD_DTYPE)
get_dihedral_subset_mask(flexible_residues, mtype)[source]

This method creates an array of ones and/or zeros of the length of the number of atoms in “self”. It uses the user-supplied flexible_residue array to determine which atoms to include in the mask. This version is hard-wired for proteins or rna to choose the C(n-1), N(n), CA(n), C(n), and N(n+1) atoms or the O3’(n-1), P(n), O5’(n), C5’(n), C4’(n), C3’(n), O3’(n) and P(n+1) atoms that form the basis set for the rotation phi & psi or alpha, beta, delta, epsilon, and eta angles respectively. This method calles a c-method called mask to speed up the calculation (24.5 X faster).

get_indices_from_mask(mask)[source]

This method returns the internal indices for the supplied mask.

Parameters
  • mask

    integer arraymask array of length of the number of atoms

    with 1 or 0 for each atom depending on the selection used to create the mask

  • kwargs – optional future arguments

Returns

integer array : indices of atoms determined by the input mask

Return type

indices

Examples

>>> import sasmol.system as system
>>> molecule = system.Molecule('hiv1_gag.pdb')
>>> basis_filter = "name[i] == 'CA'"
>>> error, mask = molecule.get_subset_mask(basis_filter)
>>> indices = molecule.get_indices_from_mask(mask)
>>> indices[:10]
array([  4,  11,  21,  45,  55,  66,  82, 101, 112, 119])
get_subset_mask(basis_filter)[source]

This method creates an array of ones and/or zeros of the length of the number of atoms in “self” and uses the user-supplied filter string to filter the parameter descriptors to obtain a subset array that can be used to filter entities in other methods either in this class or elsewhere.

usage:

Here is a way to create a mask to be used somewhere else:

m1=system.Molecule(0) ### create a molecule m1 m1.read_pdb(filename) ### read in variables, coor, etc.

… do stuff …

basis_filter = XXXX ### your (see examples below)

error,mask = m1.get_subset_mask(basis_filter) ### get a mask

… do something with the mask using other functions in this class …

Here are some example basis_filter strings:

basis_filter = ‘name[i] == “CA” and resid[i] < 10’ basis_filter = ‘name[i][0] == “H” and resid[i] < 10’ basis_filter = ‘name[i] == “CA” and resid[i] >= 1 and resid[i] < 10’

The syntax for basis selection can be quite eloborate. For example,

basis_filter = ‘name[i] == “CA” and resid[i] >= 1 and resid[i] < 10 and moltype==”protein” and chain==”F” and occupancy==1 and beta>10.0 and element==”C” …’

could be used for advanced selection needs. See API for full details.

init_child(descriptor)[source]

This method allows one to create a list of Molecule objects that are defined by the input descriptor.

usage:

This is a way to create a mask to be used somewhere else:

m1=system.Molecule(0) ### create a molecule m1 m1.read_pdb(filename) ### read in variables, coor, etc. m1.initialize_children() ### set up the masks etc.

… do stuff …

This initializes the following “children” with their masks already defined to the “parent” molecule

names() : names_mask() resnames() : resnames_mask() resids() : resids_mask() chains() : chains_mask() segnames() : segnames_mask() occupancies() : occupancies_mask() betas() : betas_mask() elements() : elements_mask()

The objects on the left contain the unique values and the objects on the right contain the masks that have the indices to extract the information for each unique value from the parent molecule.

NOTE: the pluarity of the words is chosen for a reason to distinguish the singular words used to keep track of the parent variables (name –> name[i] for each atom, while names –> corresponds to the unique names in the parent: len(names) <= len(name))

For “min3.pdb” if one wants to know the unique elements you would type:

m1.elements()

which yields:

[‘N’, ‘H’, ‘C’, ‘O’, ‘S’, ‘ZN’]

So, given a pre-defined object that has atomic information initialized by reading in the PDB file and intializing all children as shown above, one can get a list of subset objects for each type of element by typing:

element_molecules = m1.init_child(‘elements’)

then you could parse the full-subset molecule as its own entity

com = element_molecules[0].calccom(0)

which would give the center of mass for all the “N” atoms in the parent molecule.

Another example would be to get the COM of each amino acid in a protein.

residue_molecules = m1.init_child(‘resids’)

for i in range(m1.number_of_resids()):

print(residue_molecules[i].calccom(0))

NOTE: coordinates will have to be updated separately using

get_coor_using_mask … using the mask(s) generated by file_io.initialize_children()

merge_two_molecules(mol1, mol2, **kwargs)[source]

This method combines two molecules into a single, new molecule. It will assign coordinates from the first frame of a molecule.

usage:

m1=system.Molecule(0) ### create a molecule m1 m1.read_pdb(filename1) ### read in variables, coor, etc.

m2=system.Molecule(1) ### create a molecule m2 m2.read_pdb(filename2) ### read in variables, coor, etc.

m3=system.Molecule(2) ### create a molecule m3

… do stuff …

error = m3.merge_two_molecules(m1,m2) ### sets the values that define mol3

If report_missing_descriptors=True is passed, descriptors that are not present in both input molecules are listed in the returned error list. These messages are informational; the merge still proceeds when the structural essentials are valid.

Might do: add a caller-supplied required_descriptors keyword so simulation, PDB-writing, scattering, or coarse-grain callers can define their own descriptor completeness requirements without imposing a global molecule schema.

set_coor_using_mask(other, frame, mask)[source]

This method replaces coordinates from frame=frame of system object (self) using a supplied mask which has been created before this method is called.

Coordinates are chosen for the elements that are equal to 1 in the supplied mask array.

Parameters
  • frame – integer : trajectory frame number to use

  • mask

    integer arraymask array of length of the number of atoms

    with 1 or 0 for each atom depending on the selection used to create the mask

  • kwargs – optional future arguments

Returns

string : error statement

updated self._coor

Return type

error

Examples

>>> import sasmol.system as system
>>> molecule_1 = system.Molecule('hiv1_gag.pdb')
>>> molecule_2 = system.Molecule('other_hiv1_gag.pdb')
>>> basis_filter = "name[i] == 'CA'"
>>> error, mask = molecule_1.get_subset_mask(basis_filter)
>>> frame = 0
>>> error = molecule_1.set_coor_using_mask(molecule_2, frame, mask)

Note

molecule_2 must be smaller or equal to molecule_1 and that the coordinates in molecule_2 are in the same order in molecule_1

set_descriptor_using_mask(mask, descriptor, value)[source]

This method writes the “value” to the given descriptor to the elements that are equal to 1 in the supplied mask array.

Parameters
  • mask

    integer arraymask array of length of the number of atoms

    with 1 or 0 for each atom depending on the selection used to create the mask

  • descriptor – system property : a property defined in an instance of a system object

  • value – string : new value to apply to selection defined by mask

  • kwargs

    point = Truewill translate to a fixed point

    given by value variable

Returns

updated self._descriptor

Return type

None

Examples

>>> import sasmol.system as system
>>> molecule = system.Molecule('hiv1_gag.pdb')
>>> molecule.beta()[:10]
['0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00']
>>> basis_filter = "name[i] == 'CA'"
>>> error, mask = molecule.get_subset_mask(basis_filter)
>>> descriptor = molecule.beta()
>>> value = '1.00'
>>> error = molecule.set_descriptor_using_mask(mask, descriptor, value)
>>> descriptor[:10]
['0.00', '0.00', '0.00', '0.00', '1.00', '0.00', '0.00', '0.00', '0.00', '0.00']

which can then be used to set the new values into the molecule

>>> molecule.setBeta(descriptor)
>>> molecule.beta()[:10]
['0.00', '0.00', '0.00', '0.00', '1.00', '0.00', '0.00', '0.00', '0.00', '0.00']

Note

Coordinate arrays can not be manipulated by this method.

TODO: If possible, get rid of loop