It is performed upon nucleotide sequence as input on amino acid substitution level and result is provided by three novel indicators based on the subset of amino acid substitution generated on the protein level:
a) Protein structure indicator
It measures the fraction of protein structure/function-disrupting (stop codons) and likely destabilizing amino acid substitutions (Gly and Pro residues).
i) Fraction of variants with stop codons:
Fraction of single nucleotide substitutions resulting in a stop codon (TAA, TGA or TAG)
ii) Fraction of variants with Gly or Pro:
Fraction of single nucleotide substitutions resulting in a glycine or proline codon (GGA, GGT, GGG, GGC, CCA, CCT, CCG or CCC)
b) Amino acid diversity indicator
It is measured by the fraction of variants with preserved amino acids and the average number of amino acid substitution per residue.
i) Fraction of variants with preserved amino acids:
Fraction of single nucleotide substitutions that do not change the encoded amino acid
ii) Average amino acid substitution per residue:
Average number of amino acid substitutions after single nucleotide exchange of a codon
c) Codon diversity coefficient
It is a coefficient that measures how random mutations are distributed among codons of a gene. In a method with non-biased mutational spectra (equal occurrence of A-N, T-N, G-N and C-N), the Codon diversity coefficient has a value of 0. Biased methods show preferences toward certain nucleotide exchanges and mutate certain nucleotides in codons preferentially. In other words, biased mutagenesis methods generate ?hot spots? of mutagenesis that compromise genetic diversity.
d) Chemical diversity indicator
It analyzes how chemically different the substituted amino acids are. For chemical diversity indicator, amino acids are grouped in four categories depending on chemical properties of amino acids. An ideal mutagenesis method allows us to substitute each residue equally with 19 other amino acids at each amino acid positions.
i) Chemically categorized amino acid substitution graph:
Shows the percentages of aliphatic, aromatic, neutral and charged amino acid substitutions generated by all 19 random mutagenesis methods
ii) Chemically categorized amino acid substitution values:
Data are reported as deviation of each random mutagenesis method from the ideal chemical distribution described above
iii) Amino acid substitution patterns (matrix):
Analyze to which extent each amino acid species is generated. These figures show the substitution pattern for each of the 20 amino acids in the protein of interest of all 19 random mutagenesis methods. Y-axis and X-axis represent 20 amino acid species in the protein of interest and amino acid substitution pattern generated respectively.
Structure based analysis
It is performed upon a selected random mutagenesis method at a time and requires a crystalographic structure or reliable homology model of the protein along with the nucleotide sequence as input. Along with the basic MAP indicators, it provides the informatics of the factors related to the protein stability, flexibility and activity with mutational spectra by correlating it with the local structure environment of the protein and the molecular interactions of its residues:
a) Local structure environment
Local structural environment of the protein comprises secondary structure element, residue flexibility and solvent accessibility
i) Secondary structure assesment:
Secondary structure assignments are important to assure the optimal yield of experimental structures and to cleverly select the target for mutagenesis. We provides the secondary structure information by usign DSSP program. Each residue is assigned with one of the four state: H: alpha helix, B: beta bridge and extended strand, T: hydrogen bonded turn and bend, *: loop or irregular structure. Ref
ii) Residue flexibility:
Proteins are dynamic molecules that are in constant motion, which enabled structural flexibility associated with various biological processes like molecular recognition and catalytic activity. Crystallographic B-factors (obtained from the crystallographic structure of the protein) is used as a representative of residue flexibility. The relative B-factor value of backbone atoms is used to differentiate flexible regions of protein from rigid. Ref
iii) Relative solvent accessibility:
Consequently the solvent-inaccessible residues have a lower rate of acceptance of mutations than those on surface and it has been used to estimate protein stability together with residue flefibility. Relative solvent accessibility (RSA) is used to differentiate between exposed and buried residues. RSA is calculated by the ratio of the number of water molecules in contact of residue/total surface area of the residue. Threshold is used to differentiate between exposed (RSA>=0.16) or buried residues (RSA<0.16) by using RSA. Ref
b) Molecular interactions
Intra residue interaction plays an important part in protein folding, stability and function. The knowledge of molecular interaction helps to evaluate the effect of amino acid substitutions in the stability or activity of the protein.
i) Salt bridges:
They are relatively weak ionic bonds between oppositely charged residues in protein structures. The script is used by the server to define salt bridges if the charged-group atoms in the residues are found to be lie within the distance of 2.0 to 4.0 Å in the structure. Ref
ii) Hydrophobic interactions:
If the distance between the hydrophobic chains of non polar amino acids, is within 5 Å, it was considered to be involved in hydrophobic interaction . We used the same criteria for the definition of hydrophobic interactions by using a Perl script at backend in our server. Ref
iii) Aromatic interactions:
If the aromatic residues is found to be separated by a preferential distance of between 4.5 to 7.0 Å, is considered to be involved in aromatic interaction. Ref
iv) Side chain hydrogen bonds:
The hydrogen bond formation was defined on the criterion of a donor-acceptor distance within 3.5 Å (oxygen and nitrogen) and 4.0 Å (sulphur), angular criteria is not considered during calculation of side chain to side chain and side chain to main chain hydrogen bonds in MAP analysis. Ref
v) Disulphide bonds:
These are the covalent bonds derived from the coupling of thiol group on cysteins and calculated by DSSP program. Ref