Quick Help

PDB ID

Each structure in the Protein Data Bank (PDB) is represented by a four character alphanumeric identifier, assigned upon its deposition. For example, 1bxl is the identification code for the Bcl-Xl / Bak complex. For more information check the PDB home page

User-provided PDB file

A PDB file can be loaded locally using the browse button. The user-provided PDB file must follow the PDB file format

Chain identifier

To run the server, a chain identifier of the corresponding PDB file should be specified. The chain identifier can be found in the standard PDB file at the third field (column 12) of the SEQRES record or at the sixth field (column 22) of the ATOM record. The chain identifier may be any single character. More than one chain can be given as input. For example, if a protein complex is composed of the chains L and H then 'LH' should be supplied in the chain identifier text box. If there is only a single chain with no id then "NONE" should be written in the chain identifier text box.

Peptides file format    View Example

A list of peptides to be aligned to the PDB structure should be supplied in a "Peptides file format". Each peptide sequence should be preceded by a line starting with >. Optionally, the peptide name and a number can follow the > mark. This number specifies the number of times the peptide was isolated in the experiment. The second line contains the peptide sequence itself. The number of times each peptide was isolated is currently taken into account only when using the PepSurf algorithm and not when using Mapitope. Blank lines are ignored, and so are spaces or other gap symbols (dashes, underscores, and periods).

Phage-display library properties

Library type

A phage display library is a heterogeneous mixture of phage clones, each carrying a different foreign DNA insert. Each DNA insert codes for a different peptide, which is displayed on the surface of the phage. The most commonly used phages for library construction are filamentous bacteriophages such as fd, f1, and m13. In most cases the insert is fused to the solvent accessible amino terminus of one of the phage's proteins, and thus large number of random peptides are produced and presented on the outer surface of the phage.

The expected amino acid frequencies are not necessarily identical (i.e., the probability of each amino-acid is not necessarily 1/20). This is because the amino-acid frequencies depend on the way in which the phage-display library used to generate the peptides was generated. In one type of library, for example, the first and second codon positions are chosen from A, C, G, and T with equal probability, and the third codon position is chosen only from G or T, again with equal probabilities. This library is often called an NNK library, with N representing an equal mixture of A, G, C, and T and K representing an equal mixture of G and T. It is easy to see that in such a library, the expected frequency of tryptophan is 3.1% (compared to 5% if all amino acids are uniformly sampled or 1.6% when considering the number of codons for each amino acid).

In Pepitope, the user can choose between 3 different library types:
1. NNK: N stands for A,C,G or T; K stands for G or T.
2. NNS: N stands for A,C,G or T; S stands for G or C.
3. RANDOM_AA: All amino acids frequencies are equal to 5%.

When using the PepSurf algorithm the similarity matrix is modified according to the library type. If the user chooses to ignore this information, the original similarity matrix (for example, BLOSUM62) is used without any modifications.

When using the Mapitope algorithm, the first step is to identify pairs of amino acids that are significantly overrepresented in the panel of peptides compared to their expected frequencies. The expected frequencies are calculated according to the specified library type.

Stop codon modification

Typically, random peptides are produced by random oligonucleotides for which the codons are NNK (N=GATC and K=GT). This allows to avoid the stop codons UAA and AGA, as well as to reduce the total redundancy of codons. The bacterial strain used to produce the phages can influence the amino acid composition. For example, in the bacterial strain genotype DH5AlphaF' the third stop codon UGA is read as glutamic acid. This function is encoded by the gene supE which suppresses the UAG stop codon inserting a tRNA-amino acid instead. Therefore, the usage for the glutamic amino acid is 6.2% versus 3.1% in the universal genetic code.

PepSurf advanced options

Substitution matrix

The substitution matrix specifies the similarity score between any two amino acids. Generally, the similarity score between two related amino acids, such as leucine and isoleucine, is higher compared to the score of two dissimilar amino acids, such as leucine and arginine.

The substitution matrix can influence the resulting alignments. Currently supported matrices are BLOSUM30, BLOSUM62, BLOSUM80, and GRANTHAM. The BLOSUM matrices are based on evolutionary data and are often used in the context of aligning protein sequences. BLOSUM62 is the default matrix, while BLOSUM80 and BLOSUM30 can be used for aligning more closely or more remotely related sequences, respectively. The GRANTHAM matrix is based on physico-chemical properties, such as side chain composition, polarity, and molecular volume.

Gap penalty

The gap penalty accounts for unmatched peptide residues. For example, a gap score of -1 indicates that each gap reduces the alignment score by 1. The best value depends on the choice of substitution matrix. The default value of -0.5 was optimized for the BLOSUM62 matrix for protein sequences.

Probability for obtaining the best path

The alignment algorithm implemented in PepSurf guarantees finding the best alignment between each peptide and the PDB structure only up to a certain probability, which can be set by the user. As this value increases (closer to 1.0) the running time of the algorithm increases. In order to limit the computation to a manageable time the default probability set by the server is high for short peptides (up to 10 amino acids) and lower for longer ones. Practically, we have found that this parameter hardly influences the results of the algorithm.

Mapitope advanced options

Library constraints file

The expected amino acid frequencies may be biased in situations when the peptides contain fixed residues. For example, constant cysteine residues that are introduced at either end of the peptide sequences to impose disulfide constrained-looped configurations.

A user can specify such constraints using a "Library Constraints File Format". Each line in this file should be as follows:
#PAIR_FIXED XY = N
where X/Y is a one-letter amino-acid code and N stands for the number of times this pair is fixed in the peptides file. For example, if all 7 peptides start with alanine, then the line should be:
#PAIR_FIXED A* = 7
'*' indicates that any amino acid can follow the first A. If 5 peptides start and end with a cystein, two lines are needed:
#PAIR_FIXED C* = 5
#PAIR_FIXED *C = 5
Finally, if an alanine-valine pair is fixed in the middle of 5 out of 7 peptides, the following lines should be introduced:
#PAIR_FIXED AV = 5
#PAIR_FIXED V* = 5
#PAIR_FIXED *A = 5
An example for a library constraints file that represents 7 peptides from a library that was constructed with 2 cysteine residues at both ends of each peptide can be found here.

Distance threshold

The algorithm considers two residues to be a valid pair, if the Euclidian distance between them (in angstroms) is less than the "Distance threshold". Specifically, the distance between each two amino acids is defined as the distance between their carbon-alpha atoms as computed from the coordinates in the PDB file.

Statistical threshold

In the Mapitope algorithm only amino-acid pairs that are significantly more enriched in the peptide panel compared with the random expectation are mapped to the antigen. To determine which amino acid pairs are significantly enriched, the number of standard deviations above the expected random occurrence is computed for all amino-acid pairs. Only pairs for which this number exceeds the statistical threshold parameter are considered statistically significant.

Maximum gap "Fill-in"

When short gaps appeared between two predicted amino-acid segments, the user may want to "fill-in" the gaps and include the otherwise missed residues as part of the final prediction. The maximum gap "Fill-In" parameter defines the maximum gap between two residues to be connected. For example, if amino-acids 4, 5, 6, 7, 10, 11, and 12 are part of the epitope. It seems logical that positions 8 and 9 are also part of the epitope. When the Fill-in parameter is set to 2 (or higher) then these two missed residues will be included in the final prediction. Otherwise, they will not be added to the prediction.

Page Top