Pepitope Overview


The Peptiope server can be used to computationally predict epitopes based on peptides extracted from a phage display library, or to align a linear peptide sequence onto a three dimensional protein structure.

Epitope mapping using phage display libraries

The interaction between an antibody and its antigen is at the heart of the humoral immune response. An antibody's immunological activity depends upon its specific binding to a discrete site on its target antigen known as the epitope. Epitope mapping is the process of identifying the molecular determinants of the antigen that are recognized by the antibody. One strategy for epitope mapping involves the use of a phage display library. This technology is used to select from a large set of random peptides, those with a high binding affinity to an antibody of interest. This set of peptides is regarded as mimicking the genuine epitope of the antibody's cognate antigen, and can be used to define it using computational methods. The methodology is a general technique that can also be used for detecting interfaces of various types of interacting proteins outside the immunological context.

A phage display library is a heterogeneous mixture of phage clones each carrying a different foreign DNA that is inserted into the reading frame of the phage surface protein. As such, each phage displays a different peptide on its surface. This technology is used to select from a large set of random peptides those with a high binding affinity to an antibody of interest. This selection process, termed biopanning, can be used to characterize the interacting interface of the antigen in the following manner. Let us assume that we are investigating an antibody and an antigen which are known to interact. A random peptide library is scanned against the antibody. The selected peptides are assumed to mimic the epitope in terms of physico-chemical properties and spatial organization. The algorithmic task is thus to utilize the information contained in the set of peptides, selected by the antibody, for correctly predicting the corresponding epitope on the surface of the antigen.


For epitope mapping, the standard input to Pepitope is a Protein Data Bank file of the antigen and a set of peptides selected by the antibody in a biopanning experiment. In the more general case, any protein with a solved 3D structure can be used as input and any peptide(s) can be aligned to this structure. The first output of the program is the alignment of each peptide to the 3D structure. When several peptides are aligned, the server implements a clustering algorithm to detect one or more patches of residues on the surface of the 3D structure. Thus, the second output is the predicted patches. For antigens, such a patch corresponds to a putative epitope site, and in general, a patch is a predicted interacting surface region.

Pepitope assumes that the peptides mimic surface residues (i.e., solvent exposed residues). Thus, buried residues are eliminated from the search. The exposed residues of the given structure are extracted using the Surface Racer program (Tsodikov et al. 2002).

The server implements three epitope mapping algorithms: PepSurf, Mapitope, and a combination of the two. The algorithms are shortly described below. A full description of the algorithms is given in the corresponding papers.

    Epitope mapping using PepSurf

    PepSurf aligns each peptide to a graph which represents the surface of the input 3D structure. The surface is represented as an undirected graph, in which vertices represent exposed residues, and two residues are connected by an edge if they are close to one another on the protein's surface. Each of the input peptides is aligned to the graph. Amino-acid similarities are scored using a substitution matrix. Further, unmatched peptide residues are allowed by combining gap-costs in the alignment scores. The substitution matrix and the gap costs can be adjusted by the user. Each aligned peptide corresponds to a path of residues on the 3D structure that exhibits a high similarity to the input peptide. The resulting paths are clustered and the epitope location is inferred.

    Epitope mapping using Mapitope

    The Mapitope algorithm is based on the underlying assumption that the entire set of peptides is enriched with amino-acid pairs which mimic the epitope. Thus, Mapitope first identifies pairs of residues that are significantly overrepresented in the panel of peptides, compared to their expected frequencies. Mapitope then searches for patches on the surface that are enriched with these pairs. The resulting patches are presented on the 3D protein structure. The user can specify a number of optional parameters: a distance threshold defining whether two surface residues are neighbors on the 3D structure, a statistical threshold defining to which extent a pair of residue has to be enriched in the panel in order to be considered a significant pair, and a maximum gap "fill-in" value, which is needed in the last step of the clustering to define which resides are considered part of the predicted patch.

    Epitope mapping using the Combined algorithm

    In this option, both PepSurf and Mapitope algorithms are executed and the results are combined into a single prediction. The predicted clusters in this option include only residues which were predicted to be part of the epitope in both algorithms.

    Peptide to structure alignment using PepSurf

    The PepSurf algorithm can be used as a general 1D to 3D alignment tool. In this setup, any synthesized, hypothetical, or affinity selected peptide can be aligned to a protein with a known 3D structure. Thus, local similarities between the surface of the protein and the peptide are computationally detected.


    PDB 3D-structure

    Three-dimensional structures of biological macromolecules are available in the Protein Data Bank (Berman et al. 2000). The user can specify either a PDB identifier or browse for a local PDB file.

    Chain Identifier

    A chain identifier of the corresponding PDB file should be specified in order to run Pepitope. The chain identifier is a single alphabetic character that can be found in the standard PDB file on the third field (column 12) of the SEQRES records or the sixth field (column 22) of the ATOM records. If there is only a single chain with no id then "NONE" should be written in the chain identifier text box. The user can also run the server on a protein complex composed of multiple chains. For example, if a complex is composed of the chains L and H then 'LH' should be supplied in the chain identifier text box.

    Peptides File

    A list of peptides to be aligned to the PDB structure should be supplied in a Peptide file format. An example can be found here. Each peptide sequence should be preceded by a line starting with >. Optionally, the peptide name and a number can follow the > mark. This number specifies the number of times the peptide was isolated in the experiment and is used as a weighting factor in the calculations. The second line contains the sequence itself. Blank lines are ignored, and so are spaces or other gap symbols (dashes, underscores, and periods).


In each run, Pepitope produces an output file called "Pepiope Job Status Page". This file is automatically updated every 30 seconds, showing messages regarding the different stages of the server activity. When the calculation finishes, an email is sent to the user and several links appear in the Pepitope Job Status Page:

    View Pepitope results with FirstGlance in Jmol

    This link leads to a graphic visualization of the predicted clusters through FirstGlance in Jmol visualization tool. The target protein chain is represented as a space-filling model colored in grey. All other chains in the PDB file are displayed in backbone representation and ligands are presented in a ball-and-stick representation. The user can further control the modes of representation, using the Jmol visualization capabilities. Each predicted cluster is given a different color. The alignment between a specific peptide and the 3D structure can be viewed by clicking on the checkbox next to the peptide sequence on the left hand side of the page.

    Predicted clusters

    This link includes a list of up to three clusters predicted by the server. The highest scoring cluster is cluster 1. By clicking on each cluster the user can receive the list of residues that the cluster contains and the peptides that are aligned to this cluster.

    Resulting alignments for each peptide

    This link provides a textual display of the top scoring alignments between each peptide and the surface residues. More than one alignment for each peptide can be viewed. It is noted that in the predicted clusters, only the highest scoring alignment for each peptide is considered.

    RasMol coloring script source

    This link includes the commands script for coloring the PDB file according to the predicted clusters obtained by Pepitope. This file can be downloaded and used locally with RasMol (Sayle and Milner-White 1995) to produce the same color-coded CPK scheme generated by the server.

    PDB file updated with Pepitope results in its header

    This link provides a PDB file updated with Pepitope results in its header. This file can be uploaded to the FirstGlance in Jmol interface thus enabling users to save and view Pepitope results after they are removed from the server.

Comparison to other epitope mapping tools

This section describes other available epitope-mapping computational tools that relay on affinity-selected peptides. For each tool a brief overview of the method is given followed by the features that distinguish Pepitope from it.


    Main reference: Schreiber et al. (2005)
    Availability: A standalone program available upon request.
    Overview: Searches for a linear peptide sequence on a 3D protein surface. The distance between neighboring residues on the 3D structure can be adjusted using a user-defined parameter. Unmatched peptide residues can be accounted for using a "joker" function. The results are viewed in a text format.
    Comparison to Pepitope: 3DEX only allows for identical matches but not for potential matches between similar amino acids. The algorithm allows a constant number of gaps but does not include a penalty for introducing them. It does not differentiate between gaps at the middle or at the ends of the peptides. The program is applied only for a single peptide, and results from different peptides are not combined. Each peptide is weighted equally, regardless of the number of times it was selected in the phage-display experiment.


    Main reference: Moreau et al. (2006)
    Availability: A web server available upon request.
    Overview: Integrates two different methodologies in its prediction: MimAlign and MimCons or a combination of the two. MimAlign uses four different sequence alignment programs to align the set of peptides to the linear sequence of the antigen. Positions in the linear antigen sequence that are best aligned to peptide sequences are grouped according to their 3D distance, resulting in candidate epitopes. MimCons first identifies consensus patterns within the peptide sequences. The antigen 3D surface is then scanned to locate exposed regions that encompass amino acids similar to the consensus patterns. The results are viewed in a text format and using a 3D graphical display.
    Comparison to Pepitope: MIMOP aligns the peptides to the antigen at the sequence level rather than directly to the 3D structure. The 3D structure is considered only following the alignment stage. Gaps are accounted for only in the sequence alignment rather than on the 3D structure. Each peptide is weighted equally, regardless of the number of times it was selected in the phage-display experiment.


    Main reference: Castrignano et al. (2007)
    Availability: A web server available at
    Overview: Represents the 3D antigen structure as a surface graph. A "surface ensemble" of all possible peptides under a given length is derived from the surface graph. Each linear peptide is then searched against this ensemble to retrieve the location, if it exists, of a possible match to the antigen 3D surface. Putative matches are viewed in a text format.
    Comparison to Pepitope: MEPS only allows for identical matches but not for potential matches between similar amino acids. The algorithm allows a constant number of unmatched peptide residues but does not include a penalty for introducing such gaps. In addition, MEPS does not differentiate between gaps at the middle or at the ends of the peptides. The program is applied only for a single peptide, and results from different peptides are not combined. Each peptide is weighted equally, regardless of the number of times it was selected in the phage-display experiment.

Validation on a benchmark dataset

The performances of the different epitope-mapping programs were assessed using publicly available phage-library datasets that fulfill the following requirements: (1) a set of affinity-selected peptides were derived by scanning an antibody in a biopanning experiment, and (2) a 3D structure of the antibody-antigen complex is available. For each dataset, the prediction was compared to the "true" epitope which was inferred using the Contact Map Analysis server. For more details see Mayrose et al. (2007). As can be seen in the table below, on the datasets tested PepSurf outperforms the other algorithms yielding a statistically significant prediction in 8 out of 9 cases. The performance of the MEPS server could not be assessed since the results of different peptides are not combined into an individual predicted region.

PepSurf Mapitope MIMOP 3DEX
PDB ID Peptides file Source # residues in
true epitope
1JRH 59 peptides [5] 12 10 / 28
9 / 59
0 / 9
8 / 35
1BJ1 42 peptides [3] 16 11 / 30
7 / 167
0 / 0
0 / 35
1G9M 11 peptides [4] 18 14 / 36
14 / 34
2 / 26
0 / 56
1E6J 16 peptides [4] 15 14 / 23
7 / 11
11 / 19
0 / 20
1N8Z 5 peptides [8] 23 8 / 11
9 / 27
4 / 21
0 / 8
1IQD 27 peptides [13] 19 12 / 30
12 / 65
6 / 11
10 / 48
1AVZ 18 peptides [7] 16 14 / 29
1 / 11
3 / 4
0 / 18
1G83 18 peptides [7] 13 0 / 20
2 / 11
0 / 0
0 / 13
1HX1 9 peptides [11] 22 12 / 27
4 / 16
8 / 27
0 / 13
*Accuracy is defined as the number of true positives out of the total number of predicted residues.
*P-value is calculated based on the hyper-geometric distribution.


Page Top