User guide

Analysis Programmes

CONTINLL

The original version of CONTIN implemented the ridge regression algorithm of Provencher & Glockner, 1982. The latest version incorporates the locally linearised model (Van Stokkum et al) in selecting basis set proteins from the reference database.
 

Average run time: < 1 minute
Graphical output produced
Choice between 7 reference datasets


SELCON3

Selcon was designed by N Sreerama & Woody 1993, and incorporates the self-consistent method together with the SVD algorithm to assign protein secondary structure. The programme analyses results from a number of stages in the analysis. The first stage assigns an initial guess at the fractional composition. The first stage result corresponds to the Hennesey & Johnson method using SVD. In the second stage, the SVD calculations are iterated until a convergent solution is produced (equivalent to the original self-consistent method ). The third stage selects a number of likely solutions from the calculations of the basis set by constraining the summed fractional contents to equal one and each individual fraction to be greater than -0.05. The fourth stage applies a fourth constraint: the helix limit theorem, from which a range for helix content is determined and results screened. The range is taken from the solution using the Hennesey and Johnson method.
 

Average run time: <1 min
Graphical output is produced
Choice between 7 reference datasets


CDSSTR

This programme is a modification of the original Varslc written by WC Johnson. It implements the variable selection method by performing all possible calulations using a fixed number of proteins from the reference set. The algorithm recognises proteins posessing characteristics not reflected by the test protein or proteins not reflecting the characteristics of the test protein, and removes them from the basis set. The SVD algorithm assigns secondary structure.

This method probably produces the most accurate analysis results, but can take up to 15 minutes to run due to the sheer volume of calculations. It will however produce results where other methods fail to analyse proteins.
 
 
 

Average run time ~5min
Graphical output
7 Reference datasets


VARSLC

The original implementation of the variable selection method. The programme is flexible in that the user may configure input data files to specify the number of proteins to be selected from the reference set, the number of proteins to eliminated at a time from the reference set, and the total number of calculations tried before selecting solutions. The constraints applied can also be configured, for example, results are selected if their rmsd, sum squares error, individual fractional content and summed total content are within sertain limits.

To incorporate some of this flexibility into the website, several configuration files have been set up. The first follows the guidelines set out in the readme.txt that comes with the programme. It is recommended ~500 iterations with a basis set of 5-7 proteins, removing 1-2 per iteration.

Details of the settings files:
 
 

Choice RMSD max Individual Fraction min Total sum of Fractions No. proteins removed No. basis proteins No. Calculations
 Default 0.55 -0.15 0.95 - 1.14 1 6 300
 Settings 1 0.55 -0.15 0.95 - 1.30 1 20 528
 Settings 2 0.55 -0.20 0.95 - 1.40 1 30 700
 Settings 3 2.55 -0.20 0.95 - 1.20 1 6 900

The second settings file reflects the recommended values for accurate protein analysis. The default settings exists for quicker analyses where only 300 calculations are performed. When testing this programme with various different CD data files, it was found that in the majority of cases results are overlooked due to the total fraction of secondary structures being significantly greater than 1. Therefore settings file 2 exists for cases where the default and settings 1 have not produced valid results, and it is of use to the user to look at the kind of values resulting from the analysis as a rough guide. Settings 3 is an extension of settings 1 with 900 calculations and a high maximum RMSD value. If no results are obtained with any of the settings files, then CDSSTR uses the same method but with no restrictions on the number of calculations.

There is only one reference database that comes with the programme containing 33 reference proteins. This programme doesn't produce reconstructed spectra data and therefore no graphical output exists.


K2D

K2D is one of a few neural network programmes. The neural network operates via an input layer with interconnecting neurons to the output layer. The output layer (secondary structure) is calulated as a function of the input layer (CD data) via assigning weightings to each neuron. The weightings are assigned random values in a training phase. Each of the layers are fed large volumes of CD and structural data (equivalent to reference proteins) and the weightings are adjusted in an iterative process until an accurate secondary structure profile is obtained.

In K2D the weights file is fixed and therefore there is no choice of reference dataset. Accuracy is calculated by , and results for beta sheet and mixed proteins tend to be far less accurate than for helical proteins, although when compared with other methods (Greenfield 1996) these results are an improvement.