|
User guide |
Analysis Programmes
CONTINLL
The original version of CONTIN implemented the ridge regression
algorithm of Provencher & Glockner, 1982. The latest version incorporates
the locally linearised model (Van Stokkum et al) in selecting basis set
proteins from the reference database.
Average run time: < 1 minute
Graphical output produced
Choice between 7 reference datasets
SELCON3
Selcon was designed by N Sreerama & Woody 1993, and
incorporates the self-consistent method together with the SVD algorithm
to assign protein secondary structure. The programme analyses results from
a number of stages in the analysis. The first stage assigns an initial
guess at the fractional composition. The first stage result corresponds
to the Hennesey & Johnson method using SVD. In the second stage, the
SVD calculations are iterated until a convergent solution is produced (equivalent
to the original self-consistent method ). The third stage selects a number
of likely solutions from the calculations of the basis set by constraining
the summed fractional contents to equal one and each individual fraction
to be greater than -0.05. The fourth stage applies a fourth constraint:
the helix limit theorem, from which a range for helix content is determined
and results screened. The range is taken from the solution using the Hennesey
and Johnson method.
Average run time: <1 min
Graphical output is produced
Choice between 7 reference datasets
CDSSTR
This programme is a modification of the original Varslc written by WC Johnson. It implements the variable selection method by performing all possible calulations using a fixed number of proteins from the reference set. The algorithm recognises proteins posessing characteristics not reflected by the test protein or proteins not reflecting the characteristics of the test protein, and removes them from the basis set. The SVD algorithm assigns secondary structure.
This method probably produces the most accurate analysis
results, but can take up to 15 minutes to run due to the sheer volume of
calculations. It will however produce results where other methods fail
to analyse proteins.
Average run time ~5min
Graphical output
7 Reference datasets
VARSLC
The original implementation of the variable selection method. The programme is flexible in that the user may configure input data files to specify the number of proteins to be selected from the reference set, the number of proteins to eliminated at a time from the reference set, and the total number of calculations tried before selecting solutions. The constraints applied can also be configured, for example, results are selected if their rmsd, sum squares error, individual fractional content and summed total content are within sertain limits.
To incorporate some of this flexibility into the website, several configuration files have been set up. The first follows the guidelines set out in the readme.txt that comes with the programme. It is recommended ~500 iterations with a basis set of 5-7 proteins, removing 1-2 per iteration.
Details of the settings files:
Choice | RMSD max | Individual Fraction min | Total sum of Fractions | No. proteins removed | No. basis proteins | No. Calculations |
Default | 0.55 | -0.15 | 0.95 - 1.14 | 1 | 6 | 300 |
Settings 1 | 0.55 | -0.15 | 0.95 - 1.30 | 1 | 20 | 528 |
Settings 2 | 0.55 | -0.20 | 0.95 - 1.40 | 1 | 30 | 700 |
Settings 3 | 2.55 | -0.20 | 0.95 - 1.20 | 1 | 6 | 900 |
The second settings file reflects the recommended values for accurate protein analysis. The default settings exists for quicker analyses where only 300 calculations are performed. When testing this programme with various different CD data files, it was found that in the majority of cases results are overlooked due to the total fraction of secondary structures being significantly greater than 1. Therefore settings file 2 exists for cases where the default and settings 1 have not produced valid results, and it is of use to the user to look at the kind of values resulting from the analysis as a rough guide. Settings 3 is an extension of settings 1 with 900 calculations and a high maximum RMSD value. If no results are obtained with any of the settings files, then CDSSTR uses the same method but with no restrictions on the number of calculations.
There is only one reference database that comes with the programme containing 33 reference proteins. This programme doesn't produce reconstructed spectra data and therefore no graphical output exists.
K2D
K2D is one of a few neural network programmes. The neural network operates via an input layer with interconnecting neurons to the output layer. The output layer (secondary structure) is calulated as a function of the input layer (CD data) via assigning weightings to each neuron. The weightings are assigned random values in a training phase. Each of the layers are fed large volumes of CD and structural data (equivalent to reference proteins) and the weightings are adjusted in an iterative process until an accurate secondary structure profile is obtained.
In K2D the weights file is fixed and therefore there is
no choice of reference dataset. Accuracy is calculated by , and results
for beta sheet and mixed proteins tend to be far less accurate than for
helical proteins, although when compared with other methods (Greenfield
1996) these results are an improvement.