|
User guide |
- Obtaining an account for the server
- The Input Form
Obtaining an account for the server
We have an on-line form to sign up for an account. It asks for your academic or non-profit contact details. Users from other sectors who wish to have access should send email to cdweb@mail.cryst.bbk.ac.uk .Return to top
UserID
This is your user identification, you will have been informed of this when you received confirmation that your account had been created. It is case-sensitive.Return to top
IDpassword
This is your password, you will have been informed of its value when you received confirmation that your account had been created. It is case-sensitive.Return to top
Protein Name
The Protein Name element of the form acts as an identifier and becomes the prefix of any input and output files produced from an analysis. It is restricted to 12 alphanumeric characters only.Return to top
File Location
The file location is the path name to the CD data file on your local computer. This string is checked for errors and if the server cannot locate the file then the analysis will be terminated and an error message generated. It is advisable to use the browse button as it will specify the correct file location automatically.The files that you upload should be in text format, for example as raw text (.txt) DichroWeb will not accept non-text file formats such as .exe, .gif, .jpg, .doc or .ppt
Also, to accommodate that some users utilise the comma as a decimal separator, DichroWeb does not interpret comma separated value (.csv) files.
Return to top
File Format
The select options in the file format field are derived from the file formats output from different CD spectroscopy machines. Mainly, the formats differ in the size of the header and the column layout of the data.Example file formats can be viewed below:
Applied photophysics | see manufacturer specifications for TEXT format files (not instrument-specfic coded formats). | |||
Aviv 60 DS v4.1* | Aviv1.txt | header: 25 lines | data columns: 2 | |
Aviv CDS | AvivC.txt | header: 14 lines | data columns: 2 | |
Aviv v2.86 | Aviv2.txt | header: 19 lines | data columns: 2 | |
BP (2nd column)** | BP2.txt | header: 22 lines | data columns: 2 | |
BP (4th column)** | BP.txt | header: 22 lines | data columns: 4 | |
DRS | DRS.txt | header: 13 lines | data columns: 7 | |
Jasco 1.30 | Jasco.txt | header: 19 lines | data columns: 2 | |
JASCO 1.50 | see manufacturer specifications for TEXT format files (not instrument-specfic coded formats). | |||
SDS 2000 | see manufacturer specifications for TEXT format files (not instrument-specfic coded formats). | |||
YY | YY.txt | header: 4 lines | data columns: 1 or 5 (reading accross rows) | |
* | The 60 DS format may be obtained, even in later versions of the software, by choosing the "export to 60 DS format" option in the instrument data browser window, from the "export data set" pulldown. | |||
** | It has been reported that the dichroism data can appear in either column 2 or column 4 for the BP format. Please check which format your BP file is in and select BP (data in col. 2) or BP (data in col. 4) accordingly. |
If your data exists in some other format please edit it
to match one of the above file formats or use the FREE format
option which requires two columns, wavelength and CD data respectively.
The data may begin with either high or low wavelength. If the format has
been incorrectly chosen an error message will be generated stating that
the file uploaded was not suitable for analysis.
Return to top
Input Units
Circular dichroism can be measured in several ways. Within
the literature their are several conflicting measures and definitions.
Most of these have been accommodated in the select box, but for clarity,
the conversion equations used are detailed below:
Delta Epsilon Δε
The per residue molar absorption units of circular
dichroism measured in M
-1cm-1
.
Δε is sometimes referred to as molar circular dichroism.
Data peaks are usually in the range of 0 - 10
All of the analysis programmes accept these input
units except K2D. So if your data is in Δε
then no conversions are required.
Mean
Residue Ellipticity MRE [θ]
Mean residue ellipticity is the most commonly reported unit and is measured in
degrees cm2
dmol-1
residue-1
.
Data peaks are usually in their 10,000's and the relationship between [θ]
and Δε is shown below:
Δε = [θ] / 3298
Theta
Machine Units θ
To
convert from machine units in
millidegrees, to delta epsilons, the following equation is
applied. Machine units measure the difference in molar extinction coefficients
between left and right handed light, usually between 1 and 100, and need to be corrected to account
for the amount of protein used in the sample.
Note: on selection of this option you will be asked to specify the mean residue weight (MRW = protein mean weight (in atomic mass units/daltons) / number of residues) amu for the protein, path length (P) in cm and protein concentration (CONC) in mg/ml.
Δε = θ
X
( 0.1 * MRW)
( P * CONC) * 3298
DRS
yy units
Often,
CD data units are particularly large measurements and in order to acheive
accurate data measures after unit conversion, it may be necessary to multiply
the machine values. These units are commonly used at Daresbury with the
yy file format. The data is usualy in the range 0.001- 0.01.
DRS-yy units are Theta machine units multiplied by a factor of 100. Therefore, the relationship with Delta epsilons is as follows:
Δε = | ( θ * 100 ) X | ( 0.1 * MRW ) |
(P * CONC) * 3298 |
DRS
units
These are standard Daresbury units
(machine units that have been divided by a factor of 10,000).
The relationship with delta epsilons is
shown below:
Δε= | θ | X | ( 0.1 * MRW) |
10 000 | (P * CONC) * 3298 |
Molar Ellipticity (θ)m
Molar ellipticity is a little used unit which
has the dimensions degrees decilitres mol
-1
decimeter-1
.
DichroWeb does not accept data in units of (θ)m,
but such data may be converted to units of Δε
by using the following formula, where Nr represents the number of amino acids in the protein :
Δε = (θ)m * Nr / 3298 |
If you have data in units of (θ)m, please convert the values to units of Δε and then submit to DichroWeb.
Return to top
Initial Wavelength
The initial wavelength should correspond to the first wavelength that appears in your data file (i.e. towards the top). This could be either the numerically highest or lowest. If in doubt, open up your data file in a text editor and take a look.Return to top
Final Wavelength
The final wavelength should correspond to the last wavelength that appears in your data file (i.e. towards the bottom). This could be either the numerically highest or lowest. If in doubt, open up your data file in a text editor and take a look.Return to top
Wavelength Step
CD spectrophotometers can be set to record data at various wavelength intervals. All of the DichroWeb-supported analysis programmes accept data at 1nm interval only and so all other data points will be discarded. DichroWeb performs no smoothing of the data, if you believe that smoothing is required, you must perform this yourself beforehand. If the wrong wavelength step is specified the server will detect this and return an error message stating that your file is unsuitable for analysis.Return to top
Lowest Datapoint
Sometimes part of a data set may be collected under conditions which are less than optimal. In these cases, it is desirable to remove the block of unreliable data points from the dataset and avoid trying to use them in any analysis. The "lowest wavelength datapoint" box allows for this without the need to edit the input file which is being submitted to DichroWeb. Just enter the wavelength of the last data point which is of good quality and DichroWeb will ensure that any data below that value cannot be submitted in an analysis. The suspect data is always taken as being the wavelengths below the entered value as the low wavelength data is generally the problematic area of a CD spectrum.
Why would data be unreliable?
With a conventional radiation source (such as a Xenon lamp), the intensity of the emitted signal
drops significantly towards the lowest wavelengths in its range. The lower intensities can still be
collected and utilised, but in order to compensate for the loss of signal strength, the detector (typically
a photomultiplier unit) has to increase its sensitivity and consequently requires an increased
high tension voltage. There is a maximum high tension voltage at which a photomultiplier unit can
accurately record transmitted radiation, and when this is approached, the readings become unreliable.
Data collected when the high tension voltage is abnormally high, should not be used in the analysis and
the "lowest wavelength datapoint" box allows a convenient method for truncating a dataset for this purpose.
After applying this cut off criterion, if your data does not extend to sufficiently low wavelengths to
enable the various databases and methods to be used for the analyses, then it is suggested that you
re-collect the data changing the conditions - i.e. using shorter pathlengths, lower concentrations of
buffers/additives or different buffers/additives. As a good practice guideline, the high tension voltage
should not be above 550 mV at 190 nm for the sample or not above 500 mV at all for the baseline.
Return to top
Analysis Programmes
CONTINLL
The original version of CONTIN implemented the ridge regression
algorithm of Provencher & Glockner, 1982. The latest version incorporates
the locally linearised model (Van Stokkum et al) in selecting basis set
proteins from the reference database.
Average run time: < 1 minute
Graphical output produced
Choice between 7 reference datasets
Provencher, S.W. and Glockner, J. (1981)
Estimation of globular protein secondary structure from circular dichroism.
Biochemistry 20, 33-37.
Pubmed
SELCON3
Selcon was designed by N Sreerama & Woody 1993, and
incorporates the self-consistent method together with the SVD algorithm
to assign protein secondary structure. The programme analyses results from
a number of stages in the analysis. The first stage assigns an initial
guess at the fractional composition. The first stage result corresponds
to the Hennesey & Johnson method using SVD. In the second stage, the
SVD calculations are iterated until a convergent solution is produced (equivalent
to the original self-consistent method ). The third stage selects a number
of likely solutions from the calculations of the basis set by constraining
the summed fractional contents to equal one and each individual fraction
to be greater than -0.05. The fourth stage applies a fourth constraint:
the helix limit theorem, from which a range for helix content is determined
and results screened. The range is taken from the solution using the Hennesey
and Johnson method.
Average run time: <1 min
Graphical output is produced
Choice between 7 reference datasets
Sreerema, N. and Woody, R.W. (1993)
A self-consistent method for the analysis of protein secondary structure from circular dichroism.
Anal. Biochem. 209, 32-44.
Pubmed
CDSSTR
This programme is a modification of the original Varslc written by WC Johnson. It implements the variable selection method by performing all possible calulations using a fixed number of proteins from the reference set. The algorithm recognises proteins posessing characteristics not reflected by the test protein or proteins not reflecting the characteristics of the test protein, and removes them from the basis set. The SVD algorithm assigns secondary structure.
This method probably produces the most accurate analysis
results, but can take up to 15 minutes to run due to the sheer volume of
calculations. It will however produce results where other methods fail
to analyse proteins.
Average run time ~5min
Graphical output
7 Reference datasets
Compton, L.A. and Johnson, W.C., Jr. (1986)
Analysis of protein circular dichroism spectra for secondary structure using a simple matrix multiplication.
Anal. Biochem. 155, 155-167.
Pubmed
VARSLC
The original implementation of the variable selection method. The programme is flexible in that the user may configure input data files to specify the number of proteins to be selected from the reference set, the number of proteins to eliminated at a time from the reference set, and the total number of calculations tried before selecting solutions. The constraints applied can also be configured, for example, results are selected if their rmsd, sum squares error, individual fractional content and summed total content are within sertain limits.
To incorporate some of this flexibility into the website, several configuration files have been set up. The first follows the guidelines set out in the readme.txt that comes with the programme. It is recommended ~500 iterations with a basis set of 5-7 proteins, removing 1-2 per iteration.
Details of the settings files:
Choice | RMSD max | Individual Fraction min | Total sum of Fractions | No. proteins removed | No. basis proteins | No. Calculations |
Default | 0.55 | -0.15 | 0.95 - 1.14 | 1 | 6 | 300 |
Settings 1 | 0.55 | -0.15 | 0.95 - 1.30 | 1 | 20 | 528 |
Settings 2 | 0.55 | -0.20 | 0.95 - 1.40 | 1 | 30 | 700 |
Settings 3 | 2.55 | -0.20 | 0.95 - 1.20 | 1 | 6 | 900 |
The second settings file reflects the recommended values for accurate protein analysis. The default settings exists for quicker analyses where only 300 calculations are performed. When testing this programme with various different CD data files, it was found that in the majority of cases results are overlooked due to the total fraction of secondary structures being significantly greater than 1. Therefore settings file 2 exists for cases where the default and settings 1 have not produced valid results, and it is of use to the user to look at the kind of values resulting from the analysis as a rough guide. Settings 3 is an extension of settings 1 with 900 calculations and a high maximum RMSD value. If no results are obtained with any of the settings files, then CDSSTR uses the same method but with no restrictions on the number of calculations.
There is only one reference database that comes with the programme containing 33 reference proteins. This programme doesn't produce reconstructed spectra data and therefore no graphical output exists.
Compton, L.A. and Johnson, W.C., Jr. (1986)
Analysis of protein circular dichroism spectra for secondary structure using a simple matrix multiplication.
Anal. Biochem. 155, 155-167.
Pubmed
K2D
K2D is one of a few neural network programmes. The neural network operates via an input layer with interconnecting neurons to the output layer. The output layer (secondary structure) is calulated as a function of the input layer (CD data) via assigning weightings to each neuron. The weightings are assigned random values in a training phase. Each of the layers are fed large volumes of CD and structural data (equivalent to reference proteins) and the weightings are adjusted in an iterative process until an accurate secondary structure profile is obtained.
In K2D the weights file is fixed and therefore there is no choice of reference dataset. Accuracy is calculated by , and results for beta sheet and mixed proteins tend to be far less accurate than for helical proteins, although when compared with other methods (Greenfield 1996) these results are an improvement.
Andrade, M.A., Chacón, P., Merelo, J.J. and Morán, F. (1993)
Evaluation of secondary structure of proteins from UV circular dichroism using an unsupervised learning neural network.
Prot. Engineering 6, 383-390.
Pubmed
Return to top
Reference Set
All of the programmes except K2D rely upon reference datasets of proteins, from which a set of basis spectra will be selected for the analysis. CONTIN SELCON3 and CDSSTR offer a choice of reference database which should be chosen in accordance with the range of input data. It should also be noted that the choice of reference dataset affects the analysis results, particularly if there is mixed or high beta sheet content. The reference set that represents the characteristics of the protein of interest is likely to give the most accurate result.A full breakdown of the contents of the reference sets can be found here.
Return to top
Optional Scaling Factor
The scaling factor allows the user to modify the experimental data by small amounts in order to try to compensate for errors in the intensity of the spectra and to hopefully thus improve the fit. It is possible that some spectrometers have incorrect intensity calibration and where this is known, a scaling factor may be applied to compensate for such errors.The scaling factor is applied to all data points, and has a default value of 1.0, meaning no scaling. It would be highly unusual to require a large scaling factor and typical scaling values would be in the range 0.95 - 1.05. Scaling factors which are outside of the range 0.5 - 1.5 are unfeasibly large and will be ignored by Dichroweb.
WARNING
Scaling factors should only be applied to data where there is a known reason for doing so.
It is possible to improve the NRMSD of an analysis by tweaking the scaling factor randomly,
but this does not necessarily mean that the structure assignment is improved. Scaling factors
should be used with caution.
For more information regarding correct scaling please read.
Miles AJ, Whitmore L, Wallace BA (2005) Spectral magnitude effects on the analyses of secondary structure from circular dichroism spectroscopic data.
Prot Sci, 14 ,368-374.
Pubmed
Return to top
Output Units
Output units are selectable, irrespective of the input units. The options for output units are the same as those for the input units.Return to top