1  
1 1
1
  Home 1
  ProteinDecision 1
  Database 1
  Scoring Schema
1
  Statistic Model
1
  Parameters
1
  Users Interface
1
  User Manu
1

 

Features-How to use?

The following part is the introduction of the features of our software. For more details, please refer to the user manu.

Database

The database used for protein identification in this paper is sprot45 from UniProtKB/Swiss-Prot (last updated in January 2005), together with the 40 proteins from soybean (generated after January 2005) that we have identified but not included in the database. The database has 163275 proteins in total, originally in FASTA format. After the data extraction, the database was transformed to a program-defined format including eight fields for each entry: accession number, peptide number, peptide sequences, peptide masses, peptide lengths, protein sequence, protein name, and protein molecular weight. The molecular weight of a peptide of N residues is calculated as
formula1
Equation (1) takes into account an amino-terminal hydrogen and a carboxy-terminal hydroxyl group, which sum up to 18.015.In this study, we only consider a protein that the charge state of all the peptides is 1 and no post-translational modification exists in any peptide.

Scoring Schema(PBSF)

To handle the statistical properties in PMF protein identification more systematically, we developed a new scoring scheme based on the MOWSE occurrence table. In this case,when comparing a mass distribution of peptides (n fragment molecular weights in the spectra) with the database entry of molecular weights (protein k in the column j), R(l) represents the row number of the table for the lth fragment of the mass spectra. When the difference in two peptide weights is within a tolerance value, it is a hit or match. Otherwise it is nonmatching. The probability for a match between a mass distribution of peptides and a protein k in the database is computed via

formula2

where R(l) represents the row number of the table for the l-th fragment of the mass spectrum, H_k is the set of the matched fragments in the mass spectrum with protein-k, and n_k_ij is the number of peptides (an integer) in cell-ij of the MOWSE table for protein-k (column j). m_ij indicates the average number of peptides in cell-ij divided by the total number of proteins in column j. M_j is the average number of peptides per protein in the j-th column.

Statistic Model

The current computational methods assume that the target protein is in the sequence database and use the best hit ranked by the raw score as the prospective target. This may lead to false positive results since the top hit may not be the target protein and the protein sample may not be in the search database. Given the potential inaccurate data analysis, we developed a confidence assessment for the PMF data analysis results (1) to get an idea to what extent a user can trust the protein identification result and (2) to re-rank the protein hits based on the confidence assessment instead of raw scores. Such a capacity may significantly improve the computational analysis of PMF data.

Running Parameters

Running parameters will make effects in calculating the final result. The software has default values for parameters, however, users can set their own for specific application. In the "Setting Running Parameters" dialog, there are several parameters to be set.

"Minimum Matches" and "Minimum Percentage":

The parameters give the lower bounds of the minimum number of peptide and minimum percentage of peptide that get matched between mass-spectrum and database proteins which can be considered as a candidate. They will reduce the searching space.

"Maximum Peptide Mass" and "Maximum Protein Mass":

The parameters give the upper bounds of the peptide mass and protein mass that can be considered as a candidate. They will filter out the proteins and peptides which are out of the range and reduce the searching space.

"Output Numbers":

It define the size of output set for candidate proteins.

"Tolerance":

It is the definition of ¡°matching area¡± between mass-spectrum and digested peptide.

"Miss-Cleavage":

It supports uncompleted digestion with the specified number of miss cleavage. The miss-cleaved peptides are possibly being considered when matching with the experiment data.

"performance":

Users can set performance value according to their computer¡¯s configuration. When the slider¡¯s position is at the right most, the program reaches the highest performance. However, it requires the largest computer memory at meanwhile.


"Incorporate Non-Hit Infor"

If the option is "Yes", the scoring schema used in the program integrates the "non-hit" spectrum fragment information as part of the score. It's an option for users, however, not recommended for most cases.

"Incorporate Intensity in Confidence Estimation"

If the option is "Yes", the program makes use of the peak intensity in confidence evaluation.

"Score Schema Option"

If the option is "PBSF", the program uses PBSF score schema and if the option is "MPBSF", it uses the modified "PBSF".

for more details of the options, see our paper Electrophoresis. 28:864-870, 2007.

Users Interface

We provide a user-friendly interface for the software. For example, the following figure shows the interface of setting running parameters.

set parameters

Web site and all contents Copyright Digbio 2007, All rights reserved.
to the main page of ProteinDecision