Here it is! The much longed for MaxQuant entry that I planned for some time now – and even promised.
MaxQuant is a software package for quantitative proteomics, specifically aimed at high-resolution MS data. It is designed to analyze large-scale MS data sets and to support all main labeling techniques, like SILAC, Di-methyl, TMT and iTRAQ, as well as label-free quantification. The software package comes with Andromeda peptide search engine and Perseus framework, for statistical analysis, integrated.
How does it work?
The software is based on a set of algorithms, including peak detection and scoring peptides. It performs calibration of mass and searches peptide databases to identify proteins, quantifies identified proteins and provides summarizing statistics. See here for user’s manual.
1) Raw data: correction for systematic inaccuracies of measured peptide masses and corresponding retention times of extracted peptides. The raw data can be inspected with the viewer application.
Viewer app: The Viewer module can either be used to a) get some prior information out of generated raw files or to b) find some follow up things after raw files have already been processed (tutorial here).
2) Peptide identification: mass and intensity of the peptide peaks in a MS spectra are detected and assembled into SD peak hills over the m/z retention time plane. This is filtered to identify isotope patterns through applying graph theory algorithms.
High mass accuracy is achieved by weighted averaging and through mass recalibration: the measured mass (of each MS isotope pattern) – the determined systematic mass error.
Peptide and fragment masses searching: organism specific sequence database search for peptide masses and fragment masses (in case of an MS/MS). MaxQuant has the search engine Andromeda integrated.
Andromeda is a peptide search engine. It is able to assign and score complex patterns of PTM, such as highly phosphorylated peptides, and accommodates extremely large databases. Identification of co-fragmented peptides improves the number of identified peptides. More info here
Scoring: peptide and fragment masses are scored by a probability-based approach termed peptide score.
Target-decoy approach and FDR: a target-decoy-based FDR (false discovery rate) approach is used to limit a certain number of peak matches by chance.
- The FDR is determined using statistical methods that account for multiple hypothesis testing.
- The organism specific database search includes the reverse counterparts of the target sequences (together with the “forward”/normal sequences) and contaminants to help determine a statistical cutoff for acceptable spectral matches.
3) Assembly of peptide hits into protein hits – to identify proteins: each identified peptide of a protein contributes to the overall identification accuracy.
- Matching between runs: an FDR-controlled algorithm that enables MS/MS free identification of MS features in the complete data set for each single measurement = increased number of quantified proteins per sample
- Perseus performs bioinformatic analyses of the output of MaxQuant and so completes the proteomics analysis pipeline (tutorials here).
What is the input?
.raw format of data from the MS run. In our case the LC-MS/MS analysis was carried out on an QE-HF (FT-MS or Orbitrap) generating the raw data.
What does it output?
MQ outputs several files of information, among others a .txt file called proteinGroups with proteins that share the same peptides grouped together. This file can easily be read in Excel.
What are the different columns we get?
- Protein groups – group of proteins that share the same identified peptides. All proteins in a group has the same, or less, number of the identified peptides.
- Unique peptide – unique sequence obtained by removing the redundancy from the peptide hits.
- Peptide hits / spectra hits – the number of peptide-spectrum matches. Describes the relative abundance of a protein.The larger the protein, the more abundant it is.
- Razor peptides – a peptide that has been assigned to the protein group with the largest number of total peptide identified.
- If unique, the razor peptide only matches to this single protein group.
- If not unique, the razor peptide will only be a razor peptide for the group with the largest number of peptide IDs.
If all peptide IDs from a sample analysis can be explained with the presence of a proteinGroup (A), the peptide should be assigned to this group as a razor peptide and you need not assume that there is a second proteinGroup (B) too. NB MaxQuant will still assign the peptide to the second proteinGroup (B) for your information, but not as a razor peptide. This group (B) will however only appear in the output proteinGroup file if it’s also identified by at least one unique peptide, since MQ will always generate the shortest proteinGroup list that is sufficient to explain all peptide IDs. Every peptide sequence is a razor peptide for one proteinGroup only. Read more here.
[Other columns… coming soon]
What are LFQ and iBAQ?
Intensities are the sums of all individual peptide intensities belonging to a particular protein group. Unique and razor peptide intensities are used as default.
- LFQ (Label-free quantification) intensities are based on the (raw) intensities and normalized on multiple levels to make sure that profiles of LFQ intensities across samples accurately reflect the relative amounts of the proteins.
- iBAQ (Intensity Based Absolute Quantification) values calculated by MaxQuant are the (raw) intensities divided by the number of theoretical peptides. Thus, iBAQ values are proportional to the molar quantities of the proteins. The iBAQ algorithm can roughly estimate the relative abundance of the proteins within each sample.””
What is the rationale for choosing iBAQ over LFQ?
How do you interpret the data?
What was the rationale for choosing MaxQuant over OpenMS?
In the end we chose to work with MaxQuant rather than OpenMS (TOPPAS). The reason for this was mainly that we were running out of time and needed to get some results so that we could proceed with analyzing the processed data. OpenMS was just too time consuming for us (being more or less inexperienced with this type of analysis) due to the many steps involved in building a proper workflow to process our data. We needed to convert files, install different softwares, find/choose/build pipelines and trouble-shoot when things didn’t work. So even though we were very excited about this approach in the beginning we just had to let go in the end and choose MaxQuant which was easier to use. What we really liked about the OpenMS was how it felt like we were actually building something on our own, which required more research and detective work along with problem solving skills. For example, Magnus and I both really appreciated the different pipelines we tried and how you could combine them in different ways in TOPPAS. Oh well, maybe in the future, who knows…
What are the pros and cons with MaxQuant?
- Peptide identification rates
- Peptide mass accuracy
- Proteome-wide protein quantification
[more on advantages and disadvantages coming soon]
Hm, that’s all for now. Let’s see if and when I might have time and energy to complete this before our entry deadline at noon on Tuesday next week… After this no more entries are allowed.
Links of interest