Automatic determination of collision energy of mass spectrometer
阅读说明:本技术 质谱仪碰撞能量的自动测定 (Automatic determination of collision energy of mass spectrometer ) 是由 P·F·叶 H·L·卡达西斯 小詹姆斯·L·斯蒂芬森 于 2018-05-07 设计创作,主要内容包括:本公开建立了新的解离参数,所述参数能用于测定使用碰撞池类型碰撞诱导解离来实现给定分析物前体离子的期望解离程度所需的碰撞能量(CE)。这种选择仅基于所述分析物前体离子的分子量MW和电荷态z。提出了能用作“解离程度”的参数的度量,并且针对实现每个度量的一系列值所需的CE建立了预测模型。每个模型都仅是前体离子的MW和z的简单平滑函数。通过结合实时质谱去卷积(m/z到质量)算法,根据本发明的方法能够通过以前体依赖性方式对碰撞能量进行自动实时选择来控制所述解离程度。(The present disclosure establishes new dissociation parameters that can be used to determine the Collision Energy (CE) required to achieve a desired degree of dissociation of a given analyte precursor ion using collision cell type collision induced dissociation. This selection is based solely on the molecular weight MW and charge state z of the analyte precursor ion. Metrics are proposed that can be used as parameters for "degree of dissociation" and a predictive model is built for the CE required to achieve a range of values for each metric. Each model is simply a smooth function of MW and z of the precursor ion. By incorporating a real-time mass spectrum deconvolution (m/z to mass) algorithm, the method according to the invention enables control of the dissociation degree by automatic real-time selection of collision energy in a precursor-dependent manner.)
1. A method for identifying an intact protein in a sample containing a plurality of intact proteins using a mass spectrometer, the method comprising:
(a) introducing the sample into an ionization source of the mass spectrometer;
(b) generating a plurality of ion species from the plurality of intact proteins using the ionization source, each protein thereby generating a respective subset of the plurality of ion species, wherein each ion species in each subset is a multiply-protonated ion species generated from a respective one of the intact proteins;
(c) mass analysing the plurality of ion species using a mass analyser of the mass spectrometer;
(d) automatically identifying each subset of the plurality of ion species by performing a mathematical analysis on the data generated by the mass analysis and assigning a charge state z to each identified ion species and a molecular weight MW to each intact protein;
(e) selecting one of the ion species;
(f) the collision energy CE for fragmenting the selected ion species is automatically calculated using the following relationship:
CE(DP)=c+(1/k)[ln(1/DP)-1],
wherein DPIs a fraction of the selected ion species that is expected to remain unfragmented after the fragmentation, and c and k are functions of only the charge state z of the selected ion species and the molecular weight MW of the intact protein from which the selected ion species was derived;
(g) separating the selected ion species using automatically calculated collision energies and fragmenting the species, forming fragment ion species therefrom; and
(h) mass analysing the fragment ion species.
2. A method for identifying intact proteins within a sample containing a plurality of intact proteins using a mass spectrometer, the method comprising:
(a) introducing the sample into an ionization source of the mass spectrometer;
(b) generating a plurality of ion species from the plurality of intact proteins using the ionization source, each protein thereby generating a respective subset of the plurality of ion species, wherein each ion species in each subset is a multiply-protonated ion species generated from a respective one of the intact proteins;
(c) mass analysing the plurality of ion species using a mass analyser of the mass spectrometer;
(d) automatically identifying each subset of the plurality of ion species by performing a mathematical analysis on the data generated by the mass analysis and assigning a charge state z to each identified ion species and a molecular weight MW to each intact protein;
(e) selecting one of the ion species;
(f) the collision energy CE for fragmenting the selected ion species is automatically calculated using the following relationship:
wherein DEIs a parameter corresponding to a desired distribution of fragment ion species produced by said fragmentation, z is a designated charge state of said selected ion species, MW is the molecular weight of said intact protein from which said selected ion species was produced, and b1、b2And b3Is a predetermined parameter that varies according to DE;
(g) separating the selected ion species using automatically calculated collision energies and fragmenting the species, forming fragment ion species therefrom; and
(h) mass analysing the fragment ion species.
Technical Field
The present invention relates to mass spectrometry, and more particularly to methods and apparatus for mass spectrometry of complex mixtures of proteins or polypeptides by tandem mass spectrometry. More particularly, the present invention relates to a method and apparatus for fragmenting precursor ions using collision induced dissociation, and in which the selection of precursor ions to fragment and the magnitude of the collision energy to be imparted to the selected precursor ions are automatically determined.
Background
The study of proteins in living cells and tissues (proteomics) is an active area of clinical and basic scientific research, as metabolic control in cells and tissues is performed at the protein level. For example, comparison of protein expression levels between healthy and diseased tissues or between pathogenic microbial strains and non-pathogenic microorganisms may accelerate the discovery and development of new pharmaceutical compounds or agricultural products. Furthermore, analysis of protein expression patterns in diseased tissue or tissue excised from the organism receiving treatment can also serve as a diagnosis of the effectiveness of a disease state or treatment strategy and provide prognostic information regarding the appropriate treatment regimen and treatment selection for individual patients. Further, identification of proteomes in samples derived from microorganisms (e.g., bacteria) can provide a means to identify species and/or strains of microorganisms and possible resistance to such species or strains with respect to the bacteria.
Mass Spectrometry (MS) is currently considered to be a valuable analytical tool for biochemical mixture analysis and protein identification, as it can be used to provide detailed protein and peptide structural information. Thus, conventional protein analysis methods typically combine two-dimensional (2D) gel electrophoresis for separation and quantification with mass spectrometric identification of proteins. Also, capillary liquid chromatography and various other "front-end" separation or chemical fractionation techniques have been used in conjunction with electrospray ionization tandem mass spectrometry to facilitate large-scale protein identification without gel electrophoresis. Qualitative differences between mass spectra can be identified by using mass spectrometry, and proteins corresponding to peaks that occur only in certain mass spectra serve as candidate biomarkers.
The term "top-down proteomics" refers to an analytical method in which a protein sample is introduced intact into a mass spectrometer without prior enzymatic, chemical or other digestion means. Top-down analysis allows the study of intact proteins, allowing identification, determination of major structures and localization of post-translational modifications (PTMs) directly at the protein level. Top-down proteomic analysis typically consists of: introducing intact protein into an ionization source of a mass spectrometer; determining the intact mass of the protein; fragmenting protein ions; and the mass-to-charge ratio (m/z) and abundance of each fragment thus produced were measured. This sequence of instrument steps is commonly referred to as tandem mass spectrometry, or alternatively as "MS/MS" analysis. This technique can be advantageously used for polypeptide studies. The resulting fragments are many times more complex than fragments of simple peptides. Interpretation of such fragment mass spectra typically involves comparing observed fragmentation patterns to a protein sequence database comprising compiled experimental fragmentation results generated from known samples, or alternatively to theoretically predicted fragmentation patterns. For example, Liu et al ("Top-Down protein identification/Characterization of a Priori Unknown Proteins by Ion trap collision-Induced Dissociation and Ion/Ion Reactions in a Quadrupole/Time-of-Flight Tandem Mass Spectrometer" Top-Down protein identification/Characterization of a Priori Unknown Proteins via Ion trap chromatography-Induced Dissociation and Ion/Ion reaction in a quadrupol/Time-of-Flight derived Mass Spectrometers "describe Top-Down protein identification and Characterization of modified and unmodified Unknown Proteins, up to a Mass of 28 kDa.
One advantage of top-down versus bottom-up analysis is that proteins can be identified directly, rather than being predicted as peptides in so-called "bottom-up" analysis. Another advantage is that alternative forms of the protein can be identified, such as post-translational modifications and splice variants. However, top-down analysis has a disadvantage compared to bottom-up analysis, as many proteins can be difficult to isolate and purify. Thus, in mass spectrometry, each protein in an incompletely separated mixture can produce a plurality of ion species, each species corresponding to a different respective degree of protonation and a different respective charge state, respectively, and each such ion species can produce a plurality of isotopic variants. A single MS mass spectrum measured in a top-down analysis can easily contain hundreds or even thousands of peaks belonging to different analytes-these peaks interleaved together in a given m/z range, with ion signals of very different intensities overlapping.
When front-end sample fractionation, such as two-dimensional gel electrophoresis or liquid chromatography, is performed prior to MS analysis, the complexity of each individual mass spectrum can be reduced. Nevertheless, the mass spectrum of such sample portions may still include characteristics of multiple proteins and/or polypeptides. A general technique of performing Mass Spectrometry (MS) analysis of ions generated from compounds separated by Liquid Chromatography (LC) may be referred to as "LC-MS". If the mass spectrometry is performed as tandem mass spectrometry (MS/MS), the process can be referred to as "LC-MS/MS". In a conventional LC-MS/MS experiment, a sample is first analyzed by mass spectrometry to determine the mass-to-charge ratio (m/z) of ions derived from the sample, and to identify (i.e., select) the mass spectral peak of interest. The sample is then further analyzed by performing a product ion MS/MS scan of the selected peak or peaks. More specifically, a full scan mass spectrum including an initial survey scan is obtained in the first stage of analysis (commonly referred to as "
Fig. 1A shows a hypothetical experimental scenario in which different fractions are resolved chromatographically well (in time) after being introduced into the mass spectrometer due to different analyte species. Curves a10 and a12 represent the assumed concentration of each respective analyte at various times, where concentration is expressed as a percentage on the relative intensity (r.i) scale, and time is plotted along the abscissa as retention time. Curves a10 and a12 can be readily determined from measurements of the total ion current input into the mass spectrometer. The threshold intensity level A8 for the total ion current was set to a level lower than that at which only MS1 data was acquired. When the first analyte (detected as peak a10) elutes, the total ionic current intensity crosses the threshold A8 at
In more complex analyte mixtures, there may be components where the elution peaks completely overlap, as shown in the ionic current intensity versus retention time graph of FIG. 1B. In this example, elution peak a11 represents the ion current attributable to a precursor ion produced by a first analyte, while elution peak a13 represents the ion current attributable to a different precursor ion produced by a second analyte, where the mass and/or charge state of these different precursor ions are different from each other. In the hypothetical case shown in fig. 1B, the elution of the compounds that produce different ions almost completely overlaps, and the mass spectral intensity of the first precursor ion is consistently greater than the mass spectral intensity of the second precursor ion during co-elution. As is assumed to be shown in fig. 1C, mass spectra of all precursor ions may occur at any time during co-elution of two analytes, e.g., between time t6 and time t7, where the line group indicated by
One common method of causing ion fragmentation in MS/MS analysis is Collision Induced Dissociation (CID), a method in which a population of analyte precursor ions are accelerated into a target neutral gas molecule, such as nitrogen (N) gas2) Or argon (Ar) to impart internal vibrational energy to the precursor ions, which can lead to bond breakage and dissociation. The fragment ions are analysed to provide useful information about the structure of the precursor ions. The term "collision induced dissociation" encompasses techniques in which energy is imparted to precursor ions by a resonance excitation process, which may be referred to as RE-CID techniques. This resonance excitation method includes applying an auxiliary alternating voltage (AC) to the trapping electrode in addition to the main RF trapping voltage. The auxiliary voltage typically has a relatively low amplitude (about 1 volt (V)) and a duration of about tens of milliseconds. The frequency of the auxiliary voltage is chosen to match the frequency of motion of the ions, which in turn is determined by the main trapping field amplitude, frequency and mass-to-charge ratio (m/z) of the ions. ByIn the resonance of the motion of the ions with the applied voltage, the energy of the ions increases and their motion amplitude increases.
Figure 2 schematically illustrates another method of collision induced dissociation, sometimes referred to as high energy collision dissociation (HCD). In the HCD method, selected ions are temporarily stored in or passed through a multipole
When using HCD or RE-CID to generate fragment ions in MS/MS experiments, it is highly desirable to set up the instrument so as to impart the correct amount of collision energy to selected precursor ions. For HCD, the Collision Energy (CE) is set by setting the potential difference by which ions are accelerated into the HCD cell. Where the ions collide with the resident gas one or more times until the ions exceed the vibrational energy threshold to break bonds, thereby producing dissociation product ions. The product ions may retain sufficient kinetic energy such that further collisions result in successive dissociation events. The optimum collision energy varies depending on the nature of the precursor ion selected. Setting the HCD collision energy too high can lead to such successive dissociation events, resulting in a large number of small, non-specific product ions. Conversely, setting this potential too low will result in ions that provide useful information all clustering together, as the mass spectral characteristics of at least some fragment ions may be weak or non-existent. In either case, sufficient structural information about the precursor ion will not be available from the product ion mass spectrum, and thus no identification or structural (or sequence) description can be provided. Analytes of different sizes, structures, and charge capacities dissociate to varying degrees at any given CE. Thus, using only a single collision energy setting for all precursor ions during an automated mass spectrometry experiment carries the risk of undesirable or unacceptable fragmentation of certain ions. Nevertheless, mass spectrometry procedures are often performed on samples or sample portions with reduced chemical diversity for a variety of reasons (e.g., ionization, chromatography, fragmentation, etc.). Reducing chemical diversity increases the likelihood of setting the appropriate collision energy by adjusting the collision energy on similar analytes.
Although resonance excitation CID (RE-CID) and HCD produce similar mass spectra based on the same charge of the same protein, the exact collision energy optima required to produce the maximum amount of structural information may vary greatly. In the case of RE-CID, since the applied assist frequency is at the same fundamental frequency as the motion of the precursor ions, the internal energy of the precursor ions is increased so that the minimum dissociation energy is reached and product ions are produced. The degree of fragmentation reaches a maximum with increasing applied energy and levels off with depletion of the precursor ions. If the applied fragmentation energy is further increased, the relative abundance of the individual product ions will generally not change. Conversely, as the fragmentation energy increases beyond the onset of the plateau region, the relative abundance of the product ions remains approximately constant, and little to no additional relevant structural information is obtained from this process.
In contrast, in the case of HCD fragmentation, the collision activation process is a function of only the potential difference between the HCD cell and the adjacent ion optical element. Thus, any product ions formed in the HCD cell may undergo further fragmentation based on their excess internal energy. Since the HCD process involves using nitrogen as the collision gas, rather than helium, which is commonly used in the RE-CID experiment, higher energy and more structural information can be obtained from the HCD process if near-optimal collision energy is applied. In the RE-CID process, increasing the applied collision energy beyond its optimum reduces the amount of precursor ions remaining, but does not significantly alter the amount of opposing fragment ions. In HCD fragmentation, increasing the applied collision energy beyond its optimum value typically results in further fragmentation of the fragment ions.
Fig. 3A shows a general comparison between the effect of increasing energy on the number of identifiable protein fragment ions produced by HCD fragmentation (curve 151) and the effect of increasing energy on the number of such identifiable ions produced by RE-CID fragmentation (curve 152). Curve 152 shows the effect of varying the applied resonance energy on fragmentation of the precursor ion from the protein myoglobin. In this example, the amount of structural information will remain relatively constant as the collision energy increases beyond 25% RCE. In contrast, when the HCD process is used (curve 151), the structural information content obtained has a well-defined maximum for an HCD energy of about 28% RCE. In the case where the collision energy is less than or exceeds the optimal RCE setting, the quality of the structural information obtained from the HCD experiment may be drastically degraded.
The effect of varying the applied HCD fragmentation energy is well illustrated in the fragmentation of the +8 charge state precursor ion of protein ubiquitin as shown by the product ion mass spectra of figures 3B-3D. Fig. 3B shows the limited number of fragment ions resulting from ion fragmentation when using a 25% suboptimal RCE setting. In many experimental cases, this limited fragmentation will not allow for the correct identification of proteins by searching standard tandem mass spectral libraries or using sequence information from available databases. However, when the RCE setting was changed to 30%, HCD fragmentation of the same precursor ion was best and the resulting product ion mass spectrum (fig. 3C) showed a rich array of fragments of various charge states that enabled identification of proteins using any of several methods. Finally, as shown in fig. 3D, further increasing the RCE setting to 40% results in an excessive fragmentation condition, in which case most of the product ions generated are singly charged low mass fragments that are more reflective of the amino acid composition of the protein than the actual protein sequence itself. Therefore, it is highly desirable to adjust the collision energy for HCD fragmentation of unknown proteins and complex mixtures in real time to maximize the available information content.
U.S. patent No. 6,124,591 in the name of inventor Schwartz et al describes a method of generating product ions in a quadrupole ion trap by RE-CID in which the amplitude of the applied resonance excitation voltage is substantially linearly related to the precursor ion m/z ratio. The technique described in us patent No. 6,124,591 attempts to normalize the main variations in the optimum resonant excitation voltage amplitude of different ions and the variations due to instrument variations. Schwartz et al further discovered that the contributions of different structures, charge states, and stability are secondary in nature to the impact of determining the applied collision energy, and that these secondary effects can be modeled by simple correction coefficients.
According to the teachings of Schwartz et al, a simple and fast calibration of the substantially linear relationship between the applied optimal CE and m/z can be performed on an instrument-by-instrument basis. Fig. 4A schematically illustrates the principle of the generation and use of a calibration curve. Initially, a calibration curve for a particular mass spectrometer is generated by fitting a linear relationship to the calibration data in which a particular percentage (e.g. 90% reduction) of the precursor ion intensity reduction is observed. The linear relationship is shown in fig. 4A by
Once the instrument calibration is determined, subsequent operation of the mass spectrometer typically does not take the full CE value represented by
wherein CEactualIs the applied collision energy, typically expressed in electron volts (eV), RCE is the relative collision energy-a percentage value typically defined by the user for each experiment, and f (z) is the charge correction factor. Table 1 in fig. 4B lists acceptable charge correction factors. Note that the numerator and denominator of the fraction in parentheses are both expressed in units of daltons, Da (or more precisely thomson, Th). Although this equation is generally sufficient to fine tune the absolute CE applied to the sample over a narrow range of precursor ion properties, it should be noted that since f (z) yields a fixed value when z ≧ 5, the collision energy is too high for heavier molecules with higher charge states (such as proteins and polypeptides), leading to excessive fragmentation of these species.
Recently, mass spectrometry of intact proteins and polypeptides has gained widespread popularity. For such applications, the size, structure and charge capacity of the analytes within the sample may vary greatly, thus requiring very different collision energies to achieve the same degree of dissociation. It has been found that even if the range of charge coefficients is extended and extrapolated to charge states above +5, the above equation does not sufficiently normalize the collision energy for all precursors in a polypeptide or whole protein sample. Therefore, these specific analytes require a modified model.
Disclosure of Invention
The present teachings relate to establishing new dissociation parameters that will be used to determine the HCD (collision cell type CID) Collision Energy (CE) required to achieve a desired degree of dissociation for a given analyte precursor ion. The selection is based solely on the Molecular Weight (MW) and charge state (z) of the analyte precursor ions. To this end, the inventors designed two different indicators that could be used as a measure of the "degree of dissociation" D, and replace the previous oneRelative collision energy and normalized collision energy parameters. These two new indicators are relative precursor decay (D)p) And spectral entropy (D)E) Although other indicators describing the degree of dissociation are conceivable in the future. The inventors have further developed predictive models for the collision energy values required to achieve a range of values for each such indicator. Each model is simply a smooth function of MW and z of the precursor ion. By incorporating a real-time spectral deconvolution algorithm that is capable of determining the molecular weight of the analyte molecules, these new techniques will be able to control the extent of dissociation by automatic real-time selection of collision energy in a precursor-dependent manner. With these novel collision energy determination methods, the inventors eliminate the need for the user to "tune" or "optimize" the collision energy for different compounds or applications, since a single "degree of dissociation" parameter setting will apply to all sampled MW and z. This function is advantageous for complete protein analysis, in which case the precursor may encompass a wide range of physical properties in a single sample. Existing methods are tailored for a limited range of analyte properties (such as those of simple peptides) and do not adequately address the complexity of complete protein and polypeptide analysis.
Drawings
To further clarify the above and other advantages and features of the present disclosure, a more particular description of the disclosure will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It is appreciated that these drawings depict only illustrative embodiments of the disclosure and are therefore not to be considered limiting of its scope. The disclosure will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
FIG. 1A is a schematic of an analysis of two analyte fractions showing well separated chromatographic elution peaks;
FIG. 1B is a schematic of a portion of a chromatogram having highly overlapping elution peaks, both above an analysis threshold;
FIG. 1C is a schematic representation of a hypothetical plurality of staggered mass spectral peaks of two simultaneously eluting protein or polypeptide analytes;
FIG. 2 is a schematic illustration of a conventional apparatus and method for fragmenting ions by collision-induced dissociation;
fig. 3A is a general graphical comparison between the effect of increasing energy on the number of identifiable protein fragment ions produced by HCD fragmentation and the effect of increasing energy on the number of such identifiable ions produced by RE-CID fragmentation.
Fig. 3B, 3C and 3D are mass spectra of fragment ions generated by HCD fragmentation of +8 charge state precursor ions of protein ubiquitin using relative collision energy settings of 25, 30 and 40, respectively.
FIG. 4A is a graph showing the relationship between applied collision energy and precursor ion mass-to-charge ratio according to a known "normalized collision energy" manipulation technique;
FIG. 4B is a table showing correction coefficients applied to known normalized collision energy manipulation techniques to compensate for the effect of precursor ion charge states on the degree of fragmentation produced by collision induced dissociation;
FIG. 5A is a schematic diagram of a system for generating and automatically analyzing a chromatography/mass spectrum according to the present teachings;
FIG. 5B is a schematic diagram of an exemplary mass spectrometer suitable for use in conjunction with methods according to the present teachings, the mass spectrometer including a hybrid system including a quadrupole mass filter, a dual pressure quadrupole ion trap mass analyzer, and an electrostatic trap mass analyzer;
FIG. 6A is a set of graphs of the percentage of individual precursor ion species remaining after fragmentation as a function of applied collision energy fitted to data generated by a logistic regression plot, where the precursor ion species are the +22, +24, +26, and +28 charge states of carbonic anhydrase having a molecular weight of about 29 kdaltons;
FIG. 6B is a table of parameters that may be used in a model according to the present teachings to calculate collision energies that should be provided experimentally to yield various desired precursor ion survival percentages DpAccording to each selected DpThe values are tabulated.
FIG. 7A is a combination of five representative product ion mass spectra with different degrees of collision-induced dissociation, showing variations in the "total entropy of mass" values calculated in accordance with the present teachings;
FIG. 7B is a graph of dividing each of two product ion mass spectra into two regions and determining a first mass spectral entropy E associated with each first region1And a second mass spectral entropy E associated with each second region2And in E1、E2And total mass spectrum entropy EtotExamples of comparisons therebetween;
FIG. 8A is a total mass spectrum entropy calculated from product ion mass spectra (top panel), E, according to the present teachings1(middle panel) and E2(lower panel) a set of graphs that vary with the collision energy imparted to the indicative precursor ion charge state of myoglobin (about 17 kdalton);
FIG. 8B is a table of parameters that may be used to calculate collision energy according to another model of the present teachings, which should be provided experimentally to yield a parameter D according to product ion entropyESet of distributed product ions, for each selected DEThe values are tabulated.
Fig. 9A is a comparison between the conventionally calculated collision energy (solid line) as a function of mass-to-charge ratio and the calculated collision energy (dashed line) according to the entropy model of the present teachings, and for an ion charge state of +5 and a default setting of the conventional relative collision energy.
FIG. 9B is a comparison between the scaled conventionally calculated impact energy (solid line) and the impact energy calculated by the entropy model according to the present teachings (dashed line), where the conventionally calculated impact energy in FIG. 9A is scaled by a scaling factor of 0.79475.
FIG. 10 is a graph of charge state scaling factors that may be applied to conventionally calculated collision energies to reconcile those conventionally calculated collision energies with certain calculations determined in accordance with the present teachings;
FIG. 11 is a tabular representation of the charge state scaling factor graphically depicted in FIG. 10;
FIG. 12 is a flow chart of a method for tandem mass spectrometry analysis of a protein or polypeptide using automated collision energy determination according to the present teachings;
FIG. 13A is a pictorial representation of a computer screen information display showing the calculated peak cluster decomposition results from mass spectrometry of a five component protein mixture consisting of cytochrome-c, lysozyme, myoglobin, trypsin inhibitor and carbonic anhydrase produced by computer software employing a method according to the present teachings; and is
FIG. 13B is a diagram of a computer screen information display showing peak cluster decomposition results produced by computer software employing a method according to the present teachings, the display showing an expanded portion of the decomposition results shown in FIG. 13A.
FIG. A1 shows a mass spectrum and a series of m/z values studied by the methods taught in the appendix.
Detailed Description
The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the described embodiments will be readily apparent to those skilled in the art, and the generic principles herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the embodiments and examples shown, but is intended to be accorded the widest scope possible consistent with the claims. The specific features and advantages of the present invention will become more apparent in conjunction with the following discussion with reference to fig. 1-13.
FIG. 5A is a schematic example of a
Still referring to fig. 5A, the
FIG. 5B is a schematic diagram of a particular example
In operation of the
The quadrupole
The
The ion
As shown in fig. 5B,
Data collection for model development
Dissociation mass spectral data (MS/MS tandem mass spectral data) were collected on the following 11 protein standards: ubiquitin (. about.8 kDa), cytochrome c (. about.12 kDa), lysozyme (. about.14 kDa), RNAse A (. about.14 kDa), myoglobin (. about.17 kDa), trypsin inhibitor (. about.19 kDa), rituximab LC (. about.25 kDa), carbonic anhydrase (. about.29 kDa), GAPDH (. about.35 kDa), enolase (. about.46 kDa) and bovine serum albumin (. about.66 kDa). The sample was introduced by direct injection and ionized by electrospray ionization. These proteins were selected to construct models due to their well-known fragmentation patterns and performance as typical top-down protein standards. By HCD dissociation, approximately 10 charge states of each protein were selected for MS/MS analysis. In these experiments, the absolute collision energy CE of each precursor ion was varied from an absolute collision energy of 5 to 50eV in 1 electron volt (eV) steps. From these decay curves, a logistic regression plot of the charge state for each analysis was obtained. Calculating a metric D for each mass spectrumpAnd DEThese values are then used to build a predictive model for the CE (i.e. a function of the precursors MW and z) required to achieve a range of D values.
Precursor decay model
For each protein standard, the relative measured total ion current D was calculated at each absolute Collision Energy (CE) at each precursor ion charge state zpThe residual precursor ion strength of. DpThe variation with CE follows a standard decay curve as shown in figure 6A, where decay curves 302, 304, 306 and 308 represent precursor ion decay curves for the +22, +24, +26 and +28 charge states of carbonic anhydrase, respectively. Inventors modeled the variation by logistic regression
CE=c+(1/k)[ln(1/DP)-1]
Where the parameter c represents the CE at 50% relative to the precursor remaining, and the parameter k is the-slope at c.
c=0.0018×MW1.6×z-2.2Equation 3
k=0.00025×MW1.7×z1.9Equation 4
Using
After the step of modeling each decay curve by the logistic regression of
CE(DP)=a1×MWa2×za3equation 5
Where a1, a2, and a3 are for each D of interestpEach of the values is pre-calculated and tabulated as a parameter. Providing these parameters for DpAs shown in table 2 provided in fig. 6B.
Entropy model
For centroid product ion mass spectra, another measure of the degree of dissociation, total spectral entropy, is defined as follows:
Etotal=∑iPiln(Pi)
Wherein p isiIs the centroid intensity (or area) of the mass spectral peaks (in m/z) of the index i, normalized by the total intensity (or area) of all these peaks or the total ion current TIC. The sum is the sum of all centroids in the mass spectrum (all i). As described above, the calculated value of the total spectral entropy of the HCD product ion spectrum was found to closely reflect the degree of dissociation observed in the data, up to EtotalIs about 0.7, where the position of the ion current becomes an important consideration (fig. 7A). To enhance the ability to distinguish (or resolve) "ideal dissociation" into an excessive fragmentation range (high total spectral entropy), the total entropy is divided into a first partial entropy (E)1) And a second partial entropy (E)2) In which E1Represents the entropy of the region of the MS/MS spectrum from the minimum value m/z to half of the precursor ion m/z, and E2Representing the entropy of the spectral region from half the precursor ion m/z to the final m/z (fig. 7B). Thus, E is calculated using
The calculated E for the precursor ion charge state of the selected myoglobin is shown in FIG. 8Atotal、E1And E2The myoglobin is a protein of approximately 17kDa from the model dataset.
Considering all protein profiles, it was observed that: (a) e1Values increase monotonically over the CE range of interest; (b) e1Curve ratio E2The curve is much smoother, and (c) all E1The curves can be modeled well by logistic regression. Using only E1The disadvantage of the data is that the curves are relatively featureless and it is therefore difficult to normalize the different E' s1The value is obtained. However, the following facts are utilized: each E2The curve almost always contains a well-defined maximum value that defines the reference CE for each charge state of each protein standard. Likewise, the inventors have addressed MW, precursors z and E2The relationship between the CE values at the maximum in the curve is modeled, which results in the following equation 7:
CEE2max=0.1×MW0.93×z-1.5equation 7
Now apply this set of reference CE values to E1The curve, the E for each charge state associated with each protein standard can be determined2Maximum value of E1The value is obtained. Furthermore, by applying to each E1The curves are logically fitted and a CE that yields any desired fractional value of the reference entropy can be defined for each z of each standard. The fractional reference entropy becomes the new parameter DE. Specifically, the parameter DEIs defined for any particular z, e.g.
Wherein the content of the first and second substances,
is the first partial entropy E1Value of (i.e. collision energy CE)E2maxWith a second partial entropy E2Is correlated with the maximum value in (1). Any particular set of fractional entropy values for CE values can be fitted to a power function form similar toCE(DE)=b1×MWb2×zb3equation 9
Wherein b1, b2 and b3 are for DEThe various values of (a) are pre-calculated and tabulated parameters as shown in table 3 appearing in fig. 8B. As expected at DEAt 1, we have recovered
The b1, b2, and b3 values listed in each row of table 3 are associated with a certain product ion spread ("entropy fraction") DECorrelation, the spread is given by
Real-time fine calibration
Small instrument to instrument variability and time drift for any particular instrument are anticipated. In view of this, a mechanism is provided to automatically correct for variability that results in a fixed offset for any given model. For example, given an entropy model, if DESet to 0.68 and rolling average D of the latest mass spectra (e.g., 100 latest mass spectra)EIf the difference is more than +/-15% of this value, the system should automatically adjust so that the actual measured D isECloser to the desired "target" DE. We expect that a simple multiplicative correction coefficient will suffice without modifying the coefficients of the underlying equations.
New method for adapting conventional charge state correction coefficients
FIG. 9A shows a schematic representation of a system using, for example, the U.S. patent No. 96,124,591 collision energy (curve 703) calculated conventionally as z-5 and using an entropy fraction D of 1.0 according to the entropy modelEComparison between the calculated 35% impact energy versus impact energy (RCE) (curve 704). For entropy model calculation, the molecular weight was calculated as (m/z-1.007). times.z. Like the NCE curve (by definition, a straight line), the curve computed from the entropy model appears linear over the relevant m/z range of 500.. 2000. Therefore, it should be possible to apply a scaling factor to the NCE curve to obtain a fitted curve that matches the trend of the collision energy value calculated from the entropy model. In fact, the fitted
The resulting scaling factor for the first 5 charge states is significantly lower than 1, which means that the entropy model tends to assign lower collision energies than the standard NCE method using a default RCE value of 35%. Therefore, the scaling factor for z ═ {1..5} resulting from the fit is significantly different from the conventional correction factors used in the normalized collision energy model, and similar deviations are expected for "intermediate" charge states in the range around 6..10 (when the RCE correction factors are extrapolated to >5 higher charge states). However, for compatibility reasons, modification of the established correction coefficients (table 1) to the low charge state should be avoided.
To solve this problem, two methods are combined as follows: the curve of conventional correction coefficients is extrapolated in steps of-0.05 until intersecting the curve of scaling coefficients determined by curve fitting herein. The intersection was observed at z ≈ 10, which marks the transition of the traditional approach to the novel entropy approach described herein. The resulting scaling factors are shown in fig. 10 as
for z ═ 1..5}, the conventional correction coefficients given in table 1 were used.
For z {6..10}, the correction coefficients are extrapolated by decreasing the last value f (5) ═ 0.75 in steps of 0.05, i.e., f (z ═ {6..10}) {0.70,0.65,0.60,0.55,0.50 }.
For z >10, the correction factor is given by the scale factor resulting from the above fit and normalized to the applied NCE correction factor of 0.75 (to avoid the use of double scaling).
The extended NCE coefficients are given in table 4 shown in fig. 11.
Summary of examples of molecular weight calculation methods
The above model requires knowledge of the Molecular Weight (MW) of the analyte in order to estimate the optimal collision energy for fragmenting selected ions of the analyte. In the case of ionization of ions of protein and polypeptide molecules by electrospray ionization, the ions mainly comprise intact molecules with multiple adducted protons. In this case, the charge on each major analyte ion species is only equal to the number of adducted protons. In this case, the molecular weight can be readily determined, at least theoretically, provided that the individual multiply protonated molecular ion species represented in the mass spectrum can be identified and assigned to groups (i.e., series of charge states) according to their molecular origin. Unfortunately, this process of identification and assignment is often complicated by the fact that typical mass spectra often contain lines representing multiple overlapping sequences of charge states, and by the fact that the characteristics of each ion species of a given charge state can be divided by isotopic variations.
Since samples of biological origin are often very complex, a single MS mass spectrum can easily contain hundreds or even thousands of peaks belonging to different analytes, which are interwoven together within a given m/z range, where ion signals of very different intensities overlap and suppress each other. The computational challenge presented by this is to trace each peak back to a certain analyte or analytes. Eliminating "noise" and determining the correct charge distribution are the first steps to address this challenge. Once the charge of the peaks is determined, the charge states associated with the analyte can be further grouped using known relationships between charge states in the sequence of charge states. This information can further be used to determine the molecular weight of one or more analytes in a process best described as mathematical decomposition (also known in the art as mathematical deconvolution).
Furthermore, the mathematical deconvolution required to identify the various overlapping charge state sequences must be performed in "real time" (i.e., as the mass spectral data is acquired) because the deconvolution results of the precursor ion mass spectra are immediately used to select the ion species to be dissociated and determine the appropriate collision energy to apply during dissociation, which may vary from species to species. To be successful, a data acquisition strategy to predict multiple mass spectra for each ion species and an optimized real-time data analysis strategy are required. Typically, the deconvolution process should be completed in less than one second. An algorithm that achieves the desired analysis of complex samples within such time limits and operates as application software is described in U.S. pre-grant publication No. 2016/0268112a1, the disclosure of which is incorporated herein by reference in its entirety. Alternatively, co-pending european patent application No. 16188157, filed on 9.9.2016, teaches a method for another suitable mathematical deconvolution algorithm. The text of the aforementioned european patent application is included as an appendix to this document, and its drawings are included as a drawing a1 in the accompanying drawing set. The algorithm can be encoded into a hardware processor connected to the mass spectrometer and run faster. The following paragraphs briefly summarize some of the main features of the computational deconvolution algorithm described in the above-mentioned patent application publication No. 2016/
Only the centroid is used.
Standard mass spectrometry charge distribution algorithms use complete profile data of lines in the mass spectrum. In contrast, the calculation method described in U.S. pre-grant publication No. 2016/0268112a1 uses centroids. The main advantage of using centroids rather than line profiles is data reduction. Typically, the number of contour data points is about one order of magnitude greater than the number of centroids. Any algorithm that uses centroids will gain significant advantages in computational efficiency over the standard assignment method. For applications requiring real-time charge distribution, it is preferable to design an algorithm that requires only centroid data. The main drawback of using centroids is the inaccuracy of the m/z values. Factors such as mass accuracy, resolution, and peak extraction efficiency tend to compromise the quality of the centroid data. However, these concerns can be greatly alleviated by taking m/z inaccuracies into account in algorithms that employ centroid data.
The intensity is binary.
As described in U.S. pre-grant publication No. 2016/0268112a1, mass-line intensities are encoded as binary (or Boolean) variables (true/false or present/absent). The boolean method only considers whether the centroid intensity is above a threshold. The intensity value will take a boolean "True" value if it meets user settable criteria based on signal strength or signal to noise ratio or both, otherwise a "False" value will be assigned regardless of the actual value of the intensity. A well-known disadvantage of using boolean values is the loss of information. However, if a large number of data points can be used, e.g., thousands of centroids in a typical high resolution mass spectrum, then the number of boolean variables is far enough to compensate for the loss of intensity information. Thus, the cited deconvolution algorithm takes advantage of this data abundance to achieve efficiency and accuracy.
In an alternative embodiment, additional accuracy can be achieved without significant computation speed loss by using approximate intensity values rather than just boolean true/false variables. For example, a case where only peaks of similar heights are compared with each other can be conceived. By discretizing the intensity values into a small number of low resolution bins (e.g., "low," "medium," "high," and "very high"), additional information can be easily accommodated. Such binning may enable a good balance with "height information" without sacrificing computational simplicity of the intensity of the very simplified representation.
To achieve computational efficiency comparable to when using the boolean variable alone while still incorporating intensity information, one approach is to encode the intensity in bytes, the same size as the boolean variable. This can be easily achieved by using the logarithm of the intensity (rather than the original intensity) and the appropriate logarithm base in the calculation. The logarithm of the intensity may be further converted to an integer. If the logarithmic base is chosen appropriately, the logarithmic (intensity) values will all comfortably fall within the range of values 0-255, which can be expressed as one byte. In addition, rounding errors in converting double precision variables to integers can be minimized by careful selection of the logarithmic base.
To further minimize any performance degradation that may be caused by the byte algorithm (rather than the boolean algorithm), the computation for the separation or grouping centroid may only need to compute the strength ratio, rather than the byte value strength itself. The calculation of the ratio is very efficient because: 1) the logarithm of the ratio does not use floating-point division, but a simple difference of logarithms, in which case it is converted to a subtraction of only two bytes; and 2) to recover the exact ratio from the difference of the logarithmic values, only exponentiation of the difference of the logarithmic values is required. Since such a calculation will only encounter the exponents of a limited predefined set of numbers (i.e. all possible integer differences between 2 bytes (-255 to + 255)), the exponents can be pre-calculated and stored as a look-up array. Therefore, the use of a byte representation of the log strength and a pre-computed index lookup array does not affect computational efficiency.
Sub-box of mass-to-charge ratio
As described in U.S. pre-grant publication No. 2016/0268112a1, the mass-to-charge ratio values are converted and assembled into a low resolution bin, and the relative charge state intervals are pre-calculated once and buffered for efficiency. In addition, the m/z values of mass spectral lines have been converted from their normal linear scale in daltons to a more natural dimensionless logarithmic representation. This conversion greatly simplifies the calculation of m/z values for any peaks belonging to the same protein, for example, but potentially representing different charge states. The conversion does not affect the accuracy. The cached relative m/z values can be utilized to improve computational efficiency when computing using the converted variables.
Charge state scores based on simple counts and statistical selection criteria.
All relevant mass spectra were encoded as a boolean array as described in U.S. pre-grant publication No. 2016/
Iterative optimization of charge state assignments
The teachings of the aforementioned U.S. pre-grant publication No. 2016/0268112a1 use an iterative process defined by a complete self-consistency of charge distribution. The final key feature of the method is to direct the charge distribution to the solution using appropriate optimality conditions. The optimal condition is simply defined as the most consistent distribution of charge for all centroids in the mass spectrum. The basis for this condition is that the charge state assigned to each centroid should coincide with the charge state assigned to the other centroids in the mass spectrum. The algorithm described in said publication implements an iterative process to direct the generation of charge state assignments according to the optimality conditions described above. This process conforms to the accepted specifications for the optimization process. That is, an appropriate optimality condition is first defined, then an algorithm is designed to satisfy this condition, and finally, the effectiveness of the algorithm can be judged by how well it satisfies the optimality condition.
Examples of mass spectrum deconvolution results
FIG. 13A shows the deconvolution results for a five protein mixture consisting of cytochrome c, lysozyme, myoglobin, trypsin inhibitor, and carbonic anhydrase, where deconvolution was performed according to the teachings of U.S. Pub. No. 2016/
Figure 12 is a flow chart of a method (method 800) for tandem mass spectrometry analysis of a protein or polypeptide using automated collision energy determination according to the present teachings. In
After the first generation ions are introduced into the mass spectrometer, the first generation ions are mass analyzed in
In
The optimal collision energy may be calculated in
In
If there are any remaining non-fragmented selected precursor ion species after
And (4) conclusion: model inspection
By correlating the parameters DpAnd DEAnd the mass spectrum deconvolution algorithm of U.S. pre-grant publication No. 2016/0268112a1, supra, incorporated existing data acquisition control software, testing the precursor decay model and the entropy model. The protein fraction of E.coli cell lysates was analyzed by MS/MS analysis of the liquid chromatography fraction using precursor ion decay and product ion entropy models, and by various optimized fixed normalized collision energies. In these experiments, it was observed that using either model to calculate the optimal collision energy improved the control of the degree of dissociation relative to the optimized fixed conventional normalized collision energy scheme. Using the methods of the present teachings, this improved fragmentation has led to improvements in protein identification in various data sets.
Appendix: method for identifying monoisotopic mass of molecular species
- 上一篇:一种医用注射器针头装配设备
- 下一篇:具有共用栅极堆叠的双通道CMOS