feature_grouping

feature_grouping

Overview

The feature_grouping module provides functions to group features based on their m/z values, retention times, MS2 data, and scan-to-scan correlation. The module is designed for untargeted metabolomics workflows to group features that may represent the same compound, isotopes, in-source fragments, or adducts. The functions in this module are used to group features based on the reference file or within a single file.

Functions

group_features_after_alignment

group_features_after_alignment(features, params: Params)

Groups features after alignment based on the reference file. This function requires reloading the raw data to examine the scan-to-scan correlation between features. The annotated feature groups are stored in the feature_group_id attribute of the AlignedFeature objects.

Parameters:

  • features (list): A list of AlignedFeature objects.
  • params (Params object): A Params object that contains the parameters for feature grouping.

group_features_single_file

group_features_single_file(d)

Groups features from a single file based on the m/z, retention time, MS2, and scan-to-scan correlation. The annotated feature groups are stored in the feature_group_id attribute of the Feature objects.

Parameters:

  • d (MSData object): An MSData object containing the detected ROIs to be grouped.

generate_search_dict

generate_search_dict(feature, adduct_form, ion_mode)

Generates a search dictionary for feature grouping based on the adduct form and ionization mode.

Parameters:

  • feature (Feature object): The feature object to be grouped.
  • adduct_form (str): The adduct form of the feature.
  • ion_mode (str): The ionization mode, either “positive” or “negative”.

Returns:

  • dict: A dictionary containing the possible adducts and in-source fragments.

find_isotope_signals

find_isotope_signals(mz, signals, mz_tol=0.015, charge_state=1, num=5)

Finds isotope patterns from the MS1 signals based on the m/z value and intensity.

Parameters:

  • mz (float): The m/z value of the feature.
  • signals (np.array): The MS1 signals as [[m/z, intensity], …].
  • mz_tol (float): The m/z tolerance to find isotopes (default 0.015 Da).
  • charge_state (int): The charge state of the feature (default 1).
  • num (int): The maximum number of isotopes to be found (default 5).

Returns:

  • numpy.array: The m/z and intensity of the isotopes.

scan_to_scan_cor_intensity

scan_to_scan_cor_intensity(a, b)

Calculates the scan-to-scan correlation between two features using Pearson correlation based on their intensity profiles.

Parameters:

  • a (np.array): Intensity array of the first m/z.
  • b (np.array): Intensity array of the second m/z.

Returns:

  • float: The scan-to-scan correlation between the two features.

scan_to_scan_correlation

scan_to_scan_correlation(feature_a, feature_b)

Calculates the scan-to-scan correlation between two features using Pearson correlation based on their intensity profiles.

Parameters:

  • feature_a (Feature object): The first feature object.
  • feature_b (Feature object): The second feature object.

Returns:

  • float: The scan-to-scan correlation between the two features.

get_charge_state

get_charge_state(mz_seq, valid_charge_states=[1,2])

Determines the charge state of the isotopes based on the m/z sequence.

Parameters:

  • mz_seq (list): A list of m/z values of isotopes.
  • valid_charge_states (list): A list of valid charge states (default [1,2]).

Returns:

  • int: The charge state of the isotopes.

Constants

ADDUCT_POS

A dictionary of positive adducts with the adduct form as the key and the m/z shift, charge state, and multiplier as the values.

ADDUCT_NEG

A dictionary of negative adducts with the adduct form as the key and the m/z shift, charge state, and multiplier as the values.