feature_detection

Documentation

APIs

feature_detection

Overview

Untargeted feature detection.

Classes

`Feature`

A class to store a feature characterized by a unique pair of m/z and retention time.

Attributes:

rt_seq (list of float): Retention time sequence.
signals (list): Signal sequence organized as [[m/z, intensity], …].
scan_idx_seq (list of int): Scan index sequence.
ms2_seq (list): MS2 spectra.
gap_counter (int): Counter for the number of consecutive zeros in the peak tail.
id (int): Feature ID.
feature_group_id (int): Peak group ID.
mz (float): m/z value.
rt (float): Retention time.
scan_idx (int): Scan index of the peak apex.
peak_height (float): Peak height.
peak_area (float): Peak area.
top_average (float): Average of the highest three intensities.
ms2 (object): Representative MS2 spectrum.
length (int): Number of valid scans in the feature.
gaussian_similarity (float): Gaussian similarity.
noise_score (float): Noise score.
asymmetry_factor (float): Asymmetry factor.
sse (float): Squared error to the smoothed curve.
is_segmented (bool): Indicates if the feature is segmented.
is_isotope (bool): Indicates if the feature is an isotope.
charge_state (int): Charge state of the feature.
isotope_signals (list): Isotope signals [[m/z, intensity], …].
is_in_source_fragment (bool): Indicates if the feature is an in-source fragment.
adduct_type (str): Adduct type.
annotation_algorithm (str): Annotation algorithm.
search_mode (str): Search mode (‘identity search’, ‘fuzzy search’, or ‘mzrt_search’).
similarity (float): Similarity score (0-1).
annotation (str): Name of annotated compound.
formula (str): Molecular formula.
matched_peak_number (int): Number of matched peaks.
smiles (str): SMILES notation.
inchikey (str): InChIKey notation.
matched_precursor_mz (float): Matched precursor m/z.
matched_ms2 (object): Matched MS2 spectra.
matched_adduct_type (str): Matched adduct type.

Methods:

extend(self, rt, signal, scan_idx): Extends the chromatographic peak with new data points.
get_mz_error(self): Calculates the 3*sigma error of the feature’s m/z.
get_rt_error(self): Calculates the 3*sigma error of the feature’s retention time.
summarize(self, ph=True, pa=True, ta=True, g_score=True, n_score=True, a_score=True): Summarizes the feature by calculating summary statistics.
subset(self, start, end, summarize=True): Keeps a subset of the feature based on start and end positions.

Functions

`detect_features`

detect_features(d)

Detects features in the MS data.

Parameters:

d (MSData object): An object that contains the MS data.

Returns:

final_features (list of Feature objects): A list of detected features.

`segment_feature`

segment_feature(feature, sigma=1.2, prominence_ratio=0.05, distance=10, peak_height_tol=1000, length_tol=5, sse_tol=0.5)

Segments a feature into multiple features based on edge detection.

Parameters:

feature (Feature object): The feature to segment.
sigma (float): The sigma value for the Gaussian filter. DFault is 1.2.
prominence_ratio (float): The prominence ratio for finding peaks. Default is 0.05.
distance (int): The minimum distance between peaks. Default is 10.
peak_height_tol (float): The peak height tolerance for segmentation.
length_tol (int): The length tolerance for segmentation. Default is 5.
sse_tol (float): The squared error tolerance for segmentation. Default is 0.5.

Returns:

segmented_features (list of Feature objects): A list of segmented features.

Please see the source code for more internal functions.

classifier_builder feature_evaluation