alignment
Overview
This module provides functionality for aligning metabolic features from different samples in mass spectrometry data.
- Feature alignment: Align features across different samples, considering parameters like m/z tolerance and retention time tolerance.
- Gap filling: Fill in missing features across aligned samples using various strategies.
- Merge features: Clean feature table by merging features with almost the same m/z and retention time.
- Retention time correction: Correct retention times to align features more accurately.
- Output feature table: Save the aligned features to a file.
Classes
AlignedFeature
A class to model a feature in mass spectrometry data. Generally, a feature is defined as a unique pair of m/z and retention time.
Attributes:
feature_id_arr
(np.array): Feature ID from individual files (-1 if not detected or gap filled).mz_arr
(np.array): m/z values.rt_arr
(np.array): Retention times.scan_idx_arr
(np.array): Scan index of the peak apex.peak_height_arr
(np.array): Peak height.peak_area_arr
(np.array): Peak area.top_average_arr
(np.array): Average of the highest three intensities.ms2_seq
(list): Representative MS2 spectrum from each file (default: highest total intensity).length_arr
(np.array): Length (i.e. non-zero scans in the peak).gaussian_similarity_arr
(np.array): Gaussian similarity.noise_score_arr
(np.array): Noise score.asymmetry_factor_arr
(np.array): Asymmetry factor.sse_arr
(np.array): Squared error to the smoothed curve.is_segmented_arr
(np.array): Whether the peak is segmented.id
(int): Index of the feature.feature_group_id
(int): Feature group ID.mz
(float): m/z.rt
(float): Retention time.reference_file
(str): The reference file with the highest peak height.reference_scan_idx
(int): The scan index of the peak apex from the reference file.highest_intensity
(float): The highest peak height from individual files (which is the reference file).ms2
(str): Representative MS2 spectrum.ms2_reference_file
(str): The reference file for the representative MS2 spectrum.gaussian_similarity
(float): Gaussian similarity from the reference file.noise_score
(float): Noise level from the reference file.asymmetry_factor
(float): Asymmetry factor from the reference file.detection_rate
(float): Number of detected files / total number of files (blank not included).detection_rate_gap_filled
(float): Number of detected files after gap filling / total number of files (blank not included).charge_state
(int): Charge state.is_isotope
(bool): Whether it is an isotope.isotope_signals
(list): Isotope signals [[m/z, intensity], …].is_in_source_fragment
(bool): Whether it is an in-source fragment.adduct_type
(str): Adduct type.annotation_algorithm
(str): Annotation algorithm. Not used now.search_mode
(str): ‘identity search’, ‘fuzzy search’, or ‘mzrt_search’.similarity
(float): Similarity score (0-1).annotation
(str): Name of annotated compound.formula
(str): Molecular formula.matched_peak_number
(int): Number of matched peaks.smiles
(str): SMILES.inchikey
(str): InChIKey.matched_precursor_mz
(float): Matched precursor m/z.matched_adduct_type
(str): Matched adduct type.matched_ms2
(str): Matched ms2 spectra.
Functions
feature_alignment
feature_alignment(path: str, params: Params)
Align the features from multiple processed single files as .txt format.
Parameters:
path
(str): The path to the feature tables of individual files.params
(Params object): The parameters for alignment including sample names and sample groups.
Returns:
features
(list of AlignedFeature objects)
gap_filling
gap_filling(features, params: Params)
Fill the gaps for aligned features.
Parameters:
features
(list of AlignedFeature objects): The aligned features.parameters
(Params object): The parameters used for gap filling.
Returns:
features
(list of AlignedFeature objects).
merge_features
merge_features(features: list, params: Params)
Clean features by merging features with almost the same m/z and retention time.
Parameters:
features
(list of AlignedFeature objects): The aligned features.params
(Params object): The parameters used for merging features.
Returns:
features (list of AlignedFeature objects).
convert_features_to_df
convert_features_to_df(features, sample_names, quant_method="peak_height")
Convert the aligned features to a DataFrame.
Parameters:
features
(list of AlignedFeature objects): The aligned features.sample_names
(list): The sample names.quant_method
(str): The quantification method, “peak_height”, “peak_area” or “top_average”.
Returns:
feature_table
(pd.DataFrame): The feature DataFrame.
output_feature_to_msp
output_feature_to_msp(feature_table, output_path)
Output MS2 spectra to MSP format.
Parameters:
feature_table
(pd.DataFrame): The feature table.output_path
(str): The path to the output MSP file.
output_feature_table
output_feature_table(feature_table, output_path)
Output the aligned feature table.
Parameters:
feature_table
(pd.DataFrame): The aligned feature table.output_path
(str): The path to save the aligned feature table.
retention_time_correction
retention_time_correction(mz_ref, rt_ref, mz_arr, rt_arr, mz_tol=0.01, rt_tol=2.0, mode='linear_interpolation', rt_max=None)
Correct retention times for feature alignment. There are three steps including (1) finding the selected anchors in the given data, (2) creating a model to correct retention times, and (3) correcting retention times.
Parameters:
mz_ref
(np.array): The m/z values of the selected anchors from another reference file.rt_ref
(np.array): The retention times of the selected anchors from another reference file.mz_arr
(np.array): Feature m/z values in the current file.rt_arr
(np.array): Feature retention times in the current file.mz_tol
(float): The m/z tolerance for selecting anchors.rt_tol
(float): The retention time tolerance for selecting anchors.mode
(str): The mode for retention time correction. Only ’linear_interpolation’ is available now.rt_max
(float): End of the retention time range.
Returns:
rt_arr
(np.array): The corrected retention times.f
(interp1d): The model for retention time correction.
rt_anchor_selection
rt_anchor_selection(data_path, num=50, noise_score_tol=0.1, mz_tol=0.01)
Select retention time anchors from the feature tables. Retention time anchors have unique m/z values and low noise scores. From all candidate features, the top num features with the highest peak heights are selected as anchors.
Parameters:
data_path
(str): The absolute directory to the feature tables.num
(int): The number of anchors to be selected.noise_score_tol
(float): The noise level for the anchors.mz_tol
(float): The m/z tolerance for selecting anchors.
Returns:
anchors
(list): A list of anchors (dict) for retention time correction.