alignment
Overview
This module provides functionality for aligning metabolic features from different samples in mass spectrometry data.
- Feature alignment: Align features across different samples, considering parameters like m/z tolerance and retention time tolerance.
- Gap filling: Fill in missing features across aligned samples using various strategies.
- Merge features: Clean feature table by merging features with almost the same m/z and retention time.
- Retention time correction: Correct retention times to align features more accurately.
- Output feature table: Save the aligned features to a file.
Classes
AlignedFeature
A class to model a feature in mass spectrometry data. Generally, a feature is defined as a unique pair of m/z and retention time.
Attributes:
feature_id_arr(np.array): Feature ID from individual files (-1 if not detected or gap filled).mz_arr(np.array): m/z values.rt_arr(np.array): Retention times.scan_idx_arr(np.array): Scan index of the peak apex.peak_height_arr(np.array): Peak height.peak_area_arr(np.array): Peak area.top_average_arr(np.array): Average of the highest three intensities.ms2_seq(list): Representative MS2 spectrum from each file (default: highest total intensity).length_arr(np.array): Length (i.e. non-zero scans in the peak).gaussian_similarity_arr(np.array): Gaussian similarity.noise_score_arr(np.array): Noise score.asymmetry_factor_arr(np.array): Asymmetry factor.sse_arr(np.array): Squared error to the smoothed curve.is_segmented_arr(np.array): Whether the peak is segmented.id(int): Index of the feature.feature_group_id(int): Feature group ID.mz(float): m/z.rt(float): Retention time.reference_file(str): The reference file with the highest peak height.reference_scan_idx(int): The scan index of the peak apex from the reference file.highest_intensity(float): The highest peak height from individual files (which is the reference file).ms2(str): Representative MS2 spectrum.ms2_reference_file(str): The reference file for the representative MS2 spectrum.gaussian_similarity(float): Gaussian similarity from the reference file.noise_score(float): Noise level from the reference file.asymmetry_factor(float): Asymmetry factor from the reference file.detection_rate(float): Number of detected files / total number of files (blank not included).detection_rate_gap_filled(float): Number of detected files after gap filling / total number of files (blank not included).charge_state(int): Charge state.is_isotope(bool): Whether it is an isotope.isotope_signals(list): Isotope signals [[m/z, intensity], …].is_in_source_fragment(bool): Whether it is an in-source fragment.adduct_type(str): Adduct type.annotation_algorithm(str): Annotation algorithm. Not used now.search_mode(str): ‘identity search’, ‘fuzzy search’, or ‘mzrt_search’.similarity(float): Similarity score (0-1).annotation(str): Name of annotated compound.formula(str): Molecular formula.matched_peak_number(int): Number of matched peaks.smiles(str): SMILES.inchikey(str): InChIKey.matched_precursor_mz(float): Matched precursor m/z.matched_adduct_type(str): Matched adduct type.matched_ms2(str): Matched ms2 spectra.
Functions
feature_alignment
feature_alignment(path: str, params: Params)
Align the features from multiple processed single files as .txt format.
Parameters:
path(str): The path to the feature tables of individual files.params(Params object): The parameters for alignment including sample names and sample groups.
Returns:
features(list of AlignedFeature objects)
gap_filling
gap_filling(features, params: Params)
Fill the gaps for aligned features.
Parameters:
features(list of AlignedFeature objects): The aligned features.parameters(Params object): The parameters used for gap filling.
Returns:
features(list of AlignedFeature objects).
merge_features
merge_features(features: list, params: Params)
Clean features by merging features with almost the same m/z and retention time.
Parameters:
features(list of AlignedFeature objects): The aligned features.params(Params object): The parameters used for merging features.
Returns:
features (list of AlignedFeature objects).
convert_features_to_df
convert_features_to_df(features, sample_names, quant_method="peak_height")
Convert the aligned features to a DataFrame.
Parameters:
features(list of AlignedFeature objects): The aligned features.sample_names(list): The sample names.quant_method(str): The quantification method, “peak_height”, “peak_area” or “top_average”.
Returns:
feature_table(pd.DataFrame): The feature DataFrame.
output_feature_to_msp
output_feature_to_msp(feature_table, output_path)
Output MS2 spectra to MSP format.
Parameters:
feature_table(pd.DataFrame): The feature table.output_path(str): The path to the output MSP file.
output_feature_table
output_feature_table(feature_table, output_path)
Output the aligned feature table.
Parameters:
feature_table(pd.DataFrame): The aligned feature table.output_path(str): The path to save the aligned feature table.
retention_time_correction
retention_time_correction(mz_ref, rt_ref, mz_arr, rt_arr, mz_tol=0.01, rt_tol=2.0, mode='linear_interpolation', rt_max=None)
Correct retention times for feature alignment. There are three steps including (1) finding the selected anchors in the given data, (2) creating a model to correct retention times, and (3) correcting retention times.
Parameters:
mz_ref(np.array): The m/z values of the selected anchors from another reference file.rt_ref(np.array): The retention times of the selected anchors from another reference file.mz_arr(np.array): Feature m/z values in the current file.rt_arr(np.array): Feature retention times in the current file.mz_tol(float): The m/z tolerance for selecting anchors.rt_tol(float): The retention time tolerance for selecting anchors.mode(str): The mode for retention time correction. Only ’linear_interpolation’ is available now.rt_max(float): End of the retention time range.
Returns:
rt_arr(np.array): The corrected retention times.f(interp1d): The model for retention time correction.
rt_anchor_selection
rt_anchor_selection(data_path, num=50, noise_score_tol=0.1, mz_tol=0.01)
Select retention time anchors from the feature tables. Retention time anchors have unique m/z values and low noise scores. From all candidate features, the top num features with the highest peak heights are selected as anchors.
Parameters:
data_path(str): The absolute directory to the feature tables.num(int): The number of anchors to be selected.noise_score_tol(float): The noise level for the anchors.mz_tol(float): The m/z tolerance for selecting anchors.
Returns:
anchors(list): A list of anchors (dict) for retention time correction.