utils_functions
Overview
The utils_functions module contains utility functions that are commonly used in the data processing and analysis of mass spectrometry data. The module includes functions for generating a sample table, getting timestamps for individual files, converting chemical formulas to m/z values, extracting signals from MS2 spectrum in string format, and converting signals to string format.
Functions
generate_sample_table
generate_sample_table(path=None, output=True)
Generate a sample table from the mzML or mzXML files in the specified path. The stucture of the path should be:
path
├── data
│ ├── sample1.mzml
│ ├── sample2.mzml
│ └── ...
└── ...Parameters:
path: str Path to the main directory that contains a subdirectory ‘data’ with mzML or mzXML files.output: bool If True, output the sample table to a csv file.
Return:
sample_table: pandas DataFrame A DataFrame with two columns: ‘Sample’ and ‘Groups’.
Output:
sample_table.csv: csv file A csv file with two columns: ‘Sample’ and ‘Groups’ in the specified path.
get_timestamps
get_timestamps(path=None, output=True)
Get timestamps for individual files and sort the files by time. The stucture of the path should be:
path
├── data
│ ├── sample1.mzml
│ ├── sample2.mzml
│ └── ...
└── ...Parameters:
path: str Path to the main directory that contains a subdirectory ‘data’ with mzML or mzXML files.output: bool If True, output the timestamps to a txt file with two columns: ‘file_name’ and ‘aquisition_time’.
Return:
file_times: list A list of tuples with two elements: ‘file_name’ and ‘aquisition_time’.
Output:
timestamps.txt: txt file A txt file with two columns: ‘file_name’ and ‘aquisition_time’ in the specified path.
formula_to_mz
formula_to_mz(formula, adduct, charge)
Calculate the m/z value of a molecule given its chemical formula, adduct and charge.
Parameters:
formula: str Chemical formula of the molecule.adduct: str Adduct of the molecule. The first character should be ‘+’ or ‘-’. In particular, for adduct like [M-H-H2O]-, use ‘-H3O’ or ‘-H2OH’.charge: int Charge of the molecule. Positive for cations and negative for anions.
Returns:
mz: float The m/z value of the molecule.
Examples:
formula_to_mz("C6H12O6", "+H", 1)
# 181.070665
formula_to_mz("C9H14N3O8P", "-H2OH", -1)
# 304.034010get_start_time
get_start_time(file_name)
Function to get the start time of the raw data.
Parameters:
file_name: str Absolute path of the raw data.
extract_signals_from_string
extract_signals_from_string(ms2)
Extract signals from MS2 spectrum in string format.
Parameters:
ms2: str MS2 spectrum in string format. Format: “mz1;intensity1|mz2;intensity2|…”
Returns:
peaks: numpy.array Peaks in numpy array format: [[mz1, intensity1], [mz2, intensity2], …]
convert_signals_to_string
convert_signals_to_string(signals)
Convert peaks to string format.
Parameters:
signals: numpy.array MS2 signals organized as [[mz1, intensity1], [mz2, intensity2], …]
Returns:
string: str Converted signals in string format. Format: “mz1;intensity1|mz2;intensity2|…”