utils_functions

utils_functions

Overview

The utils_functions module contains utility functions that are commonly used in the data processing and analysis of mass spectrometry data. The module includes functions for generating a sample table, getting timestamps for individual files, converting chemical formulas to m/z values, extracting signals from MS2 spectrum in string format, and converting signals to string format.

Functions

generate_sample_table

generate_sample_table(path=None, output=True)

Generate a sample table from the mzML or mzXML files in the specified path. The stucture of the path should be:

path
├── data
│   ├── sample1.mzml
│   ├── sample2.mzml
│   └── ...
└── ...

Parameters:

  • path : str Path to the main directory that contains a subdirectory ‘data’ with mzML or mzXML files.
  • output : bool If True, output the sample table to a csv file.

Return:

  • sample_table : pandas DataFrame A DataFrame with two columns: ‘Sample’ and ‘Groups’.

Output:

  • sample_table.csv : csv file A csv file with two columns: ‘Sample’ and ‘Groups’ in the specified path.

get_timestamps

get_timestamps(path=None, output=True)

Get timestamps for individual files and sort the files by time. The stucture of the path should be:

path
├── data
│   ├── sample1.mzml
│   ├── sample2.mzml
│   └── ...
└── ...

Parameters:

  • path : str Path to the main directory that contains a subdirectory ‘data’ with mzML or mzXML files.
  • output : bool If True, output the timestamps to a txt file with two columns: ‘file_name’ and ‘aquisition_time’.

Return:

  • file_times : list A list of tuples with two elements: ‘file_name’ and ‘aquisition_time’.

Output:

  • timestamps.txt : txt file A txt file with two columns: ‘file_name’ and ‘aquisition_time’ in the specified path.

formula_to_mz

formula_to_mz(formula, adduct, charge)

Calculate the m/z value of a molecule given its chemical formula, adduct and charge.

Parameters:

  • formula : str Chemical formula of the molecule.
  • adduct : str Adduct of the molecule. The first character should be ‘+’ or ‘-’. In particular, for adduct like [M-H-H2O]-, use ‘-H3O’ or ‘-H2OH’.
  • charge : int Charge of the molecule. Positive for cations and negative for anions.

Returns:

  • mz : float The m/z value of the molecule.

Examples:

formula_to_mz("C6H12O6", "+H", 1)
# 181.070665

formula_to_mz("C9H14N3O8P", "-H2OH", -1)
# 304.034010

get_start_time

get_start_time(file_name)

Function to get the start time of the raw data.

Parameters:

  • file_name : str Absolute path of the raw data.

extract_signals_from_string

extract_signals_from_string(ms2)

Extract signals from MS2 spectrum in string format.

Parameters:

  • ms2 : str MS2 spectrum in string format. Format: “mz1;intensity1|mz2;intensity2|…”

Returns:

  • peaks : numpy.array Peaks in numpy array format: [[mz1, intensity1], [mz2, intensity2], …]

convert_signals_to_string

convert_signals_to_string(signals)

Convert peaks to string format.

Parameters:

  • signals : numpy.array MS2 signals organized as [[mz1, intensity1], [mz2, intensity2], …]

Returns:

  • string : str Converted signals in string format. Format: “mz1;intensity1|mz2;intensity2|…”