Single file processing by applying default parameters depending on the data
Introduction
Processing each raw LC-MS data individually can be useful for further customized analysis. In this workflow, we introduce how to process single files in batch mode using masscube. Note, the processed files will not be aligned across samples. Also, files with different ion modes and instruments are allowed in the same batch, as masscube will apply different parameters depending on the data.
How to use
Organize the data
Similar to the untargeted metabolomics workflow, the data should be organized in the following structure:
my_project
├── data
│ ├── sample1.mzML
│ ├── sample2.mzML
| └── ...
├── ...
You cannot set parameters here.
Processing
In the project folder, open a terminal and run the following command:
process-files
Output
After the processing, you will find the following files and folders in the project folder:
project/
├── data
├── single_file_output
│ ├── sample1.txt
│ ├── sample2.txt
│ └── ...
├── chromatogram
│ ├── sample1.png
│ ├── sample2.png
│ └── ...
├── ...
single_file_output
folder: a folder containing the feature table for each sample.chromatogram
folder: a folder containing the chromatogram for each sample.
Explainations of the workflow
|
|
Step 1. Single file processing
Individual files are processed for feature detection, which envolves the detection of peak apex, edges, area, and related MS/MS spectra. It also includes the determination of isotopes, charge states, adducts, and in-source fragments.
To accelerate the processing, masscube supports parallel computing for multiple files. By default, the number of threads is set to be the 80% (cpu_ratio=0.8
) of the total CPU cores so that the computer can still be used for other tasks.
To control the memory usage, the files are processed in batches. The default batch size is 100 (batch_size=100
).
Step 2. Data visualization
Users can choose to plot the base peak chromatogram (BPC) to verify possible bad injections or outliers.