ProcessData is a Python class designed to process data from MRXS, NDPI, CZI and BIF files, merge it with inventory data, and calculate immunopositivity statistics.
General Information
From this point on, the tutorial will refer to Python codes. To run the files and produce the positivity rates for your data and if you are not familiar with Python or any terminal prompt, please refer first of all to the section of Git Bash. Subsequently, follow the rest of the steps of this tutorial. If your computer system is Linux, then a normal Terminal prompt should work.
Step by Step Instructions:
Getting Started:
Create a ProjectFolder on your system:
You should manually create an empty folder named ProjectFolder for the new steps.
Configure the ProjectFolder:
The internal structure of the folder should be like this:
You can call the ProjectFolder accordingly.
The Data_output folder is a new empty folderyou create manually. The data files generated by the Python files in the next steps will automatically be stored in this folder and can be opened after the processing.
The Results folder contains all the output files from QuPaht in a .txt format. Those files are generated as output files by QuPath. They contain the PositiveCell and NegativeCell classification which is now used to calculate the positivity rate.
As the ID_Slidescanner must be linked with the ID_Sample and the Antibody, an Inventory file connecting those needs to be manually generated. The file will link the two IDs. We generated this by opening the slides with CaseViewer, NDPI.view2 or ZEISS ZEN lite, depending on your slide scanner, showing the "barcode" to identify the ID_Sample and connect it with the ID-Slidescanner.
For the ProcessingScan Package, the names must be set to ID_Sample, Antibody,and ID_Slidescanner. A simple Exel sheet for one antibody or multi-sheet Excel workbook containing information on all the samples and antibodies, if multiple antibodies are analysed, is expected.
ID_Slidescanner
The ID_Slidescanner is the ID given to a sample automatically by the process of slide scanning. In our case, those IDs are e.g. slides-2023-06-02T09-24-14-R1-S1. Performing the analysis with the ID_slidescanner allows us to do the analysis blinded without bias.
ID_Sample
The ID_Sample is the Sample number.
In our lab, we mainly work with human samples. They are saved in a database containing information about the donor such as age, gender, etc., and are labeled with numbers e.g. 435.
The ID_Sample however does not have to be a number but can also be e.g. Rat5_TreatmentX_WeekY.
By not working with the ID_Sample directly but only linking it up at the end, the risk of unconscious bias is minimized.
If you analysed multiple antibodies within the same slides you will be able to differentiate them within the annotations in the output file. Therefore within the Antibody column of the Excel file write "antibody".
CaeViewer used to link ID_Slidescanner with ID_Sample
If you already changed the name of the ID_Slidescanner at the beginning of the project just use the same name for the ID_Sample and ID_Slidescanner and add the Antibody to the Excel sheet.
Excel spreadsheet linking the ID_Sample with the ID_Slidescanner
The names of the directory in the example are consistent with the ones used in QuPath, so if you want to change them consider it also in the previous QuPath step.
There should be no spaces in your directories or file names!
Set the system for the analysis:
Open Git Bash:
To open Git Bashsearch for it on your computer Menu and double-click on the icon to open. A standard prompt window will open.
First time using a terminal window prompt?
Basic legend of the Git Bash prompt
The navigation within the GitBash interface does not work with the mouse you need to use the arrow keys on the keyboard.
Open the project folder:
You can open the ProjectFolder from the terminal using the command cd.
cd Path/To/ProjectFolder
How can you retrieve the path of your folder?
Windows:
To retrieve the path of your folder, you can right-click on the folder on it once you select it. In address bar at the top, the full path will be displayed. You can copy this path by right-clicking and selecting "Copy address" or by simply highlighting the path and copying it (Ctrl+C).
Example of the address bar
It is possible to retrieve the path navigating to the folder for which you want the path, right-click on the folder and select "Properties".
In the "Properties" window, go to the "General" tab. The "Location" field displays the path. You can copy it by selecting the text and copying (Ctrl+C).
MacOs:
You can retrieve the path using Finder. Navigate to the folder for which you want the path, right-click (or Control-click) on the folder while holding down the Option key. Choose "Copy (folder name) as Pathname." The path is now copied to your clipboard.
You can also retrieve it through Get Info. Navigate to the folder for which you want the path, right-click (or Control-click) on the folder and select "Get Info" or press Command + I. n the Get Info window, the "Where" section shows the path. You can copy it by selecting the text and copying (Command+C).
Bash is a Linux-based system so the direction of \ needs to be changed to /:
C:/Users/andre/OneDrive/Desktop/ProjectFolder
Download the Process_scan class:
Download this repository or clone it to your local machine/laptop by clicking on the Download ZIP option coming out once you click on the Code button:
Dowload the ZIP and move it to the folder of your project
Move the directory to the ProjectFolder:
The downloaded directory, process_scan-main.zip, located in the Downloads folder, should be moved into the ProjectFolder. You can do it simply by moving the directory most conveniently for you and your computer.
Everything installed?
Have you installed the required packages?
No? Then write this command line in your GitBash/Terminal.
To access and use the ProcessScam Python Class via the command line, you have to re-open the Git Bash terminal extract the directory, and open the directory using the cd command
unzip process_scan-main
cd process_scan
cd process_scan-main
Launch the ProcessScan Classification from the terminal
From the same Git Bash terminal, the Python script is going to be launched. If Python is not still installed in your system, you can follow the instructions in the Python Installation tutorial from the same Git Bash terminal. The command you can paste directly into the terminal prompt is:
Modify the command accordingly with your paths. The last thing is to type in after the paths the extension for the final spreadsheets, you can choose between xlsx or csv.
Don't know how to construct the command in the terminal prompt?
If you are not familiar with coding it might be easiest if you use a Word document to prepare the relative paths and copying it in the Git Bash terminal then. The paths you will need are:
path/to/your/Results
path/to/your/Inventory.xlsx
path/to/your/Data_output
Remember that \ needs to be changed to / in Windows!
Remember that no spaces should be in the names of the folders or files!
A great practice is to substitute the spaces with underscores_
Once you have retrieved the paths of the subfolders in the ProjectFolder, you can put them all together in a new line with one space in between them:
In the above command, the workflow_template.py is the command to be run,already in the process_mrxs directory, so to complete the command you copy the above line with paths using (Ctrl+C).
Back to the Git Bash terminal prompt, type in:
python workflow_template.py
Move the cursor to the command line and fill in the line pressing Shift + Insert to paste the copied text.
The last thing is to type in after the paths the extension for the final spreadsheets, you can choose between xlsx or csv.
If you get an error message check that there is only a single space between the different paths. Check that your spelling is correct and that you have everything correctly in the ProjectFolder.
Some error messages and how to solve them
KeyError: "['ID_Sample'] not in index"
-> Check your Inventory file, you have a spelling mistake
usage: workflow_template.py [-h] directory_path inventory_file output_path output_extension workflow_template.py: error: the following arguments are required: output_extension
-> You forgot the xlsx after python workflow_template.py path/to/your/Results path/to/your/Inventory.xlsx path/to/your/directory/to/Data_output xlsx
main\process_mrxs_data.py", line 7, in from scipy.stats import pearsonr, spearmanr, kendalltau ModuleNotFoundError: No module named 'scipy'
-> 'scipy' needs to be reinstalled, write code pip install scipy this is the same for all the modules that can't be found
Once the code has been successfully run, you can find spreadsheets in your Data_output folder containing the positivity rate for each antibody and region, as well as basic heatmaps and scatterplots.
If you have multiple antibodies or just want to link the antibody with the donor/sample information use our merger to create a sheet containing all the information linked by the ID_Sample