💡Processing Package Tutorial

ProcessData is a Python class designed to process data from MRXS, NDPI, CZI and BIF files, merge it with inventory data, and calculate immunopositivity statistics.

General Information

From this point on, the tutorial will refer to Python codes. To run the files and produce the positivity rates for your data and if you are not familiar with Python or any terminal prompt, please refer first of all to the section of Git Bash. Subsequently, follow the rest of the steps of this tutorial. If your computer system is Linux, then a normal Terminal prompt should work.

Step by Step Instructions:

Getting Started:

Create a ProjectFolder on your system:
You should manually create an empty folder named ProjectFolder for the new steps.
Configure the ProjectFolder:
The internal structure of the folder should be like this:

The Data_output folder is a new empty folder you create manually. The data files generated by the Python files in the next steps will automatically be stored in this folder and can be opened after the processing.

There should be no spaces in your directories or file names!

Set the system for the analysis:

Open Git Bash:
To open Git Bash search for it on your computer Menu and double-click on the icon to open. A standard prompt window will open.

First time using a terminal window prompt?

The navigation within the GitBash interface does not work with the mouse you need to use the arrow keys on the keyboard.

Open the project folder:
You can open the ProjectFolder from the terminal using the command cd.

cd Path/To/ProjectFolder

How can you retrieve the path of your folder?

Windows:
- To retrieve the path of your folder, you can right-click on the folder on it once you select it. In address bar at the top, the full path will be displayed. You can copy this path by right-clicking and selecting "Copy address" or by simply highlighting the path and copying it (Ctrl+C).
  Example of the address bar
- It is possible to retrieve the path navigating to the folder for which you want the path, right-click on the folder and select "Properties".
  In the "Properties" window, go to the "General" tab. The "Location" field displays the path. You can copy it by selecting the text and copying (Ctrl+C).
MacOs:
- You can retrieve the path using Finder. Navigate to the folder for which you want the path, right-click (or Control-click) on the folder while holding down the Option key. Choose "Copy (folder name) as Pathname." The path is now copied to your clipboard.
- You can also retrieve it through Get Info. Navigate to the folder for which you want the path, right-click (or Control-click) on the folder and select "Get Info" or press Command + I. n the Get Info window, the "Where" section shows the path. You can copy it by selecting the text and copying (Command+C).

Bash is a Linux-based system so the direction of \ needs to be changed to /:

C:/Users/andre/OneDrive/Desktop/ProjectFolder

Download the Process_scan class:
Download this repository or clone it to your local machine/laptop by clicking on the Download ZIP option coming out once you click on the Code button:

Move the directory to the ProjectFolder:
The downloaded directory, process_scan-main.zip, located in the Downloads folder, should be moved into the ProjectFolder. You can do it simply by moving the directory most conveniently for you and your computer.

Everything installed?

Have you installed the required packages?

No? Then write this command line in your GitBash/Terminal.

pip install pandas numpy seaborn matplotlib scipy openpyxl

De-compress the ProcessScan Python folder:

To access and use the ProcessScam Python Class via the command line, you have to re-open the Git Bash terminal extract the directory, and open the directory using the cd command

unzip process_scan-main
cd process_scan
cd process_scan-main

Launch the ProcessScan Classification from the terminal

From the same Git Bash terminal, the Python script is going to be launched. If Python is not still installed in your system, you can follow the instructions in the Python Installation tutorial from the same Git Bash terminal. The command you can paste directly into the terminal prompt is:

python workflow_template.py path/to/your/Results path/to/your/Inventory.xlsx path/to/your/directory/to/Data_output xlsx

Modify the command accordingly with your paths. The last thing is to type in after the paths the extension for the final spreadsheets, you can choose between xlsx or csv.

Don't know how to construct the command in the terminal prompt?

If you are not familiar with coding it might be easiest if you use a Word document to prepare the relative paths and copying it in the Git Bash terminal then. The paths you will need are:

path/to/your/Results
path/to/your/Inventory.xlsx
path/to/your/Data_output

Remember that \ needs to be changed to / in Windows!

Remember that no spaces should be in the names of the folders or files!

A great practice is to substitute the spaces with underscores _

Once you have retrieved the paths of the subfolders in the ProjectFolder, you can put them all together in a new line with one space in between them:

path/to/your/Results path/to/your/Inventory.xlsx path/to/your/Data_output

In the above command, the workflow_template.py is the command to be run, already in the process_mrxs directory, so to complete the command you copy the above line with paths using (Ctrl+C).

Back to the Git Bash terminal prompt, type in:

python workflow_template.py

Move the cursor to the command line and fill in the line pressing Shift + Insert to paste the copied text.

The last thing is to type in after the paths the extension for the final spreadsheets, you can choose between xlsx or csv.

If you get an error message check that there is only a single space between the different paths. Check that your spelling is correct and that you have everything correctly in the ProjectFolder.

Some error messages and how to solve them

KeyError: "['ID_Sample'] not in index"

-> Check your Inventory file, you have a spelling mistake

usage: workflow_template.py [-h] directory_path inventory_file output_path output_extension workflow_template.py: error: the following arguments are required: output_extension

-> You forgot the xlsx after python workflow_template.py path/to/your/Results path/to/your/Inventory.xlsx path/to/your/directory/to/Data_output xlsx

main\process_mrxs_data.py", line 7, in from scipy.stats import pearsonr, spearmanr, kendalltau ModuleNotFoundError: No module named 'scipy'

-> 'scipy' needs to be reinstalled, write code pip install scipy this is the same for all the modules that can't be found

Once the code has been successfully run, you can find spreadsheets in your Data_output folder containing the positivity rate for each antibody and region, as well as basic heatmaps and scatterplots.

If you have multiple antibodies or just want to link the antibody with the donor/sample information use our merger to create a sheet containing all the information linked by the ID_Sample

We have a brief tutorial on how to do that here.

Limitations

Please, consider that the current development is only considered to be launched with the Prerequisites complied on an SH Operating System.

The Inventory file can be .csv or .xslx, and a multi-sheet format is expected.

When defining the output_filename, the options .csv and .xslx are implemented, you can choose based on your preference.

License

This project is licensed under the MIT License - see the LICENSE file for details to the rightful owner.

Last updated 8 months ago