FlowKit Tutorial - Part 5 - The Session Class

https://flowkit.readthedocs.io/en/latest/?badge=latest

The Session class combines multiple Sample instances with a GatingStrategy to manage gates for the collection of FCS files. It supports custom gates so there can be a common template gate for a particular node in the gate tree, but that node can also be customized for a particular Sample. GatingResults are also retained allowing easier reporting and plotting of gated event data.

If you have any questions about FlowKit, find any bugs, or feel something is missing from these tutorials please submit an issue to the GitHub repository here.

Table of Contents

[1]:
import os
import bokeh
from bokeh.plotting import show

import flowkit as fk

bokeh.io.output_notebook()
Loading BokehJS ...
[2]:
# check version so users can verify they have the same version/API
fk.__version__
[2]:
'1.1.0'

Session Class

The Session class is intended as the main interface in FlowKit for complex flow cytometry analysis. A Session allows creating a gating strategy for a collection of FCS samples. The Session class also supports importing a GatingML-2.0 document to serve as the gating strategy.

The gates in a Session’s gating strategy can be shared across samples (common gates) or customized per sample. Unlike the GatingStrategy class, which does not retain any Sample instances, the Session class will store the Sample instances that have been loaded. This is also true for the GatingResults data after applying the gating strategy to loaded samples.

Let’s have a look at the constructor:

Session(
    gating_strategy=None,
    fcs_samples=None
)

The gating_strategy argument may be a GatingStrategy instance or a file path to a GatingML 2.0 compliant document. If None, then an empty GatingStrategy will be created for use in the Session.

The argument fcs_samples may be a Sample instance, string or a list. If given a string, it can be a directory path or a file path. If a directory, any .fcs files in the directory will be loaded. If a list, then it must be a list of file paths or a list of Sample instances. Lists of mixed types are not supported.

Many of the methods in the Session class are similar to those found in the GatingStrategy class, with the addition of an extra methods for managing the loaded Sample instances, retrieving gated event data, and plotting gated events.

Creating a Session

Let’s create a Session starting with a GatingML document and some FCS files.

[3]:
# setup some file paths for our data
base_dir = "../../data/8_color_data_set"

fcs_dir = os.path.join(base_dir, "fcs_files")
gml_path = os.path.join(base_dir, "8_color_ICS.xml")
[4]:
# Create a Session with the path to a GatingML document & directory containing our FCS files
session = fk.Session(gating_strategy=gml_path, fcs_samples=fcs_dir)

New samples can also be added after the Session is created using the add_samples method. The directory we provided above has all the FCS files we need, let’s get a list of their IDs.

[5]:
sample_list = session.get_sample_ids()
[6]:
sample_list
[6]:
['101_DEN084Y5_15_E01_008_clean.fcs',
 '101_DEN084Y5_15_E03_009_clean.fcs',
 '101_DEN084Y5_15_E05_010_clean.fcs']

Retrieving Gate Components

The gate tree in a Session works nearly identically to the GatingStrategy class. Gates can be added and removed, with support for custom sample gates. We can get the gate tree as text, get the gate IDs, compensation matrices, and transforms as well.

[7]:
# review the gating hierarchy that was in the GatingML document
print(session.get_gate_hierarchy())
root
╰── TimeGate
    ╰── Singlets
        ╰── aAmine-
            ╰── CD3-pos
                ├── CD4-pos
                ╰── CD8-pos
[8]:
# get the list of gate IDs (a gate ID is a tuple of a gate name and a gate path)
session.get_gate_ids()
[8]:
[('TimeGate', ('root',)),
 ('Singlets', ('root', 'TimeGate')),
 ('aAmine-', ('root', 'TimeGate', 'Singlets')),
 ('CD3-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-')),
 ('CD4-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos')),
 ('CD8-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos'))]
[9]:
# get the gate instance by gate name
session.get_gate('TimeGate')
[9]:
RectangleGate(TimeGate, dims: 2)
[10]:
# a list of compensation matrices
session.get_comp_matrices()
[10]:
[Matrix(Acquisition-defined, dims: 8)]
[11]:
# or a single comp matrix by its ID
session.get_comp_matrix('Acquisition-defined')
[11]:
Matrix(Acquisition-defined, dims: 8)
[12]:
# a list of transforms
session.get_transforms()
[12]:
[LinearTransform(scatter-lin, t: 262144.0, a: 0.0),
 LogicleTransform(logicle-default, t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 LinearTransform(Time, t: 72.0, a: 0.8511997311)]
[13]:
# or a single transform by its ID
session.get_transform('logicle-default')
[13]:
LogicleTransform(logicle-default, t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0)

Analyzing Samples and Retrieving Results

The previous tutorial demonstrated using a GatingStrategy with multiple samples. That approach works for simpler gate trees and a few samples, but can become difficult to manage for more complex scenarios. The Session class stores the associated samples so you don’t have to keep track of them. It also allows processing the gate tree for all samples with a single method call, and the results are stored for all samples.

Let’s run the analysis for our loaded samples.

[14]:
# looks good, let's analyze the samples (using verbose mode to see each gate as it's processed)
session.analyze_samples(verbose=True, use_mp=False)
#### Processing gates for 3 samples (multiprocessing is disabled) ####
101_DEN084Y5_15_E01_008_clean.fcs: processing gate TimeGate
101_DEN084Y5_15_E01_008_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E01_008_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD3-pos
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD4-pos
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD8-pos
101_DEN084Y5_15_E03_009_clean.fcs: processing gate TimeGate
101_DEN084Y5_15_E03_009_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E03_009_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E03_009_clean.fcs: processing gate CD3-pos
101_DEN084Y5_15_E03_009_clean.fcs: processing gate CD4-pos
101_DEN084Y5_15_E03_009_clean.fcs: processing gate CD8-pos
101_DEN084Y5_15_E05_010_clean.fcs: processing gate TimeGate
101_DEN084Y5_15_E05_010_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E05_010_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E05_010_clean.fcs: processing gate CD3-pos
101_DEN084Y5_15_E05_010_clean.fcs: processing gate CD4-pos
101_DEN084Y5_15_E05_010_clean.fcs: processing gate CD8-pos

The Session class can process multiple samples in parallel. The number of CPU cores used depends on the number of cores available, the number of samples to be run, and the size of the event arrays in the samples. After core-count, the next limiting factor in processing multiple samples simultaneously is available memory. The Session class attempts to estimate the required memory and limit the number of parallel analyses accordingly. If you encounter memory errors when analyzing multiple samples, try setting ``use_mp`` to False.

There are also scenarios where you may not want to analyze all samples at once. The analyze_samples method takes an optional sample_id keyword argument for analyzing only one sample at a time. Let’s read the documentation for more information:

[15]:
help(session.analyze_samples)
Help on method analyze_samples in module flowkit._models.session:

analyze_samples(sample_id=None, cache_events=False, use_mp=True, verbose=False) method of flowkit._models.session.Session instance
    Process gating strategy for samples. After running, results can be retrieved
    using the `get_gating_results`, `get_report`, and  `get_gate_membership`,
    methods.

    :param sample_id: optional sample ID, if specified only this sample will be processed
    :param cache_events: Whether to cache pre-processed events (compensated and transformed). This can
        be useful to speed up processing of gates that share the same pre-processing instructions for
        the same channel data, but can consume significantly more memory space. See the related
        clear_cache method for additional information. Default is False.
    :param use_mp: Controls whether multiprocessing is used to gate samples (default is True).
        Multiprocessing can fail for large workloads (lots of samples & gates) due to running out of
        memory. If encountering memory errors, set use_mp to False (processing will take longer,
        but will use significantly less memory).
    :param verbose: if True, print a line for every gate processed (default is False)
    :return: None

The Session class retains data from the analysis, allowing convenient access to results. Next, we’ll get the report of results for all samples. The report is a Pandas DataFrame, making it easy to filter. Let’s look at the time gate for all samples.

[16]:
# and a look at the report of the results
results_report = session.get_analysis_report()
results_report[results_report['gate_name'] == 'TimeGate']
[16]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 101_DEN084Y5_15_E01_008_clean.fcs (root,) TimeGate RectangleGate None root 290166 99.997932 99.997932 1
6 101_DEN084Y5_15_E03_009_clean.fcs (root,) TimeGate RectangleGate None root 283968 99.999648 99.999648 1
12 101_DEN084Y5_15_E05_010_clean.fcs (root,) TimeGate RectangleGate None root 284846 99.844369 99.844369 1

We can see the less than optimal time gate we discovered in the last tutorial. We can fix that the same way using custom sample gates. But let’s continue looking at the various ways to get gating results. We can retrieve the GatingResults instance for a single sample and get its report.

[17]:
sample_id = '101_DEN084Y5_15_E01_008_clean.fcs'
sample_results = session.get_gating_results(sample_id)
sample_results.report
[17]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 101_DEN084Y5_15_E01_008_clean.fcs (root,) TimeGate RectangleGate None root 290166 99.997932 99.997932 1
1 101_DEN084Y5_15_E01_008_clean.fcs (root, TimeGate) Singlets PolygonGate None TimeGate 239001 82.365287 82.366990 2
2 101_DEN084Y5_15_E01_008_clean.fcs (root, TimeGate, Singlets) aAmine- PolygonGate None Singlets 164655 56.743931 68.893017 3
3 101_DEN084Y5_15_E01_008_clean.fcs (root, TimeGate, Singlets, aAmine-) CD3-pos PolygonGate None aAmine- 133670 46.065782 81.181865 4
4 101_DEN084Y5_15_E01_008_clean.fcs (root, TimeGate, Singlets, aAmine-, CD3-pos) CD4-pos PolygonGate None CD3-pos 82484 28.425899 61.707189 5
5 101_DEN084Y5_15_E01_008_clean.fcs (root, TimeGate, Singlets, aAmine-, CD3-pos) CD8-pos PolygonGate None CD3-pos 47165 16.254153 35.284656 5

Extract Gated Event Data

Gated event information is available in 2 forms: as gated event arrays (pandas DataFrame) or as Boolean arrays of gate membership. The method get_gate_events retrieves only the sample events that are within a specified gate. The original indices of the sample events are preserved. The get_gate_membership method returns 1-D Boolean array indicating which sample events are inside the gate. For either of these methods, if the gate name is ambigious, the gate path must also be given.

Note: The events returned by get_gate_events in the Session class are not pre-processed by default. This is because all the channels (dimensions) are not guaranteed to be covered by a gate. Further, even if all channels were used in the gate tree, it is possible (though not common) for a channel to use different transforms in different gates of the same tree.

[18]:
# get gate events as a pandas DataFrame
cd3_pos_events = session.get_gate_events(sample_id=sample_id, gate_name='CD3-pos')
cd3_pos_events.head(5)
[18]:
pnn FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W TNFa FITC FLR-A CD8 PerCP-Cy55 FLR-A IL2 BV421 FLR-A Aqua Amine FLR-A IFNg APC FLR-A CD3 APC-H7 FLR-A CD107a PE FLR-A CD4 PE-Cy7 FLR-A Time
pns
6 165875.515625 136158.0 79839.726562 30412.320312 29198.0 68261.585938 146.880005 68.339996 145.080002 126.480003 61.380001 1475.099976 393.800018 3631.100098 1.280
9 108877.015625 86248.0 82730.781250 52511.640625 47880.0 71875.578125 150.959991 678.299988 512.119995 163.680008 275.220001 2758.140137 700.700012 5526.399902 1.287
10 111956.429688 86024.0 85292.203125 77629.140625 68860.0 73881.835938 181.559998 499.799988 489.800018 189.720001 236.610001 2061.179932 542.299988 6578.000000 1.288
11 183806.140625 150217.0 80190.125000 42267.777344 39405.0 70297.203125 180.539993 4299.299805 64.480003 137.639999 154.440002 1152.359985 358.600006 803.000000 1.289
14 184054.203125 154340.0 78153.273438 28808.878906 28292.0 66733.304688 83.639999 233.580002 97.959999 76.879997 120.779999 1089.000000 321.200012 2814.900146 1.293
[19]:
# retrieve the gate membership as a boolean array (True value means the event is in the gate)
session.get_gate_membership(sample_id=sample_id, gate_name='Singlets')
[19]:
array([False, False, False, ..., False, False,  True])
[20]:
# Here we'll collect the gate membership arrays for all gates for a sample
results = {}

for gate_name, gate_path in session.get_gate_ids():
    result = session.get_gate_membership(
        sample_id=sample_id,
        gate_name=gate_name,
        gate_path=gate_path
    )
    results[(gate_name, gate_path)] = result
[21]:
list(results.keys())
[21]:
[('TimeGate', ('root',)),
 ('Singlets', ('root', 'TimeGate')),
 ('aAmine-', ('root', 'TimeGate', 'Singlets')),
 ('CD3-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-')),
 ('CD4-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos')),
 ('CD8-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos'))]
[22]:
results[('aAmine-', ('root', 'TimeGate', 'Singlets'))]
[22]:
array([False, False, False, ..., False, False,  True])

Plot Gated Events

The Session class takes advantage of the fact that it retains the samples and their gating results to provide better plotting tools than the GatingStrategy. Gates and their events can be plotted using the plot_gate method. This method will display the events from the parent gate along with the gate boundaries.

[23]:
# plot the time gates for all samples
for sample_id in session.get_sample_ids():
    p = session.plot_gate(
        sample_id,
        'TimeGate',
        x_min=0,
        x_max=2.8,
        y_min=0,
        y_max=1.2
    )
    p.aspect_ratio = 2.0
    show(p)

There is also the plot_scatter method that creates a scatter plot of a gated population in any specified dimensions. To use it, we need to create Dimension instances, which provide the channel label for each axis as well as the compensation and transformation references to use for displaying the channel’s events.

To demonstrate, we’ll pick a sample and plot 2 cytokine channels from the CD8+ events. Let’s review the channel labels and then create the dimensions.

[24]:
sample_id = '101_DEN084Y5_15_E01_008_clean.fcs'
sample = session.get_sample(sample_id)
sample.channels
[24]:
channel_number pnn pns png pne pnr
0 1 FSC-A 1.0 (0.0, 0.0) 262144.0
1 2 FSC-H 1.0 (0.0, 0.0) 262144.0
2 3 FSC-W 1.0 (0.0, 0.0) 262144.0
3 4 SSC-A 1.0 (0.0, 0.0) 262144.0
4 5 SSC-H 1.0 (0.0, 0.0) 262144.0
5 6 SSC-W 1.0 (0.0, 0.0) 262144.0
6 7 TNFa FITC FLR-A 1.0 (0.0, 0.0) 262144.0
7 8 CD8 PerCP-Cy55 FLR-A 1.0 (0.0, 0.0) 262144.0
8 9 IL2 BV421 FLR-A 1.0 (0.0, 0.0) 262144.0
9 10 Aqua Amine FLR-A 1.0 (0.0, 0.0) 262144.0
10 11 IFNg APC FLR-A 1.0 (0.0, 0.0) 262144.0
11 12 CD3 APC-H7 FLR-A 1.0 (0.0, 0.0) 262144.0
12 13 CD107a PE FLR-A 1.0 (0.0, 0.0) 262144.0
13 14 CD4 PE-Cy7 FLR-A 1.0 (0.0, 0.0) 262144.0
14 15 Time 1.0 (0.0, 0.0) 262144.0
[25]:
x_label = 'IL2 BV421 FLR-A'
y_label = 'CD107a PE FLR-A'

# We know from a previous section there is only 1 comp matrix
# and I want both to use the Logicle transform
x_dim = fk.Dimension(
    x_label, compensation_ref='Acquisition-defined', transformation_ref='logicle-default'
)
y_dim = fk.Dimension(
    y_label, compensation_ref='Acquisition-defined', transformation_ref='logicle-default'
)
[26]:
p = session.plot_scatter(
    sample_id,
    x_dim,
    y_dim,
    gate_name='CD8-pos',
    subsample_count=5e5,
    x_min=0,
    x_max=1,
    y_min=0,
    y_max=1
)
show(p)

Or, we could take the gate membership array from our session and use it to highlight events using the Sample plot_scatter method. This will highlight the CD8+ events in the context of the non-gated events.

[27]:
cd8_pos_memb = session.get_gate_membership(sample_id, 'CD8-pos')
[28]:
sample.apply_compensation(session.get_comp_matrix('Acquisition-defined'))
sample.apply_transform(session.get_transform('logicle-default'))
sample.subsample_events(5e5)
p = sample.plot_scatter(x_label, y_label, highlight_mask=cd8_pos_memb, x_min=0, x_max=1, y_min=0, y_max=1)
show(p)
[ ]: