FlowKit Tutorial - Part 5 - The Session Class
https://flowkit.readthedocs.io/en/latest/?badge=latest
The Session
class combines multiple Sample
instances with a GatingStrategy
to manage gates for the collection of FCS files. It supports custom gates so there can be a common template gate for a particular node in the gate tree, but that node can also be customized for a particular Sample
. GatingResults
are also retained allowing easier reporting and plotting of gated event data.
If you have any questions about FlowKit, find any bugs, or feel something is missing from these tutorials please submit an issue to the GitHub repository here.
Table of Contents
[1]:
import os
import bokeh
from bokeh.plotting import show
import flowkit as fk
bokeh.io.output_notebook()
[2]:
# check version so users can verify they have the same version/API
fk.__version__
[2]:
'1.1.0'
Session Class
The Session class is intended as the main interface in FlowKit for complex flow cytometry analysis. A Session allows creating a gating strategy for a collection of FCS samples. The Session
class also supports importing a GatingML-2.0 document to serve as the gating strategy.
The gates in a Session’s gating strategy can be shared across samples (common gates) or customized per sample. Unlike the GatingStrategy class, which does not retain any Sample instances, the Session class will store the Sample instances that have been loaded. This is also true for the GatingResults
data after applying the gating strategy to loaded samples.
Let’s have a look at the constructor:
Session(
gating_strategy=None,
fcs_samples=None
)
The gating_strategy
argument may be a GatingStrategy
instance or a file path to a GatingML 2.0 compliant document. If None, then an empty GatingStrategy
will be created for use in the Session.
The argument fcs_samples
may be a Sample
instance, string or a list. If given a string, it can be a directory path or a file path. If a directory, any .fcs files in the directory will be loaded. If a list, then it must be a list of file paths or a list of Sample instances. Lists of mixed types are not supported.
Many of the methods in the Session
class are similar to those found in the GatingStrategy
class, with the addition of an extra methods for managing the loaded Sample
instances, retrieving gated event data, and plotting gated events.
Creating a Session
Let’s create a Session starting with a GatingML document and some FCS files.
[3]:
# setup some file paths for our data
base_dir = "../../data/8_color_data_set"
fcs_dir = os.path.join(base_dir, "fcs_files")
gml_path = os.path.join(base_dir, "8_color_ICS.xml")
[4]:
# Create a Session with the path to a GatingML document & directory containing our FCS files
session = fk.Session(gating_strategy=gml_path, fcs_samples=fcs_dir)
New samples can also be added after the Session is created using the add_samples
method. The directory we provided above has all the FCS files we need, let’s get a list of their IDs.
[5]:
sample_list = session.get_sample_ids()
[6]:
sample_list
[6]:
['101_DEN084Y5_15_E01_008_clean.fcs',
'101_DEN084Y5_15_E03_009_clean.fcs',
'101_DEN084Y5_15_E05_010_clean.fcs']
Retrieving Gate Components
The gate tree in a Session works nearly identically to the GatingStrategy class. Gates can be added and removed, with support for custom sample gates. We can get the gate tree as text, get the gate IDs, compensation matrices, and transforms as well.
[7]:
# review the gating hierarchy that was in the GatingML document
print(session.get_gate_hierarchy())
root
╰── TimeGate
╰── Singlets
╰── aAmine-
╰── CD3-pos
├── CD4-pos
╰── CD8-pos
[8]:
# get the list of gate IDs (a gate ID is a tuple of a gate name and a gate path)
session.get_gate_ids()
[8]:
[('TimeGate', ('root',)),
('Singlets', ('root', 'TimeGate')),
('aAmine-', ('root', 'TimeGate', 'Singlets')),
('CD3-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-')),
('CD4-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos')),
('CD8-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos'))]
[9]:
# get the gate instance by gate name
session.get_gate('TimeGate')
[9]:
RectangleGate(TimeGate, dims: 2)
[10]:
# a list of compensation matrices
session.get_comp_matrices()
[10]:
[Matrix(Acquisition-defined, dims: 8)]
[11]:
# or a single comp matrix by its ID
session.get_comp_matrix('Acquisition-defined')
[11]:
Matrix(Acquisition-defined, dims: 8)
[12]:
# a list of transforms
session.get_transforms()
[12]:
[LinearTransform(scatter-lin, t: 262144.0, a: 0.0),
LogicleTransform(logicle-default, t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
LinearTransform(Time, t: 72.0, a: 0.8511997311)]
[13]:
# or a single transform by its ID
session.get_transform('logicle-default')
[13]:
LogicleTransform(logicle-default, t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0)
Analyzing Samples and Retrieving Results
The previous tutorial demonstrated using a GatingStrategy with multiple samples. That approach works for simpler gate trees and a few samples, but can become difficult to manage for more complex scenarios. The Session class stores the associated samples so you don’t have to keep track of them. It also allows processing the gate tree for all samples with a single method call, and the results are stored for all samples.
Let’s run the analysis for our loaded samples.
[14]:
# looks good, let's analyze the samples (using verbose mode to see each gate as it's processed)
session.analyze_samples(verbose=True, use_mp=False)
#### Processing gates for 3 samples (multiprocessing is disabled) ####
101_DEN084Y5_15_E01_008_clean.fcs: processing gate TimeGate
101_DEN084Y5_15_E01_008_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E01_008_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD3-pos
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD4-pos
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD8-pos
101_DEN084Y5_15_E03_009_clean.fcs: processing gate TimeGate
101_DEN084Y5_15_E03_009_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E03_009_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E03_009_clean.fcs: processing gate CD3-pos
101_DEN084Y5_15_E03_009_clean.fcs: processing gate CD4-pos
101_DEN084Y5_15_E03_009_clean.fcs: processing gate CD8-pos
101_DEN084Y5_15_E05_010_clean.fcs: processing gate TimeGate
101_DEN084Y5_15_E05_010_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E05_010_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E05_010_clean.fcs: processing gate CD3-pos
101_DEN084Y5_15_E05_010_clean.fcs: processing gate CD4-pos
101_DEN084Y5_15_E05_010_clean.fcs: processing gate CD8-pos
The Session class can process multiple samples in parallel. The number of CPU cores used depends on the number of cores available, the number of samples to be run, and the size of the event arrays in the samples. After core-count, the next limiting factor in processing multiple samples simultaneously is available memory. The Session class attempts to estimate the required memory and limit the number of parallel analyses accordingly. If you encounter memory errors when analyzing multiple samples, try setting ``use_mp`` to False.
There are also scenarios where you may not want to analyze all samples at once. The analyze_samples
method takes an optional sample_id
keyword argument for analyzing only one sample at a time. Let’s read the documentation for more information:
[15]:
help(session.analyze_samples)
Help on method analyze_samples in module flowkit._models.session:
analyze_samples(sample_id=None, cache_events=False, use_mp=True, verbose=False) method of flowkit._models.session.Session instance
Process gating strategy for samples. After running, results can be retrieved
using the `get_gating_results`, `get_report`, and `get_gate_membership`,
methods.
:param sample_id: optional sample ID, if specified only this sample will be processed
:param cache_events: Whether to cache pre-processed events (compensated and transformed). This can
be useful to speed up processing of gates that share the same pre-processing instructions for
the same channel data, but can consume significantly more memory space. See the related
clear_cache method for additional information. Default is False.
:param use_mp: Controls whether multiprocessing is used to gate samples (default is True).
Multiprocessing can fail for large workloads (lots of samples & gates) due to running out of
memory. If encountering memory errors, set use_mp to False (processing will take longer,
but will use significantly less memory).
:param verbose: if True, print a line for every gate processed (default is False)
:return: None
The Session class retains data from the analysis, allowing convenient access to results. Next, we’ll get the report of results for all samples. The report is a Pandas DataFrame, making it easy to filter. Let’s look at the time gate for all samples.
[16]:
# and a look at the report of the results
results_report = session.get_analysis_report()
results_report[results_report['gate_name'] == 'TimeGate']
[16]:
sample | gate_path | gate_name | gate_type | quadrant_parent | parent | count | absolute_percent | relative_percent | level | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 101_DEN084Y5_15_E01_008_clean.fcs | (root,) | TimeGate | RectangleGate | None | root | 290166 | 99.997932 | 99.997932 | 1 |
6 | 101_DEN084Y5_15_E03_009_clean.fcs | (root,) | TimeGate | RectangleGate | None | root | 283968 | 99.999648 | 99.999648 | 1 |
12 | 101_DEN084Y5_15_E05_010_clean.fcs | (root,) | TimeGate | RectangleGate | None | root | 284846 | 99.844369 | 99.844369 | 1 |
We can see the less than optimal time gate we discovered in the last tutorial. We can fix that the same way using custom sample gates. But let’s continue looking at the various ways to get gating results. We can retrieve the GatingResults instance for a single sample and get its report.
[17]:
sample_id = '101_DEN084Y5_15_E01_008_clean.fcs'
sample_results = session.get_gating_results(sample_id)
sample_results.report
[17]:
sample | gate_path | gate_name | gate_type | quadrant_parent | parent | count | absolute_percent | relative_percent | level | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 101_DEN084Y5_15_E01_008_clean.fcs | (root,) | TimeGate | RectangleGate | None | root | 290166 | 99.997932 | 99.997932 | 1 |
1 | 101_DEN084Y5_15_E01_008_clean.fcs | (root, TimeGate) | Singlets | PolygonGate | None | TimeGate | 239001 | 82.365287 | 82.366990 | 2 |
2 | 101_DEN084Y5_15_E01_008_clean.fcs | (root, TimeGate, Singlets) | aAmine- | PolygonGate | None | Singlets | 164655 | 56.743931 | 68.893017 | 3 |
3 | 101_DEN084Y5_15_E01_008_clean.fcs | (root, TimeGate, Singlets, aAmine-) | CD3-pos | PolygonGate | None | aAmine- | 133670 | 46.065782 | 81.181865 | 4 |
4 | 101_DEN084Y5_15_E01_008_clean.fcs | (root, TimeGate, Singlets, aAmine-, CD3-pos) | CD4-pos | PolygonGate | None | CD3-pos | 82484 | 28.425899 | 61.707189 | 5 |
5 | 101_DEN084Y5_15_E01_008_clean.fcs | (root, TimeGate, Singlets, aAmine-, CD3-pos) | CD8-pos | PolygonGate | None | CD3-pos | 47165 | 16.254153 | 35.284656 | 5 |
Extract Gated Event Data
Gated event information is available in 2 forms: as gated event arrays (pandas DataFrame) or as Boolean arrays of gate membership. The method get_gate_events
retrieves only the sample events that are within a specified gate. The original indices of the sample events are preserved. The get_gate_membership
method returns 1-D Boolean array indicating which sample events are inside the gate. For either of these methods, if the gate name is ambigious, the gate path must also be given.
Note: The events returned by
get_gate_events
in the Session class are not pre-processed by default. This is because all the channels (dimensions) are not guaranteed to be covered by a gate. Further, even if all channels were used in the gate tree, it is possible (though not common) for a channel to use different transforms in different gates of the same tree.
[18]:
# get gate events as a pandas DataFrame
cd3_pos_events = session.get_gate_events(sample_id=sample_id, gate_name='CD3-pos')
cd3_pos_events.head(5)
[18]:
pnn | FSC-A | FSC-H | FSC-W | SSC-A | SSC-H | SSC-W | TNFa FITC FLR-A | CD8 PerCP-Cy55 FLR-A | IL2 BV421 FLR-A | Aqua Amine FLR-A | IFNg APC FLR-A | CD3 APC-H7 FLR-A | CD107a PE FLR-A | CD4 PE-Cy7 FLR-A | Time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pns | |||||||||||||||
6 | 165875.515625 | 136158.0 | 79839.726562 | 30412.320312 | 29198.0 | 68261.585938 | 146.880005 | 68.339996 | 145.080002 | 126.480003 | 61.380001 | 1475.099976 | 393.800018 | 3631.100098 | 1.280 |
9 | 108877.015625 | 86248.0 | 82730.781250 | 52511.640625 | 47880.0 | 71875.578125 | 150.959991 | 678.299988 | 512.119995 | 163.680008 | 275.220001 | 2758.140137 | 700.700012 | 5526.399902 | 1.287 |
10 | 111956.429688 | 86024.0 | 85292.203125 | 77629.140625 | 68860.0 | 73881.835938 | 181.559998 | 499.799988 | 489.800018 | 189.720001 | 236.610001 | 2061.179932 | 542.299988 | 6578.000000 | 1.288 |
11 | 183806.140625 | 150217.0 | 80190.125000 | 42267.777344 | 39405.0 | 70297.203125 | 180.539993 | 4299.299805 | 64.480003 | 137.639999 | 154.440002 | 1152.359985 | 358.600006 | 803.000000 | 1.289 |
14 | 184054.203125 | 154340.0 | 78153.273438 | 28808.878906 | 28292.0 | 66733.304688 | 83.639999 | 233.580002 | 97.959999 | 76.879997 | 120.779999 | 1089.000000 | 321.200012 | 2814.900146 | 1.293 |
[19]:
# retrieve the gate membership as a boolean array (True value means the event is in the gate)
session.get_gate_membership(sample_id=sample_id, gate_name='Singlets')
[19]:
array([False, False, False, ..., False, False, True])
[20]:
# Here we'll collect the gate membership arrays for all gates for a sample
results = {}
for gate_name, gate_path in session.get_gate_ids():
result = session.get_gate_membership(
sample_id=sample_id,
gate_name=gate_name,
gate_path=gate_path
)
results[(gate_name, gate_path)] = result
[21]:
list(results.keys())
[21]:
[('TimeGate', ('root',)),
('Singlets', ('root', 'TimeGate')),
('aAmine-', ('root', 'TimeGate', 'Singlets')),
('CD3-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-')),
('CD4-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos')),
('CD8-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos'))]
[22]:
results[('aAmine-', ('root', 'TimeGate', 'Singlets'))]
[22]:
array([False, False, False, ..., False, False, True])
Plot Gated Events
The Session class takes advantage of the fact that it retains the samples and their gating results to provide better plotting tools than the GatingStrategy. Gates and their events can be plotted using the plot_gate
method. This method will display the events from the parent gate along with the gate boundaries.
[23]:
# plot the time gates for all samples
for sample_id in session.get_sample_ids():
p = session.plot_gate(
sample_id,
'TimeGate',
x_min=0,
x_max=2.8,
y_min=0,
y_max=1.2
)
p.aspect_ratio = 2.0
show(p)
There is also the plot_scatter
method that creates a scatter plot of a gated population in any specified dimensions. To use it, we need to create Dimension
instances, which provide the channel label for each axis as well as the compensation and transformation references to use for displaying the channel’s events.
To demonstrate, we’ll pick a sample and plot 2 cytokine channels from the CD8+ events. Let’s review the channel labels and then create the dimensions.
[24]:
sample_id = '101_DEN084Y5_15_E01_008_clean.fcs'
sample = session.get_sample(sample_id)
sample.channels
[24]:
channel_number | pnn | pns | png | pne | pnr | |
---|---|---|---|---|---|---|
0 | 1 | FSC-A | 1.0 | (0.0, 0.0) | 262144.0 | |
1 | 2 | FSC-H | 1.0 | (0.0, 0.0) | 262144.0 | |
2 | 3 | FSC-W | 1.0 | (0.0, 0.0) | 262144.0 | |
3 | 4 | SSC-A | 1.0 | (0.0, 0.0) | 262144.0 | |
4 | 5 | SSC-H | 1.0 | (0.0, 0.0) | 262144.0 | |
5 | 6 | SSC-W | 1.0 | (0.0, 0.0) | 262144.0 | |
6 | 7 | TNFa FITC FLR-A | 1.0 | (0.0, 0.0) | 262144.0 | |
7 | 8 | CD8 PerCP-Cy55 FLR-A | 1.0 | (0.0, 0.0) | 262144.0 | |
8 | 9 | IL2 BV421 FLR-A | 1.0 | (0.0, 0.0) | 262144.0 | |
9 | 10 | Aqua Amine FLR-A | 1.0 | (0.0, 0.0) | 262144.0 | |
10 | 11 | IFNg APC FLR-A | 1.0 | (0.0, 0.0) | 262144.0 | |
11 | 12 | CD3 APC-H7 FLR-A | 1.0 | (0.0, 0.0) | 262144.0 | |
12 | 13 | CD107a PE FLR-A | 1.0 | (0.0, 0.0) | 262144.0 | |
13 | 14 | CD4 PE-Cy7 FLR-A | 1.0 | (0.0, 0.0) | 262144.0 | |
14 | 15 | Time | 1.0 | (0.0, 0.0) | 262144.0 |
[25]:
x_label = 'IL2 BV421 FLR-A'
y_label = 'CD107a PE FLR-A'
# We know from a previous section there is only 1 comp matrix
# and I want both to use the Logicle transform
x_dim = fk.Dimension(
x_label, compensation_ref='Acquisition-defined', transformation_ref='logicle-default'
)
y_dim = fk.Dimension(
y_label, compensation_ref='Acquisition-defined', transformation_ref='logicle-default'
)
[26]:
p = session.plot_scatter(
sample_id,
x_dim,
y_dim,
gate_name='CD8-pos',
subsample_count=5e5,
x_min=0,
x_max=1,
y_min=0,
y_max=1
)
show(p)
Or, we could take the gate membership array from our session and use it to highlight events using the Sample plot_scatter
method. This will highlight the CD8+ events in the context of the non-gated events.
[27]:
cd8_pos_memb = session.get_gate_membership(sample_id, 'CD8-pos')
[28]:
sample.apply_compensation(session.get_comp_matrix('Acquisition-defined'))
sample.apply_transform(session.get_transform('logicle-default'))
sample.subsample_events(5e5)
p = sample.plot_scatter(x_label, y_label, highlight_mask=cd8_pos_memb, x_min=0, x_max=1, y_min=0, y_max=1)
show(p)