FlowKit Tutorial - Part 4 - Programmatically creating a GatingStrategy using the gates Module

https://flowkit.readthedocs.io/en/latest/?badge=latest

Part 4 of the tutorial series covers the programmatic construction of a GatingStrategy. This will require using the Gate sub-classes in the gates module as well as a few supporting classes used in FlowKit.

If you have any questions about FlowKit, find any bugs, or feel something is missing from these tutorials please submit an issue to the GitHub repository here.

Table of Contents

[1]:
import copy
import bokeh
from bokeh.plotting import show
import numpy as np

import flowkit as fk

bokeh.io.output_notebook()
Loading BokehJS ...
[2]:
# check version so users can verify they have the same version/API
fk.__version__
[2]:
'1.1.0'

Dimension Class

Before we cover how gates are created in FlowKit, we’ll first need to go over a few supporting classes necessary to create them. The first of which is the Dimension class, which is a reference to a specific channel in one or more Sample instances, and specifies the preprocessing intructions for that channel. The Dimension class is modelled after the “dimension” XML element used in GatingML-2.0 and in FlowJo 10 workspaces.

Let’s look at the constructor:

Dimension(
    dimension_id,
    compensation_ref='uncompensated',
    transformation_ref=None,
    range_min=None,
    range_max=None
)

The ``dimension_id`` corresponds to a Sample PnN label of a channel. The ``compensation_ref`` is a string referencing a Matrix id. The values ‘uncompensated’ or ‘fcs’ are reserved for events in a gate that should not be compensated (‘uncompensated’) or events that should be compensated using the spillover matrix found in the FCS sample (‘fcs’). The ``transformation_ref`` is a string referencing the id of a Transform sub-class. None specifies the channel data should not be transformed.

Finally, we have the range_min and range_max arguments. These are only used for Dimension instances intended for use with a RectangleGate. It is admittedly a bit odd, but this is how the GatingML-2.0 specification and FlowJo 10 workspaces implement channel dimension references and rectangle / range gates. We’ll go over these values in more detail in the RectangleGate section.

gates Module

The gates module contains classes related to the various gate types available within FlowKit. These include:

  • RectangleGate

  • PolygonGate

  • EllipsoidGate

  • QuadrantGate (and the related Quadrant class)

  • BooleanGate

We’ll cover each of these gate types below, but first let’s load our simple diamond FCS to use for demonstration, review the channels, and plot the events. We’ll also make an empty GatingStrategy instance to add and apply our gates.

[3]:
sample = fk.Sample("../../data/simple_diamond_example/test_data_diamond_01.fcs")
[4]:
sample.channels
[4]:
channel_number pnn pns png pne pnr
0 1 channel_A 1.0 (0.0, 0.0) 262144.0
1 2 channel_B 1.0 (0.0, 0.0) 262144.0
[5]:
f = sample.plot_scatter('channel_A', 'channel_B', source='raw')
show(f)
[6]:
chan_a_idx = sample.get_channel_index('channel_A')
events_a = sample.get_channel_events(chan_a_idx, 'raw')
[7]:
events_a.shape
[7]:
(200000,)
[8]:
events_a.min(), events_a.max()
[8]:
(0.0, 99999.0)
[9]:
g_strat = fk.GatingStrategy()

RectangleGate Class

The RectangleGate represents a GatingML-2.0 rectangle gate, which covers rectangles (2-D), hyper-rectangles (>2-D), and the “range” gate (1-D). A RectangleGate must have one or more dimensions, and each dimension must specify at least one of a minimum or maximum value (or both).

In the 1-D case, a single Dimension instance must be provided and include one or both of a range_min or range_max value. If only one is given the range is open-ended, with the minimum range being inclusive of the min value, and the maximum range being exclusive of the max value. For example, if only a range_min of 1000 is provided, the gate will include events >= range_min. If only a range_max value is provided, the gate will include events < range_max.

In the 2-D case, the same rules apply. Given 2 Dimension instances, the gate can still be open-ended in one of each of the dimensions. Only if both min_range and max_range are provided in both dimensions will a true rectangular gate be defined. The same logic applies for the hyper-rectangle in 3 or more dimensions.

Let’s create an open-ended RectangleGate to inlude the top left side of the diamond

[10]:
# neither dimension needs compensation or transformation, so we'll use the defaults for those arguments
dim_a = fk.Dimension('channel_A', range_max=50000)
dim_b = fk.Dimension('channel_B', range_min=50000)
[11]:
rect_top_left_gate = fk.gates.RectangleGate('top-left', dimensions=[dim_a, dim_b])
[12]:
g_strat.add_gate(rect_top_left_gate, gate_path=('root',))

The gate has been added to our gating strategy, and we expect ~25% of the events to be included within the gate. Let’s apply the gate to our sample and look at the results.

[13]:
res = g_strat.gate_sample(sample)
[14]:
res.report
[14]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 test_data_diamond_01.fcs (root,) top-left RectangleGate None root 50001 25.0005 25.0005 1

And we have ~25%, the extra 1 event is due to the inclusive limit of the range_min value in dim_b.

PolygonGate Class

The PolygonGate represents a GatingML-2.0 polygon gate, which is a closed polygon defined in exactly 2 dimensions. The vertices of the PolygonGate are defined by a list of 2-D coordinates.

Let’s create a PolygonGate, then add it to the gating strategy under the previous RectangleGate. Then, we’ll re-process our Sample.

[15]:
vertices = [
    (25000, 75000),
    (75000, 75000),
    (75000, 100000),
    (25000, 100000)
]

poly_gate = fk.gates.PolygonGate(
    'poly1',
    dimensions=[dim_a, dim_b],
    vertices=vertices
)
[16]:
g_strat.add_gate(poly_gate, gate_path=('root', 'top-left'))
res = g_strat.gate_sample(sample)
[17]:
res.report
[17]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 test_data_diamond_01.fcs (root,) top-left RectangleGate None root 50001 25.0005 25.0005 1
1 test_data_diamond_01.fcs (root, top-left) poly1 PolygonGate None top-left 25000 12.5000 49.9990 2

EllipsoidGate Class

The EllipsoidGate represents a GatingML-2.0 ellipsoid gate, which is an ellipsoid defined in 2 or more dimensions. To define an EllipsoidGate, we must specify the ellipsoid’s mean value (center of the ellipsoid), its covariance matrix (defining the shape and orientation), and a distance square (the square of the Mahalanobis distance, defining its size)

Let’s define a 2-D ellipse around the right corner of of the diamond. We’ll place it at the root level.

[18]:
center = (100000, 50000)
cov = [[5000**2, 0], [0, 5000**2]]
dist = 1

ellipse_gate = fk.gates.EllipsoidGate(
    'ellipse1',
    dimensions=[dim_a, dim_b],
    coordinates=center,
    covariance_matrix=cov,
    distance_square=dist
)
[19]:
g_strat.add_gate(ellipse_gate, gate_path=('root',))
res = g_strat.gate_sample(sample)
[20]:
res.report
[20]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
1 test_data_diamond_01.fcs (root,) ellipse1 EllipsoidGate None root 7070 3.5350 3.5350 1
0 test_data_diamond_01.fcs (root,) top-left RectangleGate None root 50001 25.0005 25.0005 1
2 test_data_diamond_01.fcs (root, top-left) poly1 PolygonGate None top-left 25000 12.5000 49.9990 2

QuadrantGate Class

The QuadrantGate represents a GatingML-2.0 quadrant gate. Quadrant gates are different from other gate types in that they are actually a collection of gates (quadrants), though even the term quadrant is misleading as they can divide a plane into more than 4 sections. A QuadrantGate must have at least 1 divider, and must specify the label of the resulting quadrants the dividers produce.

To construct a QuadrantGate, we need to introduce two new helper classes: the QuadrantDivider and the Quadrant. We’ll see how to put it all together below to make a QuadrantGate that divides the diamond into its 4 sides.

Note: When retrieving a single Quadrant from the GatingStrategy method ``get_gate``, the owning QuadrantGate will be returned. A single quadrant isn’t technically a Gate sub-class and has no parent reference.

[21]:
# QuadrantDivider instances are similar to a Dimension, they take compensation_ref and tranformation_ref
# arguments. However, we'll use the defaults here of 'uncompensated' & None.
quad_div1 = fk.QuadrantDivider(
    'chan-a-div',
    'channel_A',
    compensation_ref='uncompensated',
    transformation_ref=None,
    values=[50000]
)
quad_div2 = fk.QuadrantDivider(
    'chan-b-div',
    'channel_B',
    compensation_ref='uncompensated',
    transformation_ref=None,
    values=[50000]
)
quad_divs = [quad_div1, quad_div2]

# the 2 dividers above will be used to divide the space into 4 quadrants
quad_1 = fk.gates.Quadrant(
    quadrant_id='chanApos-chanBpos',
    divider_refs=['chan-a-div', 'chan-b-div'],
    divider_ranges=[(50000, None), (50000, None)]
)
quad_2 = fk.gates.Quadrant(
    quadrant_id='chanApos-chanBneg',
    divider_refs=['chan-a-div', 'chan-b-div'],
    divider_ranges=[(50000, None), (None, 50000)]
)
quad_3 = fk.gates.Quadrant(
    quadrant_id='chanAneg-chanBpos',
    divider_refs=['chan-a-div', 'chan-b-div'],
    divider_ranges=[(None, 50000), (50000, None)]
)
quad_4 = fk.gates.Quadrant(
    quadrant_id='chanAneg-chanBneg',
    divider_refs=['chan-a-div', 'chan-b-div'],
    divider_ranges=[(None, 50000), (None, 50000)]
)
quadrants = [quad_1, quad_2, quad_3, quad_4]

# We can now construct our QuadrantGate
quad_gate1 = fk.gates.QuadrantGate(
    'quadgate1',
    dividers=quad_divs,
    quadrants=quadrants
)
[22]:
g_strat.add_gate(quad_gate1, gate_path=('root',))
res = g_strat.gate_sample(sample)
[23]:
res.report
[23]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
5 test_data_diamond_01.fcs (root,) chanAneg-chanBneg QuadrantGate quadgate1 root 49999 24.9995 24.9995 1
4 test_data_diamond_01.fcs (root,) chanAneg-chanBpos QuadrantGate quadgate1 root 50001 25.0005 25.0005 1
3 test_data_diamond_01.fcs (root,) chanApos-chanBneg QuadrantGate quadgate1 root 50000 25.0000 25.0000 1
2 test_data_diamond_01.fcs (root,) chanApos-chanBpos QuadrantGate quadgate1 root 50000 25.0000 25.0000 1
1 test_data_diamond_01.fcs (root,) ellipse1 EllipsoidGate None root 7070 3.5350 3.5350 1
0 test_data_diamond_01.fcs (root,) top-left RectangleGate None root 50001 25.0005 25.0005 1
6 test_data_diamond_01.fcs (root, top-left) poly1 PolygonGate None top-left 25000 12.5000 49.9990 2

BooleanGate Class

The BooleanGate represents a GatingML-2.0 Boolean gate, and performs the Boolean operations AND, OR, or NOT on one or more other gates. Note, the Boolean operation XOR is not supported in the GatingML specification but can be implemented using a combination of the supported operations.

Let’s create a BooleanGate from our ellipse and top right quadrant of the quadrant gate. The BooleanGate class takes an ID, Boolean type string (‘and’, ‘or’, or ‘not’), and a dictionary of gate references.

[24]:
gate_refs = [
    {
        'ref': 'ellipse1',
        'path': ('root',),
        'complement': False
    },
    {
        'ref': 'chanApos-chanBpos',
        'path': ('root', 'quadgate1'),
        'complement': False
    }
]

bool_gate = fk.gates.BooleanGate('ellipse_and_AposBpos', 'and', gate_refs)
g_strat.add_gate(bool_gate, ('root',))
[25]:
res = g_strat.gate_sample(sample)
[26]:
res.report
[26]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
5 test_data_diamond_01.fcs (root,) chanAneg-chanBneg QuadrantGate quadgate1 root 49999 24.9995 24.9995 1
4 test_data_diamond_01.fcs (root,) chanAneg-chanBpos QuadrantGate quadgate1 root 50001 25.0005 25.0005 1
3 test_data_diamond_01.fcs (root,) chanApos-chanBneg QuadrantGate quadgate1 root 50000 25.0000 25.0000 1
2 test_data_diamond_01.fcs (root,) chanApos-chanBpos QuadrantGate quadgate1 root 50000 25.0000 25.0000 1
1 test_data_diamond_01.fcs (root,) ellipse1 EllipsoidGate None root 7070 3.5350 3.5350 1
7 test_data_diamond_01.fcs (root,) ellipse_and_AposBpos BooleanGate None root 3535 1.7675 1.7675 1
0 test_data_diamond_01.fcs (root,) top-left RectangleGate None root 50001 25.0005 25.0005 1
6 test_data_diamond_01.fcs (root, top-left) poly1 PolygonGate None top-left 25000 12.5000 49.9990 2

Removing Gates

So far we’ve seen how to create various types of gates and add them to a GatingStrategy. Now we’ll show how to remove unwanted gates from the gate tree using the GatingStrategy.remove_gate method.

Because of the heirarchical nature of a gate tree, it is important to understand the behavior of the remove_gate method on other gates that depend on the gate to be removed. By default, descendant gates will also be removed. The keyword argument keep_children controls this behavior and the descendant gates can be kept and promoted by setting this option to True. Boolean gates are also dependent on the gates they reference. Attempting to remove a gate that is referenced in a Boolean gate will throw a GateTreeError. The Boolean gate must be removed prior to removing the referenced gate.

Let’s look at the documentation for the remove_gate method:

[27]:
help(fk.GatingStrategy.remove_gate)
Help on function remove_gate in module flowkit._models.gating_strategy:

remove_gate(self, gate_name, gate_path=None, keep_children=False)
    Remove a gate from the gating strategy. Any descendant gates will also be removed
    unless keep_children=True. In all cases, if a BooleanGate exists that references
    the gate to remove, a GateTreeError will be thrown indicating the BooleanGate
    must be removed prior to removing the gate.

    :param gate_name: text string of a gate name
    :param gate_path: complete ordered tuple of gate names for unique set of gate ancestors.
        Required if gate_name is ambiguous
    :param keep_children: Whether to keep child gates. If True, the child gates will be
        remapped to the removed gate's parent. Default is False, which will delete all
        descendant gates.
    :return: None

Using our previous example, let’s try to remove the ellipse gate. It is referenced by the Boolean gate, so we expect a GateTreeError will be thrown.

[28]:
g_strat.remove_gate('ellipse1')
---------------------------------------------------------------------------
GateTreeError                             Traceback (most recent call last)
Cell In[28], line 1
----> 1 g_strat.remove_gate('ellipse1')

File ~/git/flowkit/src/flowkit/_models/gating_strategy.py:234, in GatingStrategy.remove_gate(self, gate_name, gate_path, keep_children)
    231     s_gate = s_gate_node.gate
    233     if isinstance(s_gate, fk_gates.BooleanGate):
--> 234         raise GateTreeError("BooleanGate %s references gate %s" % (s_gate.gate_name, gate_name))
    236 # At this point we're about to modify the tree and
    237 # removing a gate nullifies any previous results,
    238 # so clear cached events
    239 self.clear_cache()

GateTreeError: BooleanGate ellipse_and_AposBpos references gate ellipse1

We must remove the Boolean gate before removing the ellipse gate:

[29]:
g_strat.remove_gate('ellipse_and_AposBpos')
g_strat.remove_gate('ellipse1')
[30]:
print(g_strat.get_gate_hierarchy(output='ascii'))
root
├── top-left
│   ╰── poly1
╰── quadgate1
    ├── chanApos-chanBpos
    ├── chanApos-chanBneg
    ├── chanAneg-chanBpos
    ╰── chanAneg-chanBneg

That worked. Now, say we want to remove the ‘top-left’ gate but keep the ‘poly1’ gate. To do this, we can set the keep_children option to True. The ‘poly1’ gate will be kept and promoted to have the parent of the gate we remove.

[31]:
g_strat.remove_gate('top-left', keep_children=True)
[32]:
print(g_strat.get_gate_hierarchy(output='ascii'))
root
├── quadgate1
│   ├── chanApos-chanBpos
│   ├── chanApos-chanBneg
│   ├── chanAneg-chanBpos
│   ╰── chanAneg-chanBneg
╰── poly1
[33]:
# and re-apply our strategy for the sample
res = g_strat.gate_sample(sample)
[34]:
res.report
[34]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
3 test_data_diamond_01.fcs (root,) chanAneg-chanBneg QuadrantGate quadgate1 root 49999 24.9995 24.9995 1
2 test_data_diamond_01.fcs (root,) chanAneg-chanBpos QuadrantGate quadgate1 root 50001 25.0005 25.0005 1
1 test_data_diamond_01.fcs (root,) chanApos-chanBneg QuadrantGate quadgate1 root 50000 25.0000 25.0000 1
0 test_data_diamond_01.fcs (root,) chanApos-chanBpos QuadrantGate quadgate1 root 50000 25.0000 25.0000 1
4 test_data_diamond_01.fcs (root,) poly1 PolygonGate None root 49999 24.9995 24.9995 1

Custom Gate Variants

Often in FCM analysis, different FCS samples may require slightly different gate boundaries for the same gate in a gate tree. The GatingStrategy class can store custom gate variants via a sample_id tag provided to the add_gate method. Let’s look at the documentation for this method.

[35]:
help(fk.GatingStrategy.add_gate)
Help on function add_gate in module flowkit._models.gating_strategy:

add_gate(self, gate, gate_path, sample_id=None)
    Add a gate to the gating strategy, see `gates` module. The gate ID and gate path
    must be unique in the gating strategy. Custom sample gates may be added by specifying
    an optional sample ID. Note, the gate & gate path must already exist prior to adding
    custom sample gates.

    :param gate: instance from a subclass of the Gate class
    :param gate_path: complete ordered tuple of gate names for unique set of gate ancestors
    :param sample_id: text string for specifying given gate as a custom Sample gate

    :return: None

To demonstrate how custom gates work we will use a set of 8-color FCS files and an existing gate strategy.

[36]:
samples = fk.load_samples("../../data/8_color_data_set/fcs_files")
[37]:
samples
[37]:
[Sample(v3.1, 101_DEN084Y5_15_E01_008_clean.fcs, 15 channels, 290172 events),
 Sample(v3.1, 101_DEN084Y5_15_E03_009_clean.fcs, 15 channels, 283969 events),
 Sample(v3.1, 101_DEN084Y5_15_E05_010_clean.fcs, 15 channels, 285290 events)]

Load gating strategy & review

We’ll load the GatingML file and print the gate tree. Then, gate our 3 samples and review the time gate.

[38]:
g_strat_8c = fk.parse_gating_xml('../../data/8_color_data_set/8_color_ICS.xml')
[39]:
print(g_strat_8c.get_gate_hierarchy(output='ascii'))
root
╰── TimeGate
    ╰── Singlets
        ╰── aAmine-
            ╰── CD3-pos
                ├── CD4-pos
                ╰── CD8-pos
[40]:
sample_008 = samples[0]
sample_009 = samples[1]
sample_010 = samples[2]

sample_008_results = g_strat_8c.gate_sample(sample_008)
sample_009_results = g_strat_8c.gate_sample(sample_009)
sample_010_results = g_strat_8c.gate_sample(sample_010)
[41]:
sample_008_timegate_memb = sample_008_results.get_gate_membership('TimeGate')
sample_009_timegate_memb = sample_009_results.get_gate_membership('TimeGate')
sample_010_timegate_memb = sample_010_results.get_gate_membership('TimeGate')

The membership arrays contain Boolean values where a True value means the event is inside the gate. Let’s see how many events are NOT inside the time gate for the 008 sample. To do this, we’ll use the unary operator to flip the bits of the membership array, turning all True values to False, and False values to True. Then, we can take advantage of summing the True values which conveniently gives us a count of the True values.

[42]:
np.sum(~sample_008_timegate_memb)
[42]:
6
[43]:
# where are these 6 events, get their index
np.argwhere(~sample_008_timegate_memb)
[43]:
array([[0],
       [1],
       [2],
       [3],
       [4],
       [5]])

Only the first 6 events are outside the time gate, that doesn’t seem right. The truth is that the time gate was drawn on a different sample. Those familiar with flow cytometry gating will know that time gates almost always need to be customized for each FCS sample.

Next, we will plot the time channel on the x-axis vs a scatter channel, look at the density. Then, we’ll extract the ‘TimeGate’ and look at how it was defined.

[44]:
p = sample_008.plot_scatter(
    'Time', 'FSC-A', source='raw', subsample=False
)
show(p)
[45]:
time_gate = g_strat_8c.get_gate('TimeGate')
[46]:
type(time_gate)
[46]:
flowkit._models.gates._gates.RectangleGate

We see the time gate is a RectangleGate. Recall from earlier in the notebook that the boundaries of a RectangleGate are defined within its dimensions.

[47]:
time_gate.dimensions
[47]:
[Dimension(id: Time), Dimension(id: FSC-A)]
[48]:
time_dim = time_gate.get_dimension('Time')
[49]:
time_dim.min, time_dim.max
[49]:
(0.029254146951133823, 2.4890625841154153)

Okay, what is going on here? Nearly all sample 008’s events are in the time gate, the time values range from around 2 to just under 70, yet the time gate bounds are between 0.02 and 2.49.

The answer is that a gate’s boundaries are defined on the processed data according to the compensation and transformation specified by the dimension. Let’s examine those.

[50]:
time_dim.compensation_ref, time_dim.transformation_ref
[50]:
('uncompensated', 'Time')
[51]:
time_xform = g_strat_8c.get_transform('Time')
[52]:
time_xform
[52]:
LinearTransform(Time, t: 72.0, a: 0.8511997311)

The gate boundary values make more sense now. The time channel is uncompensated and uses a linear transform. That transform has a top value of 72.0. This means a time event value of 72.0 would be transformed to 1.0. However, our max boundary value for the gate was around 2.49, that is 2.49 * 72.0 in the unstransformed space…way past our max time value for sample 008.

[53]:
# demonstrate the transform's top value converts to 1.0
time_xform.apply(72.0)
[53]:
1.0
[54]:
# the time dimension's max value of around 2.49 was way out there for this sample!!!
time_xform.inverse(2.4890625841154153)
[54]:
180.4799957275

We could do one of two things here: make a new Transform and Dimension instance for this sample’s time channel, or we could just use this transform to determine new boundaries for this sample. We’ll do the latter since it is a bit less code.

We can use the transform instance to translate some reasonable time gate boundaries for the 008 sample . Let’s say a good minimum is around 7.0 and a reasonable max is 65.0 in the untransformed space.

[55]:
time_xform.apply(np.array([7.0, 65.0]))
[55]:
array([0.10777036, 0.90391373])

Now that we’ve translated our new boundaries to the transformed space, we’ll copy the original time gate, get the time Dimension, and change the min/max values to our new reasonable values. Finally, we’ll add the gate variant for this sample back to the gating strategy with the sample_id option.

[56]:
time_gate_008 = copy.deepcopy(time_gate)

time_dim_008 = time_gate_008.get_dimension('Time')
time_dim_008.min = 0.10777036
time_dim_008.max = 0.90391373
[57]:
g_strat_8c.add_gate(time_gate_008, ('root',), sample_id=sample_008.id)

Now that we’ve added our custom sample gate, re-gate the sample and check the results.

[58]:
sample_008_results = g_strat_8c.gate_sample(sample_008)
sample_008_timegate_memb = sample_008_results.get_gate_membership('TimeGate')
[59]:
np.sum(~sample_008_timegate_memb)
[59]:
44839

That’s better, >44k events outside the gate. Let’s make another scatter plot of time on the x-axis and highlight these gated events.

[60]:
p = sample_008.plot_scatter(
    'Time', 'FSC-A', source='raw', subsample=False, highlight_mask=sample_008_timegate_memb
)
show(p)

Conclusion & a few words on the Session Class

The last section demonstrated how to create custom sample gates and used an example using multiple FCS files. For quick and simple analysis, the GatingStrategy class is certainly useful. However, determining gate boundaries and plotting gated events for multiple samples can become a lot to manage. The Session class was built to help manage multiple files with a gating strategy, and will be the topic of the next tutorial.

[ ]: