FlowKit Tutorial - Part 3 - The GatingStrategy & GatingResults Classes

https://flowkit.readthedocs.io/en/latest/?badge=latest

So far, we’ve seen how to load FCS files using the Sample class and perform basic pre-processing like compensation and transformation for better visualization of event data. In part 3, we will explore using FlowKit for gating Sample event data using the GatingStrategy and GatingResults classes.

If you have any questions about FlowKit, find any bugs, or feel something is missing from these tutorials please submit an issue to the GitHub repository here.

Table of Contents

[1]:
import bokeh
from bokeh.plotting import show
from IPython.display import Image

import flowkit as fk

bokeh.io.output_notebook()
Loading BokehJS ...
[2]:
# check version so users can verify they have the same version/API
fk.__version__
[2]:
'1.1.0'

GatingStrategy Class

A GatingStrategy object represents a collection of hierarchical gates along with the compensation and transformation information referenced by any gate Dimension objects (covered in Part 4 of the tutorial series). A GatingStrategy can be created from a valid GatingML document or built programmatically. Methods in the GatingStrategy class fall in to 3 main categories: adding gate-related objects, retrieving those objects, and applying the gating strategy to a Sample.

The Gate ID Concept

Quite a lot of thought has been put into the design of the GatingStrategy class to support the various ways gates are used and processed in typical FCM workflows. The most important concept to understand when interacting with a GatingStrategy instance is how gate IDs are used to reference gates and their position within the gating hierarchy.

For example, gates are sometimes “re-used” in different branches of the hierarchy, like the same quadrant gate applied to each of the CD4+ and CD8+ populations. Because of this, the name of the gate is not sufficient to fully identify it. Further, simply coupling the gate name with its parent gate name can also become problematic if the nested gates are re-used.

The GatingStrategy class solves this ambiguity by defining a gate ID as a tuple combining the gate name and the full ancestor path of gate names, similar in concept to a computer file system. However, this approach can be cumbersome for the common case where gates are not re-used. Therefore, the GatingStrategy allows for referencing gates simply by their gate name string for cases where that name is not re-used within the gate hierarchy. For ambiguous cases, referencing a gate requires the full gate ID tuple of the gate name and gate path.

We will see how this works in practice later, but for now let’s create a GatingStrategy from an existing GatingML-2.0 document.

Create a GatingStrategy from GatingML Document

[3]:
gml_path = '../../data/8_color_data_set/8_color_ICS.xml'
g_strat = fk.parse_gating_xml(gml_path)
[4]:
g_strat
[4]:
GatingStrategy(6 gates, 3 transforms, 1 compensations)

The string representation reveals this GatingStrategy has 6 gates, 3 transforms, and 1 compensation (Matrix instance).

Retrieve the Gate Hierarchy

We can retrieve the gate hierarchy in a variety of formats using the get_gate_hiearchy method. The method takes the following output options:

  • ascii: Generates a text-based representation of the gate tree, and is the most human-readable format for reviewing the hierarchy. This is the default option.

  • json: Generates a JSON representation of the gate tree, useful for programmatic parsing, especially outside of Python. When this option is used, all extra keywords are passed to json.dumps (e.g. indent=2 works to indent the output).

  • dict: Generates a Python dictionary representation of the gate tree, useful for programmatic parsing within Python.

[5]:
text = g_strat.get_gate_hierarchy(output='ascii')
[6]:
print(text)
root
╰── TimeGate
    ╰── Singlets
        ╰── aAmine-
            ╰── CD3-pos
                ├── CD4-pos
                ╰── CD8-pos
[7]:
gs_json = g_strat.get_gate_hierarchy(output='json', indent=2)
[8]:
print(gs_json)
{
  "name": "root",
  "children": [
    {
      "gate_type": "RectangleGate",
      "custom_gates": {},
      "name": "TimeGate",
      "children": [
        {
          "gate_type": "PolygonGate",
          "custom_gates": {},
          "name": "Singlets",
          "children": [
            {
              "gate_type": "PolygonGate",
              "custom_gates": {},
              "name": "aAmine-",
              "children": [
                {
                  "gate_type": "PolygonGate",
                  "custom_gates": {},
                  "name": "CD3-pos",
                  "children": [
                    {
                      "gate_type": "PolygonGate",
                      "custom_gates": {},
                      "name": "CD4-pos"
                    },
                    {
                      "gate_type": "PolygonGate",
                      "custom_gates": {},
                      "name": "CD8-pos"
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}
[9]:
gs_dict = g_strat.get_gate_hierarchy(output='dict')
[10]:
gs_dict
[10]:
{'name': 'root',
 'children': [{'gate': RectangleGate(TimeGate, dims: 2),
   'gate_type': 'RectangleGate',
   'custom_gates': {},
   'name': 'TimeGate',
   'children': [{'gate': PolygonGate(Singlets, vertices: 8),
     'gate_type': 'PolygonGate',
     'custom_gates': {},
     'name': 'Singlets',
     'children': [{'gate': PolygonGate(aAmine-, vertices: 10),
       'gate_type': 'PolygonGate',
       'custom_gates': {},
       'name': 'aAmine-',
       'children': [{'gate': PolygonGate(CD3-pos, vertices: 8),
         'gate_type': 'PolygonGate',
         'custom_gates': {},
         'name': 'CD3-pos',
         'children': [{'gate': PolygonGate(CD4-pos, vertices: 12),
           'gate_type': 'PolygonGate',
           'custom_gates': {},
           'name': 'CD4-pos'},
          {'gate': PolygonGate(CD8-pos, vertices: 6),
           'gate_type': 'PolygonGate',
           'custom_gates': {},
           'name': 'CD8-pos'}]}]}]}]}]}
[11]:
# Note that graphviz must be installed to export images
g_strat.export_gate_hierarchy_image('gs.png')
[12]:
Image('gs.png')
[12]:
../_images/notebooks_flowkit-tutorial-part03-gating-strategy-and-gating-results-classes_17_0.png

Retrieve Gate IDs

Remember, a gate ID is a tuple of the gate name and the gate path. We can retrieve all the gate IDs using get_gate_ids. There are also convenience methods to get a parent gate ID or child gate IDs from a gate ID. If the gate name of a gate ID is unambiguous, the gate path can be omitted.

[13]:
g_strat.get_gate_ids()
[13]:
[('TimeGate', ('root',)),
 ('Singlets', ('root', 'TimeGate')),
 ('aAmine-', ('root', 'TimeGate', 'Singlets')),
 ('CD3-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-')),
 ('CD4-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos')),
 ('CD8-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos'))]
[14]:
g_strat.get_parent_gate_id('CD3-pos')
[14]:
('aAmine-', ('root', 'TimeGate', 'Singlets'))
[15]:
g_strat.get_child_gate_ids('CD3-pos')
[15]:
[('CD4-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos')),
 ('CD8-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos'))]

Retrieve Gate Instances

Below we demonstrate how to retrieve a Gate instance by its gate name, which works here because the name is unambigious within this gate hierarchy.

[16]:
g_strat.get_gate('TimeGate')
[16]:
RectangleGate(TimeGate, dims: 2)

Retrieve Compensation Matrices

[17]:
g_strat.comp_matrices
[17]:
{'Acquisition-defined': Matrix(Acquisition-defined, dims: 8)}

Retrieve Transformations

[18]:
g_strat.transformations
[18]:
{'scatter-lin': LinearTransform(scatter-lin, t: 262144.0, a: 0.0),
 'logicle-default': LogicleTransform(logicle-default, t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Time': LinearTransform(Time, t: 72.0, a: 0.8511997311)}

GatingResults Class

A GatingResults instance is returned from calling the GatingStrategy gate_sample method on a Sample instance, and is never created by an end user directly. A GatingResults instance contains the results of applying the gating hierarchy on a single Sample. Let’s load a Sample and apply the previous GatingStrategy via the gate_sample method (setting verbose=True to print out each gate as it is processed).

[19]:
sample = fk.Sample("../../data/8_color_data_set/fcs_files/101_DEN084Y5_15_E01_008_clean.fcs")
[20]:
gs_results = g_strat.gate_sample(sample, verbose=True)
101_DEN084Y5_15_E01_008_clean.fcs: processing gate TimeGate
101_DEN084Y5_15_E01_008_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E01_008_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD3-pos
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD4-pos
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD8-pos
[21]:
# get the Sample ID for the GatingResults instance
gs_results.sample_id
[21]:
'101_DEN084Y5_15_E01_008_clean.fcs'

GatingResults Report

As we can see, the GatingResults class is relatively simple, and it’s main purpose is to provide a Pandas DataFrame of the results via the report attribute. The report contains a row for every gate and includes the following columns:

  • sample: the Sample ID of the processed Sample instance

  • gate_path: tuple of the gate path

  • gate_name: the name of the gate (or name of the Quadrant of a QuadrantGate)

  • gate_type: The class name of the gate (RectangleGate, PolygonGate, etc.)

  • quadrant_parent: Quadrant gates are a bit different, they are really a collection of gates. This field would contain the QuadrantGate name, and each Quadrant name would be in the gate_name field.

  • parent: the gate name of the parent gate

  • count: the absolute event count for events inside the gate

  • absolute_percent: the percentage of events inside the gate relative to the total event count in the Sample

  • relative_percent: the percentage of events inside the gate relative to the number of events in the parent gate

  • level: the depth of the gate in the gate tree relative to the root of the tree

[22]:
gs_results.report
[22]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 101_DEN084Y5_15_E01_008_clean.fcs (root,) TimeGate RectangleGate None root 290166 99.997932 99.997932 1
1 101_DEN084Y5_15_E01_008_clean.fcs (root, TimeGate) Singlets PolygonGate None TimeGate 239001 82.365287 82.366990 2
2 101_DEN084Y5_15_E01_008_clean.fcs (root, TimeGate, Singlets) aAmine- PolygonGate None Singlets 164655 56.743931 68.893017 3
3 101_DEN084Y5_15_E01_008_clean.fcs (root, TimeGate, Singlets, aAmine-) CD3-pos PolygonGate None aAmine- 133670 46.065782 81.181865 4
4 101_DEN084Y5_15_E01_008_clean.fcs (root, TimeGate, Singlets, aAmine-, CD3-pos) CD4-pos PolygonGate None CD3-pos 82484 28.425899 61.707189 5
5 101_DEN084Y5_15_E01_008_clean.fcs (root, TimeGate, Singlets, aAmine-, CD3-pos) CD8-pos PolygonGate None CD3-pos 47165 16.254153 35.284656 5

Retrieve Gate Membership

The get_gate_membership method returns a Boolean array representing which of the Sample events are inside the specified gate.

[23]:
cd3_pos_gate_membership = gs_results.get_gate_membership('CD3-pos')
cd3_pos_gate_membership.sum()
[23]:
133670

We can then use the membership array to retrieve those events from the Sample

Note: The events we extract here are not necessarily pre-processed the same as they would be given the instructions of the gate, even if using the ‘comp’ or ‘xform’ source option.

[24]:
gated_raw_events = sample.get_events(source='raw')
gated_raw_events = gated_raw_events[cd3_pos_gate_membership]
[25]:
gated_raw_events.shape
[25]:
(133670, 15)

Manually Creating Gating Strategies

In the next tutorial, we will cover how to use the gates module to manually build a GatingStrategy.

[ ]: