FlowKit Tutorial - Part 3 - The GatingStrategy & GatingResults Classes
https://flowkit.readthedocs.io/en/latest/?badge=latest
So far, we’ve seen how to load FCS files using the Sample class and perform basic pre-processing like compensation and transformation for better visualization of event data. In part 3, we will explore using FlowKit for gating Sample event data using the GatingStrategy and GatingResults classes.
If you have any questions about FlowKit, find any bugs, or feel something is missing from these tutorials please submit an issue to the GitHub repository here.
Table of Contents
[1]:
import bokeh
from bokeh.plotting import show
from IPython.display import Image
import flowkit as fk
bokeh.io.output_notebook()
[2]:
# check version so users can verify they have the same version/API
fk.__version__
[2]:
'1.3.0'
GatingStrategy Class
A GatingStrategy object represents a collection of hierarchical gates along with the compensation and transformation information referenced by any gate Dimension objects (covered in Part 4 of the tutorial series). A GatingStrategy can be created from a valid GatingML document or built programmatically. Methods in the GatingStrategy class fall in to 3 main categories: adding or removing gate-related objects, retrieving those objects, and applying the gating strategy to a Sample.
This notebook demonstrates importing a GatingML gating strategy, applying it to an FCS file and covering the GatingResults class that is returned from that analysis.. Using the GatingStrategy class to programmatically create a gate hierarchy is covered in the next tutorial, where we will need to use more tools in the FlowKit package.
The Gate ID Concept
Quite a lot of thought has gone into the design of the GatingStrategy class in order to support the various ways gates are used and processed in typical FCM workflows. The most important concept to understand when interacting with a GatingStrategy instance is how gate IDs are used to reference gates and their position within the gating hierarchy.
Gates are sometimes “re-used” in different branches of the hierarchy. For example, the same quadrant gate may be applied to each of the CD4+ and CD8+ populations. Because of this, the name of the gate is not sufficient to fully identify it. Simply coupling the gate name with its parent gate name can also be ambiguous if the nested gates are re-used.
The GatingStrategy class solves this ambiguity by defining a gate ID as a tuple combining the gate name and the full path of ancestor gate names, similar in concept to a computer file system. However, this approach can be cumbersome for the common case where gates are not re-used. Therefore, referencing gates simply by their gate name string is allowed for cases where that name is not re-used within the gate hierarchy. For ambiguous cases, referencing a gate requires the full gate ID tuple of the gate name and gate path.
We will see how this works in practice later, but for now let’s create a GatingStrategy from an existing GatingML-2.0 document.
Create a GatingStrategy from GatingML Document
[3]:
gml_path = '../../data/8_color_data_set/8_color_ICS.xml'
g_strat = fk.parse_gating_xml(gml_path)
[4]:
g_strat
[4]:
GatingStrategy(6 gates, 3 transforms, 1 compensations)
The string representation reveals this GatingStrategy has 6 gates, 3 transforms, and 1 compensation (Matrix instance).
Retrieve the Gate Hierarchy
We can retrieve the gate hierarchy in a variety of formats using the get_gate_hiearchy method. The method takes the following output options:
ascii: Generates a text-based representation of the gate tree, and is the most human-readable format for reviewing the hierarchy. This is the default option.json: Generates a JSON representation of the gate tree, useful for programmatic parsing, especially outside of Python. When this option is used, all extra keywords are passed tojson.dumps(e.g.indent=2works to indent the output).dict: Generates a Python dictionary representation of the gate tree, useful for programmatic parsing within Python.
[5]:
text = g_strat.get_gate_hierarchy(output='ascii')
[6]:
print(text)
root
╰── TimeGate
╰── Singlets
╰── aAmine-
╰── CD3-pos
├── CD4-pos
╰── CD8-pos
[7]:
gs_json = g_strat.get_gate_hierarchy(output='json', indent=2)
[8]:
print(gs_json)
{
"name": "root",
"children": [
{
"gate_type": "RectangleGate",
"custom_gates": {},
"name": "TimeGate",
"children": [
{
"gate_type": "PolygonGate",
"custom_gates": {},
"name": "Singlets",
"children": [
{
"gate_type": "PolygonGate",
"custom_gates": {},
"name": "aAmine-",
"children": [
{
"gate_type": "PolygonGate",
"custom_gates": {},
"name": "CD3-pos",
"children": [
{
"gate_type": "PolygonGate",
"custom_gates": {},
"name": "CD4-pos"
},
{
"gate_type": "PolygonGate",
"custom_gates": {},
"name": "CD8-pos"
}
]
}
]
}
]
}
]
}
]
}
[9]:
gs_dict = g_strat.get_gate_hierarchy(output='dict')
[10]:
gs_dict
[10]:
{'name': 'root',
'children': [{'gate': RectangleGate(TimeGate, dims: 2),
'gate_type': 'RectangleGate',
'custom_gates': {},
'name': 'TimeGate',
'children': [{'gate': PolygonGate(Singlets, vertices: 8),
'gate_type': 'PolygonGate',
'custom_gates': {},
'name': 'Singlets',
'children': [{'gate': PolygonGate(aAmine-, vertices: 10),
'gate_type': 'PolygonGate',
'custom_gates': {},
'name': 'aAmine-',
'children': [{'gate': PolygonGate(CD3-pos, vertices: 8),
'gate_type': 'PolygonGate',
'custom_gates': {},
'name': 'CD3-pos',
'children': [{'gate': PolygonGate(CD4-pos, vertices: 12),
'gate_type': 'PolygonGate',
'custom_gates': {},
'name': 'CD4-pos'},
{'gate': PolygonGate(CD8-pos, vertices: 6),
'gate_type': 'PolygonGate',
'custom_gates': {},
'name': 'CD8-pos'}]}]}]}]}]}
Export Gate Hierarchy as Image
The gate hierarchy can also be exported as an image.
Note: The graphviz package must be installed (see https://graphviz.org/download/)
[11]:
g_strat.export_gate_hierarchy_image('gs.png')
[12]:
Image('gs.png')
[12]:
Retrieve Gate IDs
Remember, a gate ID is a tuple of the gate name and the gate path. We can retrieve all the gate IDs using get_gate_ids. There are also convenience methods to get a parent gate ID or child gate IDs from a gate ID. If the gate name of a gate ID is unambiguous, the gate path can be omitted.
[13]:
g_strat.get_gate_ids()
[13]:
[('TimeGate', ('root',)),
('Singlets', ('root', 'TimeGate')),
('aAmine-', ('root', 'TimeGate', 'Singlets')),
('CD3-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-')),
('CD4-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos')),
('CD8-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos'))]
[14]:
g_strat.get_parent_gate_id('CD3-pos')
[14]:
('aAmine-', ('root', 'TimeGate', 'Singlets'))
[15]:
g_strat.get_child_gate_ids('CD3-pos')
[15]:
[('CD4-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos')),
('CD8-pos', ('root', 'TimeGate', 'Singlets', 'aAmine-', 'CD3-pos'))]
There are a few other convenience methods for programatically traversing the gate tree to retrieve gate IDs.
Retrieve Gate Instances
Below we demonstrate how to retrieve a Gate instance by its gate name, which works here because the name is unambigious within this gate hierarchy.
[16]:
g_strat.get_gate('Singlets')
[16]:
PolygonGate(Singlets, vertices: 8)
[17]:
# Get the list of gates at the root level
g_strat.get_root_gates()
[17]:
[RectangleGate(TimeGate, dims: 2)]
Before we move on, there is another feature of the GatingStrategy that is important to cover: custom sample gates. Users of cytometry software with a graphical user interface (e.g. FlowJo) know that sometimes a gate’s boundaries need tweaking for a particular sample. The GatingStrategy supports adding such custom gates by adding a gate with a Sample ID.
The next tutorial will demonstrate its usage, but let’s look at the docstring for the add_gate method to get an idea of how it works.
[18]:
help(fk.GatingStrategy.add_gate)
Help on function add_gate in module flowkit._models.gating_strategy:
add_gate(self, gate, gate_path, sample_id=None)
Add a gate to the gating strategy, see `gates` module. The gate ID and gate path
must be unique in the gating strategy. Custom sample gates may be added by specifying
an optional sample ID. Note, the gate & gate path must already exist prior to adding
custom sample gates.
:param gate: instance from a subclass of the Gate class
:param gate_path: complete ordered tuple of gate names for unique set of gate ancestors
:param sample_id: text string for specifying given gate as a custom Sample gate
:return: None
Retrieve Compensation Matrices
Compensation matrixes identified within the GatingStrategy by an ID. The comp_matrices attribute returns a dictionary where the IDs are the keys and the values are the Matrix instances.
[19]:
g_strat.comp_matrices
[19]:
{'Acquisition-defined': Matrix(dims: 8)}
Retrieve Transformations
Retrieving tranformations works in a similar way as the compensation matrices. The transformations attribute returns a dictionary of ID keys and transformation instances>
[20]:
g_strat.transformations
[20]:
{'scatter-lin': LinearTransform(t: 262144.0, a: 0.0),
'logicle-default': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
'Time': LinearTransform(t: 72.0, a: 0.8511997311)}
GatingResults Class
The gate_sample method is used to apply a GatingStrategy to a FCS file. This method returns GatingResults instance. A GatingResults instance contains the results of applying the gating hierarchy on a single Sample. Note, a GatingResults instance is never created by the end user directly, it is only returned from applying a GatingStrategy, Session, or Workspace on a Sample (more on the Session and Workspace classes in the upcoming tutorials).
Let’s load a Sample and apply the previous GatingStrategy via the gate_sample method (setting verbose=True to print out each gate as it is processed).
[21]:
sample = fk.Sample("../../data/8_color_data_set/fcs_files/101_DEN084Y5_15_E01_008_clean.fcs")
[22]:
gs_results = g_strat.gate_sample(sample, verbose=True)
101_DEN084Y5_15_E01_008_clean.fcs: processing gate TimeGate
101_DEN084Y5_15_E01_008_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E01_008_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD3-pos
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD4-pos
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD8-pos
[23]:
# get the Sample ID for the GatingResults instance
gs_results.sample_id
[23]:
'101_DEN084Y5_15_E01_008_clean.fcs'
GatingResults Report
As we can see, the GatingResults class is relatively simple, and it’s main purpose is to provide a Pandas DataFrame of the results via the report attribute. The report contains a row for every gate and includes the following columns:
sample: the Sample ID of the processed Sample instance
gate_path: tuple of the gate path
gate_name: the name of the gate (or name of the Quadrant of a QuadrantGate)
gate_type: The class name of the gate (RectangleGate, PolygonGate, etc.)
quadrant_parent: Quadrant gates are a bit different, they are really a collection of gates. This field would contain the QuadrantGate name, and each Quadrant name would be in the gate_name field.
parent: the gate name of the parent gate
count: the absolute event count for events inside the gate
absolute_percent: the percentage of events inside the gate relative to the total event count in the Sample
relative_percent: the percentage of events inside the gate relative to the number of events in the parent gate
level: the depth of the gate in the gate tree relative to the root of the tree (Note: root level is 0)
[24]:
gs_results.report
[24]:
| sample_id | gate_path | gate_name | gate_type | quadrant_parent | parent | count | absolute_percent | relative_percent | level | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 101_DEN084Y5_15_E01_008_clean.fcs | (root,) | TimeGate | RectangleGate | None | root | 290166 | 99.997932 | 99.997932 | 1 |
| 1 | 101_DEN084Y5_15_E01_008_clean.fcs | (root, TimeGate) | Singlets | PolygonGate | None | TimeGate | 239001 | 82.365287 | 82.366990 | 2 |
| 2 | 101_DEN084Y5_15_E01_008_clean.fcs | (root, TimeGate, Singlets) | aAmine- | PolygonGate | None | Singlets | 164655 | 56.743931 | 68.893017 | 3 |
| 3 | 101_DEN084Y5_15_E01_008_clean.fcs | (root, TimeGate, Singlets, aAmine-) | CD3-pos | PolygonGate | None | aAmine- | 133670 | 46.065782 | 81.181865 | 4 |
| 4 | 101_DEN084Y5_15_E01_008_clean.fcs | (root, TimeGate, Singlets, aAmine-, CD3-pos) | CD4-pos | PolygonGate | None | CD3-pos | 82484 | 28.425899 | 61.707189 | 5 |
| 5 | 101_DEN084Y5_15_E01_008_clean.fcs | (root, TimeGate, Singlets, aAmine-, CD3-pos) | CD8-pos | PolygonGate | None | CD3-pos | 47165 | 16.254153 | 35.284656 | 5 |
For convenience, you can get individual Sample statistics for a gate without having to filter the Pandas DataFrame
[25]:
# Retrieve the absolute gate count.
# Remember to use the gate path if the gate name is ambiguous.
gs_results.get_gate_count('CD3-pos')
[25]:
np.int64(133670)
[26]:
# Or, get the absolute percent
gs_results.get_gate_absolute_percent('CD3-pos')
[26]:
np.float64(46.0657816743173)
[27]:
# And the same for the relative percent
gs_results.get_gate_relative_percent('CD3-pos')
[27]:
np.float64(81.18186511190063)
Retrieve Gate Membership
The get_gate_membership method returns a Boolean array representing which of the Sample events are inside the specified gate. The arguments are the same gate name and gate path to identify the gate (gate path is optional if gate name is unambiguous).
[28]:
cd3_pos_gate_membership = gs_results.get_gate_membership('CD3-pos')
[29]:
cd3_pos_gate_membership
[29]:
array([False, False, False, ..., False, False, True], shape=(290172,))
[30]:
# Summing the Boolean values in the membership NumPy array gives the absolute event count of the gate
cd3_pos_gate_membership.sum()
[30]:
np.int64(133670)
We can then use the membership array to retrieve those events from the Sample
Note: The events we extract here are not necessarily pre-processed the same as they would be given the instructions of the gate, even if using the ‘comp’ or ‘xform’ source option.
[31]:
gated_raw_events = sample.get_events(source='raw')
gated_raw_events = gated_raw_events[cd3_pos_gate_membership]
[32]:
gated_raw_events.shape
[32]:
(133670, 15)
[33]:
# Even simpler, we can use the event_mask option of the Sample.get_events method to get the same gated events
gated_raw_events = sample.get_events(source='raw', event_mask=cd3_pos_gate_membership)
[34]:
gated_raw_events.shape
[34]:
(133670, 15)
Manually Creating Gating Strategies
In the next tutorial, we will cover how to use the gates module to manually build a GatingStrategy.
[ ]: