FlowKit Tutorial - Part 6 - The Workspace Class

https://flowkit.readthedocs.io/en/latest/?badge=latest

In this last part of the tutorial series we will cover the Workspace class. The Workspace class imports a FlowJo 10 workspace file (with the “.wsp” file extension) and its associated FCS files.

If you have any questions about FlowKit, find any bugs, or feel something is missing from these tutorials please submit an issue to the GitHub repository here.

Table of Contents

[1]:
import os
import bokeh
from bokeh.plotting import show
import pandas as pd

import flowkit as fk

bokeh.io.output_notebook()
Loading BokehJS ...
[2]:
# check version so users can verify they have the same version/API
fk.__version__
[2]:
'1.3.0'

Workspace Class

The Workspace class imports a FlowJo 10 workspace file (with the “.wsp” file extension) and its associated FCS files. Most FlowJo 10 features related to gating are supported, including retaining sample group assignment and custom sample gates.

Like the Session class, a Workspace will store Sample instances as well as the analysis results. Unlike the Session class, programmatically creating gates is not allowed and the Workspace class is essentially a “read-only” import of FlowJo 10 workspace files. However, GatingStrategy instances for individual samples can be extracted and used as the basis for a new Session instance.

Let’s have a look at the constructor:

Workspace(
    wsp_file_path,
    fcs_samples=None,
    ignore_missing_files=False,
    find_fcs_files_from_wsp=False
)

The required wsp_file_path argument is a file path to a FlowJo 10 workspace file (.wsp).

The argument fcs_samples may be a Sample instance, string or a list. If given a string, it can be a directory path or a file path. If a directory, any .fcs files in the directory will be found. If a list, then it must be a list of file paths or a list of Sample instances. Lists of mixed types are not supported. Note: Only FCS files matching the ones referenced in the .wsp file will be retained in the Workspace.

The ignore_missing_files argument controls the behavior when FCS files referenced in the WSP file were not loaded in the Workspace. When set to True, missing file data (i.e. gate information) is still loaded into the Workspace and warning messages for missing files are suppressed. Default is False, displaying warnings and not retaining data for missing files.

The find_fcs_files_from_wsp option controls whether to search for FCS files based on URI parameters within the FlowJo workspace file. When True, local FCS files are loaded as Sample instances within the Workspace.

Creating a Workspace

Let’s create a Workspace starting with a FlowJo 10 workspace file and some FCS files.

[3]:
# setup some file paths for our data
base_dir = "../../data/8_color_data_set"

sample_path = os.path.join(base_dir, "fcs_files")
wsp_path = os.path.join(base_dir, "8_color_ICS.wsp")
[4]:
# Create a Workspace with the path to our WSP file and FCS files.
wsp = fk.Workspace(wsp_path, fcs_samples=sample_path)
[5]:
# look at a summary of the Workspace
wsp.summary()
[5]:
samples loaded_samples gates max_gate_depth
group_name
All Samples 3 3 14 6
DEN 3 3 14 6
GEN 0 0 0 0
G69 0 0 0 0
Lyo Cells 0 0 0 0
[6]:
# get a list of sample groups
wsp.get_sample_groups()
[6]:
['All Samples', 'DEN', 'GEN', 'G69', 'Lyo Cells']
[7]:
# From the summary, we can see all the "real" analysis is within the "DEN" group
sample_group = 'DEN'
[8]:
# get the sample IDs that are included in the group
sample_list = wsp.get_sample_ids(group_name=sample_group)
[9]:
sample_list
[9]:
['101_DEN084Y5_15_E01_008_clean.fcs',
 '101_DEN084Y5_15_E03_009_clean.fcs',
 '101_DEN084Y5_15_E05_010_clean.fcs']
[10]:
# pick a sample ID
sample_id = '101_DEN084Y5_15_E01_008_clean.fcs'
[11]:
# get a single Sample instance by its ID
sample = wsp.get_sample(sample_id)
[12]:
sample.channels
[12]:
channel_number pnn pns pne png pnr
0 1 FSC-A (0.0, 0.0) 1.0 262144.0
1 2 FSC-H (0.0, 0.0) 1.0 262144.0
2 3 FSC-W (0.0, 0.0) 1.0 262144.0
3 4 SSC-A (0.0, 0.0) 1.0 262144.0
4 5 SSC-H (0.0, 0.0) 1.0 262144.0
5 6 SSC-W (0.0, 0.0) 1.0 262144.0
6 7 TNFa FITC FLR-A (0.0, 0.0) 1.0 262144.0
7 8 CD8 PerCP-Cy55 FLR-A (0.0, 0.0) 1.0 262144.0
8 9 IL2 BV421 FLR-A (0.0, 0.0) 1.0 262144.0
9 10 Aqua Amine FLR-A (0.0, 0.0) 1.0 262144.0
10 11 IFNg APC FLR-A (0.0, 0.0) 1.0 262144.0
11 12 CD3 APC-H7 FLR-A (0.0, 0.0) 1.0 262144.0
12 13 CD107a PE FLR-A (0.0, 0.0) 1.0 262144.0
13 14 CD4 PE-Cy7 FLR-A (0.0, 0.0) 1.0 262144.0
14 15 Time (0.0, 0.0) 1.0 262144.0

Retrieving Gate Components

Retrieving gate information from a Workspace works slightly differently than in the Session class because of the presence of sample groups and the way in which FlowJo stores information. Within a FlowJo workspace, samples in the same group are not guaranteed to have the same gate tree. Therefore, we will need to specify the sample ID when accessing gate information from the Workspace class.

Let’s use a sample ID to get a gate tree and some other gate components.

[13]:
# The gating hierarchy is retrieved per sample.
# This is due to FlowJo allowing variation in the gate tree among samples.
print(wsp.get_gate_hierarchy(sample_id))
root
╰── Time
    ╰── Singlets
        ╰── aAmine-
            ╰── CD3+
                ├── CD4+
                │   ├── CD107a+
                │   ├── IFNg+
                │   ├── IL2+
                │   ╰── TNFa+
                ╰── CD8+
                    ├── CD107a+
                    ├── IFNg+
                    ├── IL2+
                    ╰── TNFa+
[14]:
# get the gate instance by gate name
wsp.get_gate(sample_id, gate_name='Time')
[14]:
RectangleGate(Time, dims: 2)
[15]:
# Retrieve the child gate IDs for a gate ID
wsp.get_child_gate_ids(sample_id, gate_name='Time')
[15]:
[('Singlets', ('root', 'Time'))]

Retrieving a compensation matrix for a sample is a bit easier in the Workspace since FlowJo restricts each sample to a single comp matrix.

[16]:
wsp.get_comp_matrix(sample_id)
[16]:
Matrix(dims: 8)

FlowJo workspaces also use a dedicated set of transforms for each sample parameter, and they are identified by the sample’s PnN label. We will see later how this enables easier extraction of processed gated event data.

[17]:
# a list of transforms
wsp.get_transforms(sample_id)
[17]:
{'FSC-A': LinearTransform(t: 262144.0, a: 0.0),
 'FSC-H': LinearTransform(t: 262144.0, a: 0.0),
 'FSC-W': LinearTransform(t: 262144.0, a: 0.0),
 'SSC-A': LinearTransform(t: 262144.0, a: 0.0),
 'SSC-H': LinearTransform(t: 262144.0, a: 0.0),
 'SSC-W': LinearTransform(t: 262144.0, a: 0.0),
 'TNFa FITC FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'CD8 PerCP-Cy55 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'IL2 BV421 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Aqua Amine FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'IFNg APC FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'CD3 APC-H7 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'CD107a PE FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'CD4 PE-Cy7 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Time': LinearTransform(t: 69.0, a: 1.2746832452),
 'Comp-TNFa FITC FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Comp-CD8 PerCP-Cy55 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Comp-IL2 BV421 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Comp-Aqua Amine FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Comp-IFNg APC FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Comp-CD3 APC-H7 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Comp-CD107a PE FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Comp-CD4 PE-Cy7 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0)}

Since the FlowJo workspace has a transform for each sample parameter and it is identified using the sample’s PnN channel label we can get a channel’s transform using the transform ID (channel label in this case).

[18]:
wsp.get_transform(sample_id, 'Aqua Amine FLR-A')
[18]:
LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0)

Analyzing Samples and Retrieving Results

Similar to the Session, a Workspace can analyze multiple samples in a sample group with a single call, and the results are stored for all samples.

Let’s review the gate tree and run the analysis for our loaded samples.

[19]:
# We can analyze a whole group at once (use verbose=True to see each gate as it's processed)
wsp.analyze_samples(group_name=sample_group, verbose=False, use_mp=False)

When processing multiple samples in parallel, the number of CPU cores used depends on the number of cores available, the number of samples to be run, and the size of the event arrays in the samples. After core-count, the next limiting factor in processing multiple samples simultaneously is available memory. The Workspace class attempts to estimate the required memory and limit the number of parallel analyses accordingly. If you encounter memory errors when analyzing multiple samples, try setting ``use_mp`` to False.

There are also scenarios where you may not want to analyze all samples at once. The analyze_samples method takes an optional sample_id keyword argument for analyzing only one sample at a time. Let’s read the documentation for more information:

[20]:
help(wsp.analyze_samples)
Help on method analyze_samples in module flowkit._models.workspace:

analyze_samples(group_name=None, sample_id=None, cache_events=False, use_mp=True, verbose=False) method of flowkit._models.workspace.Workspace instance
    Process gates for samples. Samples to analyze can be filtered by group name or sample ID.
    After running, results can be retrieved using the `get_gating_results`, `get_group_report`,
    and  `get_gate_membership`, methods.

    :param group_name: optional group name, if specified only samples in this group will be processed
    :param sample_id: optional sample ID, if specified only this sample will be processed (overrides group filter)
    :param cache_events: Whether to cache pre-processed events (compensated and transformed). This can
        be useful to speed up processing of gates that share the same pre-processing instructions for
        the same channel data, but can consume significantly more memory space. See the related
        clear_cache method for additional information. Default is False.
    :param use_mp: Controls whether multiprocessing is used to gate samples (default is True).
        Multiprocessing can fail for large workloads (lots of samples & gates) due to running out of
        memory. If encountering memory errors, set use_mp to False (processing will take longer,
        but will use significantly less memory).
    :param verbose: if True, print a line for every gate processed (default is False)
    :return: None

The Workspace class retains the data from analysis, allowing convenient access to results. Next, we’ll get the report of results for all samples in a sample group. The report is a Pandas DataFrame, making it easy to filter. Let’s look at the time gate for all samples.

[21]:
# and a look a the results as a Pandas DataFrame
results_report = wsp.get_analysis_report(sample_group)
results_report[results_report['gate_name'] == 'Time']
[21]:
sample_id gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 101_DEN084Y5_15_E01_008_clean.fcs (root,) Time RectangleGate None root 290166 99.997932 99.997932 1
14 101_DEN084Y5_15_E03_009_clean.fcs (root,) Time RectangleGate None root 283968 99.999648 99.999648 1
28 101_DEN084Y5_15_E05_010_clean.fcs (root,) Time RectangleGate None root 284846 99.844369 99.844369 1
[22]:
# We can also retrieve the full GatingResults object for a specific sample
sample_results = wsp.get_gating_results(sample_id)
sample_results.report.head()
[22]:
sample_id gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 101_DEN084Y5_15_E01_008_clean.fcs (root,) Time RectangleGate None root 290166 99.997932 99.997932 1
1 101_DEN084Y5_15_E01_008_clean.fcs (root, Time) Singlets PolygonGate None Time 239001 82.365287 82.366990 2
2 101_DEN084Y5_15_E01_008_clean.fcs (root, Time, Singlets) aAmine- PolygonGate None Singlets 164655 56.743931 68.893017 3
3 101_DEN084Y5_15_E01_008_clean.fcs (root, Time, Singlets, aAmine-) CD3+ PolygonGate None aAmine- 133670 46.065782 81.181865 4
4 101_DEN084Y5_15_E01_008_clean.fcs (root, Time, Singlets, aAmine-, CD3+) CD4+ PolygonGate None CD3+ 82484 28.425899 61.707189 5

Extracting Gated Event Data

Gated event information is available in 2 forms: as gated event arrays (pandas DataFrame) or as Boolean arrays of gate membership. The method get_gate_events retrieves only the sample events that are within a specified gate. The original indices of the sample events are preserved. The get_gate_membership method returns 1-D Boolean array indicating which sample events are inside the gate. For either of these methods, if the gate name is ambigious, the gate path must also be given.

Note: The events returned by get_gate_events in the Workspace class are pre-processed, this is in contrast to the Session class. The reason for the difference is that FlowJo has a dedicated compensation matrix and set of transforms for each channel. In the Session class there is no such guarantee or requirement to create transforms for unused channels.

[23]:
gated_events = wsp.get_gate_events(sample_id, gate_name='CD4+')
[24]:
# Gated events is a DataFrame of only the events within the gate.
# The events WILL BE processed according to the Sample's compensation & transforms
# as specified in the WSP file. The original event indices are retained.
gated_events.head()
[24]:
sample_id FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W TNFa FITC FLR-A CD8 PerCP-Cy55 FLR-A IL2 BV421 FLR-A Aqua Amine FLR-A IFNg APC FLR-A CD3 APC-H7 FLR-A CD107a PE FLR-A CD4 PE-Cy7 FLR-A Time
6 101_DEN084Y5_15_E01_008_clean.fcs 0.632765 0.519402 0.304564 0.116014 0.111382 0.260397 0.253992 0.225618 0.253962 0.250438 0.235338 0.419341 0.276203 0.548099 0.036353
9 101_DEN084Y5_15_E01_008_clean.fcs 0.415333 0.329010 0.315593 0.200316 0.182648 0.274184 0.254298 0.329108 0.320049 0.257477 0.271226 0.500196 0.319537 0.594672 0.036452
10 101_DEN084Y5_15_E01_008_clean.fcs 0.427080 0.328156 0.325364 0.296132 0.262680 0.281837 0.260209 0.296330 0.316296 0.262380 0.266253 0.451234 0.284111 0.618065 0.036467
14 101_DEN084Y5_15_E01_008_clean.fcs 0.702111 0.588760 0.298131 0.109897 0.107925 0.254567 0.242071 0.260048 0.245019 0.241003 0.246732 0.377067 0.268328 0.516127 0.036538
18 101_DEN084Y5_15_E01_008_clean.fcs 0.519625 0.444183 0.292461 0.150212 0.142075 0.264318 0.237336 0.253679 0.290496 0.243838 0.246275 0.518013 0.261879 0.581097 0.036723

Instead of getting the data array for gated events, you can also retrieve the gate membership as a boolean array (True value means the event is in the gate). Below we’ll get the membership for the singlets gate.

[25]:
wsp.get_gate_membership(sample_id, gate_name='Singlets')
[25]:
array([False, False, False, ..., False, False,  True], shape=(290172,))

Looking at our gate tree, we see the cytokine gates are repeated in both the CD4+ and CD8+ branch. To get gate information for those, we must specify the gate path. Below we will use the gate path to get the IL2+ gate membership for the CD4+ events.

[26]:
wsp.get_gate_membership(
    sample_id,
    gate_name="IL2+",
    gate_path=('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')
)
[26]:
array([False, False, False, ..., False, False, False], shape=(290172,))
[27]:
# Here we'll collect the gate membership arrays for all gates for a sample
results = {}

for gate_name, gate_path in wsp.get_gate_ids(sample_id):
    result = wsp.get_gate_membership(
        sample_id,
        gate_name=gate_name,
        gate_path=gate_path
    )
    results[(gate_name, gate_path)] = result
[28]:
list(results.keys())
[28]:
[('Time', ('root',)),
 ('Singlets', ('root', 'Time')),
 ('aAmine-', ('root', 'Time', 'Singlets')),
 ('CD3+', ('root', 'Time', 'Singlets', 'aAmine-')),
 ('CD4+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+')),
 ('CD107a+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')),
 ('IFNg+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')),
 ('IL2+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')),
 ('TNFa+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')),
 ('CD8+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+')),
 ('CD107a+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+')),
 ('IFNg+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+')),
 ('IL2+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+')),
 ('TNFa+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+'))]
[29]:
results[('aAmine-', ('root', 'Time', 'Singlets'))]
[29]:
array([False, False, False, ..., False, False,  True], shape=(290172,))
[30]:
# combine the boolean arrays into a single DataFrame
df_results = pd.DataFrame(results)
[31]:
df_results
[31]:
Time Singlets aAmine- CD3+ CD4+ CD107a+ IFNg+ IL2+ TNFa+ CD8+ CD107a+ IFNg+ IL2+ TNFa+
(root,) (root, Time) (root, Time, Singlets) (root, Time, Singlets, aAmine-) (root, Time, Singlets, aAmine-, CD3+) (root, Time, Singlets, aAmine-, CD3+, CD4+) (root, Time, Singlets, aAmine-, CD3+, CD4+) (root, Time, Singlets, aAmine-, CD3+, CD4+) (root, Time, Singlets, aAmine-, CD3+, CD4+) (root, Time, Singlets, aAmine-, CD3+) (root, Time, Singlets, aAmine-, CD3+, CD8+) (root, Time, Singlets, aAmine-, CD3+, CD8+) (root, Time, Singlets, aAmine-, CD3+, CD8+) (root, Time, Singlets, aAmine-, CD3+, CD8+)
0 False False False False False False False False False False False False False False
1 False False False False False False False False False False False False False False
2 False False False False False False False False False False False False False False
3 False False False False False False False False False False False False False False
4 False False False False False False False False False False False False False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
290167 True True True True False False False False False True False False False False
290168 True False False False False False False False False False False False False False
290169 True False False False False False False False False False False False False False
290170 True False False False False False False False False False False False False False
290171 True True True True True False False False False False False False False False

290172 rows × 14 columns

Plotting Gated Events

Because the Workspace class retains samples and their gating results, it can also provide better plotting tools than the GatingStrategy. Gates and their events can be plotted using the plot_gate method. This method will display the events from the parent gate along with the gate boundaries. Note that the plot methods display a subsample of events by default for performance reasons. This can be controlled by the subsample_count argument.

[32]:
p = wsp.plot_gate(
    sample_id,
    'CD3+',
    x_min=0,
    x_max=1,
    y_min=0,
    y_max=1
)
[33]:
show(p)
[34]:
# plot the 'CD3+' gate for all samples in the 'DEN' group
# plot the time gates for all samples
for den_sample_id in wsp.get_sample_ids(group_name='DEN'):
    p = wsp.plot_gate(
        den_sample_id,
        'CD3+',
        x_min=0,
        x_max=1,
        y_min=0,
        y_max=1
    )
    show(p)

There is also the plot_scatter method that creates a scatter plot of a gated population in any specified dimensions. To use it, provide the channel label for each axis. The events will be pre-processed as specified in the workspace for those dimensions. To demonstrate, we’ll plot 2 cytokine channels from the CD8+ events.

[35]:
x_label = 'IL2 BV421 FLR-A'
y_label = 'CD107a PE FLR-A'

p = wsp.plot_scatter(
    sample_id,
    x_label,
    y_label,
    gate_name='CD8+',
    subsample_count=50e5,
    x_min=0,
    x_max=1,
    y_min=0,
    y_max=1
)
show(p)

Or, we could take the gate membership array from our Workspace and use it to highlight events using the Sample plot_scatter method. This will highlight the CD8+ events in the context of the non-gated events.

[36]:
cd8_pos_memb = wsp.get_gate_membership(sample_id, 'CD8+')
[37]:
{t_id: t for t_id, t in wsp.get_transforms(sample_id).items() if t_id in sample.pnn_labels}
[37]:
{'FSC-A': LinearTransform(t: 262144.0, a: 0.0),
 'FSC-H': LinearTransform(t: 262144.0, a: 0.0),
 'FSC-W': LinearTransform(t: 262144.0, a: 0.0),
 'SSC-A': LinearTransform(t: 262144.0, a: 0.0),
 'SSC-H': LinearTransform(t: 262144.0, a: 0.0),
 'SSC-W': LinearTransform(t: 262144.0, a: 0.0),
 'TNFa FITC FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'CD8 PerCP-Cy55 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'IL2 BV421 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Aqua Amine FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'IFNg APC FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'CD3 APC-H7 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'CD107a PE FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'CD4 PE-Cy7 FLR-A': LogicleTransform(t: 262144.0, w: 1.0, m: 4.418539922, a: 0.0),
 'Time': LinearTransform(t: 69.0, a: 1.2746832452)}
[38]:
sample_xforms = {t_id: t for t_id, t in wsp.get_transforms(sample_id).items() if t_id in sample.pnn_labels}

sample.apply_compensation(wsp.get_comp_matrix(sample_id))
sample.apply_transform(sample_xforms)
sample.subsample_events(5e5)

p = sample.plot_scatter(x_label, y_label, highlight_mask=cd8_pos_memb, x_min=0, x_max=1, y_min=0, y_max=1)
show(p)

Extract GatingStrategy

The Workspace class is essentially a read-only parser of FlowJo 10 workspaces. It doesn’t allow for modifying or creating new gates, adding new samples, etc. The Session class is for programmatically creating gating strategies. However, you can extract a GatingStrategy instance from the Workspace to use as the starting point for an interactive Session.

[39]:
gating_strategy = wsp.get_gating_strategy(sample_id)
[40]:
new_session = fk.Session(gating_strategy)

Archive Results

While Python provides a great ecosystem for downstream bioinformatics analysis, there are specific libraries or user preferences for other programming languages. The Workspace class offers a way to archive analyzed gating results as Feather files, a programming language agnostic format. This allows users to read and perform downstream analysis on the results in their preferred language or library.

The Workspace method archive_results() provides this functionality. Note, the Workspace method analyze_samples() must be run prior to calling this method. It takes a sample group and produces the following output files for the specified sample group:

  • Sample panel (CSV file)

    CSV file containing the sample channel data. It includes columns for ‘channel_number’, ‘pnn’, ‘pns’. This is a useful human-readable reference for the panel of channel data in the archived results.

  • Gate tree (text file)

    ASCII text file contining the gate tree for the sample group. This is a useful human-readable reference for the gating hierarchy.

  • Gate ID lookup table (Feather file)

    Feather file containing columns for ‘gate_id’, ‘gate_path’, and ‘gate_name’. This is useful as a programmatic reference of the specific gates used in the analysis.

  • Gating report (Feather file)

    A Feather version of the output from the get_analysis_report() method. This contains the aggregate gate statistics.

  • Gate membership (Feather file)

    A Feather version of the output from the get_gate_membership() method with an added column ‘sample_id’ for identifying the events belonging to an event. This is an array of Boolean values indicating whether that event is in that particular gate. The other columns are gate IDs.

  • All preprocessed events (Feather file)

    A Feather version of the output from the get_gate_events() method with an added column ‘sample_id’ for identifying the events belonging to an event. This is an array of preprocessed event values (compensated and transformed as defined in the workspace). The other columns are channel labels.

Finally, it can be useful to archive results with additional sample-level metadata. The archive_results() method takes an optional pandas DataFrame of Sample metadata that will be merged with the gating report, gate membership, and the preprocessed event arrays. This pandas DataFrame must have a ‘sample_id’ as an index, with distinct rows for each Sample ID.

Let’s take a look at the docstring for this method.

[41]:
help(wsp.archive_results)
Help on method archive_results in module flowkit._models.workspace:

archive_results(group_name, out_dir, output_prefix=None, df_sample_metadata=None, overwrite=False) method of flowkit._models.workspace.Workspace instance
    Archive group results to PyArrow Feather files for language-agnostic downstream analysis.
    Requires having previously run the `analyze_samples` method. The following archive
    files will be created:
        - Sample panel (CSV file)
        - Gate tree (text file)
        - Gate ID lookup table (Feather file)
        - Gating report (Feather file)
        - Gate membership (Feather file)
        - All preprocessed events (Feather file)

    :param group_name: Workspace group name to archive
    :param out_dir: string path for the output file directory
    :param output_prefix: Optional string prefix for the output file names
    :param df_sample_metadata: Optional pandas DataFrame of Sample metadata. Must have a
        'sample_id' column as an index, with distinct rows for each Sample ID. Sample metadata
        will be merged with the archived gating report, gate membership, and preprocessed
        events feather files.
    :param overwrite: force overwriting existing files (default=False)
    :return: None

This concludes the introductory tutorial series. I recommend perusing theFlowKit API documentationor looking at the other advanced notebooks in the ``notebooks/advanced`` directory to learn what more you can do with the FlowKit library.

[ ]: