FlowKit Tutorial - Part 1 - The Sample Class

https://flowkit.readthedocs.io/en/latest/?badge=latest

Welcome to the series of FlowKit tutorial notebooks! I hope you find these tutorials a helpful guide to using FlowKit for your FCM analysis. Part 1 covers the Sample class, the foundational class on which most of FlowKit is built. If you have any questions about FlowKit, find any bugs, or feel something is missing from these tutorials please submit an issue to the GitHub repository here.

Table of Contents

[1]:
import bokeh
from bokeh.plotting import show

import flowkit as fk

bokeh.io.output_notebook()
Loading BokehJS ...
[2]:
# check version so users can verify they have the same version/API
fk.__version__
[2]:
'1.1.0'

Sample Class

A Sample instance represents a single FCS sample, and is the only point of entry for FCS event data into the FlowKit library.

A Sample object can conveniently be created from a variety of data sources:

  • A file path to an FCS file

  • A pathlib Path object to an FCS file

  • An already instantiated FlowIO FlowData object

  • A NumPy array (must provide sample_id & channel_labels)

  • A Pandas DataFrame (with channel labels as headers, must provide sample_id)

Let’s take a look at the Sample constructor method:

Sample(
    fcs_path_or_data,
    sample_id=None,
    channel_labels=None,
    compensation=None,
    null_channel_list=None,
    ignore_offset_error=False,
    ignore_offset_discrepancy=False,
    use_header_offsets=False,
    cache_original_events=False,
    subsample=10000
)
  • fcs_path_or_data: a data source for the FCS sample as described above

  • sample_id: A text string to use for the Sample’s ID. If None, the ID will be taken from the ‘fil’ keyword of the metadata. If the ‘fil’ keyword is not present, the value will be the filename if given a file. For a NumPy array or Pandas DataFrame, a text value is required.

  • channel_labels: A list of strings or a list of tuples to use for the channel labels. Required if fcs_path_or_data is a NumPy array

  • compensation: Compensation matrix, which can be a:

    • Matrix instance

    • NumPy array

    • CSV file path

    • pathlib Path object to a CSV or TSV file

    • string of CSV text

    • None (default) for no compensation (it can be applied later via the apply_comensation method)

  • null_channel_list: List of PnN labels for acquired channels that do not contain useful data. Note, this should only be used if no fluorochromes were used to target those detectors. Null channels do not contribute to compensation and should not be included in a compensation matrix for this sample.

  • ignore_offset_error: An option to ignore data offset error (see note below for more details)

  • ignore_offset_discrepancy: option to ignore discrepancy between the HEADER and TEXT values for the DATA byte offset location, default is False

  • use_header_offsets: use the HEADER section for the data offset locations, default is False. Setting this option to True also suppresses an error in cases of an offset discrepancy.

  • cache_original_events: Original events are the unprocessed events as stored in the FCS binary, meaning they have not been scaled according to channel gain, corrected for proper lin/log display, or had the time channel scaled by the ‘timestep’ keyword value (if present). By default, these events are not retained by the Sample class as they are typically not useful. To retrieve the original events, set this to True and call get_events(source=’orig’).

  • subsample: The number of events to use for sub-sampling. The number of sub-sampled events can be changed after instantiation using the subsample_events method. The random seed can also be specified using that method. Sub-sampled events are used predominantly for speeding up plotting methods.

Note about FCS files with a data offset error:

Some FCS files incorrectly report the location of the last data byte as the last byte exclusive of the data section rather than the last byte inclusive of the data section. Technically, these are invalid FCS files but these are not corrupted data files. To attempt to read in these files, set the ignore_offset_error option to True.

Note on ``ignore_offset_discrepancy`` and ``use_header_offset``: The byte offset location for the DATA segment is defined in 2 places in an FCS file: the HEADER and the TEXT segments. By default, FlowIO uses the offset values found in the TEXT segment. If the HEADER values differ from the TEXT values, a DataOffsetDiscrepancyError will be raised. This option allows overriding this error to force the loading of the FCS file. The related use_header_offset can be used to force loading the file using the data offset locations found in the HEADER section rather than the TEXT section. Setting use_header_offset to True is equivalent to setting both options to True, meaning no error will be raised for an offset discrepancy.

Event type names in the Sample class

Several methods in the Sample class include a source argument that determines the type of events used or retrieved by the method. The options for the source argument are:

  • orig

  • raw

  • comp

  • xform

The orig option (for original events) is only available if the Sample object was instantiated with cache_original_events set to True. The original events are the events exactly as they were encoded in the FCS file, without applying the gain (from the $PnG keywords) or the time step from the FCS metadata. The original events are not typically useful for any processing.

The raw option specifies the event data that has been correctly pre-processed according the the channel gain and time step information.

The comp option specifies the event data as the raw data with a compensation matrix applied. These events are only available if a compensation matrix was specified in the Sample object instantiation or if the apply_compensation method has been called.

The xform option specifies transformed events. Transformed events will be stored post-compensation if a compensation matrix was supplied when creating a Sample instance or if the apply_compensation method has been called. Transformations can be also be applied to a non-compensated Sample.

Applying compensation and transforms is covered in part 2 of the tutorial notebook series.

Information is also available via the Python help function, along with descriptions of the Sample class methods:

help(fk.Sample)

Create a Sample Instance

As stated above in the Sample docstring, a Sample instance can be created from a variety of data sources:

  • File path to an FCS file

  • pathlib Path object to an FCS file

  • FlowIO FlowData object

  • NumPy array (must provide sample_id & channel_labels)

  • Pandas DataFrame (with channel labels as headers, must provide sample_id)

From an FCS File

Let’s create a Sample instance from a file path to an FCS file.

[3]:
fcs_path = '../../data/gate_ref/data1.fcs'
[4]:
sample = fk.Sample(fcs_path)
[5]:
sample
[5]:
Sample(v2.0, B07, 8 channels, 13367 events)

The string representation tells us this is an FCS 2.0 file with the ‘$FIL’ keyword value of ‘B07’. There are 8 channels of event data with 13,367 total events.

From a pandas DataFrame or NumPy array

A Sample can also be created from a pandas DataFrame or a NumPy array. In both cases, a sample ID must be provided. For FCS files, this ID is read from the metadata, and without an ID other features of FlowKit will have no mechanism to reference a Sample.

Let’s get a DataFrame from the previous Sample we made. We’ll cover more on exporting events later in this tutorial, but for now we’ll just use the as_dataframe method. Then, we can use it as an example of creating a new Sample instance.

[6]:
df_events = sample.as_dataframe(source='raw')
[7]:
df_events.head()
[7]:
pnn FSC-H SSC-H FL1-H FL2-H FL3-H FL2-A FL4-H Time
pns FSC-Height SSC-Height CD4 FITC CD8 B PE CD3 PerCP CD8 APC Time (102.40 sec.)
0 88.010899 27.250 7.233942 34.598917 11.039992 5.0 5.186134 0.0
1 19.073569 5.375 36.517413 1.000000 170.007762 0.0 4.293510 0.0
2 70.572207 26.000 2.480454 12.863969 3.023213 0.0 8.582104 0.0
3 98.910082 31.750 5.473703 14.989296 3.751619 0.0 6.731704 0.0
4 29.972752 34.750 2.641648 2.665516 2.641648 0.0 6.097562 0.0
[8]:
sample_from_df = fk.Sample(df_events, sample_id='my_sample_from_dataframe')
[9]:
sample_from_df
[9]:
Sample(v3.1, my_sample_from_dataframe, 8 channels, 13367 events)

Creating a Sample from a NumPy array is similar, but we must also provide the channel names. In the case of the DataFrame above, the channel names were taken from the column names.

[10]:
np_events = sample.get_events(source='raw')
channel_labels = sample.pnn_labels
[11]:
sample_from_np = fk.Sample(np_events, channel_labels=channel_labels, sample_id='my_sample_from_numpy')
[12]:
sample_from_np
[12]:
Sample(v3.1, my_sample_from_numpy, 8 channels, 13367 events)

Metadata and Channel Information

Get the FCS version of the file (returns None if a Sample was created from a NumPy array or Pandas DataFrame)

[13]:
sample.version
[13]:
'2.0'

Retrieve all the FCS metadata:

[14]:
sample.get_metadata()
[14]:
{'byteord': '4,3,2,1',
 'datatype': 'I',
 'nextdata': '0',
 'sys': 'Macintosh System Software 9.0.4',
 'creator': 'CELLQuestª 3.3',
 'tot': '13367',
 'mode': 'L',
 'par': '8',
 'p1n': 'FSC-H',
 'p1r': '1024',
 'p1b': '16',
 'p1e': '0,0',
 'p1g': '3.67',
 'p2n': 'SSC-H',
 'p2r': '1024',
 'p2b': '16',
 'p2e': '0,0',
 'p2g': '8',
 'p3n': 'FL1-H',
 'p3r': '1024',
 'p3b': '16',
 'p3e': '4,0',
 'p4n': 'FL2-H',
 'p4r': '1024',
 'p4b': '16',
 'p4e': '4,0',
 'p5n': 'FL3-H',
 'p5r': '1024',
 'p5b': '16',
 'p5e': '4,0',
 'p1s': 'FSC-Height',
 'p2s': 'SSC-Height',
 'p3s': 'CD4 FITC',
 'p4s': 'CD8 B PE',
 'p5s': 'CD3 PerCP',
 'p6n': 'FL2-A',
 'p6r': '1024',
 'p6b': '16',
 'p6e': '0,0',
 'timeticks': '100',
 'p7n': 'FL4-H',
 'p7r': '1024',
 'p7e': '4,0',
 'p7b': '16',
 'p7s': 'CD8 APC',
 'p8n': 'Time',
 'p8r': '1024',
 'p8e': '0,0',
 'p8b': '16',
 'p8s': 'Time (102.40 sec.)',
 'sample id': 'Default Patient ID',
 'src': 'Default',
 'case number': 'Default Case Number',
 'cyt': 'FACSCalibur',
 'cytnum': 'E3820',
 'btim': '16:31:33',
 'etim': '16:31:52',
 'bdacqlibversion': '3.1',
 'bdnpar': '7',
 'bdp1n': 'FSC-H',
 'bdp2n': 'SSC-H',
 'bdp3n': 'FL1-H',
 'bdp4n': 'FL2-H',
 'bdp5n': 'FL3-H',
 'bdp6n': 'FL2-A',
 'bdp7n': 'FL4-H',
 'bdword0': '24',
 'bdword1': '394',
 'bdword2': '492',
 'bdword3': '477',
 'bdword4': '566',
 'bdword5': '397',
 'bdword6': '397',
 'bdword7': '397',
 'bdword8': '398',
 'bdword9': '397',
 'bdword10': '300',
 'bdword11': '299',
 'bdword12': '551',
 'bdword13': '4',
 'bdword14': '397',
 'bdword15': '501',
 'bdword16': '481',
 'bdword17': '586',
 'bdword18': '574',
 'bdword19': '100',
 'bdword20': '100',
 'bdword21': '100',
 'bdword22': '100',
 'bdword23': '1',
 'bdword24': '1',
 'bdword25': '0',
 'bdword26': '0',
 'bdword27': '0',
 'bdword28': '136',
 'bdword29': '52',
 'bdword30': '52',
 'bdword31': '52',
 'bdword32': '52',
 'bdword33': '52',
 'bdword34': '12',
 'bdword35': '201',
 'bdword36': '6',
 'bdword37': '138',
 'bdword38': '280',
 'bdword39': '3',
 'bdword40': '3',
 'bdword41': '100',
 'bdword42': '100',
 'bdword43': '0',
 'bdword44': '1023',
 'bdword45': '1023',
 'bdword46': '1023',
 'bdword47': '53',
 'bdword48': '550',
 'bdword49': '56',
 'bdword50': '72',
 'bdword51': '52',
 'bdword52': '0',
 'bdword53': '0',
 'bdword54': '0',
 'bdword55': '0',
 'bdword56': '0',
 'bdword57': '0',
 'bdword58': '0',
 'bdword59': '0',
 'bdword60': '0',
 'bdword61': '0',
 'bdword62': '0',
 'bdword63': '0',
 'bdlasermode': '1',
 'calibfile': 'FALSE',
 'p7thresvol': '52',
 'fil': 'B07',
 'date': '23-Aug-02',
 'number well info keywords': '3',
 '&1sample': '200',
 '&2number of washes': '1',
 '&3mixing vol': '100',
 '&4number of mixes': '2',
 '&5data file prefix part #1\\\\&6data file prefix part #2\\\\&7data file prefix part #3\\\\&8acquisition doc.': 'LYMPH SUBSET ACQ',
 '&9instr. sett. file': 'E#7 Settings #1',
 '&10patient id': ' FJ#192659',
 '&11day': '35d',
 '&12sample id': 'T-cells',
 '&13analysis doc.': ''}

Retrieve a DataFrame of channel information, including the required PnN labels & optional PnS labels for each channel. Note that Samples distinguish between channel numbers and indices, with channel numbers being indexed at 1 and channel indices being indexed at 0.

[15]:
sample.channels
[15]:
channel_number pnn pns png pne pnr
0 1 FSC-H FSC-Height 3.67 (0.0, 0.0) 1024.0
1 2 SSC-H SSC-Height 8.00 (0.0, 0.0) 1024.0
2 3 FL1-H CD4 FITC 1.00 (4.0, 1.0) 1024.0
3 4 FL2-H CD8 B PE 1.00 (4.0, 1.0) 1024.0
4 5 FL3-H CD3 PerCP 1.00 (4.0, 1.0) 1024.0
5 6 FL2-A 1.00 (0.0, 0.0) 1024.0
6 7 FL4-H CD8 APC 1.00 (4.0, 1.0) 1024.0
7 8 Time Time (102.40 sec.) 1.00 (0.0, 0.0) 1024.0

Get a list of only the PnN labels:

[16]:
sample.pnn_labels
[16]:
['FSC-H', 'SSC-H', 'FL1-H', 'FL2-H', 'FL3-H', 'FL2-A', 'FL4-H', 'Time']

The optional PnS labels are also available (empty values will be empty strings):

[17]:
sample.pns_labels
[17]:
['FSC-Height',
 'SSC-Height',
 'CD4 FITC',
 'CD8 B PE',
 'CD3 PerCP',
 '',
 'CD8 APC',
 'Time (102.40 sec.)']

The Sample class attempts to automatically identify fluorescent, scatter, and time channels. This is done by looking for simple sub-string values in channel names (i.e. ‘FSC’, ‘SSC’, ‘Time’). The channel indices for these channel types are available using the following attributes:

[18]:
sample.fluoro_indices
[18]:
[2, 3, 4, 5, 6]
[19]:
sample.scatter_indices
[19]:
[0, 1]
[20]:
sample.time_index
[20]:
7

Lookup a channel index by a label string:

[21]:
sample.get_channel_index('FL2-H')
[21]:
3

Or, lookup a channel number:

[22]:
sample.get_channel_number_by_label('FL2-H')
[22]:
4

And for completeness, get a channel index via its number:

[23]:
sample.get_channel_index(4)
[23]:
3

To get the event count:

[24]:
sample.event_count
[24]:
13367

Several other Sample attributes are available including:

  • id

  • original_filename

  • acquisition_date

  • compensation

  • transform

And a few attributes for filtering events:

  • subsample_indices

    Set by constructor or by calling the subsample_events method (see below).

  • negative_scatter_indices

    Set by calling the filter_negative_scatter method.

  • flagged_indices

    Assigned manually by the user for flagging any events (anomolous events from a QC routine, etc.)

Subsampling

FlowKit is optimized for performance (or attempts to be!). However, when dealing with high-dimensional flow cytometry data and/or data containing millions of events, it can be useful to sub-sample events to speed up processing. This is especially true when trying to plot events. All Sample plot methods, except for plot_histogram, use sub-sampled events by default (we’ll see these methods in the next section).

On instantiation, the number of sub-sampled events can be specified (default is 10,000). The Sample class also provides a subsample_events method to change the number of sub-sample events or the random seed used to generate the sub-sample. The sub-sample is drawn randomly, but in a reproducible way. You are guaranteed the same sub-sample indices when re-running analysis by providing the same random seed as an argument (default seed is 1). We can retrieve the subsampled indices using the subsample_indices attribute.

Note that sub-sampling does not delete any events, the sub-sampled indices are simply stored and used as a subset of events. Any Sample class method that plots or retrieves events will have a subsample argument that takes a Boolean value specifying whether to use the sub-sampled events or all events. Any method that processes events (compensating or transforming) will always use all the events.

Retrieving sub-sampled indices

Our Sample instance was already sub-sampled by default (at 10000 events) when creating the Sample. Note in the output below, the sub-sampled indices are not only randomly selected (in a reproducible way via the random seed), but are also randomly shuffled. This allows for safer sub-sampling the sub-sample if even fewer events are needed.

[25]:
sample.subsample_indices
[25]:
array([ 4136, 12180, 11048, ...,  9661, 10709, 10093])

Retrieving Events

Several methods are available in the Sample class for convenient retrieval event data, and in a variety of forms.

Retrieve Events as NumPy Array

The Sample methods get_events and get_channel_events return event data as a NumPy array. Both methods have similar input arguments, including the already familiar source and subsample arguments for specifying the event class (‘orig’, ‘raw’, ‘comp’, or ‘xform’) and whether to return the sub-sampled events or all events.

Note: These methods return the arrays directly, not a copy of the array. Be careful if you are planning to modify the returned event data, and make a copy of the array when appropriate.

[26]:
help(sample.get_events)
Help on method get_events in module flowkit._models.sample:

get_events(source='xform', subsample=False) method of flowkit._models.sample.Sample instance
    Returns a NumPy array of event data.

    Note: This method returns the array directly, not a copy of the array. Be careful if you
    are planning to modify returned event data, and make a copy of the array when appropriate.

    :param source: 'orig', 'raw', 'comp', 'xform' for whether the original (no gain applied),
        raw (orig + gain), compensated (raw + comp), or transformed (comp + xform) events will
        be returned
    :param subsample: Whether to return all events or just the sub-sampled
        events. Default is False (all events)
    :return: NumPy array of event data

[27]:
sample.get_events(source='raw')
[27]:
array([[ 88.01089918,  27.25      ,   7.23394163, ...,   5.        ,
          5.18613419,   0.        ],
       [ 19.07356948,   5.375     ,  36.51741273, ...,   0.        ,
          4.29351021,   0.        ],
       [ 70.57220708,  26.        ,   2.48045441, ...,   0.        ,
          8.58210354,   0.        ],
       ...,
       [ 62.1253406 ,  27.625     ,  11.75743266, ...,   0.        ,
          1.77827941, 174.        ],
       [ 36.23978202,  64.5       ,   5.42469094, ...,   0.        ,
          4.95806824, 174.        ],
       [ 66.48501362,   8.75      ,   1.43301257, ...,   0.        ,
          6.0429639 , 174.        ]])
[28]:
# The Sample method `get_channel_events` takes a `channel_index` argument.
# Note the channel index (indexed at zero) and not the channel number.
channel_idx = sample.get_channel_index('FSC-H')
sample.get_channel_events(channel_idx, source='raw')
[28]:
array([88.01089918, 19.07356948, 70.57220708, ..., 62.1253406 ,
       36.23978202, 66.48501362])

Retrieve Events as pandas DataFrame

The Sample method as_dataframe returns a pandas DataFrame of the Sample event data. This method also supports source and subsample arguments, but includes the extra arguments col_order and col_names for choosing the order of columns by PnN label and/or specifying new names for the columns in the returned DataFrame.

[29]:
sample.as_dataframe(source='raw')
[29]:
pnn FSC-H SSC-H FL1-H FL2-H FL3-H FL2-A FL4-H Time
pns FSC-Height SSC-Height CD4 FITC CD8 B PE CD3 PerCP CD8 APC Time (102.40 sec.)
0 88.010899 27.250 7.233942 34.598917 11.039992 5.0 5.186134 0.0
1 19.073569 5.375 36.517413 1.000000 170.007762 0.0 4.293510 0.0
2 70.572207 26.000 2.480454 12.863969 3.023213 0.0 8.582104 0.0
3 98.910082 31.750 5.473703 14.989296 3.751619 0.0 6.731704 0.0
4 29.972752 34.750 2.641648 2.665516 2.641648 0.0 6.097562 0.0
... ... ... ... ... ... ... ... ...
13362 66.212534 19.625 6.915821 14.330126 2.617995 0.0 5.882084 174.0
13363 70.844687 25.500 7.986266 20.169146 3.522695 0.0 6.731704 174.0
13364 62.125341 27.625 11.757433 22.266720 2.665516 0.0 1.778279 174.0
13365 36.239782 64.500 5.424691 5.327979 7.299301 0.0 4.958068 174.0
13366 66.485014 8.750 1.433013 1.154782 1.218814 0.0 6.042964 174.0

13367 rows × 8 columns

Plotting Sample Events

  • Histogram

  • Contour Plot

  • Interactive Scatter Plot

  • Interactive Scatter Plot Matrix

All plotting methods return a Bokeh figure instance. This is done so the caller can modify and/or display the plot as required.

Histogram

[30]:
help(sample.plot_histogram)
Help on method plot_histogram in module flowkit._models.sample:

plot_histogram(channel_label_or_number, source='xform', subsample=False, bins=None, data_min=None, data_max=None, x_range=None) method of flowkit._models.sample.Sample instance
    Returns a histogram plot of the specified channel events

    :param channel_label_or_number:  A channel's PnN label or number to use
        for plotting the histogram
    :param source: 'raw', 'comp', 'xform' for whether the raw, compensated
        or transformed events are used for plotting
    :param subsample: Whether to use all events for plotting or just the
        sub-sampled events. Default is False (all events).
    :param bins: Number of bins to use for the histogram or a string compatible
        with the NumPy histogram function. If None, the number of bins is
        determined by the square root rule.
    :param data_min: filter event data, removing events below specified value
    :param data_max: filter event data, removing events above specified value
    :param x_range: Tuple of lower & upper bounds of x-axis. Used for modifying
        plot view, doesn't filter event data.
    :return: Bokeh figure of the histogram plot.

[31]:
p = sample.plot_histogram('FSC-H', source='raw')
show(p)

Changing the bin size:

[32]:
p = sample.plot_histogram('FSC-H', source='raw', bins=256)
show(p)

Change the display range:

[33]:
p = sample.plot_histogram('FSC-H', source='raw', bins=256, x_range=(10, 200))
show(p)

You can also change the data range that is used to compute the histogram, useful if there are extreme values that you would like to exclude. There are separate arguments for data_min and data_max. Let’s exclude data above 100:

[34]:
p = sample.plot_histogram('FSC-H', source='raw', bins=50, data_max=100)
show(p)

Plot Channel

The plot_channel method create a 2-D histogram of the specified channel data with the x-axis as the event index. This is similar to plotting a channel vs Time, except the events are equally distributed along the x-axis.

[35]:
help(sample.plot_channel)
Help on method plot_channel in module flowkit._models.sample:

plot_channel(channel_label_or_number, source='xform', subsample=True, color_density=True, bin_width=4, event_mask=None, highlight_mask=None, x_min=None, x_max=None, y_min=None, y_max=None) method of flowkit._models.sample.Sample instance
    Plot a 2-D histogram of the specified channel data with the x-axis as the event index.
    This is similar to plotting a channel vs Time, except the events are equally
    distributed along the x-axis.

    :param channel_label_or_number: A channel's PnN label or number
    :param source: 'raw', 'comp', 'xform' for whether the raw, compensated
        or transformed events are used for plotting
    :param subsample: Whether to use all events for plotting or just the
        sub-sampled events. Default is True (sub-sampled events). Plotting
        sub-sampled events is much faster.
    :param color_density: Whether to color the events by density, similar
        to a heat map. Default is True.
    :param bin_width: Bin size to use for the color density, in units of
        event point size. Larger values produce smoother gradients.
        Default is 4 for a 4x4 grid size.
    :param event_mask: Boolean array of events to plot. Takes precedence
        over highlight_mask (i.e. events marked False in event_mask will
        never be plotted).
    :param highlight_mask: Boolean array of event indices to highlight
        in color. Non-highlighted events will be light grey.
    :param x_min: Lower bound of x-axis. If None, channel's min value will
        be used with some padding to keep events off the edge of the plot.
    :param x_max: Upper bound of x-axis. If None, channel's max value will
        be used with some padding to keep events off the edge of the plot.
    :param y_min: Lower bound of y-axis. If None, channel's min value will
        be used with some padding to keep events off the edge of the plot.
    :param y_max: Upper bound of y-axis. If None, channel's max value will
        be used with some padding to keep events off the edge of the plot.
    :return: A Bokeh Figure object containing the interactive channel plot.

[36]:
f = sample.plot_channel('FSC-H', source='raw')
show(f)

Contour Plot

The plot_contour method uses the Kernel Density Estimate function from SciPy and is computationally intensive, so the plots can take some time to create.

[37]:
help(sample.plot_contour)
Help on method plot_contour in module flowkit._models.sample:

plot_contour(x_label_or_number, y_label_or_number, source='xform', subsample=True, plot_events=False, fill=False, x_min=None, x_max=None, y_min=None, y_max=None) method of flowkit._models.sample.Sample instance
    Returns a contour plot of the specified channel events, available
    as raw, compensated, or transformed data.

    :param x_label_or_number:  A channel's PnN label or number for x-axis
        data
    :param y_label_or_number: A channel's PnN label or number for y-axis
        data
    :param source: 'raw', 'comp', 'xform' for whether the raw, compensated
        or transformed events are used for plotting
    :param subsample: Whether to use all events for plotting or just the
        sub-sampled events. Default is True (sub-sampled events). Running
        with all events is not recommended, as the Kernel Density
        Estimation is computationally demanding.
    :param plot_events: Whether to display the event data points in
        addition to the contours. Default is False.
    :param x_min: Lower bound of x-axis. If None, channel's min value will
        be used with some padding to keep events off the edge of the plot.
    :param x_max: Upper bound of x-axis. If None, channel's max value will
        be used with some padding to keep events off the edge of the plot.
    :param y_min: Lower bound of y-axis. If None, channel's min value will
        be used with some padding to keep events off the edge of the plot.
    :param y_max: Upper bound of y-axis. If None, channel's max value will
        be used with some padding to keep events off the edge of the plot.
    :param fill: Whether to fill in color between contour lines. D default
        is False.
    :return: A Bokeh figure of the contour plot

[38]:
# by default, plot_contour uses sub-sampled events for performance
p = sample.plot_contour('FSC-H', 'SSC-H', source='raw', fill=False, plot_events=False)
[39]:
show(p)

To specify the axes ranges:

[40]:
x_min = y_min = 0
x_max = y_max = 250

p = sample.plot_contour(
    'FSC-H', 'SSC-H', source='raw', x_min=x_min, x_max=x_max, y_min=y_min, y_max=y_max
)
show(p)

Fill contours:

[41]:
p = sample.plot_contour('FSC-H', 'SSC-H', fill=True, source='raw')
show(p)

Adding events:

[42]:
p = sample.plot_contour('FSC-H', 'SSC-H', source='raw', plot_events=True)
show(p)

Scatter Plot

[43]:
p = sample.plot_scatter(
    'FSC-H', 'SSC-H',
    source='raw', y_min=0., y_max=130, x_min=0., x_max=280, color_density=True
)
[44]:
show(p)

Change the bin width to control the color density. The bin width is in units of the event point size and the default is 4 for a 4x4 grid size. Larger values will create a smoother color gradient but will lose detail. Let’s set the bin size to 8 and see how it compares.

[45]:
p = sample.plot_scatter(
    'FSC-H', 'SSC-H',
    source='raw', y_min=0., y_max=130, x_min=0., x_max=280, color_density=True, bin_width=8
)
[46]:
show(p)

Or, turn off the color density completely:

[47]:
p = sample.plot_scatter('FSC-H', 'SSC-H', source='raw', color_density=False)
show(p)

Apply a transform and plot fluorescent channels (raw and transformed)

Note: The ``transforms`` module will be covered in more detail in part 2 of the tutorial notebook series

[48]:
xform = fk.transforms.LogicleTransform('my_logicle', param_t=1024, param_w=0.5, param_m=4.5, param_a=0)
sample.apply_transform(xform)
[49]:
# source is 'raw' so not too useful for visualization
p = sample.plot_scatter('FL1-H', 'FL2-H', source='raw')
show(p)
[50]:
# change source to 'xform' to visualize the transformed data
p = sample.plot_scatter('FL1-H', 'FL2-H', source='xform')
show(p)

Highlight Specific Events

You can also highlight certain events to only apply the color density to them using a Boolean array. The density calculation is still based on all the events. Let’s highlight all events with CD3 values above 0.65.

[51]:
# cd3 is channel index 4 (channel number 5)
cd3_xform_events = sample.get_channel_events(4, source='xform')
is_high_cd3 = cd3_xform_events > 0.65
[52]:
p = sample.plot_scatter('FL3-H', 'FSC-H', source='xform', highlight_mask=is_high_cd3)
show(p)

But let’s show these events on a plot of CD4 vs CD8.

[53]:
p = sample.plot_scatter('FL1-H', 'FL2-H', source='xform', highlight_mask=is_high_cd3)
show(p)

Filter Specific Events

Or, we could just omit the events completely:

[54]:
p = sample.plot_scatter('FL1-H', 'FL2-H', source='xform', event_mask=is_high_cd3)
show(p)

Or combine the options to hide events and highlight others. Here we’ll highlight CD8 > 0.5 but hide the high CD3 events from above.

[55]:
# cd3 is channel index 4 (channel number 5)
cd8_xform_events = sample.get_channel_events(3, source='xform')
is_high_cd8 = cd8_xform_events > 0.5

p = sample.plot_scatter('FL1-H', 'FL2-H', source='xform', event_mask=is_high_cd3, highlight_mask=is_high_cd8)
show(p)

Scatterplot Matrix

Plot multiple scatterplots using the plot_scatter_matrix method. The diagonals will plot a histogram of the channel.

[56]:
help(sample.plot_scatter_matrix)
Help on method plot_scatter_matrix in module flowkit._models.sample:

plot_scatter_matrix(channel_labels_or_numbers=None, source='xform', subsample=True, event_mask=None, highlight_mask=None, color_density=False, plot_height=256, plot_width=256) method of flowkit._models.sample.Sample instance
    Returns an interactive scatter plot matrix for all channel combinations
    except for the Time channel.

    :param channel_labels_or_numbers: List of channel PnN labels or channel
        numbers to use for the scatter plot matrix. If None, then all
        channels will be plotted (except Time).
    :param source: 'raw', 'comp', 'xform' for whether the raw, compensated
        or transformed events are used for plotting
    :param subsample: Whether to use all events for plotting or just the
        sub-sampled events. Default is True (sub-sampled events). Plotting
        sub-sampled events is much faster.
    :param event_mask: Boolean array of events to plot. Takes precedence
        over highlight_mask (i.e. events marked False in event_mask will
        never be plotted).
    :param highlight_mask: Boolean array of event indices to highlight
        in color. Non-highlighted events will be light grey.
    :param color_density: Whether to color the events by density, similar
        to a heat map. Default is False.
    :param plot_height: Height of plot in pixels (screen units)
    :param plot_width: Width of plot in pixels (screen units)
    :return: A Bokeh Figure object containing the interactive scatter plot
        matrix.

[57]:
# For the scatter matrix, sub-sampling is usually a good idea since there are so many plots
spm = sample.plot_scatter_matrix(
    source='xform',
    channel_labels_or_numbers=['FSC-H', 'SSC-H', 'FL3-H', 'FL4-H'],
    color_density=True
)
show(spm)

Exporting Events

The export method exports the event data to either a new FCS file or a CSV file, with the format determined by filename extension (either ‘.fcs’ or ‘.csv’). Extra options are available for excluding certain events (negative scatter, flagged, subsample) from the exported file.

[58]:
help(sample.export)
Help on method export in module flowkit._models.sample:

export(filename, source='xform', exclude_neg_scatter=False, exclude_flagged=False, exclude_normal=False, subsample=False, include_metadata=False, directory=None) method of flowkit._models.sample.Sample instance
    Export Sample event data to either a new FCS file or a CSV file. Format determined by filename extension.

    :param filename: Text string to use for the exported file name. File type is determined by
        the filename extension (supported types are .fcs & .csv).
    :param source: 'orig', 'raw', 'comp', 'xform' for whether the original (no gain applied),
        raw (orig + gain), compensated (raw + comp), or transformed (comp + xform) events  are
        used for exporting
    :param exclude_neg_scatter: Whether to exclude negative scatter events. Default is False.
    :param exclude_flagged: Whether to exclude flagged events. Default is False.
    :param exclude_normal: Whether to exclude "normal" events. This is useful for retrieving all
         the "bad" events (neg scatter and/or flagged events). Default is False.
    :param subsample: Whether to export all events or just the sub-sampled events.
        Default is False (all events).
    :param include_metadata: Whether to include all key/value pairs from the metadata attribute
        in the output FCS file. Only valid for .fcs file extension. If False, only the minimum
        amount of metadata will be included in the output FCS file. Default is False.
    :param directory: Directory path where the exported file will be saved. If None, the file
        will be saved in the current working directory.
    :return: None

[ ]: