FlowKit Tutorial - Part 1 - The `Sample` Class

https://flowkit.readthedocs.io/en/latest/?badge=latest

Welcome to the series of FlowKit tutorial notebooks! I hope you find these tutorials a helpful guide to using FlowKit for your FCM analysis. Part 1 covers the Sample class, the foundational class on which most of FlowKit is built. If you have any questions about FlowKit, find any bugs, or feel something is missing from these tutorials please submit an issue to the GitHub repository here.

Table of Contents

Sample Class
- Create a Sample Instance
  - From an FCS File
  - From a pandas DataFrame or NumPy array
- Metadata & Channel Information
- Subsampling
- Retrieving Events
  - Retrieve as NumPy array
  - Retrieve as pandas DataFrame
- Plotting Sample Events
  - Histogram
  - Plot Channel
  - Contour Plot
  - Scatter Plot
  - Scatterplot Matrix
- Exporting Events

[1]:

import bokeh
from bokeh.plotting import show

import flowkit as fk

bokeh.io.output_notebook()

Loading BokehJS ...

[2]:

# check version so users can verify they have the same version/API
fk.__version__

[2]:

'1.1.0'

Sample Class

A Sample instance represents a single FCS sample, and is the only point of entry for FCS event data into the FlowKit library.

A Sample object can conveniently be created from a variety of data sources:

A file path to an FCS file
A pathlib Path object to an FCS file
An already instantiated FlowIO FlowData object
A NumPy array (must provide sample_id & channel_labels)
A Pandas DataFrame (with channel labels as headers, must provide sample_id)

Let’s take a look at the Sample constructor method:

Sample(
    fcs_path_or_data,
    sample_id=None,
    channel_labels=None,
    compensation=None,
    null_channel_list=None,
    ignore_offset_error=False,
    ignore_offset_discrepancy=False,
    use_header_offsets=False,
    cache_original_events=False,
    subsample=10000
)

fcs_path_or_data: a data source for the FCS sample as described above
sample_id: A text string to use for the Sample’s ID. If None, the ID will be taken from the ‘fil’ keyword of the metadata. If the ‘fil’ keyword is not present, the value will be the filename if given a file. For a NumPy array or Pandas DataFrame, a text value is required.
channel_labels: A list of strings or a list of tuples to use for the channel labels. Required if fcs_path_or_data is a NumPy array
compensation: Compensation matrix, which can be a:
- Matrix instance
- NumPy array
- CSV file path
- pathlib Path object to a CSV or TSV file
- string of CSV text
- None (default) for no compensation (it can be applied later via the apply_comensation method)
null_channel_list: List of PnN labels for acquired channels that do not contain useful data. Note, this should only be used if no fluorochromes were used to target those detectors. Null channels do not contribute to compensation and should not be included in a compensation matrix for this sample.
ignore_offset_error: An option to ignore data offset error (see note below for more details)
ignore_offset_discrepancy: option to ignore discrepancy between the HEADER and TEXT values for the DATA byte offset location, default is False
use_header_offsets: use the HEADER section for the data offset locations, default is False. Setting this option to True also suppresses an error in cases of an offset discrepancy.
cache_original_events: Original events are the unprocessed events as stored in the FCS binary, meaning they have not been scaled according to channel gain, corrected for proper lin/log display, or had the time channel scaled by the ‘timestep’ keyword value (if present). By default, these events are not retained by the Sample class as they are typically not useful. To retrieve the original events, set this to True and call get_events(source=’orig’).
subsample: The number of events to use for sub-sampling. The number of sub-sampled events can be changed after instantiation using the subsample_events method. The random seed can also be specified using that method. Sub-sampled events are used predominantly for speeding up plotting methods.

Note about FCS files with a data offset error:

Some FCS files incorrectly report the location of the last data byte as the last byte exclusive of the data section rather than the last byte inclusive of the data section. Technically, these are invalid FCS files but these are not corrupted data files. To attempt to read in these files, set the ignore_offset_error option to True.

Note on ``ignore_offset_discrepancy`` and ``use_header_offset``: The byte offset location for the DATA segment is defined in 2 places in an FCS file: the HEADER and the TEXT segments. By default, FlowIO uses the offset values found in the TEXT segment. If the HEADER values differ from the TEXT values, a DataOffsetDiscrepancyError will be raised. This option allows overriding this error to force the loading of the FCS file. The related use_header_offset can be used to force loading the file using the data offset locations found in the HEADER section rather than the TEXT section. Setting use_header_offset to True is equivalent to setting both options to True, meaning no error will be raised for an offset discrepancy.

Event type names in the Sample class

Several methods in the Sample class include a source argument that determines the type of events used or retrieved by the method. The options for the source argument are:

orig
raw
comp
xform

The orig option (for original events) is only available if the Sample object was instantiated with cache_original_events set to True. The original events are the events exactly as they were encoded in the FCS file, without applying the gain (from the $PnG keywords) or the time step from the FCS metadata. The original events are not typically useful for any processing.

The raw option specifies the event data that has been correctly pre-processed according the the channel gain and time step information.

The comp option specifies the event data as the raw data with a compensation matrix applied. These events are only available if a compensation matrix was specified in the Sample object instantiation or if the apply_compensation method has been called.

The xform option specifies transformed events. Transformed events will be stored post-compensation if a compensation matrix was supplied when creating a Sample instance or if the apply_compensation method has been called. Transformations can be also be applied to a non-compensated Sample.

Applying compensation and transforms is covered in part 2 of the tutorial notebook series.

Information is also available via the Python help function, along with descriptions of the Sample class methods:

help(fk.Sample)

Create a Sample Instance

As stated above in the Sample docstring, a Sample instance can be created from a variety of data sources:

File path to an FCS file
pathlib Path object to an FCS file
FlowIO FlowData object
NumPy array (must provide sample_id & channel_labels)
Pandas DataFrame (with channel labels as headers, must provide sample_id)

From an FCS File

Let’s create a Sample instance from a file path to an FCS file.

[3]:

fcs_path = '../../data/gate_ref/data1.fcs'

[4]:

sample = fk.Sample(fcs_path)

[5]:

sample

[5]:

Sample(v2.0, B07, 8 channels, 13367 events)

The string representation tells us this is an FCS 2.0 file with the ‘$FIL’ keyword value of ‘B07’. There are 8 channels of event data with 13,367 total events.

From a pandas DataFrame or NumPy array

A Sample can also be created from a pandas DataFrame or a NumPy array. In both cases, a sample ID must be provided. For FCS files, this ID is read from the metadata, and without an ID other features of FlowKit will have no mechanism to reference a Sample.

Let’s get a DataFrame from the previous Sample we made. We’ll cover more on exporting events later in this tutorial, but for now we’ll just use the as_dataframe method. Then, we can use it as an example of creating a new Sample instance.

[6]:

df_events = sample.as_dataframe(source='raw')

[7]:

df_events.head()

[7]:

pnn	FSC-H	SSC-H	FL1-H	FL2-H	FL3-H	FL2-A	FL4-H	Time
pns	FSC-Height	SSC-Height	CD4 FITC	CD8 B PE	CD3 PerCP		CD8 APC	Time (102.40 sec.)
0	88.010899	27.250	7.233942	34.598917	11.039992	5.0	5.186134	0.0
1	19.073569	5.375	36.517413	1.000000	170.007762	0.0	4.293510	0.0
2	70.572207	26.000	2.480454	12.863969	3.023213	0.0	8.582104	0.0
3	98.910082	31.750	5.473703	14.989296	3.751619	0.0	6.731704	0.0
4	29.972752	34.750	2.641648	2.665516	2.641648	0.0	6.097562	0.0

[8]:

sample_from_df = fk.Sample(df_events, sample_id='my_sample_from_dataframe')

[9]:

sample_from_df

[9]:

Sample(v3.1, my_sample_from_dataframe, 8 channels, 13367 events)

Creating a Sample from a NumPy array is similar, but we must also provide the channel names. In the case of the DataFrame above, the channel names were taken from the column names.

[10]:

np_events = sample.get_events(source='raw')
channel_labels = sample.pnn_labels

[11]:

sample_from_np = fk.Sample(np_events, channel_labels=channel_labels, sample_id='my_sample_from_numpy')

[12]:

sample_from_np

[12]:

Sample(v3.1, my_sample_from_numpy, 8 channels, 13367 events)

Metadata and Channel Information

Get the FCS version of the file (returns None if a Sample was created from a NumPy array or Pandas DataFrame)

[13]:

sample.version

[13]:

'2.0'

Retrieve all the FCS metadata:

[14]:

sample.get_metadata()

[14]:

{'byteord': '4,3,2,1',
 'datatype': 'I',
 'nextdata': '0',
 'sys': 'Macintosh System Software 9.0.4',
 'creator': 'CELLQuestª 3.3',
 'tot': '13367',
 'mode': 'L',
 'par': '8',
 'p1n': 'FSC-H',
 'p1r': '1024',
 'p1b': '16',
 'p1e': '0,0',
 'p1g': '3.67',
 'p2n': 'SSC-H',
 'p2r': '1024',
 'p2b': '16',
 'p2e': '0,0',
 'p2g': '8',
 'p3n': 'FL1-H',
 'p3r': '1024',
 'p3b': '16',
 'p3e': '4,0',
 'p4n': 'FL2-H',
 'p4r': '1024',
 'p4b': '16',
 'p4e': '4,0',
 'p5n': 'FL3-H',
 'p5r': '1024',
 'p5b': '16',
 'p5e': '4,0',
 'p1s': 'FSC-Height',
 'p2s': 'SSC-Height',
 'p3s': 'CD4 FITC',
 'p4s': 'CD8 B PE',
 'p5s': 'CD3 PerCP',
 'p6n': 'FL2-A',
 'p6r': '1024',
 'p6b': '16',
 'p6e': '0,0',
 'timeticks': '100',
 'p7n': 'FL4-H',
 'p7r': '1024',
 'p7e': '4,0',
 'p7b': '16',
 'p7s': 'CD8 APC',
 'p8n': 'Time',
 'p8r': '1024',
 'p8e': '0,0',
 'p8b': '16',
 'p8s': 'Time (102.40 sec.)',
 'sample id': 'Default Patient ID',
 'src': 'Default',
 'case number': 'Default Case Number',
 'cyt': 'FACSCalibur',
 'cytnum': 'E3820',
 'btim': '16:31:33',
 'etim': '16:31:52',
 'bdacqlibversion': '3.1',
 'bdnpar': '7',
 'bdp1n': 'FSC-H',
 'bdp2n': 'SSC-H',
 'bdp3n': 'FL1-H',
 'bdp4n': 'FL2-H',
 'bdp5n': 'FL3-H',
 'bdp6n': 'FL2-A',
 'bdp7n': 'FL4-H',
 'bdword0': '24',
 'bdword1': '394',
 'bdword2': '492',
 'bdword3': '477',
 'bdword4': '566',
 'bdword5': '397',
 'bdword6': '397',
 'bdword7': '397',
 'bdword8': '398',
 'bdword9': '397',
 'bdword10': '300',
 'bdword11': '299',
 'bdword12': '551',
 'bdword13': '4',
 'bdword14': '397',
 'bdword15': '501',
 'bdword16': '481',
 'bdword17': '586',
 'bdword18': '574',
 'bdword19': '100',
 'bdword20': '100',
 'bdword21': '100',
 'bdword22': '100',
 'bdword23': '1',
 'bdword24': '1',
 'bdword25': '0',
 'bdword26': '0',
 'bdword27': '0',
 'bdword28': '136',
 'bdword29': '52',
 'bdword30': '52',
 'bdword31': '52',
 'bdword32': '52',
 'bdword33': '52',
 'bdword34': '12',
 'bdword35': '201',
 'bdword36': '6',
 'bdword37': '138',
 'bdword38': '280',
 'bdword39': '3',
 'bdword40': '3',
 'bdword41': '100',
 'bdword42': '100',
 'bdword43': '0',
 'bdword44': '1023',
 'bdword45': '1023',
 'bdword46': '1023',
 'bdword47': '53',
 'bdword48': '550',
 'bdword49': '56',
 'bdword50': '72',
 'bdword51': '52',
 'bdword52': '0',
 'bdword53': '0',
 'bdword54': '0',
 'bdword55': '0',
 'bdword56': '0',
 'bdword57': '0',
 'bdword58': '0',
 'bdword59': '0',
 'bdword60': '0',
 'bdword61': '0',
 'bdword62': '0',
 'bdword63': '0',
 'bdlasermode': '1',
 'calibfile': 'FALSE',
 'p7thresvol': '52',
 'fil': 'B07',
 'date': '23-Aug-02',
 'number well info keywords': '3',
 '&1sample': '200',
 '&2number of washes': '1',
 '&3mixing vol': '100',
 '&4number of mixes': '2',
 '&5data file prefix part #1\\\\&6data file prefix part #2\\\\&7data file prefix part #3\\\\&8acquisition doc.': 'LYMPH SUBSET ACQ',
 '&9instr. sett. file': 'E#7 Settings #1',
 '&10patient id': ' FJ#192659',
 '&11day': '35d',
 '&12sample id': 'T-cells',
 '&13analysis doc.': ''}

Retrieve a DataFrame of channel information, including the required PnN labels & optional PnS labels for each channel. Note that Samples distinguish between channel numbers and indices, with channel numbers being indexed at 1 and channel indices being indexed at 0.

[15]:

sample.channels

[15]:

	channel_number	pnn	pns	png	pne	pnr
0	1	FSC-H	FSC-Height	3.67	(0.0, 0.0)	1024.0
1	2	SSC-H	SSC-Height	8.00	(0.0, 0.0)	1024.0
2	3	FL1-H	CD4 FITC	1.00	(4.0, 1.0)	1024.0
3	4	FL2-H	CD8 B PE	1.00	(4.0, 1.0)	1024.0
4	5	FL3-H	CD3 PerCP	1.00	(4.0, 1.0)	1024.0
5	6	FL2-A		1.00	(0.0, 0.0)	1024.0
6	7	FL4-H	CD8 APC	1.00	(4.0, 1.0)	1024.0
7	8	Time	Time (102.40 sec.)	1.00	(0.0, 0.0)	1024.0

Get a list of only the PnN labels:

[16]:

sample.pnn_labels

[16]:

['FSC-H', 'SSC-H', 'FL1-H', 'FL2-H', 'FL3-H', 'FL2-A', 'FL4-H', 'Time']

The optional PnS labels are also available (empty values will be empty strings):

[17]:

sample.pns_labels

[17]:

['FSC-Height',
 'SSC-Height',
 'CD4 FITC',
 'CD8 B PE',
 'CD3 PerCP',
 '',
 'CD8 APC',
 'Time (102.40 sec.)']

The Sample class attempts to automatically identify fluorescent, scatter, and time channels. This is done by looking for simple sub-string values in channel names (i.e. ‘FSC’, ‘SSC’, ‘Time’). The channel indices for these channel types are available using the following attributes:

[18]:

sample.fluoro_indices

[18]:

[2, 3, 4, 5, 6]

[19]:

sample.scatter_indices

[19]:

[0, 1]

[20]:

sample.time_index

[20]:

Lookup a channel index by a label string:

[21]:

sample.get_channel_index('FL2-H')

[21]:

Or, lookup a channel number:

[22]:

sample.get_channel_number_by_label('FL2-H')

[22]:

And for completeness, get a channel index via its number:

[23]:

sample.get_channel_index(4)

[23]:

To get the event count:

[24]:

sample.event_count

[24]:

Several other Sample attributes are available including:

id
original_filename
acquisition_date
compensation
transform

And a few attributes for filtering events:

subsample_indices

Set by constructor or by calling the subsample_events method (see below).
negative_scatter_indices

Set by calling the filter_negative_scatter method.
flagged_indices

Assigned manually by the user for flagging any events (anomolous events from a QC routine, etc.)

Subsampling

FlowKit is optimized for performance (or attempts to be!). However, when dealing with high-dimensional flow cytometry data and/or data containing millions of events, it can be useful to sub-sample events to speed up processing. This is especially true when trying to plot events. All Sample plot methods, except for plot_histogram, use sub-sampled events by default (we’ll see these methods in the next section).

On instantiation, the number of sub-sampled events can be specified (default is 10,000). The Sample class also provides a subsample_events method to change the number of sub-sample events or the random seed used to generate the sub-sample. The sub-sample is drawn randomly, but in a reproducible way. You are guaranteed the same sub-sample indices when re-running analysis by providing the same random seed as an argument (default seed is 1). We can retrieve the subsampled indices using the subsample_indices attribute.

Note that sub-sampling does not delete any events, the sub-sampled indices are simply stored and used as a subset of events. Any Sample class method that plots or retrieves events will have a subsample argument that takes a Boolean value specifying whether to use the sub-sampled events or all events. Any method that processes events (compensating or transforming) will always use all the events.

Retrieving sub-sampled indices

Our Sample instance was already sub-sampled by default (at 10000 events) when creating the Sample. Note in the output below, the sub-sampled indices are not only randomly selected (in a reproducible way via the random seed), but are also randomly shuffled. This allows for safer sub-sampling the sub-sample if even fewer events are needed.

[25]:

sample.subsample_indices

[25]:

array([ 4136, 12180, 11048, ...,  9661, 10709, 10093])

Retrieving Events

Several methods are available in the Sample class for convenient retrieval event data, and in a variety of forms.

Retrieve Events as NumPy Array

The Sample methods get_events and get_channel_events return event data as a NumPy array. Both methods have similar input arguments, including the already familiar source and subsample arguments for specifying the event class (‘orig’, ‘raw’, ‘comp’, or ‘xform’) and whether to return the sub-sampled events or all events.

Note: These methods return the arrays directly, not a copy of the array. Be careful if you are planning to modify the returned event data, and make a copy of the array when appropriate.

[26]:

help(sample.get_events)

Help on method get_events in module flowkit._models.sample:

get_events(source='xform', subsample=False) method of flowkit._models.sample.Sample instance
    Returns a NumPy array of event data.

    Note: This method returns the array directly, not a copy of the array. Be careful if you
    are planning to modify returned event data, and make a copy of the array when appropriate.

    :param source: 'orig', 'raw', 'comp', 'xform' for whether the original (no gain applied),
        raw (orig + gain), compensated (raw + comp), or transformed (comp + xform) events will
        be returned
    :param subsample: Whether to return all events or just the sub-sampled
        events. Default is False (all events)
    :return: NumPy array of event data

[27]:

sample.get_events(source='raw')

[27]:

array([[ 88.01089918,  27.25      ,   7.23394163, ...,   5.        ,
          5.18613419,   0.        ],
       [ 19.07356948,   5.375     ,  36.51741273, ...,   0.        ,
          4.29351021,   0.        ],
       [ 70.57220708,  26.        ,   2.48045441, ...,   0.        ,
          8.58210354,   0.        ],
       ...,
       [ 62.1253406 ,  27.625     ,  11.75743266, ...,   0.        ,
          1.77827941, 174.        ],
       [ 36.23978202,  64.5       ,   5.42469094, ...,   0.        ,
          4.95806824, 174.        ],
       [ 66.48501362,   8.75      ,   1.43301257, ...,   0.        ,
          6.0429639 , 174.        ]])

[28]:

# The Sample method `get_channel_events` takes a `channel_index` argument.
# Note the channel index (indexed at zero) and not the channel number.
channel_idx = sample.get_channel_index('FSC-H')
sample.get_channel_events(channel_idx, source='raw')

[28]:

array([88.01089918, 19.07356948, 70.57220708, ..., 62.1253406 ,
       36.23978202, 66.48501362])

Retrieve Events as pandas DataFrame

The Sample method as_dataframe returns a pandas DataFrame of the Sample event data. This method also supports source and subsample arguments, but includes the extra arguments col_order and col_names for choosing the order of columns by PnN label and/or specifying new names for the columns in the returned DataFrame.

[29]:

sample.as_dataframe(source='raw')

[29]:

pnn	FSC-H	SSC-H	FL1-H	FL2-H	FL3-H	FL2-A	FL4-H	Time
pns	FSC-Height	SSC-Height	CD4 FITC	CD8 B PE	CD3 PerCP		CD8 APC	Time (102.40 sec.)
0	88.010899	27.250	7.233942	34.598917	11.039992	5.0	5.186134	0.0
1	19.073569	5.375	36.517413	1.000000	170.007762	0.0	4.293510	0.0
2	70.572207	26.000	2.480454	12.863969	3.023213	0.0	8.582104	0.0
3	98.910082	31.750	5.473703	14.989296	3.751619	0.0	6.731704	0.0
4	29.972752	34.750	2.641648	2.665516	2.641648	0.0	6.097562	0.0
...	...	...	...	...	...	...	...	...
13362	66.212534	19.625	6.915821	14.330126	2.617995	0.0	5.882084	174.0
13363	70.844687	25.500	7.986266	20.169146	3.522695	0.0	6.731704	174.0
13364	62.125341	27.625	11.757433	22.266720	2.665516	0.0	1.778279	174.0
13365	36.239782	64.500	5.424691	5.327979	7.299301	0.0	4.958068	174.0
13366	66.485014	8.750	1.433013	1.154782	1.218814	0.0	6.042964	174.0

13367 rows × 8 columns

Plotting Sample Events

Histogram
Contour Plot
Interactive Scatter Plot
Interactive Scatter Plot Matrix

All plotting methods return a Bokeh figure instance. This is done so the caller can modify and/or display the plot as required.

Histogram

[30]:

help(sample.plot_histogram)

Help on method plot_histogram in module flowkit._models.sample:

plot_histogram(channel_label_or_number, source='xform', subsample=False, bins=None, data_min=None, data_max=None, x_range=None) method of flowkit._models.sample.Sample instance
    Returns a histogram plot of the specified channel events

    :param channel_label_or_number:  A channel's PnN label or number to use
        for plotting the histogram
    :param source: 'raw', 'comp', 'xform' for whether the raw, compensated
        or transformed events are used for plotting
    :param subsample: Whether to use all events for plotting or just the
        sub-sampled events. Default is False (all events).
    :param bins: Number of bins to use for the histogram or a string compatible
        with the NumPy histogram function. If None, the number of bins is
        determined by the square root rule.
    :param data_min: filter event data, removing events below specified value
    :param data_max: filter event data, removing events above specified value
    :param x_range: Tuple of lower & upper bounds of x-axis. Used for modifying
        plot view, doesn't filter event data.
    :return: Bokeh figure of the histogram plot.

[31]:

p = sample.plot_histogram('FSC-H', source='raw')
show(p)

Changing the bin size:

[32]:

p = sample.plot_histogram('FSC-H', source='raw', bins=256)
show(p)

Change the display range:

[33]:

p = sample.plot_histogram('FSC-H', source='raw', bins=256, x_range=(10, 200))
show(p)

You can also change the data range that is used to compute the histogram, useful if there are extreme values that you would like to exclude. There are separate arguments for data_min and data_max. Let’s exclude data above 100:

[34]:

p = sample.plot_histogram('FSC-H', source='raw', bins=50, data_max=100)
show(p)

Plot Channel

The plot_channel method create a 2-D histogram of the specified channel data with the x-axis as the event index. This is similar to plotting a channel vs Time, except the events are equally distributed along the x-axis.

[35]:

help(sample.plot_channel)

Help on method plot_channel in module flowkit._models.sample:

plot_channel(channel_label_or_number, source='xform', subsample=True, color_density=True, bin_width=4, event_mask=None, highlight_mask=None, x_min=None, x_max=None, y_min=None, y_max=None) method of flowkit._models.sample.Sample instance
Plot a 2-D histogram of the specified channel data with the x-axis as the event index.
This is similar to plotting a channel vs Time, except the events are equally
distributed along the x-axis.

:param channel_label_or_number: A channel's PnN label or number
:param source: 'raw', 'comp', 'xform' for whether the raw, compensated
or transformed events are used for plotting
:param subsample: Whether to use all events for plotting or just the
sub-sampled events. Default is True (sub-sampled events). Plotting
sub-sampled events is much faster.
:param color_density: Whether to color the events by density, similar
to a heat map. Default is True.
:param bin_width: Bin size to use for the color density, in units of
event point size. Larger values produce smoother gradients.
Default is 4 for a 4x4 grid size.
:param event_mask: Boolean array of events to plot. Takes precedence
over highlight_mask (i.e. events marked False in event_mask will
never be plotted).
:param highlight_mask: Boolean array of event indices to highlight
in color. Non-highlighted events will be light grey.
:param x_min: Lower bound of x-axis. If None, channel's min value will
be used with some padding to keep events off the edge of the plot.
:param x_max: Upper bound of x-axis. If None, channel's max value will
be used with some padding to keep events off the edge of the plot.
:param y_min: Lower bound of y-axis. If None, channel's min value will
be used with some padding to keep events off the edge of the plot.
:param y_max: Upper bound of y-axis. If None, channel's max value will
be used with some padding to keep events off the edge of the plot.
:return: A Bokeh Figure object containing the interactive channel plot.

[36]:

f = sample.plot_channel('FSC-H', source='raw')
show(f)

Contour Plot

The plot_contour method uses the Kernel Density Estimate function from SciPy and is computationally intensive, so the plots can take some time to create.

[37]:

help(sample.plot_contour)

Help on method plot_contour in module flowkit._models.sample:

plot_contour(x_label_or_number, y_label_or_number, source='xform', subsample=True, plot_events=False, fill=False, x_min=None, x_max=None, y_min=None, y_max=None) method of flowkit._models.sample.Sample instance
Returns a contour plot of the specified channel events, available
as raw, compensated, or transformed data.

:param x_label_or_number: A channel's PnN label or number for x-axis
data
:param y_label_or_number: A channel's PnN label or number for y-axis
data
:param source: 'raw', 'comp', 'xform' for whether the raw, compensated
or transformed events are used for plotting
:param subsample: Whether to use all events for plotting or just the
sub-sampled events. Default is True (sub-sampled events). Running
with all events is not recommended, as the Kernel Density
Estimation is computationally demanding.
:param plot_events: Whether to display the event data points in
addition to the contours. Default is False.
:param x_min: Lower bound of x-axis. If None, channel's min value will
be used with some padding to keep events off the edge of the plot.
:param x_max: Upper bound of x-axis. If None, channel's max value will
be used with some padding to keep events off the edge of the plot.
:param y_min: Lower bound of y-axis. If None, channel's min value will
be used with some padding to keep events off the edge of the plot.
:param y_max: Upper bound of y-axis. If None, channel's max value will
be used with some padding to keep events off the edge of the plot.
:param fill: Whether to fill in color between contour lines. D default
is False.
:return: A Bokeh figure of the contour plot

[38]:

# by default, plot_contour uses sub-sampled events for performance
p = sample.plot_contour('FSC-H', 'SSC-H', source='raw', fill=False, plot_events=False)

[39]:

show(p)

To specify the axes ranges:

[40]:

x_min = y_min = 0
x_max = y_max = 250

p = sample.plot_contour(
    'FSC-H', 'SSC-H', source='raw', x_min=x_min, x_max=x_max, y_min=y_min, y_max=y_max
)
show(p)

Fill contours:

[41]:

p = sample.plot_contour('FSC-H', 'SSC-H', fill=True, source='raw')
show(p)

Adding events:

[42]:

p = sample.plot_contour('FSC-H', 'SSC-H', source='raw', plot_events=True)
show(p)

Scatter Plot

[43]:

p = sample.plot_scatter(
    'FSC-H', 'SSC-H',
    source='raw', y_min=0., y_max=130, x_min=0., x_max=280, color_density=True
)

[44]:

show(p)

Change the bin width to control the color density. The bin width is in units of the event point size and the default is 4 for a 4x4 grid size. Larger values will create a smoother color gradient but will lose detail. Let’s set the bin size to 8 and see how it compares.

[45]:

p = sample.plot_scatter(
    'FSC-H', 'SSC-H',
    source='raw', y_min=0., y_max=130, x_min=0., x_max=280, color_density=True, bin_width=8
)

[46]:

show(p)

Or, turn off the color density completely:

[47]:

p = sample.plot_scatter('FSC-H', 'SSC-H', source='raw', color_density=False)
show(p)

Apply a transform and plot fluorescent channels (raw and transformed)

Note: The ``transforms`` module will be covered in more detail in part 2 of the tutorial notebook series

[48]:

xform = fk.transforms.LogicleTransform('my_logicle', param_t=1024, param_w=0.5, param_m=4.5, param_a=0)
sample.apply_transform(xform)

[49]:

# source is 'raw' so not too useful for visualization
p = sample.plot_scatter('FL1-H', 'FL2-H', source='raw')
show(p)

[50]:

# change source to 'xform' to visualize the transformed data
p = sample.plot_scatter('FL1-H', 'FL2-H', source='xform')
show(p)

Highlight Specific Events

You can also highlight certain events to only apply the color density to them using a Boolean array. The density calculation is still based on all the events. Let’s highlight all events with CD3 values above 0.65.

[51]:

# cd3 is channel index 4 (channel number 5)
cd3_xform_events = sample.get_channel_events(4, source='xform')
is_high_cd3 = cd3_xform_events > 0.65

[52]:

p = sample.plot_scatter('FL3-H', 'FSC-H', source='xform', highlight_mask=is_high_cd3)
show(p)

But let’s show these events on a plot of CD4 vs CD8.

[53]:

p = sample.plot_scatter('FL1-H', 'FL2-H', source='xform', highlight_mask=is_high_cd3)
show(p)

Filter Specific Events

Or, we could just omit the events completely:

[54]:

p = sample.plot_scatter('FL1-H', 'FL2-H', source='xform', event_mask=is_high_cd3)
show(p)

Or combine the options to hide events and highlight others. Here we’ll highlight CD8 > 0.5 but hide the high CD3 events from above.

[55]:

# cd3 is channel index 4 (channel number 5)
cd8_xform_events = sample.get_channel_events(3, source='xform')
is_high_cd8 = cd8_xform_events > 0.5

p = sample.plot_scatter('FL1-H', 'FL2-H', source='xform', event_mask=is_high_cd3, highlight_mask=is_high_cd8)
show(p)

Scatterplot Matrix

Plot multiple scatterplots using the plot_scatter_matrix method. The diagonals will plot a histogram of the channel.

[56]:

help(sample.plot_scatter_matrix)

Help on method plot_scatter_matrix in module flowkit._models.sample:

plot_scatter_matrix(channel_labels_or_numbers=None, source='xform', subsample=True, event_mask=None, highlight_mask=None, color_density=False, plot_height=256, plot_width=256) method of flowkit._models.sample.Sample instance
Returns an interactive scatter plot matrix for all channel combinations
except for the Time channel.

:param channel_labels_or_numbers: List of channel PnN labels or channel
numbers to use for the scatter plot matrix. If None, then all
channels will be plotted (except Time).
:param source: 'raw', 'comp', 'xform' for whether the raw, compensated
or transformed events are used for plotting
:param subsample: Whether to use all events for plotting or just the
sub-sampled events. Default is True (sub-sampled events). Plotting
sub-sampled events is much faster.
:param event_mask: Boolean array of events to plot. Takes precedence
over highlight_mask (i.e. events marked False in event_mask will
never be plotted).
:param highlight_mask: Boolean array of event indices to highlight
in color. Non-highlighted events will be light grey.
:param color_density: Whether to color the events by density, similar
to a heat map. Default is False.
:param plot_height: Height of plot in pixels (screen units)
:param plot_width: Width of plot in pixels (screen units)
:return: A Bokeh Figure object containing the interactive scatter plot
matrix.

[57]:

# For the scatter matrix, sub-sampling is usually a good idea since there are so many plots
spm = sample.plot_scatter_matrix(
    source='xform',
    channel_labels_or_numbers=['FSC-H', 'SSC-H', 'FL3-H', 'FL4-H'],
    color_density=True
)
show(spm)

Exporting Events

The export method exports the event data to either a new FCS file or a CSV file, with the format determined by filename extension (either ‘.fcs’ or ‘.csv’). Extra options are available for excluding certain events (negative scatter, flagged, subsample) from the exported file.

[58]:

help(sample.export)

Help on method export in module flowkit._models.sample:

export(filename, source='xform', exclude_neg_scatter=False, exclude_flagged=False, exclude_normal=False, subsample=False, include_metadata=False, directory=None) method of flowkit._models.sample.Sample instance
Export Sample event data to either a new FCS file or a CSV file. Format determined by filename extension.

:param filename: Text string to use for the exported file name. File type is determined by
the filename extension (supported types are .fcs & .csv).
:param source: 'orig', 'raw', 'comp', 'xform' for whether the original (no gain applied),
raw (orig + gain), compensated (raw + comp), or transformed (comp + xform) events are
used for exporting
:param exclude_neg_scatter: Whether to exclude negative scatter events. Default is False.
:param exclude_flagged: Whether to exclude flagged events. Default is False.
:param exclude_normal: Whether to exclude "normal" events. This is useful for retrieving all
the "bad" events (neg scatter and/or flagged events). Default is False.
:param subsample: Whether to export all events or just the sub-sampled events.
Default is False (all events).
:param include_metadata: Whether to include all key/value pairs from the metadata attribute
in the output FCS file. Only valid for .fcs file extension. If False, only the minimum
amount of metadata will be included in the output FCS file. Default is False.
:param directory: Directory path where the exported file will be saved. If None, the file
will be saved in the current working directory.
:return: None

[ ]:

FlowKit Tutorial - Part 1 - The Sample Class