Sample Class

class flowkit.Sample(fcs_path_or_data, sample_id=None, channel_labels=None, compensation=None, null_channel_list=None, ignore_offset_error=False, ignore_offset_discrepancy=False, use_header_offsets=False, cache_original_events=False, subsample=10000)

Represents a single FCS sample from an FCS file, NumPy array or pandas DataFrame.

For Sample plot methods, pay attention to the defaults for the subsample arguments, as most will use the sub-sampled events by default for better performance. For compensation and transformation routines, all events are always processed.

Note on ignore_offset_error:

Some FCS files incorrectly report the location of the last data byte as the last byte exclusive of the data section rather than the last byte inclusive of the data section. Technically, these are invalid FCS files but these are not corrupted data files. To attempt to read in these files, set the ignore_offset_error option to True.

Note on ignore_offset_discrepancy and use_header_offset:

The byte offset location for the DATA segment is defined in 2 places in an FCS file: the HEADER and the TEXT segments. By default, FlowIO uses the offset values found in the TEXT segment. If the HEADER values differ from the TEXT values, a DataOffsetDiscrepancyError will be raised. This option allows overriding this error to force the loading of the FCS file. The related use_header_offset can be used to force loading the file using the data offset locations found in the HEADER section rather than the TEXT section. Setting use_header_offset to True is equivalent to setting both options to True, meaning no error will be raised for an offset discrepancy.

Parameters:
  • fcs_path_or_data

    FCS data, can be either:

    • a file path or file handle to an FCS file

    • a pathlib Path object

    • a FlowIO FlowData object

    • a NumPy array of FCS event data (must provide sample_id & channel_labels)

    • a pandas DataFrame containing FCS event data (channel labels as column labels, must provide sample_id)

  • sample_id – A text string to use for the Sample’s ID. If None, the ID will be taken from the ‘fil’ keyword of the metadata. If the ‘fil’ keyword is not present, the value will be the filename if given a file. For a NumPy array or Pandas DataFrame, a text value is required.

  • channel_labels – A list of strings or a list of tuples to use for the channel labels. Required if fcs_path_or_data is a NumPy array

  • compensation

    Compensation matrix, which can be a:

    • Matrix instance

    • NumPy array

    • CSV file path

    • pathlib Path object to a CSV or TSV file

    • string of CSV text

  • null_channel_list – List of PnN labels for acquired channels that do not contain useful data. Note, this should only be used if no fluorochromes were used to target those detectors. Null channels do not contribute to compensation and should not be included in a compensation matrix for this sample.

  • ignore_offset_error – option to ignore data offset error (see above note), default is False

  • ignore_offset_discrepancy – option to ignore discrepancy between the HEADER and TEXT values for the DATA byte offset location, default is False

  • use_header_offsets – use the HEADER section for the data offset locations, default is False. Setting this option to True also suppresses an error in cases of an offset discrepancy.

  • cache_original_events – Original events are the unprocessed events as stored in the FCS binary, meaning they have not been scaled according to channel gain, corrected for proper lin/log display, or had the time channel scaled by the ‘timestep’ keyword value (if present). By default, these events are not retained by the Sample class as they are typically not useful. To retrieve the original events, set this to True and call the get_events() method with source=’orig’.

  • subsample – The number of events to use for sub-sampling. The number of sub-sampled events can be changed after instantiation using the subsample_events method. The random seed can also be specified using that method. Sub-sampled events are used predominantly for speeding up plotting methods.

Create a Sample instance

Public Methods:

__init__(fcs_path_or_data[, sample_id, ...])

Create a Sample instance

__repr__()

Return repr(self).

__lt__(other)

Return self<value.

__eq__(other)

Return self==value.

filter_negative_scatter([reapply_subsample])

Determines indices of negative scatter events, optionally re-subsample the Sample events afterward.

set_flagged_events(event_indices)

Flags the given event indices.

get_index_sorted_locations()

Retrieve well locations for index sorted data (if present in metadata)

subsample_events([subsample_count, random_seed])

Stores a set of sub-sampled indices for event data.

apply_compensation(compensation[, comp_id])

Applies given compensation matrix to Sample events.

get_metadata()

Retrieve FCS metadata.

get_events([source, subsample])

Returns a NumPy array of event data.

as_dataframe([source, subsample, col_order, ...])

Returns a pandas DataFrame of event data.

get_channel_number_by_label(label)

Returns the channel number for the given PnN label.

get_channel_index(channel_label_or_number)

Returns the channel index for the given PnN label.

get_channel_events(channel_index[, source, ...])

Returns a NumPy array of event data for the specified channel index.

apply_transform(transform[, include_scatter])

Applies given transform to Sample events, and overwrites the transform attribute.

plot_channel(channel_label_or_number[, ...])

Plot a 2-D histogram of the specified channel data with the x-axis as the event index.

plot_contour(x_label_or_number, ...[, ...])

Returns a contour plot of the specified channel events, available as raw, compensated, or transformed data.

plot_scatter(x_label_or_number, ...[, ...])

Returns an interactive scatter plot for the specified channel data.

plot_scatter_matrix([...])

Returns an interactive scatter plot matrix for all channel combinations except for the Time channel.

plot_histogram(channel_label_or_number[, ...])

Returns a histogram plot of the specified channel events

export(filename[, source, ...])

Export Sample event data to either a new FCS file or a CSV file.

__gt__(other[, NotImplemented])

Return a > b.

__le__(other[, NotImplemented])

Return a <= b.

__ge__(other[, NotImplemented])

Return a >= b.


filter_negative_scatter(reapply_subsample=True)

Determines indices of negative scatter events, optionally re-subsample the Sample events afterward.

Parameters:

reapply_subsample – Whether to re-subsample the Sample events after filtering. Default is True

set_flagged_events(event_indices)

Flags the given event indices. Can be useful for flagging anomalous time events for quality control or for any other purpose. Flagged indices do not affect analysis, it is only used as an option when exporting Sample event data.

Parameters:

event_indices – list of event indices to flag

Returns:

None

get_index_sorted_locations()

Retrieve well locations for index sorted data (if present in metadata)

Returns:

list of 2-D tuples

subsample_events(subsample_count=10000, random_seed=1)

Stores a set of sub-sampled indices for event data. Sub-sampled events can be accessed via the get_events method by setting the keyword argument subsample=True. The sub-sampled indices are available via the subsample_indices attribute.

Parameters:
  • subsample_count – Number of events to use as a sub-sample. If the number of events in the Sample is less than the requested sub-sample count, then the maximum number of available events is used for the sub-sample.

  • random_seed – Random seed used for sub-sampling events

Returns:

None

apply_compensation(compensation, comp_id='custom_spill')

Applies given compensation matrix to Sample events. If any transformation has been applied, it will be re-applied after compensation. Compensated events can be retrieved afterward by calling get_events with source=’comp’. Note, if the sample specifies null channels then these must not be present in the compensation matrix.

Parameters:
  • compensation

    Compensation matrix, which can be a:

    • Matrix instance

    • NumPy array

    • CSV file path

    • pathlib Path object to a CSV or TSV file

    • string of CSV text

    If a string, both multi-line traditional CSV, and the single line FCS spill formats are supported. If a NumPy array, we assume the columns are in the same order as the channel labels.

  • comp_id – text ID for identifying compensation matrix (not used if compensation was a Matrix instance)

Returns:

None

get_metadata()

Retrieve FCS metadata.

Returns:

Dictionary of FCS metadata

get_events(source='xform', subsample=False)

Returns a NumPy array of event data.

Note: This method returns the array directly, not a copy of the array. Be careful if you are planning to modify returned event data, and make a copy of the array when appropriate.

Parameters:
  • source – ‘orig’, ‘raw’, ‘comp’, ‘xform’ for whether the original (no gain applied), raw (orig + gain), compensated (raw + comp), or transformed (comp + xform) events will be returned

  • subsample – Whether to return all events or just the sub-sampled events. Default is False (all events)

Returns:

NumPy array of event data

as_dataframe(source='xform', subsample=False, col_order=None, col_names=None)

Returns a pandas DataFrame of event data.

Parameters:
  • source – ‘orig’, ‘raw’, ‘comp’, ‘xform’ for whether the original (no gain applied), raw (orig + gain), compensated (raw + comp), or transformed (comp + xform) events will be returned

  • subsample – Whether to return all events or just the sub-sampled events. Default is False (all events)

  • col_order – list of PnN labels. Determines the order of columns in the output DataFrame. If None, the column order will match the FCS file.

  • col_names – list of new column labels. If None (default), the DataFrame columns will be a MultiIndex of the PnN / PnS labels.

Returns:

pandas DataFrame of event data

get_channel_number_by_label(label)

Returns the channel number for the given PnN label. Note, this is the channel number, as defined in the FCS data (not the channel index), so the 1st channel’s number is 1 (not 0).

Parameters:

label – PnN channel label

Returns:

Channel number (not index)

get_channel_index(channel_label_or_number)

Returns the channel index for the given PnN label. Note, this is different from the channel number. The 1st channel’s index is 0 (not 1).

Parameters:

channel_label_or_number – A channel’s PnN label or number

Returns:

Channel index

get_channel_events(channel_index, source='xform', subsample=False)

Returns a NumPy array of event data for the specified channel index.

Note: This method returns the array directly, not a copy of the array. Be careful if you are planning to modify returned event data, and make a copy of the array when appropriate.

Parameters:
  • channel_index – channel index for which data is returned

  • source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events will be returned

  • subsample – Whether to return all events or just the sub-sampled events. Default is False (all events)

Returns:

NumPy array of event data for the specified channel index

apply_transform(transform, include_scatter=False)

Applies given transform to Sample events, and overwrites the transform attribute. By default, only the fluorescent channels are transformed (and excludes null channels). For fully customized transformations per channel, the transform can be specified as a dictionary mapping PnN labels to an instance of the Transform subclass. If a dictionary of transforms is specified, the include_scatter option is ignored and only the channels explicitly included in the transform dictionary will be transformed.

Parameters:
  • transform – an instance of a Transform subclass or a dictionary where the keys correspond to the PnN labels and the value is an instance of a Transform subclass.

  • include_scatter – Whether to transform the scatter channel in addition to the fluorescent channels. Default is False.

plot_channel(channel_label_or_number, source='xform', subsample=True, color_density=True, bin_width=4, event_mask=None, highlight_mask=None, x_min=None, x_max=None, y_min=None, y_max=None)

Plot a 2-D histogram of the specified channel data with the x-axis as the event index. This is similar to plotting a channel vs Time, except the events are equally distributed along the x-axis.

Parameters:
  • channel_label_or_number – A channel’s PnN label or number

  • source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events are used for plotting

  • subsample – Whether to use all events for plotting or just the sub-sampled events. Default is True (sub-sampled events). Plotting sub-sampled events is much faster.

  • color_density – Whether to color the events by density, similar to a heat map. Default is True.

  • bin_width – Bin size to use for the color density, in units of event point size. Larger values produce smoother gradients. Default is 4 for a 4x4 grid size.

  • event_mask – Boolean array of events to plot. Takes precedence over highlight_mask (i.e. events marked False in event_mask will never be plotted).

  • highlight_mask – Boolean array of event indices to highlight in color. Non-highlighted events will be light grey.

  • x_min – Lower bound of x-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.

  • x_max – Upper bound of x-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.

  • y_min – Lower bound of y-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.

  • y_max – Upper bound of y-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.

Returns:

A Bokeh Figure object containing the interactive channel plot.

plot_contour(x_label_or_number, y_label_or_number, source='xform', subsample=True, plot_events=False, fill=False, x_min=None, x_max=None, y_min=None, y_max=None)

Returns a contour plot of the specified channel events, available as raw, compensated, or transformed data.

Parameters:
  • x_label_or_number – A channel’s PnN label or number for x-axis data

  • y_label_or_number – A channel’s PnN label or number for y-axis data

  • source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events are used for plotting

  • subsample – Whether to use all events for plotting or just the sub-sampled events. Default is True (sub-sampled events). Running with all events is not recommended, as the Kernel Density Estimation is computationally demanding.

  • plot_events – Whether to display the event data points in addition to the contours. Default is False.

  • x_min – Lower bound of x-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.

  • x_max – Upper bound of x-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.

  • y_min – Lower bound of y-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.

  • y_max – Upper bound of y-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.

  • fill – Whether to fill in color between contour lines. D default is False.

Returns:

A Bokeh figure of the contour plot

plot_scatter(x_label_or_number, y_label_or_number, source='xform', subsample=True, color_density=True, bin_width=4, event_mask=None, highlight_mask=None, x_min=None, x_max=None, y_min=None, y_max=None)

Returns an interactive scatter plot for the specified channel data.

Parameters:
  • x_label_or_number – A channel’s PnN label or number for x-axis data

  • y_label_or_number – A channel’s PnN label or number for y-axis data

  • source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events are used for plotting

  • subsample – Whether to use all events for plotting or just the sub-sampled events. Default is True (sub-sampled events). Plotting sub-sampled events is much faster.

  • color_density – Whether to color the events by density, similar to a heat map. Default is True.

  • bin_width – Bin size to use for the color density, in units of event point size. Larger values produce smoother gradients. Default is 4 for a 4x4 grid size.

  • event_mask – Boolean array of events to plot. Takes precedence over highlight_mask (i.e. events marked False in event_mask will never be plotted).

  • highlight_mask – Boolean array of event indices to highlight in color. Non-highlighted events will be light grey.

  • x_min – Lower bound of x-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.

  • x_max – Upper bound of x-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.

  • y_min – Lower bound of y-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.

  • y_max – Upper bound of y-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.

Returns:

A Bokeh Figure object containing the interactive scatter plot.

plot_scatter_matrix(channel_labels_or_numbers=None, source='xform', subsample=True, event_mask=None, highlight_mask=None, color_density=False, plot_height=256, plot_width=256)

Returns an interactive scatter plot matrix for all channel combinations except for the Time channel.

Parameters:
  • channel_labels_or_numbers – List of channel PnN labels or channel numbers to use for the scatter plot matrix. If None, then all channels will be plotted (except Time).

  • source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events are used for plotting

  • subsample – Whether to use all events for plotting or just the sub-sampled events. Default is True (sub-sampled events). Plotting sub-sampled events is much faster.

  • event_mask – Boolean array of events to plot. Takes precedence over highlight_mask (i.e. events marked False in event_mask will never be plotted).

  • highlight_mask – Boolean array of event indices to highlight in color. Non-highlighted events will be light grey.

  • color_density – Whether to color the events by density, similar to a heat map. Default is False.

  • plot_height – Height of plot in pixels (screen units)

  • plot_width – Width of plot in pixels (screen units)

Returns:

A Bokeh Figure object containing the interactive scatter plot matrix.

plot_histogram(channel_label_or_number, source='xform', subsample=False, bins=None, data_min=None, data_max=None, x_range=None)

Returns a histogram plot of the specified channel events

Parameters:
  • channel_label_or_number – A channel’s PnN label or number to use for plotting the histogram

  • source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events are used for plotting

  • subsample – Whether to use all events for plotting or just the sub-sampled events. Default is False (all events).

  • bins – Number of bins to use for the histogram or a string compatible with the NumPy histogram function. If None, the number of bins is determined by the square root rule.

  • data_min – filter event data, removing events below specified value

  • data_max – filter event data, removing events above specified value

  • x_range – Tuple of lower & upper bounds of x-axis. Used for modifying plot view, doesn’t filter event data.

Returns:

Bokeh figure of the histogram plot.

export(filename, source='xform', exclude_neg_scatter=False, exclude_flagged=False, exclude_normal=False, subsample=False, include_metadata=False, directory=None)

Export Sample event data to either a new FCS file or a CSV file. Format determined by filename extension.

Parameters:
  • filename – Text string to use for the exported file name. File type is determined by the filename extension (supported types are .fcs & .csv).

  • source – ‘orig’, ‘raw’, ‘comp’, ‘xform’ for whether the original (no gain applied), raw (orig + gain), compensated (raw + comp), or transformed (comp + xform) events are used for exporting

  • exclude_neg_scatter – Whether to exclude negative scatter events. Default is False.

  • exclude_flagged – Whether to exclude flagged events. Default is False.

  • exclude_normal – Whether to exclude “normal” events. This is useful for retrieving all the “bad” events (neg scatter and/or flagged events). Default is False.

  • subsample – Whether to export all events or just the sub-sampled events. Default is False (all events).

  • include_metadata – Whether to include all key/value pairs from the metadata attribute in the output FCS file. Only valid for .fcs file extension. If False, only the minimum amount of metadata will be included in the output FCS file. Default is False.

  • directory – Directory path where the exported file will be saved. If None, the file will be saved in the current working directory.

Returns:

None