Sample Class
- class flowkit.Sample(fcs_path_or_data, sample_id=None, filename_as_id=False, channel_labels=None, compensation=None, null_channel_list=None, ignore_offset_error=False, ignore_offset_discrepancy=False, use_header_offsets=False, preprocess=True, use_flowjo_labels=False, subsample=10000)
Represents a single FCS sample from an FCS file, NumPy array or pandas DataFrame.
For Sample plot methods, pay attention to the defaults for the subsample arguments, as most will use the subsampled events by default for better performance. For compensation and transformation routines, all events are always processed.
- Note on ignore_offset_error:
Some FCS files incorrectly report the location of the last data byte as the last byte exclusive of the data section rather than the last byte inclusive of the data section. Technically, these are invalid FCS files but these are not corrupted data files. To attempt to read in these files, set the ignore_offset_error option to True.
- Note on ignore_offset_discrepancy and use_header_offset:
The byte offset location for the DATA segment is defined in 2 places in an FCS file: the HEADER and the TEXT segments. By default, FlowIO uses the offset values found in the TEXT segment. If the HEADER values differ from the TEXT values, a DataOffsetDiscrepancyError will be raised. This option allows overriding this error to force the loading of the FCS file. The related use_header_offset can be used to force loading the file using the data offset locations found in the HEADER section rather than the TEXT section. Setting use_header_offset to True is equivalent to setting both options to True, meaning no error will be raised for an offset discrepancy.
- Parameters:
fcs_path_or_data –
FCS data, can be either:
a file path or file handle to an FCS file
a pathlib Path object
a FlowIO FlowData object
a NumPy array of FCS event data (must provide sample_id & channel_labels)
a pandas DataFrame containing FCS event data (channel labels as column labels, must provide sample_id)
sample_id – A text string to use for the Sample’s ID. If None, the ID will be taken from the ‘fil’ keyword of the metadata. If the ‘fil’ keyword is not present, the value will be the filename if given a file. For a NumPy array or Pandas DataFrame, a text value is required.
filename_as_id – Boolean option for using the file name (as it exists on the filesystem) for the Sample’s ID, default is False. This option is only valid for file-like objects (file paths, filehandles, Pathlib Paths). Note, the ‘sample_id’ kwarg takes precedence, if both are specified, the ‘filename_as_id’ option is ignored.
channel_labels – A list of strings or a list of tuples to use for the channel labels. Required if fcs_path_or_data is a NumPy array
compensation –
Compensation matrix, which can be a:
Matrix instance
NumPy array
CSV file path
pathlib Path object to a CSV or TSV file
string of CSV text
null_channel_list – List of PnN labels for acquired channels that do not contain useful data. Note, this should only be used if no fluorochromes were used to target those detectors. Null channels do not contribute to compensation and should not be included in a compensation matrix for this sample. This option is ignored if fcs_path_or_data is a FlowData object.
ignore_offset_error – option to ignore data offset error (see above note), default is False
ignore_offset_discrepancy – option to ignore discrepancy between the HEADER and TEXT values for the DATA byte offset location, default is False
use_header_offsets – use the HEADER section for the data offset locations, default is False. Setting this option to True also suppresses an error in cases of an offset discrepancy.
preprocess – Controls whether preprocessing is applied to the ‘raw’ data (retrievable via the get_events() method with source=’raw’). Binary events in an FCS file are stored unprocessed, meaning they have not been scaled according to channel gain, corrected for proper lin/log display, or had the time channel scaled by the ‘timestep’ keyword value (if present). Unprocessed event data is typically not useful for analysis, so the default is True. Preprocessing does not include compensation or transformation (e.g. biex, Logicle) which are separate operations.
use_flowjo_labels – FlowJo converts forward slashes (‘/’) in PnN labels to underscores. This option matches that behavior. Default is False.
subsample – The number of events to use for subsampling. The number of subsampled events can be changed after instantiation using the subsample_events method. The random seed can also be specified using that method. Subsampled events are used predominantly for speeding up plotting methods.
Create a Sample instance
Public Methods:
__init__(fcs_path_or_data[, sample_id, ...])Create a Sample instance
__repr__()Return repr(self).
__lt__(other)Return self<value.
__eq__(other)Return self==value.
filter_negative_scatter([reapply_subsample])Determines indices of negative scatter events, optionally re-subsample the Sample events afterward.
set_flagged_events(event_indices)Flags the given event indices.
get_index_sorted_locations()Retrieve well locations for index sorted data (if present in metadata)
subsample_events([subsample_count, random_seed])Stores a set of subsampled indices for event data.
apply_compensation(compensation)Applies given compensation matrix to Sample events.
get_metadata()Retrieve FCS metadata.
get_events([source, subsample, event_mask, ...])Returns a NumPy array of event data.
as_dataframe([source, subsample, ...])Returns a pandas DataFrame of event data.
get_channel_number_by_label(label)Returns the channel number for the given PnN label.
get_channel_index(channel_label_or_number)Returns the channel index for the given PnN label.
get_channel_events(channel_label_or_number)Returns a NumPy array of event data for the specified channel index.
rename_channel(current_label, new_label[, ...])Rename a channel label.
apply_transform(transform[, include_scatter])Applies given transform to Sample events, and overwrites the transform attribute.
plot_channel(channel_label_or_number[, ...])Plot a 2-D histogram of the specified channel data with the x-axis as the event index.
plot_contour(x_label_or_number, ...[, ...])Returns a contour plot of the specified channel events, available as raw, compensated, or transformed data.
plot_scatter(x_label_or_number, ...[, ...])Returns an interactive scatter plot for the specified channel data.
plot_scatter_matrix([...])Returns an interactive scatter plot matrix for all channel combinations except for the Time channel.
plot_histogram(channel_label_or_number[, ...])Returns a histogram plot of the specified channel events
export(filename[, source, ...])Export Sample event data to either a new FCS file or a CSV file.
__gt__(other[, NotImplemented])Return a > b.
__le__(other[, NotImplemented])Return a <= b.
__ge__(other[, NotImplemented])Return a >= b.
- filter_negative_scatter(reapply_subsample=True)
Determines indices of negative scatter events, optionally re-subsample the Sample events afterward.
- Parameters:
reapply_subsample – Whether to re-subsample the Sample events after filtering. Default is True
- set_flagged_events(event_indices)
Flags the given event indices. Can be useful for flagging anomalous time events for quality control or for any other purpose. Flagged indices do not affect analysis, it is only used as an option when exporting Sample event data.
- Parameters:
event_indices – list of event indices to flag
- Returns:
None
- get_index_sorted_locations()
Retrieve well locations for index sorted data (if present in metadata)
- Returns:
list of 2-D tuples
- subsample_events(subsample_count=10000, random_seed=1)
Stores a set of subsampled indices for event data. Subsampled events can be accessed via the get_events method by setting the keyword argument subsample=True. The subsampled indices are available via the subsample_indices attribute.
- Parameters:
subsample_count – Number of events to use as a subsample. If the number of events in the Sample is less than the requested subsample count, then the maximum number of available events is used for the subsample.
random_seed – Random seed used for subsampling events
- Returns:
None
- apply_compensation(compensation)
Applies given compensation matrix to Sample events. If any transformation has been applied, it will be re-applied after compensation. Compensated events can be retrieved afterward by calling get_events with source=’comp’. Note, if the sample specifies null channels then these must not be present in the compensation matrix.
- Parameters:
compensation –
Compensation matrix, which can be a:
Matrix or SpectralMatrix instance
NumPy array
CSV file path
pathlib Path object to a CSV or TSV file
string of CSV text
If a string, both multi-line traditional CSV, and the single line FCS spill formats are supported. If a NumPy array, we assume the columns are in the same order as the channel labels.
- Returns:
None
- get_metadata()
Retrieve FCS metadata.
- Returns:
Dictionary of FCS metadata
- get_events(source='xform', subsample=False, event_mask=None, col_order=None)
Returns a NumPy array of event data.
Note: This method returns the array directly, not a copy of the array. Be careful if you are planning to modify returned event data, and make a copy of the array when appropriate.
- Parameters:
source – Controls which version of event data to return.Valid values are: ‘raw’, ‘comp’, or ‘xform’. For ‘raw’, events are returned uncompensated and non-transformed. For ‘comp’, events are returned compensated according to the stored compensation matrix. For ‘xform’, events are returned transformed according to the stored transformations and will include any compensation applied beforehand. Note: In all cases, events returned will be based on whether pre-processing was applied when loading the Sample.
subsample – Whether to return all events or just the subsampled events. Default is False (all events)
event_mask – Filter Sample events by a given Boolean array (events marked True will be returned). Can be combined with the subsample option.
col_order – PnN label list for the channel columns and their order
- Returns:
NumPy array of event data
- as_dataframe(source='xform', subsample=False, event_mask=None, col_order=None, col_names=None, col_multi_index=True)
Returns a pandas DataFrame of event data.
- Parameters:
source – ‘raw’, ‘comp’, ‘xform’ for whether the raw (uncompensated, non-transformed, optionally pre-processed), compensated (raw + comp), or transformed (comp + xform) events are returned
subsample – Whether to return all events or just the subsampled events. Default is False (all events)
event_mask – Filter Sample events by a given Boolean array (events marked True will be returned). Can be combined with the subsample option.
col_order – list of PnN labels. Determines the order of columns in the output DataFrame. If None, the column order will match the FCS file.
col_names – list of new column labels. If None (default), the DataFrame columns will be a MultiIndex of the PnN / PnS labels.
col_multi_index – Controls whether the column labels are multi-index. If False, only the PnN labels will be used for a simple column index. Default is True.
- Returns:
pandas DataFrame of event data
- get_channel_number_by_label(label)
Returns the channel number for the given PnN label. Note, this is the channel number, as defined in the FCS data (not the channel index), so the 1st channel’s number is 1 (not 0).
- Parameters:
label – PnN channel label
- Returns:
Channel number (not index)
- get_channel_index(channel_label_or_number)
Returns the channel index for the given PnN label. Note, this is different from the channel number. The 1st channel’s index is 0 (not 1).
- Parameters:
channel_label_or_number – A channel’s PnN label or number
- Returns:
Channel index
- get_channel_events(channel_label_or_number, source='xform', subsample=False, event_mask=None)
Returns a NumPy array of event data for the specified channel index.
Note: This method returns the array directly, not a copy of the array. Be careful if you are planning to modify returned event data, and make a copy of the array when appropriate.
- Parameters:
channel_label_or_number – A channel’s PnN label or number
source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events will be returned
subsample – Whether to return all events or just the subsampled events. Default is False (all events)
event_mask – Filter Sample events by a given Boolean array (events marked True will be returned). Can be combined with the subsample option.
- Returns:
NumPy array of event data for the specified channel index
- rename_channel(current_label, new_label, new_pns_label=None)
Rename a channel label.
- Parameters:
current_label – PnN label of a channel
new_label – new PnN label
new_pns_label – optional new PnS label
- Returns:
None
- apply_transform(transform, include_scatter=False)
Applies given transform to Sample events, and overwrites the transform attribute. By default, only the fluorescent channels are transformed (and excludes null channels). For fully customized transformations per channel, the transform can be specified as a dictionary mapping PnN labels to an instance of the Transform subclass. If a dictionary of transforms is specified, the include_scatter option is ignored and only the channels explicitly included in the transform dictionary will be transformed.
- Parameters:
transform – an instance of a Transform subclass or a dictionary where the keys correspond to the PnN labels and the value is an instance of a Transform subclass.
include_scatter – Whether to transform the scatter channel in addition to the fluorescent channels. Default is False.
- plot_channel(channel_label_or_number, source='xform', subsample=True, color_density=True, bin_width=4, event_mask=None, highlight_mask=None, x_min=None, x_max=None, y_min=None, y_max=None, width=900, aspect_ratio=3)
Plot a 2-D histogram of the specified channel data with the x-axis as the event index. This is similar to plotting a channel vs Time, except the events are equally distributed along the x-axis.
- Parameters:
channel_label_or_number – A channel’s PnN label or number
source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events are used for plotting
subsample – Whether to use all events for plotting or just the subsampled events. Default is True (subsampled events). Plotting subsampled events is much faster.
color_density – Whether to color the events by density, similar to a heat map. Default is True.
bin_width – Bin size to use for the color density, in units of event point size. Larger values produce smoother gradients. Default is 4 for a 4x4 grid size.
event_mask – Boolean array of events to plot. Takes precedence over highlight_mask (i.e. events marked False in event_mask will never be plotted).
highlight_mask – Boolean array of event indices to highlight in color. Non-highlighted events will be light grey.
x_min – Lower bound of x-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.
x_max – Upper bound of x-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.
y_min – Lower bound of y-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.
y_max – Upper bound of y-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.
width – Width of the plot. Default is 900. By default, the width to height ratio is 3:1 (default height of 300 pixels).
aspect_ratio – The width to height ratio of the plot. Default is 3. Set to 1 for a square plot.
- Returns:
A Bokeh Figure object containing the interactive channel plot.
- plot_contour(x_label_or_number, y_label_or_number, source='xform', subsample=True, plot_events=False, fill=False, x_min=None, x_max=None, y_min=None, y_max=None)
Returns a contour plot of the specified channel events, available as raw, compensated, or transformed data.
- Parameters:
x_label_or_number – A channel’s PnN label or number for x-axis data
y_label_or_number – A channel’s PnN label or number for y-axis data
source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events are used for plotting
subsample – Whether to use all events for plotting or just the subsampled events. Default is True (subsampled events). Running with all events is not recommended, as the Kernel Density Estimation is computationally demanding.
plot_events – Whether to display the event data points in addition to the contours. Default is False.
x_min – Lower bound of x-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.
x_max – Upper bound of x-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.
y_min – Lower bound of y-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.
y_max – Upper bound of y-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.
fill – Whether to fill in color between contour lines. D default is False.
- Returns:
A Bokeh figure of the contour plot
- plot_scatter(x_label_or_number, y_label_or_number, source='xform', subsample=True, color_density=True, bin_width=4, event_mask=None, highlight_mask=None, x_min=None, x_max=None, y_min=None, y_max=None, height=600, width=600)
Returns an interactive scatter plot for the specified channel data.
- Parameters:
x_label_or_number – A channel’s PnN label or number for x-axis data
y_label_or_number – A channel’s PnN label or number for y-axis data
source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events are used for plotting
subsample – Whether to use all events for plotting or just the subsampled events. Default is True (subsampled events). Plotting subsampled events is much faster.
color_density – Whether to color the events by density, similar to a heat map. Default is True.
bin_width – Bin size to use for the color density, in units of event point size. Larger values produce smoother gradients. Default is 4 for a 4x4 grid size.
event_mask – Boolean array of events to plot. Takes precedence over highlight_mask (i.e. events marked False in event_mask will never be plotted).
highlight_mask – Boolean array of event indices to highlight in color. Non-highlighted events will be light grey.
x_min – Lower bound of x-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.
x_max – Upper bound of x-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.
y_min – Lower bound of y-axis. If None, channel’s min value will be used with some padding to keep events off the edge of the plot.
y_max – Upper bound of y-axis. If None, channel’s max value will be used with some padding to keep events off the edge of the plot.
height – Height of plot in pixels. Default is 600.
width – Width of plot in pixels. Default is 600.
- Returns:
A Bokeh Figure object containing the interactive scatter plot.
- plot_scatter_matrix(channel_labels_or_numbers=None, source='xform', subsample=True, event_mask=None, highlight_mask=None, color_density=False, plot_height=256, plot_width=256)
Returns an interactive scatter plot matrix for all channel combinations except for the Time channel.
- Parameters:
channel_labels_or_numbers – List of channel PnN labels or channel numbers to use for the scatter plot matrix. If None, then all channels will be plotted (except Time).
source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events are used for plotting
subsample – Whether to use all events for plotting or just the subsampled events. Default is True (subsampled events). Plotting subsampled events is much faster.
event_mask – Boolean array of events to plot. Takes precedence over highlight_mask (i.e. events marked False in event_mask will never be plotted).
highlight_mask – Boolean array of event indices to highlight in color. Non-highlighted events will be light grey.
color_density – Whether to color the events by density, similar to a heat map. Default is False.
plot_height – Height of plot in pixels (screen units)
plot_width – Width of plot in pixels (screen units)
- Returns:
A Bokeh Figure object containing the interactive scatter plot matrix.
- plot_histogram(channel_label_or_number, source='xform', subsample=False, bins=None, data_min=None, data_max=None, x_range=None)
Returns a histogram plot of the specified channel events
- Parameters:
channel_label_or_number – A channel’s PnN label or number to use for plotting the histogram
source – ‘raw’, ‘comp’, ‘xform’ for whether the raw, compensated or transformed events are used for plotting
subsample – Whether to use all events for plotting or just the subsampled events. Default is False (all events).
bins – Number of bins to use for the histogram or a string compatible with the NumPy histogram function. If None, the number of bins is determined by the square root rule.
data_min – filter event data, removing events below specified value
data_max – filter event data, removing events above specified value
x_range – Tuple of lower & upper bounds of x-axis. Used for modifying plot view, doesn’t filter event data.
- Returns:
Bokeh figure of the histogram plot.
- export(filename, source='xform', exclude_neg_scatter=False, exclude_flagged=False, exclude_normal=False, subsample=False, include_metadata=False, directory=None)
Export Sample event data to either a new FCS file or a CSV file. Format determined by filename extension.
- Parameters:
filename – Text string to use for the exported file name. File type is determined by the filename extension (supported types are .fcs & .csv).
source – ‘orig’, ‘raw’, ‘comp’, ‘xform’ for whether the original (no gain applied), raw (orig + gain), compensated (raw + comp), or transformed (comp + xform) events are used for exporting
exclude_neg_scatter – Whether to exclude negative scatter events. Default is False.
exclude_flagged – Whether to exclude flagged events. Default is False.
exclude_normal – Whether to exclude “normal” events. This is useful for retrieving all the “bad” events (neg scatter and/or flagged events). Default is False.
subsample – Whether to export all events or just the subsampled events. Default is False (all events).
include_metadata – Whether to include all key/value pairs from the metadata attribute in the output FCS file. Only valid for .fcs file extension. If False, only the minimum amount of metadata will be included in the output FCS file. Default is False.
directory – Directory path where the exported file will be saved. If None, the file will be saved in the current working directory.
- Returns:
None