Skip to content

Feature Library

Mosaic's feature library provides 30+ registered feature implementations organized by output type. Features are composable pipeline stages that read from tracks or upstream feature outputs and produce per-sequence parquet files.

Feature categories

Category Features
Per-frame kinematic SpeedAngvel, BodyScale, OrientationRelative
Per-frame spatial PairEgocentric, PairPosition, PairInteractionFilter, ApproachAvoidance
Per-frame social NearestNeighbor, FFGroups, FFGroupsMetrics, NNDeltaResponse, NNDeltaBins
Per-frame context TemporalStacking, PairWavelet
Dimensionality reduction PairPoseDistancePCA, GlobalScaler
Embedding & clustering GlobalTSNE, GlobalKMeansClustering, GlobalWardClustering, WardAssign, ExtractTemplates, ExtractLabeledTemplates
Classification XgboostFeature, FeralFeature, KpmsFeature

Registry

feature_library

Feature library for behavior datasets.

This module provides a collection of features for behavioral analysis. Features are automatically registered on import via the @register_feature decorator.

All features are automatically loaded when the feature_library is imported, making them available in the global FEATURES registry.

Usage

from mosaic.behavior.feature_library import Inputs, Result from mosaic.behavior.feature_library.speed_angvel import SpeedAngvel

Track-only feature (default inputs)

feat = SpeedAngvel() dataset.run_feature(feat)

Feature consuming another feature's output

feat = SpeedAngvel(inputs=Inputs((Result(feature="nn"),))) dataset.run_feature(feat)

List all registered features

from mosaic.behavior.feature_library.registry import FEATURES print(list(FEATURES.keys()))

ApproachAvoidance

ApproachAvoidance(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'approach-avoidance' — per-sequence AA event detection for all pairs.

For N animals per sequence, evaluates all N*(N-1)/2 unique unordered pairs. The output stores directional events as aa_event_12 and aa_event_21 over canonical (id1,id2), plus aa_event/label_id as non-directional union.

Parameters:

Name Type Description Default
interpolation

Interpolation settings for missing data. Default: InterpolationConfig().

required
sampling

Frame rate and smoothing settings. Default: SamplingConfig().

required
velocity_units

Whether speed thresholds are in "per_frame" or "per_second". Default: "per_frame".

required
angle_units

Unit for heading angles — "radians", "degrees", or "auto" (detect from data range). Default: "radians".

required
consecutive_frame_delta

Expected frame step between consecutive rows; used to detect gaps. Default: 1.0.

required
distance_threshold

Maximum inter-animal distance (in position units) for a frame to be considered AA-eligible. Default: 200.0.

required
approacher_velocity_threshold

Minimum speed of the approaching animal. Default: 5.0.

required
avoider_velocity_threshold

Minimum speed of the avoiding animal. Default: 5.0.

required
cos_approacher_threshold

Minimum cosine between the approacher's velocity vector and the direction toward the partner. Default: 0.8.

required
cos_avoider_threshold

Minimum cosine between the avoider's velocity vector and the direction away from the partner. Default: 0.5.

required
min_event_length

Minimum number of contiguous qualifying frames to form an event. Default: 10.

required
min_event_count

Minimum number of qualifying frames within an event run to keep it. Default: 5.

required
orientation_gate_cos

If set, require the approacher's body orientation to align with its velocity (cos threshold). Default: cos(30°) ≈ 0.866. None disables the gate.

required
smooth_window_sec

If set, apply a sliding-window average (in seconds) to velocities before thresholding. Default: None (disabled; framewise behaviour).

required

extract_events staticmethod

extract_events(aa_df: DataFrame, min_duration: int = 1) -> pd.DataFrame

Convert per-frame AA output into a compact event table.

Parameters

aa_df : DataFrame Per-frame output with columns: frame, id1, id2, aa_event, aa_event_12, aa_event_21. May span multiple sequences/groups (they are handled independently). min_duration : int Minimum event length in frames. Events shorter than this are discarded.

Returns

DataFrame with columns: id1, id2, start_frame, end_frame, duration, direction ('12' if id1→id2, '21' if id2→id1, 'both'), approacher_id, avoider_id, sequence (if present), group (if present).

ArHmmFeature

ArHmmFeature(inputs: Inputs, params: dict[str, object] | None = None)

AR-HMM behavioral syllable discovery as a pipeline feature.

Fits an autoregressive Hidden Markov Model across all input sequences and assigns per-frame syllable labels via Viterbi decoding.

Parameters:

Name Type Description Default
model

Pre-fitted ArHmmModelArtifact to load (skip fit). Default: None (fit from scratch).

required
pca_dim

Number of PCA components for dimensionality reduction before fitting. None skips PCA. Default: None.

required
n_states

Maximum number of HMM states (pruned after fit). Default: 50.

required
n_lags

AR order (number of lagged frames as regressors). Default: 1.

required
sticky_weight

Extra pseudo-count on the diagonal of the transition matrix (encourages state persistence). Default: 100.0.

required
n_iter

Maximum EM iterations per restart. Default: 200.

required
tol

Convergence tolerance on relative LL change. Default: 1e-4.

required
n_restarts

Number of random restarts (best LL kept). Default: 1.

required
standardize

If True, z-score features before fitting. Default: True.

required
downsample_rate

Temporal downsampling factor. None disables. Default: None.

required
prune_threshold

Drop states with posterior mass below this fraction. Default: 0.01.

required
random_state

Random seed. Default: 42.

required

ArtifactSpec

Bases: Result[str], Generic[L, R]

Reference to a feature artifact with load specification.

Class Type Parameters:

Name Bound or Constraints Description Default
L

Load spec type (NpzLoadSpec, ParquetLoadSpec, JoblibLoadSpec).

required
R

Return type of from_path(). Defaults to object.

required

Attributes:

Name Type Description
load L

How to load the matched files.

pattern str

Glob pattern. Auto-derived from load.kind when empty.

from_path

from_path(path: Path) -> R

Load artifact from a resolved file path.

Dispatches on load-spec type via load_from_spec(). Return type is determined by the R type parameter.

from_result classmethod

from_result(result: Result[str]) -> Self

Create from a Result, validating feature match.

Typed artifact subclasses (with a default feature) validate that result.feature matches. Base ArtifactSpec passes through.

BodyScaleFeature

BodyScaleFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-frame body scale: median intra-animal pose distance.

Outputs per sequence parquet with columns: frame, id, scale, sequence, group. Intended to be averaged later (per sequence or dataset) to derive a single normalization constant for downstream orientation features.

ExtractLabeledTemplates

ExtractLabeledTemplates(inputs: Inputs, params: dict[str, object] | None = None)

Extract labeled, split-annotated templates from upstream features.

Streams upstream feature data, aligns ground truth labels from NPZ files, assigns train/test splits by sequence, and subsamples per class. Produces a templates parquet with feature columns + label (int) + split (str).

Parameters:

Name Type Description Default
labels

GroundTruthLabelsSource specifying where to load per-frame ground-truth labels (required).

required
strategy

Template selection method — "random" or "farthest_first". Default: "random".

required
n_per_class

Number of templates per class. An int applies uniformly; a dict maps class -> count. Exactly one of n_per_class or n_total must be set. Default: None.

required
n_total

Total number of templates across all classes (distributed proportionally). Exactly one of n_per_class or n_total must be set. Default: None.

required
pool

PoolConfig controlling candidate pool size and allocation. Default: PoolConfig().

required
test_fraction

Fraction of sequences held out for the test split. Default: 0.2.

required
random_state

Random seed for reproducibility. Default: 42.

required

ExtractTemplates

ExtractTemplates(inputs: Inputs, params: dict[str, object] | None = None)

Subsample per-sequence data into a representative template matrix.

Entry point for the global feature pipeline. Streams per-sequence inputs, builds a candidate pool with proportional per-entry contribution, and selects templates using the configured strategy.

Parameters:

Name Type Description Default
strategy

Template selection method — "random" for uniform random sampling, "farthest_first" for greedy diversity maximization. Default: "random".

required
n_templates

Number of templates to select (required).

required
pool

PoolConfig controlling candidate pool size, allocation strategy, and per-entry caps. Default: PoolConfig().

required
random_state

Random seed for reproducibility. Default: 42.

required
pair_filter

Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.

required

Params

Bases: Params

ExtractTemplates parameters.

Attributes:

Name Type Description
strategy Literal['random', 'farthest_first']

Selection strategy. Default "random".

n_templates int

Number of templates to select. Required.

pool PoolConfig

Pool configuration. Default PoolConfig().

random_state int

Random seed. Default 42.

FFGroups

FFGroups(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence fission-fusion grouping metrics.

Inputs: raw tracks (columns: x, y, id, frame/time, group, sequence). Outputs per (frame, id): - group_membership (component label) - group_size (size of that component) - event (event id from dp.get_events_info, -1 if not in an event)

Parameters:

Name Type Description Default
distance_cutoff

Pairwise distance threshold below which two animals are considered in the same group. Default: 50.0.

required
window_size

Sliding-window size (frames) for smoothing the pairwise distance matrix before thresholding. Default: 5.

required
min_event_duration

Minimum number of contiguous frames for a stable subgroup to be registered as an event. Default: 1.

required

FFGroupsMetrics

FFGroupsMetrics(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence summary of focal-fish group metrics.

Per-frame computed (internal): - distance_from_centroid, xrot_to_centroid, yrot_to_centroid, dev_speed_to_mean Summaries (output: one row per id within sequence): - fractime_norm2 - avg_duration_frame - med_duration_frame - ftime_periphery - ftime_periphery_norm

Parameters:

Name Type Description Default
group_col

Column name that identifies group events (e.g. from FFGroups output). Default: "event".

required
speed_col

Column name for speed values. Default: "speed".

required
time_chunk_sec

If set, split each sequence into time-based chunks of this duration (seconds) and compute summaries per chunk. Default: None (whole sequence).

required
frame_chunk

If set, split each sequence into frame-based chunks of this size and compute summaries per chunk. Default: None.

required
centroid_heading_col

Column for centroid heading used in rotation calculations. Default: "centroid_heading".

required
exclude_cols

List of boolean column names (e.g. "bad_frame") whose truthy rows are dropped before computation. Default: [].

required

Feature

Bases: Protocol

Feature protocol -- 4 attributes, 4 methods.

FeralFeature

FeralFeature(inputs: Inputs, params: dict[str, object] | None = None)

FERAL vision-transformer behavior classifier as a pipeline feature.

Supports two operating modes:

Training mode (video_dir + label_json + training): Runs the full FERAL ViT fine-tuning loop, saves checkpoints, evaluates the test split (if present), then applies to all sequences in the apply phase.

Inference mode (model_dir): Loads a pre-trained FERAL model and runs per-frame behavior classification on crop videos.

Supports two input formats for the apply phase:

  1. InteractionCropPipeline output (pair-level): One row per crop video with video_path, id_a, id_b, target_id, interaction_id, start_frame, end_frame.

  2. EgocentricCrop output (individual-level): One row per frame with target_id, frame. Videos are derived as egocentric_id{target_id}.mp4.

Params

feral_code_dir : Path Path to a local clone of https://github.com/Skovorp/feral. model_name : str HuggingFace model name (default: V-JEPA2 ViT-L). predict_per_item : int Predictions per chunk (default 64). chunk_length : int Frames per video chunk (default 64). chunk_shift : int Stride between chunks for overlapping inference (default 32). chunk_step : int Frame sampling step within chunks (default 1). resize_to : int Input resolution for ViT (default 256). device : str PyTorch device (default "cuda"). class_names : dict | None Class index -> name mapping. Auto-detected from model config. decision_threshold : float | None Probability threshold for positive class. None uses argmax. default_class : int Fallback class when no class exceeds threshold (default 0). model_dir : Path | None Directory with model_best.pt + config.json (inference mode). video_dir : Path | None Directory containing crop videos (training mode). label_json : Path | None Path to FERAL-format label JSON with splits (training mode). training : FeralTrainingConfig | None Training hyperparameters. None = inference-only mode.

bind_dataset

bind_dataset(ds)

Store dataset reference for resolving media paths.

fit

fit(inputs: InputStream) -> None

Train a FERAL model or verify pre-trained model is loaded.

In training mode (video_dir + label_json + training set), runs the full ViT fine-tuning loop with intermediate checkpoints. After training, evaluates the test split if present.

In inference mode (model_dir set), the model is already loaded by load_state() and this method is not called.

The inputs argument is not consumed -- FERAL reads video files directly from params.video_dir.

FeralTrainingConfig

Bases: StrictModel

Training hyperparameters for FERAL ViT fine-tuning.

These mirror the FERAL default_vjepa.yaml configuration.

GlobalIdentityModel

GlobalIdentityModel(inputs: Inputs, params: dict[str, object] | None = None)

Train a visual identity model from individual animal sequences.

Takes EgocentricCrop output as input. Each identity is specified as a mapping of identity names to lists of sequences containing that individual alone. Trains a V200 CNN classifier (T-Rex-compatible) and exports weights loadable via visual_identification_model_path.

Example::

ego_result = dataset.run_feature(ego_crop)

identity_model = GlobalIdentityModel(
    Inputs((Result(feature="egocentric-crop"),)),
    params={
        "identities": {
            "mouse_A": ["cage1/day1_mouseA_alone", "cage1/day3_mouseA_alone"],
            "mouse_B": ["cage1/day1_mouseB_alone"],
            "mouse_C": ["cage1/day2_mouseC_alone"],
            "mouse_D": ["cage1/day1_mouseD_alone"],
        },
        "image_size": (128, 128),
        "channels": 1,
    },
)
result = dataset.run_feature(identity_model)

Parameters:

Name Type Description Default
identities

Explicit identity -> sequences mapping. Keys are identity names, values are lists of "group/sequence" strings.

required
group_as_identity

Convenience shortcut -- treat each group name as one identity. Default False.

required
image_size

Crop resize target (height, width). Default (128, 128).

required
channels

Number of image channels (1=grayscale, 3=color). Default 1.

required
epochs

Training epochs. Default 150.

required
learning_rate

Adam learning rate. Default 0.0001.

required
batch_size

Training batch size. Default 64.

required
val_split

Fraction of data reserved for validation. Default 0.2.

required
max_images_per_identity

Cap on images per identity to balance classes. Default 2000.

required
export_trex_weights

Save a T-Rex-loadable .pth file. Default True.

required
trex_weights_name

Stem of the exported .pth file. Default "identity_model".

required

Params

Bases: Params

Global identity model parameters.

apply

apply(df: DataFrame) -> pd.DataFrame

Passthrough -- identity predictions are applied by T-Rex, not Mosaic.

GlobalKMeansClustering

GlobalKMeansClustering(inputs: Inputs, params: dict[str, object] | None = None)

Global K-Means clustering on templates loaded via load_state. Per-sequence cluster assignment is done in apply().

Parameters:

Name Type Description Default
templates

Templates artifact to fit on (inherited from GlobalModelParams).

required
model

Pre-fitted KMeansModelArtifact to load (skip fit). Default: KMeansModelArtifact().

required
k

Number of clusters. Default: 100.

required
random_state

Random seed for KMeans initialization. Default: 42.

required
n_init

Number of KMeans initializations to run. Default: "auto".

required
max_iter

Maximum iterations per KMeans run. Default: 300.

required
device

Compute device — "cpu" or "cuda" (requires cuML). Default: "cpu".

required
label_artifact_points

If True, assign cluster labels to the template points used for fitting. Default: True.

required
pair_filter

Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.

required

Params

Bases: GlobalModelParams[KMeansModelArtifact]

Global K-means clustering parameters.

Attributes:

Name Type Description
templates ParquetArtifact | None

Templates artifact to fit on (inherited).

model KMeansModelArtifact | None

Pre-fitted KMeans model artifact (skip fit).

k int

Number of clusters. Default 100.

random_state int

Random seed. Default 42.

n_init Literal['auto'] | int

KMeans initializations. Default "auto".

max_iter int

Max iterations per run. Default 300.

device str

Compute device. Default "cpu".

label_artifact_points bool

Label points used for fitting. Default True.

pair_filter NNResult | None

Nearest-neighbor pair filter for dependency resolution. Default None.

GlobalModelParams

Bases: Params, Generic[M]

Base params for global features that fit on a templates artifact or load a pre-fitted model.

Type parameter M is the model artifact type (must extend JoblibArtifact). Exactly one of templates or model must be provided.

Both fields use default_factory so that from_overrides() merges partial dicts correctly. The _exclusive_source validator checks model_fields_set and nulls out the field that was not provided.

Attributes:

Name Type Description
templates ParquetArtifact | None

Templates artifact to fit from. Mutually exclusive with model.

model M | None

Pre-fitted model artifact. Mutually exclusive with templates.

GlobalScaler

GlobalScaler(inputs: Inputs, params: dict[str, object] | None = None)

Fit a StandardScaler on templates and scale per-sequence data.

Consumes a templates artifact (from ExtractTemplates or any feature producing templates.parquet). Produces a scaler model bundle and scaled templates.

Parameters:

Name Type Description Default
templates

Templates artifact to fit the scaler on (inherited from GlobalModelParams).

required
model

Pre-fitted ScalerModelArtifact to load (skip fit). Default: ScalerModelArtifact().

required

Params

Bases: GlobalModelParams[ScalerModelArtifact]

GlobalScaler parameters.

Attributes:

Name Type Description
templates ParquetArtifact | None

Templates artifact to fit scaler on.

model ScalerModelArtifact | None

Pre-fitted scaler model artifact (skip fit).

GlobalTSNE

GlobalTSNE(inputs: Inputs, params: dict[str, object] | None = None)

Fit an openTSNE embedding on templates and map per-sequence data.

Consumes a templates artifact (from ExtractTemplates, GlobalScaler, or any feature producing templates). Produces an embedding model bundle and template coordinates.

Parameters:

Name Type Description Default
templates

Templates artifact to fit embedding on (inherited from GlobalModelParams).

required
model

Pre-fitted TSNEModelArtifact to load (skip fit). Default: TSNEModelArtifact().

required
random_state

Random seed. Default: 42.

required
perplexity

t-SNE perplexity parameter. Default: 50.

required
knn_method

kNN backend — "annoy", "faiss", or "faiss-gpu". Default: "annoy".

required
n_jobs

Number of parallel jobs for openTSNE. Default: 8.

required
fit

TSNEFitConfig controlling learning rate, exaggeration iterations, momentum, etc. Default: TSNEFitConfig().

required
mapping

TSNEMapConfig controlling partial-embedding parameters (k, iterations, chunk_size, etc.). Default: TSNEMapConfig().

required

Params

Bases: GlobalModelParams[TSNEModelArtifact]

Global t-SNE parameters.

Attributes:

Name Type Description
templates ParquetArtifact | None

Templates artifact to fit embedding on.

model TSNEModelArtifact | None

Pre-fitted embedding model artifact (skip fit).

random_state int

Random seed. Default 42.

perplexity int

t-SNE perplexity. Default 50.

knn_method str

kNN method ("annoy", "faiss", "faiss-gpu"). Default "annoy".

n_jobs int

Parallel jobs for openTSNE. Default 8.

fit TSNEFitConfig

Embedding fitting parameters.

mapping TSNEMapConfig

Partial embedding mapping parameters.

GlobalWardClustering

GlobalWardClustering(inputs: Inputs, params: dict[str, object] | None = None)

Ward hierarchical clustering on templates with per-sequence 1-NN assignment.

Parameters:

Name Type Description Default
templates

Templates artifact to cluster (inherited from GlobalModelParams).

required
model

Pre-fitted WardModelArtifact to load (skip fit). Default: WardModelArtifact().

required
n_clusters

Number of clusters to cut from the linkage tree. Default: 20.

required
method

Linkage method passed to scipy.cluster.hierarchy.linkage. Default: "ward".

required
pair_filter

Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.

required

Params

Bases: GlobalModelParams[WardModelArtifact]

Global Ward clustering parameters.

Attributes:

Name Type Description
templates ParquetArtifact | None

Templates artifact to cluster (inherited).

model WardModelArtifact | None

Pre-fitted Ward model artifact (skip fit).

n_clusters int

Number of clusters to cut. Default 20.

method str

Linkage method. Default "ward".

pair_filter NNResult | None

Nearest-neighbor pair filter. Default None.

GroundTruthLabelsSource

Bases: LabelsSource[Literal['behavior']]

Labels loaded from labels//index.csv.

IdTagColumns

IdTagColumns(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Attach per-id label fields (from labels/) to each frame, so they can be merged via Inputs() and used as categories (e.g., focal/nonfocal).

Outputs per row (same granularity as input tracks/feature): frame/time/id/group/sequence + one column per requested label field.

Parameters:

Name Type Description Default
labels

LabelsSource specifying which labels directory to load. Default: LabelsSource(kind="id_tags").

required
label_kind

Label subdirectory name used for dependency resolution. Default: "id_tags".

required
fields

List of label field names to attach. None means all fields found in the labels file. Default: None.

required
field_renames

Optional mapping of original field names to renamed column names in the output. Default: None.

required

Inputs

Bases: RootModel[tuple[InputItem, ...]], Generic[InputItem]

Base class for feature input collections. Mirrors Params.

Each Feature subclasses to narrow allowed input types, paralleling class Params(Params):.

Examples:

Inputs(("tracks",)) Inputs((Result(feature="speed-angvel"),)) Inputs(("tracks", Result(feature="nn", run_id="0.1-abc")))

Per-feature narrowing

class Inputs(Inputs[TrackInput]): pass

Features that take no pipeline inputs

class Inputs(Inputs[Result]): _require: ClassVar[InputRequire] = "empty"

Self-loading features that optionally accept inputs (e.g. fit + assign): class Inputs(Inputs[Result]): _require: ClassVar[InputRequire] = "any"

InputsLike

Bases: Protocol

Read-only interface satisfied by any Inputs[InputItem].

KpmsFeature

KpmsFeature(inputs: Inputs, params: dict[str, object] | None = None)

Unified keypoint-MoSeq feature: fit + apply via persistent subprocess.

Parameters:

Name Type Description Default
model

Pre-fitted KpmsModelArtifact to load (skip fit). Default: None (fit from scratch).

required
kpms_python

Path to a Python interpreter with keypoint-moseq installed. None uses the bundled external .venv. Default: None.

required
pose

Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().

required
anterior_bodyparts

List of bodypart names forming the anterior reference (required, min 1 element).

required
posterior_bodyparts

List of bodypart names forming the posterior reference (required, min 1 element).

required
fps

Frames per second of the input data. Default: 30.

required
num_iters_ar

Number of AR-only fitting iterations. Default: 50.

required
num_iters_full

Number of full model fitting iterations. Default: 500.

required
kappa_ar

AR transition concentration parameter. None lets keypoint-moseq choose. Default: None.

required
kappa_full

Full-model transition concentration parameter. None lets keypoint-moseq choose. Default: None.

required
latent_dim

Dimensionality of the latent pose space. Must satisfy latent_dim < 2 * num_keypoints. Default: 10.

required
location_aware

If True, include centroid location in the model. Default: False.

required
outlier_scale_factor

Scale factor for outlier detection. Default: 6.0.

required
remove_outliers

If True, remove detected outlier frames before fitting. Default: True.

required
mixed_map_iters

Number of mixed MAP iterations. None uses the keypoint-moseq default. Default: None.

required
parallel_message_passing

Enable parallel message passing. None uses the keypoint-moseq default. Default: None.

required
resume

If True, resume fitting from a previously saved checkpoint. Default: True.

required
downsample_rate

Temporal downsampling factor applied before fitting. None disables downsampling. Default: None.

required
save_every_n_iters

Save a checkpoint every N iterations during fit. Default: 25.

required
num_iters_apply

Number of iterations when applying the model to new data. Default: 500.

required

LightningActionFeature

LightningActionFeature(inputs: Inputs, params: dict[str, object] | None = None)

Supervised temporal action segmentation via lightning-action.

Trains a temporal neural network classifier (DilatedTCN, RNN, or TemporalMLP head + linear classifier) on labeled templates and predicts per-frame action probabilities.

Parameters:

Name Type Description Default
model

Pre-fitted LightningActionModelArtifact to load (skip training). Default: LightningActionModelArtifact().

required
head

Temporal encoder architecture — "dtcn" (dilated temporal convolution), "rnn" (LSTM/GRU), or "temporalmlp". Default: "dtcn".

required
num_hid_units

Hidden units in the temporal encoder. Default: 64.

required
num_layers

Number of encoder layers. Default: 2.

required
num_lags

Lag/kernel size for temporal context. Default: 4.

required
activation

Activation function. Default: "lrelu".

required
dropout_rate

Dropout rate. Default: 0.1.

required
sequence_length

Training sequence length (frames per chunk). Default: 500.

required
num_epochs

Number of training epochs. Default: 200.

required
batch_size

Training batch size. Default: 32.

required
learning_rate

Optimizer learning rate. Default: 1e-3.

required
weight_decay

Optimizer weight decay. Default: 0.0.

required
optimizer

Optimizer type. Default: "Adam".

required
weight_classes

If True, weight loss by inverse class frequency. Default: True.

required
device

Compute device — "cpu" or "gpu". Default: "cpu".

required
random_state

Random seed. Default: 42.

required
decision_threshold

Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None.

required
default_class

Class label assigned when no class exceeds the decision threshold (required).

required

NearestNeighbor

NearestNeighbor(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature computing nearest-neighbor identity and relative kinematics.

Outputs per frame (one row per individual): - nn_id: id of nearest neighbor (NaN if none) - nn_delta_x / nn_delta_y: neighbor position minus focal, world frame - nn_dist: Euclidean distance to nearest neighbor - nn_delta_angle: neighbor heading minus focal, wrapped to [-pi, pi] - nn_delta_x_ego / nn_delta_y_ego: neighbor offset in focal ego frame

NearestNeighborDelta

NearestNeighborDelta(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature that measures how a focal fish changes position/heading/speed over the next diff_numframes frames relative to its nearest neighbor at the current frame.

Expected inputs (via tracks or an Inputs() that merges tracks + nearest-neighbor feature): - position/heading/speed columns for the focal (x, y, ANGLE, speed_col) - nearest-neighbor id column (nn_id_col, default: 'nn_id') - neighbor offsets in ego frame (nn_delta_x_ego / nn_delta_y_ego); if missing, world offsets (nn_delta_x / nn_delta_y) are rotated using the focal heading.

Outputs per focal row (filtered to frames with a valid future sample diff_numframes ahead): frame, id, group, sequence, nn_id, neighbor_x/y (ego), neighbor_focal (if available), dx, dy, dt, dangle (wrapped; optionally scaled by fps), dspeed, plus passthrough columns like group_size/event/Focal_fish when present.

Parameters:

Name Type Description Default
sampling

Frame rate and smoothing settings. Default: SamplingConfig().

required
speed_col

Column name for speed. Default: "SPEED#wcentroid".

required
nn_id_col

Column name for the nearest-neighbor ID. Default: "nn_id".

required
nn_dx_ego_col

Column for neighbor delta-x in ego frame. Default: "nn_delta_x_ego".

required
nn_dy_ego_col

Column for neighbor delta-y in ego frame. Default: "nn_delta_y_ego".

required
nn_dx_world_col

Fallback column for neighbor delta-x in world frame (used when ego columns are absent). Default: "nn_delta_x".

required
nn_dy_world_col

Fallback column for neighbor delta-y in world frame. Default: "nn_delta_y".

required
focal_col

Column name for the focal-animal flag. Default: "Focal_fish".

required
diff_numframes

Number of frames ahead to compute the future response delta. Default: 4.

required
wrap_angle

If True, wrap heading differences to [-pi, pi]. Default: True.

required
divide_dangle_by_frames

If True, divide the heading change by diff_numframes. Default: True.

required
scale_dangle_by_fps

If True, multiply dangle by fps to convert to radians/sec. Default: True.

required
tag_cols

Additional columns to pass through to the output. Default: [].

required

NearestNeighborDeltaBins

NearestNeighborDeltaBins(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Bin nearest-neighbor response fields (dangle, dspeed) over neighbor position.

Inputs: expect outputs from nn-delta-response (neighbor_x/neighbor_y in ego frame, dangle, dspeed, group_size, and focal/neighbor category columns).

tidy DataFrame with mean turn/speed per bin for focal role and neighbor role:

columns: [group, sequence, exp, trial, role, category, group_size, metric, bin_idx, value]

Parameters:

Name Type Description Default
nbins

Number of spatial bins along the binning axis. Default: 45.

required
binmax

Maximum absolute value for bin edges. Default: 14.0.

required
max_for_avg

Maximum neighbor distance used when computing binned-mean responses. Default: 5.0.

required
antisymm

If True, use front/back antisymmetric folding for turn-force computation. Default: True.

required
focal_category_col

Column name for the focal animal's category flag. Default: "Focal_fish".

required
neighbor_category_col

Column name for the neighbor's category flag. Default: "neighbor_focal".

required
group_size_col

Column name for group size. Default: "group_size".

required
exp_col

Column name for experimental condition. Default: "Exp".

required
trial_col

Column name for trial identifier. Default: "Trial".

required
category_specs

List of dicts defining derived category columns (keys: source_col, new_col, quantile, op). Default: [].

required
exclude_cols

List of boolean column names whose truthy rows are dropped before computation. Default: [].

required
nonfocal_flag_col

Column used to flag nonfocal animals. Default: "Focal_fish".

required
nonfocal_flag_value

Value in nonfocal_flag_col that marks an animal as nonfocal. Default: False.

required

OrientationRelativeFeature

OrientationRelativeFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Orientation-aware relative features between animal pairs, order-agnostic to pose points.

For each frame and ordered pair (id_a -> id_b): - Express B in A's body frame (using heading angle and global scale). - Emit signed centroid deltas, heading difference, quantiles over B's points in A's frame, and nearest-k distances.

Params

Bases: Params

Orientation-relative feature parameters.

Attributes:

Name Type Description
scale BodyScaleResult

Body-scale artifact for normalization.

nearest_k int

Number of nearest pose-point distances to emit. Default 3.

quantiles list[float]

Distance distribution quantiles to compute. Default [0.25, 0.5, 0.75].

PairEgocentricFeatures

PairEgocentricFeatures(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-egocentric' -- per-sequence egocentric + kinematic features for dyads. Produces a row-wise DataFrame with columns: - frame (if available) or time passthrough (only if it's the order col) - perspective: 0 for A->B, 1 for B->A - id1, id2: pair identifiers - feature columns (e.g., A_speed, AB_dx_egoA, ...) - (optionally) group/sequence if present in df, for convenience

This feature is stateless (no fitting). It computes features for all C(n,2) pairs per sequence, cleans/interpolates pose per animal, inner-joins by the chosen order column, and computes A->B and B->A features for each pair.

Parameters:

Name Type Description Default
interpolation

Interpolation settings for missing pose data. Default: InterpolationConfig().

required
sampling

Frame rate and smoothing settings. Default: SamplingConfig().

required
pose

Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().

required
neck_idx

Index of the neck keypoint in the pose array, used to compute heading direction. Default: 3.

required
tail_base_idx

Index of the tail-base keypoint, paired with neck_idx for heading vector. Default: 6.

required
center_mode

How to compute the animal's center — "mean" averages all keypoints, other values use a specific keypoint. Default: "mean".

required

PairInteractionFilter

PairInteractionFilter(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Detect pairwise interaction segments from trajectory data.

For every unique pair of individuals in a sequence, tests per-frame distance and (optionally) angular criteria, applies morphological filtering, and extracts continuous interaction segments that meet a minimum duration.

Output columns (one row per frame per interaction segment): - frame: frame number - id_a, id_b: individual IDs (id_a < id_b by convention) - interaction_id: integer label for the segment within this pair - interaction_start: first frame of this segment - interaction_end: last frame (exclusive) of this segment

Params

shift_dist : float Pixel shift along heading before distance check (default 15). Set to 0 to use raw positions without forward shift. max_dist : float Maximum shifted-position distance in pixels (default 40). require_facing : bool If True (default), require individuals to face each other (inverse orientation difference < max_inv_orientation_diff_deg). Set to False for distance-only filtering. max_inv_orientation_diff_deg : float Max angle (degrees) between inverse orientations (default 80). Only used when require_facing=True. min_run_frames : int Minimum continuous frames for a valid interaction (default 250). frame_padding : int Frames to pad before/after each segment (default 10). morphological_structure_size : int Structure element length for binary close/open (default 25). Set to 0 to disable morphological filtering. px_scale : float Scale factor applied to shift_dist and max_dist (default 1.0). Use to adjust for videos with different pixel resolutions. use_pixel_coords : bool If True, use poseX/poseY columns (pixel coordinates) for distance calculations instead of X/Y (world coordinates). Default True since thresholds are in pixel units. pose_head_index : int | None If set and use_pixel_coords is True, use this pose index as the position for distance calculations.

PairPoseDistancePCA

PairPoseDistancePCA(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-posedistance-pca' — builds per-frame pairwise pose-distance features and fits an IncrementalPCA globally; outputs PC scores per sequence (and perspective).

Parameters:

Name Type Description Default
interpolation

Interpolation settings for missing pose data. Default: InterpolationConfig().

required
pose

Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().

required
include_intra_A

If True, include intra-animal A pairwise keypoint distances. Default: True.

required
include_intra_B

If True, include intra-animal B pairwise keypoint distances. Default: True.

required
include_inter

If True, include inter-animal pairwise keypoint distances. Default: True.

required
duplicate_perspective

If True, output both A->B and B->A perspectives per pair. Default: True.

required
n_components

Number of PCA components to retain. Default: 6.

required
batch_size

Batch size for IncrementalPCA partial_fit. Default: 5000.

required

PairPositionFeatures

PairPositionFeatures(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-position' -- per-sequence egocentric + kinematic features for all pairs.

Unlike PairEgocentricFeatures which requires full pose keypoints, this feature works with minimal input: just (x, y, angle) per animal.

For N animals per sequence, computes features for all N*(N-1)/2 unique pairs, each with two perspectives (A->B and B->A).

Output columns (per row): - frame: frame number - perspective: 0 for A->B, 1 for B->A - id1, id2: IDs of the two animals in this pair - A_speed, A_v_para, A_v_perp, A_ang_speed: focal kinematics - A_heading_cos, A_heading_sin: focal heading - AB_dist: inter-animal distance - AB_dx_egoA, AB_dy_egoA: partner position in focal's egocentric frame - rel_heading_cos, rel_heading_sin: relative heading - B_speed, B_v_para, B_v_perp, B_ang_speed: partner kinematics - (optionally) group, sequence for convenience

Parameters:

Name Type Description Default
interpolation

Interpolation settings for missing position data. Default: InterpolationConfig().

required
sampling

Frame rate and smoothing settings. Default: SamplingConfig().

required

PairWavelet

PairWavelet(inputs: Inputs, params: dict[str, object] | None = None)

CWT spectrograms on PairPoseDistancePCA outputs.

Expects input df to contain columns
  • 'perspective' (0 = A->B, 1 = B->A)
  • 'frame' (preferred) or 'time' (if used as order column)
  • PC0..PC{k-1} (k = number of PCA components)
Returns a DataFrame with columns
  • frame (or time if that was the order col)
  • perspective
  • W_{col}_f{fi} (log-power, clamped, for each component x frequency) and (optionally) passthrough group/sequence if present in df.

Stateless (no fitting). FPS is inferred from constant df['fps'] if present, otherwise from fps_default. Frequencies are dyadically spaced in [f_min, f_max].

Parameters:

Name Type Description Default
sampling

Frame rate and smoothing settings. Default: SamplingConfig().

required
f_min

Minimum frequency in Hz for the CWT band. Default: 0.2.

required
f_max

Maximum frequency in Hz for the CWT band. Default: 5.0.

required
n_freq

Number of frequency bins (dyadically spaced between f_min and f_max). Default: 25.

required
wavelet

PyWavelets wavelet name. Default: "cmor1.5-1.0".

required
log_floor

Floor value for log-power clamping. Default: -3.0.

required
pc_prefix

Column prefix used to auto-detect PC input columns (e.g. "PC0", "PC1", ...). Default: "PC".

required
cols

Explicit list of input column names. If None, columns are auto-detected using pc_prefix. Default: None.

required

Result

Bases: StrictModel, Generic[F]

Reference to a prior feature's output as pipeline input.

Attributes:

Name Type Description
feature F

Feature name whose output to consume.

run_id str | None

Specific run ID, or None for latest finished run.

use_latest

use_latest() -> Self

Return a copy with run_id=None (resolves to latest run).

ResultColumn

Bases: Result[str]

Reference to a column in a feature's standard parquet output.

Attributes:

Name Type Description
feature str

Source feature name.

column str

Column name to extract from the parquet output.

run_id str | None

Specific run ID, or None for latest.

from_result

from_result(result: Result[str]) -> Self

Return a copy with feature and run_id set from another Result.

SpeedAngvel

SpeedAngvel(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature computing translational speed and angular velocity.

Outputs (per frame): - speed: displacement magnitude between consecutive frames divided by dt - angvel: wrapped heading difference (rad) divided by dt - speed_step / angvel_step: same, but using a configurable step_size (omitted if step_size is None) - speed_smooth: Savitzky-Golay smoothed speed (polyorder=1), only present when smooth_window is set in Params

Time-delta (dt) computation: Speed and angular velocity require dividing by a time interval. The source for dt is chosen by priority:

  1. frame + fps (recommended for constant-fps video): when fps is set in Params, dt is computed as frame_diff / fps. This is immune to irregular real timestamps that some trackers embed in the time column (e.g. TRex uses wall-clock timestamps that may jitter by several milliseconds per frame). It also correctly handles frame gaps from dropped/bad frames.
  2. time column: if fps is not set but a time column exists, dt is computed from consecutive time differences.
  3. array index: last resort when neither frame+fps nor time is available — assumes each row is one step apart.

For most video-based tracking data, setting fps is strongly recommended to avoid speed artifacts from timestamp jitter.

Parameters:

Name Type Description Default
step_size

If set, also compute speed_step / angvel_step using this frame step (in addition to step=1). Default: None.

required
smooth_window

If set, apply Savitzky-Golay smoothing (polyorder=1) over this many frames to produce speed_smooth. Default: None.

required
fps

Frames per second. When set, dt is derived from frame_diff/fps instead of the time column — more robust for constant-fps data with jittery timestamps. Default: None.

required

TemporalStackingFeature

TemporalStackingFeature(inputs: Inputs, params: dict[str, object] | None = None)

Build temporal context windows over per-sequence feature data.

Parameters:

Name Type Description Default
half

Half-width of the temporal window in frames. The full window spans [-half, +half]. Default: 60.

required
skip

Step size between time offsets in the stacking window. Default: 5.

required
use_temporal_stack

If True, concatenate Gaussian-smoothed copies at each time offset. Default: True.

required
sigma_stack

Gaussian sigma (in frames) for smoothing before stacking. 0 disables smoothing. Default: 30.0.

required
add_pool

If True, append pooled statistics (e.g. mean, std) computed over a sliding Gaussian window. Default: True.

required
pool_stats

Tuple of pooled statistics to compute. Supported: "mean", "std", "variance". Default: ("mean",).

required
sigma_pool

Gaussian sigma (in frames) for the pooling window. Default: 30.0.

required
fps

Frames per second; used to convert win_sec to frames. Default: 30.0.

required
win_sec

Pooling window width in seconds. Default: 0.5.

required
pair_filter

Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.

required

TrajectorySmooth

TrajectorySmooth(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature that smooths and interpolates trajectory positions.

Pipeline (per individual): 1. Bad-frame detection: flag frames with speed > speed_threshold, expand flagged region by expand_frames in each direction. 2. Interpolation: set positions to NaN at bad frames, linearly interpolate, forward/backward fill edges. Controlled separately for centroid (interpolate_centroid) and pose (interpolate_pose). 3. Savgol smoothing: apply savgol_filter to centroid X/Y and all pose columns (always, regardless of interpolation flags).

Output is the full track DataFrame with smoothed positions replacing originals, plus a bad_frame boolean column. Downstream features can consume this via Inputs((Result(feature="trajectory-smooth"),)).

Parameters:

Name Type Description Default
speed_threshold

Speed above which a frame is flagged as bad. When fps is set, interpreted as units/sec (e.g. 40 cm/s); otherwise units/frame. Default: None (no bad-frame detection).

required
fps

Frames per second. When provided, speed_threshold is converted from units/sec to units/frame internally. Default: None.

required
interpolate_centroid

If True, replace bad-frame centroid positions with linear interpolation. Default: True.

required
interpolate_pose

If True, replace bad-frame pose keypoint positions with linear interpolation. Default: False.

required
expand_frames

Number of frames to expand the bad-frame region in each direction. Default: 2.

required
savgol_window

Window length for Savitzky-Golay smoothing. Must be odd and >= savgol_polyorder + 1. None disables smoothing. Default: None.

required
savgol_polyorder

Polynomial order for Savitzky-Golay filter. Default: 2.

required

XgboostFeature

XgboostFeature(inputs: Inputs, params: dict[str, object] | None = None)

XGBoost behavior classifier as a pipeline feature.

Trains on labeled templates (from ExtractLabeledTemplates) and runs per-sequence inference. Supports multiclass and one-vs-rest strategies.

Parameters:

Name Type Description Default
model

Pre-fitted XgboostModelArtifact to load (skip training). Default: XgboostModelArtifact().

required
strategy

Classification strategy — "multiclass" trains a single multi-class model; "one_vs_rest" trains one binary classifier per class. Default: "multiclass".

required
decision_threshold

Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None.

required
default_class

Class label assigned when no class exceeds the decision threshold (required).

required
class_weight

If "balanced", adjust sample weights inversely proportional to class frequency. Default: "balanced".

required
use_smote

If True, apply SMOTE oversampling to the training set. Default: False.

required
undersample_ratio

If set, undersample majority classes to this ratio relative to the minority class before SMOTE. Default: None.

required
n_estimators

Number of boosting rounds. Default: 100.

required
max_depth

Maximum tree depth. Default: 6.

required
learning_rate

Boosting learning rate. Default: 0.1.

required
subsample

Fraction of training samples used per tree. Default: 0.8.

required
colsample_bytree

Fraction of features used per tree. Default: 0.8.

required
random_state

Random seed for reproducibility. Default: 42.

required

approach_avoidance

ApproachAvoidance feature.

Detects approach-avoidance (AA) events for all C(n,2) unordered pairs per sequence.

Default decision logic follows trajognize AA
  • role-specific speed thresholds (approacher vs avoider)
  • distance threshold
  • cosine thresholds between velocity and pair direction
  • approacher forward-motion gate vs body orientation
  • minimum event continuity (min_event_count of min_event_length frames)

Optional sliding-window averaging can be enabled, but it is OFF by default to preserve trajognize-style framewise behavior.

Output columns (per frame × pair): - frame, id1, id2 (canonical order: id1 < id2) - label_id: primary non-directional AA label for visualization compatibility - aa_event: 1 if either direction is active - aa_event_12: 1 if id1 approaches and id2 avoids - aa_event_21: 1 if id2 approaches and id1 avoids - sequence, group (metadata pass-through)

ApproachAvoidance

ApproachAvoidance(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'approach-avoidance' — per-sequence AA event detection for all pairs.

For N animals per sequence, evaluates all N*(N-1)/2 unique unordered pairs. The output stores directional events as aa_event_12 and aa_event_21 over canonical (id1,id2), plus aa_event/label_id as non-directional union.

Parameters:

Name Type Description Default
interpolation

Interpolation settings for missing data. Default: InterpolationConfig().

required
sampling

Frame rate and smoothing settings. Default: SamplingConfig().

required
velocity_units

Whether speed thresholds are in "per_frame" or "per_second". Default: "per_frame".

required
angle_units

Unit for heading angles — "radians", "degrees", or "auto" (detect from data range). Default: "radians".

required
consecutive_frame_delta

Expected frame step between consecutive rows; used to detect gaps. Default: 1.0.

required
distance_threshold

Maximum inter-animal distance (in position units) for a frame to be considered AA-eligible. Default: 200.0.

required
approacher_velocity_threshold

Minimum speed of the approaching animal. Default: 5.0.

required
avoider_velocity_threshold

Minimum speed of the avoiding animal. Default: 5.0.

required
cos_approacher_threshold

Minimum cosine between the approacher's velocity vector and the direction toward the partner. Default: 0.8.

required
cos_avoider_threshold

Minimum cosine between the avoider's velocity vector and the direction away from the partner. Default: 0.5.

required
min_event_length

Minimum number of contiguous qualifying frames to form an event. Default: 10.

required
min_event_count

Minimum number of qualifying frames within an event run to keep it. Default: 5.

required
orientation_gate_cos

If set, require the approacher's body orientation to align with its velocity (cos threshold). Default: cos(30°) ≈ 0.866. None disables the gate.

required
smooth_window_sec

If set, apply a sliding-window average (in seconds) to velocities before thresholding. Default: None (disabled; framewise behaviour).

required
extract_events staticmethod
extract_events(aa_df: DataFrame, min_duration: int = 1) -> pd.DataFrame

Convert per-frame AA output into a compact event table.

Parameters

aa_df : DataFrame Per-frame output with columns: frame, id1, id2, aa_event, aa_event_12, aa_event_21. May span multiple sequences/groups (they are handled independently). min_duration : int Minimum event length in frames. Events shorter than this are discarded.

Returns

DataFrame with columns: id1, id2, start_frame, end_frame, duration, direction ('12' if id1→id2, '21' if id2→id1, 'both'), approacher_id, avoider_id, sequence (if present), group (if present).

arhmm

AR-HMM global feature.

Fits an autoregressive Hidden Markov Model on arbitrary upstream feature inputs and produces per-frame syllable (state) labels. This is a native mosaic implementation — no KPMS or JAX dependency.

The feature accepts any combination of upstream Result inputs. Mosaic's manifest system merges them via inner join on alignment columns, so the feature receives a single merged DataFrame whose numeric columns are the union of all input features.

ArHmmFeature

ArHmmFeature(inputs: Inputs, params: dict[str, object] | None = None)

AR-HMM behavioral syllable discovery as a pipeline feature.

Fits an autoregressive Hidden Markov Model across all input sequences and assigns per-frame syllable labels via Viterbi decoding.

Parameters:

Name Type Description Default
model

Pre-fitted ArHmmModelArtifact to load (skip fit). Default: None (fit from scratch).

required
pca_dim

Number of PCA components for dimensionality reduction before fitting. None skips PCA. Default: None.

required
n_states

Maximum number of HMM states (pruned after fit). Default: 50.

required
n_lags

AR order (number of lagged frames as regressors). Default: 1.

required
sticky_weight

Extra pseudo-count on the diagonal of the transition matrix (encourages state persistence). Default: 100.0.

required
n_iter

Maximum EM iterations per restart. Default: 200.

required
tol

Convergence tolerance on relative LL change. Default: 1e-4.

required
n_restarts

Number of random restarts (best LL kept). Default: 1.

required
standardize

If True, z-score features before fitting. Default: True.

required
downsample_rate

Temporal downsampling factor. None disables. Default: None.

required
prune_threshold

Drop states with posterior mass below this fraction. Default: 0.01.

required
random_state

Random seed. Default: 42.

required

ArHmmModelArtifact

Bases: JoblibArtifact[ArHmmModelBundle]

Fitted AR-HMM model bundle (arhmm_model.joblib).

arhmm_model

Autoregressive Hidden Markov Model (AR-HMM) with EM fitting.

A standalone implementation using numpy/scipy — no external HMM library required. Fits switching autoregressive dynamics with sticky transitions via Expectation–Maximisation and decodes the most-likely state sequence with the Viterbi algorithm.

This module has no mosaic imports and can be tested independently.

ARHMM dataclass

ARHMM(n_states: int = 50, n_lags: int = 1, sticky_weight: float = 100.0, n_iter: int = 200, tol: float = 0.0001, n_restarts: int = 1, random_state: int | None = None, A_: ndarray | None = None, Q_: ndarray | None = None, Q_cho_: list | None = None, Q_logdet_: ndarray | None = None, log_transmat_: ndarray | None = None, log_startprob_: ndarray | None = None, n_features_: int | None = None, active_states_: ndarray | None = None)

Autoregressive Hidden Markov Model.

Each of the K discrete states owns an AR(n_lags) linear model:

x_t = A_k @ [x_{t-1}; ...; x_{t-nlags}; 1] + ε,   ε ~ N(0, Q_k)

Transitions between states are governed by a K × K matrix with a sticky prior that encourages self-transitions (controlled by sticky_weight).

Parameters

n_states : int Maximum number of hidden states. n_lags : int AR order (number of lagged frames used as regressors). sticky_weight : float Extra pseudo-count added to the diagonal of the transition matrix during M-step updates. Larger values → states persist longer. n_iter : int Maximum EM iterations per restart. tol : float Convergence threshold on relative change in log-likelihood. n_restarts : int Number of random restarts; the best (highest LL) is kept. random_state : int | None Seed for reproducibility.

fit
fit(sequences: list[ndarray]) -> ARHMM

Fit the AR-HMM via EM on sequences.

Parameters

sequences : list of ndarray, each shape (T_i, D) Feature matrices for each sequence.

Returns

self

predict
predict(X: ndarray) -> np.ndarray

Viterbi decoding → per-frame state labels.

Parameters

X : ndarray, shape (T, D)

Returns

labels : ndarray of int32, shape (T,) State assignments. The first n_lags frames are assigned the same state as frame n_lags (the earliest decodable frame).

prune_states
prune_states(sequences: list[ndarray], threshold: float = 0.01) -> None

Drop states whose posterior mass is below threshold.

Re-indexes the remaining states to 0..K'-1.

score
score(X: ndarray) -> float

Log-likelihood of X under the fitted model.

body_scale

BodyScaleFeature feature.

Extracted from features.py as part of feature_library modularization.

BodyScaleFeature

BodyScaleFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-frame body scale: median intra-animal pose distance.

Outputs per sequence parquet with columns: frame, id, scale, sequence, group. Intended to be averaged later (per sequence or dataset) to derive a single normalization constant for downstream orientation features.

external

External tool runners for mosaic.

Scripts in this directory bridge mosaic with external packages that have incompatible dependencies or restrictive licenses. They are invoked via subprocess using a separate Python environment.

kpms_protocol

Shared protocol models and wire helpers for the kpms server/client.

Defines the request/response Pydantic models and the newline-delimited JSON framing used over Unix domain sockets. Importable from both the main mosaic environment (client) and the external .venv (server).

Dependencies: pydantic, numpy (available in both environments).

check_latent_dim
check_latent_dim(latent_dim: int, num_keypoints: int) -> None

Raise ValueError if latent_dim exceeds (num_keypoints - 1) * 2.

receive_message
receive_message(conn: socket) -> bytes

Read a single newline-terminated line from conn.

send_message
send_message(conn: socket, message: BaseModel) -> None

Send a newline-terminated JSON message.

kpms_server

Persistent subprocess server for keypoint-moseq operations.

Runs in the external .venv (keypoint-moseq environment). Imports JAX and keypoint-moseq once at startup, then serves commands over a Unix domain socket.

Commands: add_track, fit, load_model, apply, save_model, shutdown.

Wire protocol: newline-delimited JSON. Arrays are base64-encoded in the JSON with dtype and shape metadata.

Usage::

.venv/bin/python kpms_server.py /tmp/kpms.sock
prctl_set_pdeathsig
prctl_set_pdeathsig() -> None

Ask the kernel to send SIGTERM when the parent process dies.

recv_request
recv_request(conn: socket) -> Request

Read a newline-terminated JSON request.

serve
serve(server: KpmsServer, conn: socket) -> None

Read commands from conn and dispatch to server handlers.

extract_labeled_templates

ExtractLabeledTemplates

ExtractLabeledTemplates(inputs: Inputs, params: dict[str, object] | None = None)

Extract labeled, split-annotated templates from upstream features.

Streams upstream feature data, aligns ground truth labels from NPZ files, assigns train/test splits by sequence, and subsamples per class. Produces a templates parquet with feature columns + label (int) + split (str).

Parameters:

Name Type Description Default
labels

GroundTruthLabelsSource specifying where to load per-frame ground-truth labels (required).

required
strategy

Template selection method — "random" or "farthest_first". Default: "random".

required
n_per_class

Number of templates per class. An int applies uniformly; a dict maps class -> count. Exactly one of n_per_class or n_total must be set. Default: None.

required
n_total

Total number of templates across all classes (distributed proportionally). Exactly one of n_per_class or n_total must be set. Default: None.

required
pool

PoolConfig controlling candidate pool size and allocation. Default: PoolConfig().

required
test_fraction

Fraction of sequences held out for the test split. Default: 0.2.

required
random_state

Random seed for reproducibility. Default: 42.

required

LabeledProvenanceArtifact

Bases: ParquetArtifact

Per-entry template provenance (template_provenance.parquet).

LabeledTemplatesArtifact

Bases: ParquetArtifact

Labeled template feature vectors (templates.parquet).

Uses numeric_only=False because the parquet contains the str 'split' column alongside numeric feature columns and int 'label'.

extract_templates

ExtractTemplates

ExtractTemplates(inputs: Inputs, params: dict[str, object] | None = None)

Subsample per-sequence data into a representative template matrix.

Entry point for the global feature pipeline. Streams per-sequence inputs, builds a candidate pool with proportional per-entry contribution, and selects templates using the configured strategy.

Parameters:

Name Type Description Default
strategy

Template selection method — "random" for uniform random sampling, "farthest_first" for greedy diversity maximization. Default: "random".

required
n_templates

Number of templates to select (required).

required
pool

PoolConfig controlling candidate pool size, allocation strategy, and per-entry caps. Default: PoolConfig().

required
random_state

Random seed for reproducibility. Default: 42.

required
pair_filter

Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.

required
Params

Bases: Params

ExtractTemplates parameters.

Attributes:

Name Type Description
strategy Literal['random', 'farthest_first']

Selection strategy. Default "random".

n_templates int

Number of templates to select. Required.

pool PoolConfig

Pool configuration. Default PoolConfig().

random_state int

Random seed. Default 42.

ProvenanceArtifact

Bases: ParquetArtifact

Per-entry template provenance (template_provenance.parquet).

TemplatesArtifact

Bases: ParquetArtifact

Template feature vectors (templates.parquet).

feature_template__global

Template for a global feature (clustering, embedding, dimensionality reduction).

Copy this file, rename the class and name, and fill in your logic.

Protocol (4 attributes + 4 methods): - name, version, parallelizable, scope_dependent - load_state(run_root, artifact_paths, dependency_lookups) -> bool - fit(inputs: factory returning iterator of (entry_key, DataFrame)) -> None - save_state(run_root) -> None - apply(df: DataFrame) -> DataFrame

Global features are stateful: fit() iterates over all sequences to build a model, save_state() persists it, and load_state() restores it to skip re-fitting. apply() then maps per-sequence data using the fitted model.

Set scope_dependent = False unless outputs change depending on which sequences are in scope (most global features are scope-independent once fitted).

See GlobalTSNE and GlobalWardClustering for real examples.

MyGlobalFeature

MyGlobalFeature(inputs: Inputs, params: dict[str, object] | None = None)

Template for a global feature.

Global features load data from prior feature outputs (via Result-based inputs), run a cross-sequence algorithm in fit(), and persist the model via save_state(). The apply() method maps per-sequence data using the fitted model.

Typical workflow
  1. load_state() checks for a cached model on disk
  2. fit() iterates over all sequences, accumulates data, runs algorithm
  3. save_state() persists the model to run_root
  4. apply() maps per-sequence data using the fitted model
Params

Bases: Params

Global feature template parameters.

Attributes:

Name Type Description
random_state int

Random seed. Default 42.

feature_template__per_sequence

Template for a per-sequence feature.

Copy this file, rename the class and name, and fill in your logic.

Protocol (4 attributes + 4 methods): - name, version, parallelizable, scope_dependent - load_state(run_root, artifact_paths, dependency_lookups) -> bool - fit(inputs: factory returning iterator of (entry_key, DataFrame)) -> None - save_state(run_root) -> None - apply(df: DataFrame) -> DataFrame

Per-sequence features are stateless by default: load_state returns True (nothing to restore), fit/save_state are no-ops, and apply does all the work. Set scope_dependent = False unless outputs depend on which sequences are in scope.

See SpeedAngvel for a real per-sequence feature.

MyPerSequenceFeature

MyPerSequenceFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Template for a per-sequence feature.

Input

A DataFrame for a single (group, sequence) from either: * tracks (input_kind="tracks") * another feature (input_kind="feature") * a multi-input Inputs() tuple

Output

A DataFrame with one row per frame (or per frame x pair), with: * frame (or time) * group, sequence * id1, id2 (when pair-aware) * your feature columns

Params

Bases: Params

Per-sequence feature template parameters.

Attributes:

Name Type Description
window_size int

Sliding window size. Default 15.

apply
apply(df: DataFrame) -> pd.DataFrame

Compute features for a single (group, sequence).

For pair-aware inputs the df may contain multiple (id1, id2) pairs; process each pair independently to avoid mixing contexts.

feral_feature

FeralFeature -- FERAL vision-transformer behavior classifier as a Mosaic pipeline feature.

Supports both training and inference in a single unified feature, following the same global-feature pattern as XgboostFeature and KpmsFeature.

Training mode

Provide video_dir, label_json, and a training config dict. The label_json file must contain class_names, splits (with train and optionally val/test keys), and optionally is_multilabel. Training runs the full FERAL ViT fine-tuning loop with intermediate checkpoints saved to disk for crash recovery. After training, the test split (if present) is automatically evaluated.

Inference mode

Provide model_dir pointing to a directory with model_best.pt and config.json from a previous training run.

Output follows the same pattern as XgboostFeature: per-frame rows with prob_<class> probability columns and a predicted_label column.

Requires the FERAL code directory (https://github.com/Skovorp/feral). Point feral_code_dir to a local clone of the repository.

FeralFeature

FeralFeature(inputs: Inputs, params: dict[str, object] | None = None)

FERAL vision-transformer behavior classifier as a pipeline feature.

Supports two operating modes:

Training mode (video_dir + label_json + training): Runs the full FERAL ViT fine-tuning loop, saves checkpoints, evaluates the test split (if present), then applies to all sequences in the apply phase.

Inference mode (model_dir): Loads a pre-trained FERAL model and runs per-frame behavior classification on crop videos.

Supports two input formats for the apply phase:

  1. InteractionCropPipeline output (pair-level): One row per crop video with video_path, id_a, id_b, target_id, interaction_id, start_frame, end_frame.

  2. EgocentricCrop output (individual-level): One row per frame with target_id, frame. Videos are derived as egocentric_id{target_id}.mp4.

Params

feral_code_dir : Path Path to a local clone of https://github.com/Skovorp/feral. model_name : str HuggingFace model name (default: V-JEPA2 ViT-L). predict_per_item : int Predictions per chunk (default 64). chunk_length : int Frames per video chunk (default 64). chunk_shift : int Stride between chunks for overlapping inference (default 32). chunk_step : int Frame sampling step within chunks (default 1). resize_to : int Input resolution for ViT (default 256). device : str PyTorch device (default "cuda"). class_names : dict | None Class index -> name mapping. Auto-detected from model config. decision_threshold : float | None Probability threshold for positive class. None uses argmax. default_class : int Fallback class when no class exceeds threshold (default 0). model_dir : Path | None Directory with model_best.pt + config.json (inference mode). video_dir : Path | None Directory containing crop videos (training mode). label_json : Path | None Path to FERAL-format label JSON with splits (training mode). training : FeralTrainingConfig | None Training hyperparameters. None = inference-only mode.

bind_dataset
bind_dataset(ds)

Store dataset reference for resolving media paths.

fit
fit(inputs: InputStream) -> None

Train a FERAL model or verify pre-trained model is loaded.

In training mode (video_dir + label_json + training set), runs the full ViT fine-tuning loop with intermediate checkpoints. After training, evaluates the test split if present.

In inference mode (model_dir set), the model is already loaded by load_state() and this method is not called.

The inputs argument is not consumed -- FERAL reads video files directly from params.video_dir.

FeralTrainingConfig

Bases: StrictModel

Training hyperparameters for FERAL ViT fine-tuning.

These mirror the FERAL default_vjepa.yaml configuration.

ffgroups

FFGroups

FFGroups(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence fission-fusion grouping metrics.

Inputs: raw tracks (columns: x, y, id, frame/time, group, sequence). Outputs per (frame, id): - group_membership (component label) - group_size (size of that component) - event (event id from dp.get_events_info, -1 if not in an event)

Parameters:

Name Type Description Default
distance_cutoff

Pairwise distance threshold below which two animals are considered in the same group. Default: 50.0.

required
window_size

Sliding-window size (frames) for smoothing the pairwise distance matrix before thresholding. Default: 5.

required
min_event_duration

Minimum number of contiguous frames for a stable subgroup to be registered as an event. Default: 1.

required

ffgroups_metrics

FFGroupsMetrics

FFGroupsMetrics(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence summary of focal-fish group metrics.

Per-frame computed (internal): - distance_from_centroid, xrot_to_centroid, yrot_to_centroid, dev_speed_to_mean Summaries (output: one row per id within sequence): - fractime_norm2 - avg_duration_frame - med_duration_frame - ftime_periphery - ftime_periphery_norm

Parameters:

Name Type Description Default
group_col

Column name that identifies group events (e.g. from FFGroups output). Default: "event".

required
speed_col

Column name for speed values. Default: "speed".

required
time_chunk_sec

If set, split each sequence into time-based chunks of this duration (seconds) and compute summaries per chunk. Default: None (whole sequence).

required
frame_chunk

If set, split each sequence into frame-based chunks of this size and compute summaries per chunk. Default: None.

required
centroid_heading_col

Column for centroid heading used in rotation calculations. Default: "centroid_heading".

required
exclude_cols

List of boolean column names (e.g. "bad_frame") whose truthy rows are dropped before computation. Default: [].

required

global_kmeans

GlobalKMeansClustering feature.

Extracted from features.py as part of feature_library modularization.

GlobalKMeansClustering

GlobalKMeansClustering(inputs: Inputs, params: dict[str, object] | None = None)

Global K-Means clustering on templates loaded via load_state. Per-sequence cluster assignment is done in apply().

Parameters:

Name Type Description Default
templates

Templates artifact to fit on (inherited from GlobalModelParams).

required
model

Pre-fitted KMeansModelArtifact to load (skip fit). Default: KMeansModelArtifact().

required
k

Number of clusters. Default: 100.

required
random_state

Random seed for KMeans initialization. Default: 42.

required
n_init

Number of KMeans initializations to run. Default: "auto".

required
max_iter

Maximum iterations per KMeans run. Default: 300.

required
device

Compute device — "cpu" or "cuda" (requires cuML). Default: "cpu".

required
label_artifact_points

If True, assign cluster labels to the template points used for fitting. Default: True.

required
pair_filter

Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.

required
Params

Bases: GlobalModelParams[KMeansModelArtifact]

Global K-means clustering parameters.

Attributes:

Name Type Description
templates ParquetArtifact | None

Templates artifact to fit on (inherited).

model KMeansModelArtifact | None

Pre-fitted KMeans model artifact (skip fit).

k int

Number of clusters. Default 100.

random_state int

Random seed. Default 42.

n_init Literal['auto'] | int

KMeans initializations. Default "auto".

max_iter int

Max iterations per run. Default 300.

device str

Compute device. Default "cpu".

label_artifact_points bool

Label points used for fitting. Default True.

pair_filter NNResult | None

Nearest-neighbor pair filter for dependency resolution. Default None.

KMeansArtifactLabelsArtifact

Bases: NpzArtifact

Labels for the artifact points used in fitting (artifact_labels.npz).

KMeansClusterCentersArtifact

Bases: NpzArtifact

Cluster center vectors (cluster_centers.npz).

KMeansClusterSizesArtifact

Bases: ParquetArtifact

Per-cluster sample counts (cluster_sizes.parquet).

KMeansModelArtifact

Bases: JoblibArtifact[KMeansModelBundle]

KMeans model (model.joblib).

global_scaler

GlobalScaler

GlobalScaler(inputs: Inputs, params: dict[str, object] | None = None)

Fit a StandardScaler on templates and scale per-sequence data.

Consumes a templates artifact (from ExtractTemplates or any feature producing templates.parquet). Produces a scaler model bundle and scaled templates.

Parameters:

Name Type Description Default
templates

Templates artifact to fit the scaler on (inherited from GlobalModelParams).

required
model

Pre-fitted ScalerModelArtifact to load (skip fit). Default: ScalerModelArtifact().

required
Params

Bases: GlobalModelParams[ScalerModelArtifact]

GlobalScaler parameters.

Attributes:

Name Type Description
templates ParquetArtifact | None

Templates artifact to fit scaler on.

model ScalerModelArtifact | None

Pre-fitted scaler model artifact (skip fit).

ScaledTemplatesArtifact

Bases: ParquetArtifact

Scaled template vectors (scaled_templates.parquet).

ScalerModelArtifact

Bases: JoblibArtifact[ScalerModelBundle]

Fitted scaler model bundle (scaler.joblib).

global_tsne

GlobalTSNE feature.

GlobalTSNE

GlobalTSNE(inputs: Inputs, params: dict[str, object] | None = None)

Fit an openTSNE embedding on templates and map per-sequence data.

Consumes a templates artifact (from ExtractTemplates, GlobalScaler, or any feature producing templates). Produces an embedding model bundle and template coordinates.

Parameters:

Name Type Description Default
templates

Templates artifact to fit embedding on (inherited from GlobalModelParams).

required
model

Pre-fitted TSNEModelArtifact to load (skip fit). Default: TSNEModelArtifact().

required
random_state

Random seed. Default: 42.

required
perplexity

t-SNE perplexity parameter. Default: 50.

required
knn_method

kNN backend — "annoy", "faiss", or "faiss-gpu". Default: "annoy".

required
n_jobs

Number of parallel jobs for openTSNE. Default: 8.

required
fit

TSNEFitConfig controlling learning rate, exaggeration iterations, momentum, etc. Default: TSNEFitConfig().

required
mapping

TSNEMapConfig controlling partial-embedding parameters (k, iterations, chunk_size, etc.). Default: TSNEMapConfig().

required
Params

Bases: GlobalModelParams[TSNEModelArtifact]

Global t-SNE parameters.

Attributes:

Name Type Description
templates ParquetArtifact | None

Templates artifact to fit embedding on.

model TSNEModelArtifact | None

Pre-fitted embedding model artifact (skip fit).

random_state int

Random seed. Default 42.

perplexity int

t-SNE perplexity. Default 50.

knn_method str

kNN method ("annoy", "faiss", "faiss-gpu"). Default "annoy".

n_jobs int

Parallel jobs for openTSNE. Default 8.

fit TSNEFitConfig

Embedding fitting parameters.

mapping TSNEMapConfig

Partial embedding mapping parameters.

TSNECoordsArtifact

Bases: NpzArtifact

t-SNE coordinates of templates (global_tsne_templates.npz).

TSNEFitConfig

Bases: StrictModel

openTSNE fitting parameters.

Attributes:

Name Type Description
learning_rate float | str

Learning rate ("auto" lets openTSNE compute). Default "auto".

exaggeration_iters int

Early exaggeration phase iterations. Default 250.

exaggeration float

Early exaggeration factor. Default 12.

exaggeration_momentum float

Momentum during early exaggeration. Default 0.5.

iters int

Refinement phase iterations. Default 750.

momentum float

Momentum during refinement. Default 0.8.

TSNEMapConfig

Bases: StrictModel

Parameters for mapping new points into the fitted embedding.

Attributes:

Name Type Description
k int

Neighbors for partial embedding. Default 25.

iters int

Optimization iterations. Default 100.

learning_rate float

Learning rate. Default 1.0.

exaggeration float

Exaggeration factor. Default 2.0.

momentum float

Momentum. Default 0.0.

chunk_size int

Chunk size for large sequences. Default 50000.

TSNEModelArtifact

Bases: JoblibArtifact[TSNEModelBundle]

Fitted t-SNE embedding model (embedding.joblib).

global_ward

GlobalWardClustering feature.

Fits Ward hierarchical linkage on templates, cuts at n_clusters, builds centroids, and assigns per-sequence rows via 1-NN.

GlobalWardClustering

GlobalWardClustering(inputs: Inputs, params: dict[str, object] | None = None)

Ward hierarchical clustering on templates with per-sequence 1-NN assignment.

Parameters:

Name Type Description Default
templates

Templates artifact to cluster (inherited from GlobalModelParams).

required
model

Pre-fitted WardModelArtifact to load (skip fit). Default: WardModelArtifact().

required
n_clusters

Number of clusters to cut from the linkage tree. Default: 20.

required
method

Linkage method passed to scipy.cluster.hierarchy.linkage. Default: "ward".

required
pair_filter

Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.

required
Params

Bases: GlobalModelParams[WardModelArtifact]

Global Ward clustering parameters.

Attributes:

Name Type Description
templates ParquetArtifact | None

Templates artifact to cluster (inherited).

model WardModelArtifact | None

Pre-fitted Ward model artifact (skip fit).

n_clusters int

Number of clusters to cut. Default 20.

method str

Linkage method. Default "ward".

pair_filter NNResult | None

Nearest-neighbor pair filter. Default None.

WardModelArtifact

Bases: JoblibArtifact[WardModelBundle]

Ward linkage model (model.joblib).

helpers

Shared helper functions for feature implementations.

This module contains utility functions used across multiple features in the feature_library to avoid code duplication.

apply_exclude_cols

apply_exclude_cols(df: DataFrame, exclude_cols: list[str] | None) -> pd.DataFrame

Drop rows where any exclude_cols column is truthy.

Silently skips column names not present in df. Returns df unchanged when exclude_cols is empty/None.

clean_animal_track

clean_animal_track(g: DataFrame, data_cols: list[str], order_col: str, config: InterpolationConfig) -> pd.DataFrame

Sort, interpolate, fill, and drop rows with excessive missing data.

clean_tracks_grouped

clean_tracks_grouped(df: DataFrame, group_cols: list[str], data_cols: list[str], order_col: str, config: InterpolationConfig) -> pd.DataFrame

Clean tracks per group, preserving group columns in the result.

Pandas 3.0 excludes group columns from groupby().apply() results. This wrapper uses group_keys=True and resets the index to restore them.

ego_rotate

ego_rotate(dx: ndarray, dy: ndarray, heading: ndarray) -> tuple[np.ndarray, np.ndarray]

Rotate world-frame deltas into ego frame (heading aligned with +x).

ensure_columns

ensure_columns(df: DataFrame, required: list[str]) -> None

Raise ValueError if any required columns are missing from df.

feature_columns

feature_columns(df: DataFrame) -> list[str]

Return the sorted list of numeric feature column names in df.

Excludes standard metadata columns (COLUMNS.meta_set()) and known non-feature columns (id1, id2, entity_level, perspective, fps).

smooth_1d

smooth_1d(x: ndarray, win: int) -> np.ndarray

Moving average with reflected padding.

unwrap_diff

unwrap_diff(theta: ndarray, fps: float) -> np.ndarray

Compute angular velocity from angle array.

wrap_angle

wrap_angle(x: ndarray) -> np.ndarray

Wrap angles to [-pi, pi].

id_tag_columns

IdTagColumns

IdTagColumns(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Attach per-id label fields (from labels/) to each frame, so they can be merged via Inputs() and used as categories (e.g., focal/nonfocal).

Outputs per row (same granularity as input tracks/feature): frame/time/id/group/sequence + one column per requested label field.

Parameters:

Name Type Description Default
labels

LabelsSource specifying which labels directory to load. Default: LabelsSource(kind="id_tags").

required
label_kind

Label subdirectory name used for dependency resolution. Default: "id_tags".

required
fields

List of label field names to attach. None means all fields found in the labels file. Default: None.

required
field_renames

Optional mapping of original field names to renamed column names in the output. Default: None.

required

identity_model

GlobalIdentityModel feature.

Trains a T-Rex-compatible visual identification model from egocentric crop images of individual animals. Uses the V200 CNN architecture to produce weights loadable via T-Rex's visual_identification_model_path setting.

GlobalIdentityModel

GlobalIdentityModel(inputs: Inputs, params: dict[str, object] | None = None)

Train a visual identity model from individual animal sequences.

Takes EgocentricCrop output as input. Each identity is specified as a mapping of identity names to lists of sequences containing that individual alone. Trains a V200 CNN classifier (T-Rex-compatible) and exports weights loadable via visual_identification_model_path.

Example::

ego_result = dataset.run_feature(ego_crop)

identity_model = GlobalIdentityModel(
    Inputs((Result(feature="egocentric-crop"),)),
    params={
        "identities": {
            "mouse_A": ["cage1/day1_mouseA_alone", "cage1/day3_mouseA_alone"],
            "mouse_B": ["cage1/day1_mouseB_alone"],
            "mouse_C": ["cage1/day2_mouseC_alone"],
            "mouse_D": ["cage1/day1_mouseD_alone"],
        },
        "image_size": (128, 128),
        "channels": 1,
    },
)
result = dataset.run_feature(identity_model)

Parameters:

Name Type Description Default
identities

Explicit identity -> sequences mapping. Keys are identity names, values are lists of "group/sequence" strings.

required
group_as_identity

Convenience shortcut -- treat each group name as one identity. Default False.

required
image_size

Crop resize target (height, width). Default (128, 128).

required
channels

Number of image channels (1=grayscale, 3=color). Default 1.

required
epochs

Training epochs. Default 150.

required
learning_rate

Adam learning rate. Default 0.0001.

required
batch_size

Training batch size. Default 64.

required
val_split

Fraction of data reserved for validation. Default 0.2.

required
max_images_per_identity

Cap on images per identity to balance classes. Default 2000.

required
export_trex_weights

Save a T-Rex-loadable .pth file. Default True.

required
trex_weights_name

Stem of the exported .pth file. Default "identity_model".

required
Params

Bases: Params

Global identity model parameters.

apply
apply(df: DataFrame) -> pd.DataFrame

Passthrough -- identity predictions are applied by T-Rex, not Mosaic.

kpms

Unified keypoint-MoSeq feature.

Fits an AR-HMM model and applies it to extract per-frame syllable labels, using a persistent subprocess server to avoid repeated JAX startup costs. The kpms package does NOT need to be installed in the mosaic environment -- only in a separate .venv whose interpreter path is passed via kpms_python.

KpmsFeature

KpmsFeature(inputs: Inputs, params: dict[str, object] | None = None)

Unified keypoint-MoSeq feature: fit + apply via persistent subprocess.

Parameters:

Name Type Description Default
model

Pre-fitted KpmsModelArtifact to load (skip fit). Default: None (fit from scratch).

required
kpms_python

Path to a Python interpreter with keypoint-moseq installed. None uses the bundled external .venv. Default: None.

required
pose

Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().

required
anterior_bodyparts

List of bodypart names forming the anterior reference (required, min 1 element).

required
posterior_bodyparts

List of bodypart names forming the posterior reference (required, min 1 element).

required
fps

Frames per second of the input data. Default: 30.

required
num_iters_ar

Number of AR-only fitting iterations. Default: 50.

required
num_iters_full

Number of full model fitting iterations. Default: 500.

required
kappa_ar

AR transition concentration parameter. None lets keypoint-moseq choose. Default: None.

required
kappa_full

Full-model transition concentration parameter. None lets keypoint-moseq choose. Default: None.

required
latent_dim

Dimensionality of the latent pose space. Must satisfy latent_dim < 2 * num_keypoints. Default: 10.

required
location_aware

If True, include centroid location in the model. Default: False.

required
outlier_scale_factor

Scale factor for outlier detection. Default: 6.0.

required
remove_outliers

If True, remove detected outlier frames before fitting. Default: True.

required
mixed_map_iters

Number of mixed MAP iterations. None uses the keypoint-moseq default. Default: None.

required
parallel_message_passing

Enable parallel message passing. None uses the keypoint-moseq default. Default: None.

required
resume

If True, resume fitting from a previously saved checkpoint. Default: True.

required
downsample_rate

Temporal downsampling factor applied before fitting. None disables downsampling. Default: None.

required
save_every_n_iters

Save a checkpoint every N iterations during fit. Default: 25.

required
num_iters_apply

Number of iterations when applying the model to new data. Default: 500.

required

lightning_action_feature

Lightning-action supervised temporal action segmentation feature.

Wraps the lightning-action package (Paninski lab, MIT license) as a mosaic global feature. Trains a temporal neural network classifier (DilatedTCN, RNN, or TemporalMLP) on labeled templates and predicts per-frame action probabilities with temporal context.

Requires the optional lightning-action package::

pip install lightning-action

Or install mosaic with the extra::

pip install mosaic-behavior[lightning-action]

LightningActionFeature

LightningActionFeature(inputs: Inputs, params: dict[str, object] | None = None)

Supervised temporal action segmentation via lightning-action.

Trains a temporal neural network classifier (DilatedTCN, RNN, or TemporalMLP head + linear classifier) on labeled templates and predicts per-frame action probabilities.

Parameters:

Name Type Description Default
model

Pre-fitted LightningActionModelArtifact to load (skip training). Default: LightningActionModelArtifact().

required
head

Temporal encoder architecture — "dtcn" (dilated temporal convolution), "rnn" (LSTM/GRU), or "temporalmlp". Default: "dtcn".

required
num_hid_units

Hidden units in the temporal encoder. Default: 64.

required
num_layers

Number of encoder layers. Default: 2.

required
num_lags

Lag/kernel size for temporal context. Default: 4.

required
activation

Activation function. Default: "lrelu".

required
dropout_rate

Dropout rate. Default: 0.1.

required
sequence_length

Training sequence length (frames per chunk). Default: 500.

required
num_epochs

Number of training epochs. Default: 200.

required
batch_size

Training batch size. Default: 32.

required
learning_rate

Optimizer learning rate. Default: 1e-3.

required
weight_decay

Optimizer weight decay. Default: 0.0.

required
optimizer

Optimizer type. Default: "Adam".

required
weight_classes

If True, weight loss by inverse class frequency. Default: True.

required
device

Compute device — "cpu" or "gpu". Default: "cpu".

required
random_state

Random seed. Default: 42.

required
decision_threshold

Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None.

required
default_class

Class label assigned when no class exceeds the decision threshold (required).

required

LightningActionModelArtifact

Bases: JoblibArtifact[LightningActionModelBundle]

Fitted lightning-action model bundle.

movement

Movement library integration for mosaic.

Provides bidirectional conversion between mosaic DataFrames and movement xarray Datasets, plus mosaic features that wrap movement's smoothing, filtering, and interpolation functions.

MovementFilterInterpolate

MovementFilterInterpolate(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Filter low-confidence points and interpolate gaps using movement.

Wraps movement.filtering.filter_by_confidence and movement.filtering.interpolate_over_time.

When no confidence columns (poseP0..N) are present, the confidence filter is skipped and only interpolation of existing NaN gaps is performed.

The output is a full track DataFrame with cleaned positions replacing the originals, so downstream features can chain off the result.

MovementSmooth

MovementSmooth(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Smooth trajectory positions using the movement library.

Wraps movement.filtering.rolling_filter and movement.filtering.savgol_filter to smooth X/Y centroid and/or poseX/poseY keypoint positions.

The output is a full track DataFrame with smoothed positions replacing the originals, so downstream features can chain off the result via Inputs((Result(feature="movement-smooth"),)).

from_movement_dataset

from_movement_dataset(ds: Any, original_df: DataFrame, metadata: dict[str, Any], update_confidence: bool = False) -> pd.DataFrame

Merge a movement xarray Dataset back into a mosaic DataFrame.

Overwrites X/Y and poseX/poseY columns in a copy of original_df with the (smoothed/filtered) values from the Dataset.

Parameters

ds : xarray.Dataset movement Dataset with position and confidence data variables. original_df : pd.DataFrame The original mosaic DataFrame to merge into. metadata : dict Metadata returned by to_movement_dataset. update_confidence : bool Whether to also overwrite poseP columns from the Dataset's confidence values. Default False.

Returns

pd.DataFrame Copy of original_df with position columns replaced.

to_movement_dataset

to_movement_dataset(df: DataFrame, fps: float | None = None, keypoint_names: list[str] | None = None, include_centroid: bool = True) -> tuple[Any, dict[str, Any]]

Convert a mosaic tracks DataFrame to a movement xarray Dataset.

Parameters

df : pd.DataFrame Mosaic tracks DataFrame with columns like X, Y, poseX0..N, poseY0..N, id, frame, etc. fps : float, optional Frames per second. If None, the time dimension uses frame numbers. keypoint_names : list[str], optional Names for the pose keypoints. If None, defaults to "keypoint_0", etc. include_centroid : bool Whether to include the centroid (X, Y) as an additional keypoint named "centroid". Default True.

Returns

ds : xarray.Dataset movement poses Dataset with dimensions (time, space, keypoints, individuals). metadata : dict Metadata needed by from_movement_dataset to convert back: individual_ids, frame_index, include_centroid, pose_pairs.

convert

Bidirectional conversion between mosaic DataFrames and movement xarray Datasets.

from_movement_dataset
from_movement_dataset(ds: Any, original_df: DataFrame, metadata: dict[str, Any], update_confidence: bool = False) -> pd.DataFrame

Merge a movement xarray Dataset back into a mosaic DataFrame.

Overwrites X/Y and poseX/poseY columns in a copy of original_df with the (smoothed/filtered) values from the Dataset.

Parameters

ds : xarray.Dataset movement Dataset with position and confidence data variables. original_df : pd.DataFrame The original mosaic DataFrame to merge into. metadata : dict Metadata returned by to_movement_dataset. update_confidence : bool Whether to also overwrite poseP columns from the Dataset's confidence values. Default False.

Returns

pd.DataFrame Copy of original_df with position columns replaced.

to_movement_dataset
to_movement_dataset(df: DataFrame, fps: float | None = None, keypoint_names: list[str] | None = None, include_centroid: bool = True) -> tuple[Any, dict[str, Any]]

Convert a mosaic tracks DataFrame to a movement xarray Dataset.

Parameters

df : pd.DataFrame Mosaic tracks DataFrame with columns like X, Y, poseX0..N, poseY0..N, id, frame, etc. fps : float, optional Frames per second. If None, the time dimension uses frame numbers. keypoint_names : list[str], optional Names for the pose keypoints. If None, defaults to "keypoint_0", etc. include_centroid : bool Whether to include the centroid (X, Y) as an additional keypoint named "centroid". Default True.

Returns

ds : xarray.Dataset movement poses Dataset with dimensions (time, space, keypoints, individuals). metadata : dict Metadata needed by from_movement_dataset to convert back: individual_ids, frame_index, include_centroid, pose_pairs.

filter_interp

Movement-based confidence filtering and interpolation feature.

MovementFilterInterpolate
MovementFilterInterpolate(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Filter low-confidence points and interpolate gaps using movement.

Wraps movement.filtering.filter_by_confidence and movement.filtering.interpolate_over_time.

When no confidence columns (poseP0..N) are present, the confidence filter is skipped and only interpolation of existing NaN gaps is performed.

The output is a full track DataFrame with cleaned positions replacing the originals, so downstream features can chain off the result.

smooth

Movement-based trajectory smoothing feature.

MovementSmooth
MovementSmooth(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Smooth trajectory positions using the movement library.

Wraps movement.filtering.rolling_filter and movement.filtering.savgol_filter to smooth X/Y centroid and/or poseX/poseY keypoint positions.

The output is a full track DataFrame with smoothed positions replacing the originals, so downstream features can chain off the result via Inputs((Result(feature="movement-smooth"),)).

nearestneighbor

NearestNeighbor

NearestNeighbor(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature computing nearest-neighbor identity and relative kinematics.

Outputs per frame (one row per individual): - nn_id: id of nearest neighbor (NaN if none) - nn_delta_x / nn_delta_y: neighbor position minus focal, world frame - nn_dist: Euclidean distance to nearest neighbor - nn_delta_angle: neighbor heading minus focal, wrapped to [-pi, pi] - nn_delta_x_ego / nn_delta_y_ego: neighbor offset in focal ego frame

nn_delta_bins

NearestNeighborDeltaBins

NearestNeighborDeltaBins(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Bin nearest-neighbor response fields (dangle, dspeed) over neighbor position.

Inputs: expect outputs from nn-delta-response (neighbor_x/neighbor_y in ego frame, dangle, dspeed, group_size, and focal/neighbor category columns).

tidy DataFrame with mean turn/speed per bin for focal role and neighbor role:

columns: [group, sequence, exp, trial, role, category, group_size, metric, bin_idx, value]

Parameters:

Name Type Description Default
nbins

Number of spatial bins along the binning axis. Default: 45.

required
binmax

Maximum absolute value for bin edges. Default: 14.0.

required
max_for_avg

Maximum neighbor distance used when computing binned-mean responses. Default: 5.0.

required
antisymm

If True, use front/back antisymmetric folding for turn-force computation. Default: True.

required
focal_category_col

Column name for the focal animal's category flag. Default: "Focal_fish".

required
neighbor_category_col

Column name for the neighbor's category flag. Default: "neighbor_focal".

required
group_size_col

Column name for group size. Default: "group_size".

required
exp_col

Column name for experimental condition. Default: "Exp".

required
trial_col

Column name for trial identifier. Default: "Trial".

required
category_specs

List of dicts defining derived category columns (keys: source_col, new_col, quantile, op). Default: [].

required
exclude_cols

List of boolean column names whose truthy rows are dropped before computation. Default: [].

required
nonfocal_flag_col

Column used to flag nonfocal animals. Default: "Focal_fish".

required
nonfocal_flag_value

Value in nonfocal_flag_col that marks an animal as nonfocal. Default: False.

required

nn_delta_response

NearestNeighborDelta

NearestNeighborDelta(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature that measures how a focal fish changes position/heading/speed over the next diff_numframes frames relative to its nearest neighbor at the current frame.

Expected inputs (via tracks or an Inputs() that merges tracks + nearest-neighbor feature): - position/heading/speed columns for the focal (x, y, ANGLE, speed_col) - nearest-neighbor id column (nn_id_col, default: 'nn_id') - neighbor offsets in ego frame (nn_delta_x_ego / nn_delta_y_ego); if missing, world offsets (nn_delta_x / nn_delta_y) are rotated using the focal heading.

Outputs per focal row (filtered to frames with a valid future sample diff_numframes ahead): frame, id, group, sequence, nn_id, neighbor_x/y (ego), neighbor_focal (if available), dx, dy, dt, dangle (wrapped; optionally scaled by fps), dspeed, plus passthrough columns like group_size/event/Focal_fish when present.

Parameters:

Name Type Description Default
sampling

Frame rate and smoothing settings. Default: SamplingConfig().

required
speed_col

Column name for speed. Default: "SPEED#wcentroid".

required
nn_id_col

Column name for the nearest-neighbor ID. Default: "nn_id".

required
nn_dx_ego_col

Column for neighbor delta-x in ego frame. Default: "nn_delta_x_ego".

required
nn_dy_ego_col

Column for neighbor delta-y in ego frame. Default: "nn_delta_y_ego".

required
nn_dx_world_col

Fallback column for neighbor delta-x in world frame (used when ego columns are absent). Default: "nn_delta_x".

required
nn_dy_world_col

Fallback column for neighbor delta-y in world frame. Default: "nn_delta_y".

required
focal_col

Column name for the focal-animal flag. Default: "Focal_fish".

required
diff_numframes

Number of frames ahead to compute the future response delta. Default: 4.

required
wrap_angle

If True, wrap heading differences to [-pi, pi]. Default: True.

required
divide_dangle_by_frames

If True, divide the heading change by diff_numframes. Default: True.

required
scale_dangle_by_fps

If True, multiply dangle by fps to convert to radians/sec. Default: True.

required
tag_cols

Additional columns to pass through to the output. Default: [].

required

orientation_relative

OrientationRelativeFeature feature.

Extracted from features.py as part of feature_library modularization.

OrientationRelativeFeature

OrientationRelativeFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Orientation-aware relative features between animal pairs, order-agnostic to pose points.

For each frame and ordered pair (id_a -> id_b): - Express B in A's body frame (using heading angle and global scale). - Emit signed centroid deltas, heading difference, quantiles over B's points in A's frame, and nearest-k distances.

Params

Bases: Params

Orientation-relative feature parameters.

Attributes:

Name Type Description
scale BodyScaleResult

Body-scale artifact for normalization.

nearest_k int

Number of nearest pose-point distances to emit. Default 3.

quantiles list[float]

Distance distribution quantiles to compute. Default [0.25, 0.5, 0.75].

pair_egocentric

PairEgocentricFeatures feature.

Extracted from features.py as part of feature_library modularization.

PairEgocentricFeatures

PairEgocentricFeatures(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-egocentric' -- per-sequence egocentric + kinematic features for dyads. Produces a row-wise DataFrame with columns: - frame (if available) or time passthrough (only if it's the order col) - perspective: 0 for A->B, 1 for B->A - id1, id2: pair identifiers - feature columns (e.g., A_speed, AB_dx_egoA, ...) - (optionally) group/sequence if present in df, for convenience

This feature is stateless (no fitting). It computes features for all C(n,2) pairs per sequence, cleans/interpolates pose per animal, inner-joins by the chosen order column, and computes A->B and B->A features for each pair.

Parameters:

Name Type Description Default
interpolation

Interpolation settings for missing pose data. Default: InterpolationConfig().

required
sampling

Frame rate and smoothing settings. Default: SamplingConfig().

required
pose

Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().

required
neck_idx

Index of the neck keypoint in the pose array, used to compute heading direction. Default: 3.

required
tail_base_idx

Index of the tail-base keypoint, paired with neck_idx for heading vector. Default: 6.

required
center_mode

How to compute the animal's center — "mean" averages all keypoints, other values use a specific keypoint. Default: "mean".

required

pair_interaction_filter

PairInteractionFilter -- detect pairwise interaction segments from trajectories.

Identifies frames where pairs of individuals meet configurable distance and angular thresholds. Applies morphological filtering to remove noise and enforces a minimum interaction duration.

Typical use cases
  • Detecting face-to-face interactions (distance + facing criterion)
  • Proximity-based pair detection (distance only, require_facing=False)
  • Pre-filtering for expensive downstream processing (e.g. interaction crops)

All thresholds are parameterized and should be tuned per application.

PairInteractionFilter

PairInteractionFilter(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Detect pairwise interaction segments from trajectory data.

For every unique pair of individuals in a sequence, tests per-frame distance and (optionally) angular criteria, applies morphological filtering, and extracts continuous interaction segments that meet a minimum duration.

Output columns (one row per frame per interaction segment): - frame: frame number - id_a, id_b: individual IDs (id_a < id_b by convention) - interaction_id: integer label for the segment within this pair - interaction_start: first frame of this segment - interaction_end: last frame (exclusive) of this segment

Params

shift_dist : float Pixel shift along heading before distance check (default 15). Set to 0 to use raw positions without forward shift. max_dist : float Maximum shifted-position distance in pixels (default 40). require_facing : bool If True (default), require individuals to face each other (inverse orientation difference < max_inv_orientation_diff_deg). Set to False for distance-only filtering. max_inv_orientation_diff_deg : float Max angle (degrees) between inverse orientations (default 80). Only used when require_facing=True. min_run_frames : int Minimum continuous frames for a valid interaction (default 250). frame_padding : int Frames to pad before/after each segment (default 10). morphological_structure_size : int Structure element length for binary close/open (default 25). Set to 0 to disable morphological filtering. px_scale : float Scale factor applied to shift_dist and max_dist (default 1.0). Use to adjust for videos with different pixel resolutions. use_pixel_coords : bool If True, use poseX/poseY columns (pixel coordinates) for distance calculations instead of X/Y (world coordinates). Default True since thresholds are in pixel units. pose_head_index : int | None If set and use_pixel_coords is True, use this pose index as the position for distance calculations.

pair_position

PairPositionFeatures - egocentric dyadic features using only (x, y, angle).

Drop-in replacement for PairEgocentricFeatures when pose keypoints are not available. Uses the ANGLE column directly for heading instead of computing from neck->tail vector.

Output columns match PairEgocentricFeatures exactly, enabling use with downstream features like PairWavelet.

PairPositionFeatures

PairPositionFeatures(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-position' -- per-sequence egocentric + kinematic features for all pairs.

Unlike PairEgocentricFeatures which requires full pose keypoints, this feature works with minimal input: just (x, y, angle) per animal.

For N animals per sequence, computes features for all N*(N-1)/2 unique pairs, each with two perspectives (A->B and B->A).

Output columns (per row): - frame: frame number - perspective: 0 for A->B, 1 for B->A - id1, id2: IDs of the two animals in this pair - A_speed, A_v_para, A_v_perp, A_ang_speed: focal kinematics - A_heading_cos, A_heading_sin: focal heading - AB_dist: inter-animal distance - AB_dx_egoA, AB_dy_egoA: partner position in focal's egocentric frame - rel_heading_cos, rel_heading_sin: relative heading - B_speed, B_v_para, B_v_perp, B_ang_speed: partner kinematics - (optionally) group, sequence for convenience

Parameters:

Name Type Description Default
interpolation

Interpolation settings for missing position data. Default: InterpolationConfig().

required
sampling

Frame rate and smoothing settings. Default: SamplingConfig().

required

pair_wavelet

PairWavelet feature -- CWT spectrograms on PairPoseDistancePCA outputs.

PairWavelet

PairWavelet(inputs: Inputs, params: dict[str, object] | None = None)

CWT spectrograms on PairPoseDistancePCA outputs.

Expects input df to contain columns
  • 'perspective' (0 = A->B, 1 = B->A)
  • 'frame' (preferred) or 'time' (if used as order column)
  • PC0..PC{k-1} (k = number of PCA components)
Returns a DataFrame with columns
  • frame (or time if that was the order col)
  • perspective
  • W_{col}_f{fi} (log-power, clamped, for each component x frequency) and (optionally) passthrough group/sequence if present in df.

Stateless (no fitting). FPS is inferred from constant df['fps'] if present, otherwise from fps_default. Frequencies are dyadically spaced in [f_min, f_max].

Parameters:

Name Type Description Default
sampling

Frame rate and smoothing settings. Default: SamplingConfig().

required
f_min

Minimum frequency in Hz for the CWT band. Default: 0.2.

required
f_max

Maximum frequency in Hz for the CWT band. Default: 5.0.

required
n_freq

Number of frequency bins (dyadically spaced between f_min and f_max). Default: 25.

required
wavelet

PyWavelets wavelet name. Default: "cmor1.5-1.0".

required
log_floor

Floor value for log-power clamping. Default: -3.0.

required
pc_prefix

Column prefix used to auto-detect PC input columns (e.g. "PC0", "PC1", ...). Default: "PC".

required
cols

Explicit list of input column names. If None, columns are auto-detected using pc_prefix. Default: None.

required

pairposedistancepca

PairPoseDistancePCA

PairPoseDistancePCA(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-posedistance-pca' — builds per-frame pairwise pose-distance features and fits an IncrementalPCA globally; outputs PC scores per sequence (and perspective).

Parameters:

Name Type Description Default
interpolation

Interpolation settings for missing pose data. Default: InterpolationConfig().

required
pose

Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().

required
include_intra_A

If True, include intra-animal A pairwise keypoint distances. Default: True.

required
include_intra_B

If True, include intra-animal B pairwise keypoint distances. Default: True.

required
include_inter

If True, include inter-animal pairwise keypoint distances. Default: True.

required
duplicate_perspective

If True, output both A->B and B->A perspectives per pair. Default: True.

required
n_components

Number of PCA components to retain. Default: 6.

required
batch_size

Batch size for IncrementalPCA partial_fit. Default: 5000.

required

speed_angvel

SpeedAngvel

SpeedAngvel(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature computing translational speed and angular velocity.

Outputs (per frame): - speed: displacement magnitude between consecutive frames divided by dt - angvel: wrapped heading difference (rad) divided by dt - speed_step / angvel_step: same, but using a configurable step_size (omitted if step_size is None) - speed_smooth: Savitzky-Golay smoothed speed (polyorder=1), only present when smooth_window is set in Params

Time-delta (dt) computation: Speed and angular velocity require dividing by a time interval. The source for dt is chosen by priority:

  1. frame + fps (recommended for constant-fps video): when fps is set in Params, dt is computed as frame_diff / fps. This is immune to irregular real timestamps that some trackers embed in the time column (e.g. TRex uses wall-clock timestamps that may jitter by several milliseconds per frame). It also correctly handles frame gaps from dropped/bad frames.
  2. time column: if fps is not set but a time column exists, dt is computed from consecutive time differences.
  3. array index: last resort when neither frame+fps nor time is available — assumes each row is one step apart.

For most video-based tracking data, setting fps is strongly recommended to avoid speed artifacts from timestamp jitter.

Parameters:

Name Type Description Default
step_size

If set, also compute speed_step / angvel_step using this frame step (in addition to step=1). Default: None.

required
smooth_window

If set, apply Savitzky-Golay smoothing (polyorder=1) over this many frames to produce speed_smooth. Default: None.

required
fps

Frames per second. When set, dt is derived from frame_diff/fps instead of the time column — more robust for constant-fps data with jittery timestamps. Default: None.

required

temporal_stacking

Temporal stacking feature.

Builds temporal context windows over per-sequence feature data by stacking Gaussian-smoothed frames at time offsets and optional pooled statistics.

TemporalStackingFeature

TemporalStackingFeature(inputs: Inputs, params: dict[str, object] | None = None)

Build temporal context windows over per-sequence feature data.

Parameters:

Name Type Description Default
half

Half-width of the temporal window in frames. The full window spans [-half, +half]. Default: 60.

required
skip

Step size between time offsets in the stacking window. Default: 5.

required
use_temporal_stack

If True, concatenate Gaussian-smoothed copies at each time offset. Default: True.

required
sigma_stack

Gaussian sigma (in frames) for smoothing before stacking. 0 disables smoothing. Default: 30.0.

required
add_pool

If True, append pooled statistics (e.g. mean, std) computed over a sliding Gaussian window. Default: True.

required
pool_stats

Tuple of pooled statistics to compute. Supported: "mean", "std", "variance". Default: ("mean",).

required
sigma_pool

Gaussian sigma (in frames) for the pooling window. Default: 30.0.

required
fps

Frames per second; used to convert win_sec to frames. Default: 30.0.

required
win_sec

Pooling window width in seconds. Default: 0.5.

required
pair_filter

Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.

required

trajectory_smooth

TrajectorySmooth

TrajectorySmooth(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature that smooths and interpolates trajectory positions.

Pipeline (per individual): 1. Bad-frame detection: flag frames with speed > speed_threshold, expand flagged region by expand_frames in each direction. 2. Interpolation: set positions to NaN at bad frames, linearly interpolate, forward/backward fill edges. Controlled separately for centroid (interpolate_centroid) and pose (interpolate_pose). 3. Savgol smoothing: apply savgol_filter to centroid X/Y and all pose columns (always, regardless of interpolation flags).

Output is the full track DataFrame with smoothed positions replacing originals, plus a bad_frame boolean column. Downstream features can consume this via Inputs((Result(feature="trajectory-smooth"),)).

Parameters:

Name Type Description Default
speed_threshold

Speed above which a frame is flagged as bad. When fps is set, interpreted as units/sec (e.g. 40 cm/s); otherwise units/frame. Default: None (no bad-frame detection).

required
fps

Frames per second. When provided, speed_threshold is converted from units/sec to units/frame internally. Default: None.

required
interpolate_centroid

If True, replace bad-frame centroid positions with linear interpolation. Default: True.

required
interpolate_pose

If True, replace bad-frame pose keypoint positions with linear interpolation. Default: False.

required
expand_frames

Number of frames to expand the bad-frame region in each direction. Default: 2.

required
savgol_window

Window length for Savitzky-Golay smoothing. Must be odd and >= savgol_polyorder + 1. None disables smoothing. Default: None.

required
savgol_polyorder

Polynomial order for Savitzky-Golay filter. Default: 2.

required

types

InterpolationConfig

Bases: StrictModel

Interpolation parameters for missing pose/position data.

Attributes:

Name Type Description
linear_interp_limit int

Max consecutive NaN frames to fill via linear interpolation. Default 10, must be >= 1.

edge_fill_limit int

Max frames to forward/backward fill at sequence edges. Default 3, must be >= 0.

max_missing_fraction float

Rows with a higher fraction of NaN columns are dropped entirely. Default 0.10, range [0, 1].

PoolConfig

Bases: StrictModel

Candidate pool configuration for template extraction.

Controls how per-entry contributions to the candidate pool are allocated before the final template selection step.

Attributes:

Name Type Description
size int | None

Candidate pool size. For "random" strategy, defaults to n_templates (pool == output). For "farthest_first", should be larger (e.g. n_templates * 3).

allocation Literal['reservoir', 'exact']

How per-entry quotas are computed. "reservoir": weighted reservoir sampling, single pass. "exact": two-pass -- first counts rows, second samples with exact proportional quotas. Default "reservoir".

max_entry_fraction float | None

Cap per entry as fraction of pool size. None means no cap (purely proportional). At runtime, effective cap is max(max_entry_fraction, 1 / n_entries) so the pool can always be filled completely. Default None.

SamplingConfig

Bases: StrictModel

Frame rate and temporal smoothing parameters.

Attributes:

Name Type Description
fps_default float

Fallback frames-per-second when the data does not carry an fps column. Default 30.0, must be > 0.

smooth_win int

Moving-average window size applied to pose coordinates before feature computation. 0 disables smoothing. Default 0.

xgboost_feature

XgboostFeature

XgboostFeature(inputs: Inputs, params: dict[str, object] | None = None)

XGBoost behavior classifier as a pipeline feature.

Trains on labeled templates (from ExtractLabeledTemplates) and runs per-sequence inference. Supports multiclass and one-vs-rest strategies.

Parameters:

Name Type Description Default
model

Pre-fitted XgboostModelArtifact to load (skip training). Default: XgboostModelArtifact().

required
strategy

Classification strategy — "multiclass" trains a single multi-class model; "one_vs_rest" trains one binary classifier per class. Default: "multiclass".

required
decision_threshold

Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None.

required
default_class

Class label assigned when no class exceeds the decision threshold (required).

required
class_weight

If "balanced", adjust sample weights inversely proportional to class frequency. Default: "balanced".

required
use_smote

If True, apply SMOTE oversampling to the training set. Default: False.

required
undersample_ratio

If set, undersample majority classes to this ratio relative to the minority class before SMOTE. Default: None.

required
n_estimators

Number of boosting rounds. Default: 100.

required
max_depth

Maximum tree depth. Default: 6.

required
learning_rate

Boosting learning rate. Default: 0.1.

required
subsample

Fraction of training samples used per tree. Default: 0.8.

required
colsample_bytree

Fraction of features used per tree. Default: 0.8.

required
random_state

Random seed for reproducibility. Default: 42.

required

XgboostModelArtifact

Bases: JoblibArtifact[XgboostModelBundle]

Fitted XGBoost model bundle (xgboost_model.joblib).