Feature Library¶

Mosaic's feature library provides 30+ registered feature implementations organized by output type. Features are composable pipeline stages that read from tracks or upstream feature outputs and produce per-sequence parquet files.

Feature categories¶

Category	Features
Per-frame kinematic	SpeedAngvel, BodyScale, OrientationRelative
Per-frame spatial	PairEgocentric, PairPosition, PairInteractionFilter, ApproachAvoidance
Per-frame social	NearestNeighbor, FFGroups, FFGroupsMetrics, NNDeltaResponse, NNDeltaBins
Per-frame context	TemporalStacking, PairWavelet
Dimensionality reduction	PairPoseDistancePCA, GlobalScaler
Embedding & clustering	GlobalTSNE, GlobalKMeansClustering, GlobalWardClustering, WardAssign, ExtractTemplates, ExtractLabeledTemplates
Classification	XgboostFeature, FeralFeature, KpmsFeature

Registry¶

feature_library ¶

Feature library for behavior datasets.

This module provides a collection of features for behavioral analysis. Features are automatically registered on import via the @register_feature decorator.

All features are automatically loaded when the feature_library is imported, making them available in the global FEATURES registry.

Usage¶

from mosaic.behavior.feature_library import Inputs, Result from mosaic.behavior.feature_library.speed_angvel import SpeedAngvel

Track-only feature (default inputs)¶

feat = SpeedAngvel() dataset.run_feature(feat)

Feature consuming another feature's output¶

feat = SpeedAngvel(inputs=Inputs((Result(feature="nn"),))) dataset.run_feature(feat)

List all registered features¶

from mosaic.behavior.feature_library.registry import FEATURES print(list(FEATURES.keys()))

ApproachAvoidance ¶

ApproachAvoidance(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'approach-avoidance' — per-sequence AA event detection for all pairs.

For N animals per sequence, evaluates all N*(N-1)/2 unique unordered pairs. The output stores directional events as aa_event_12 and aa_event_21 over canonical (id1,id2), plus aa_event/label_id as non-directional union.

Parameters:

Name	Description	Default
`interpolation`	Interpolation settings for missing data. Default: InterpolationConfig().	required
`sampling`	Frame rate and smoothing settings. Default: SamplingConfig().	required
`velocity_units`	Whether speed thresholds are in "per_frame" or "per_second". Default: "per_frame".	required
`angle_units`	Unit for heading angles — "radians", "degrees", or "auto" (detect from data range). Default: "radians".	required
`consecutive_frame_delta`	Expected frame step between consecutive rows; used to detect gaps. Default: 1.0.	required
`distance_threshold`	Maximum inter-animal distance (in position units) for a frame to be considered AA-eligible. Default: 200.0.	required
`approacher_velocity_threshold`	Minimum speed of the approaching animal. Default: 5.0.	required
`avoider_velocity_threshold`	Minimum speed of the avoiding animal. Default: 5.0.	required
`cos_approacher_threshold`	Minimum cosine between the approacher's velocity vector and the direction toward the partner. Default: 0.8.	required
`cos_avoider_threshold`	Minimum cosine between the avoider's velocity vector and the direction away from the partner. Default: 0.5.	required
`min_event_length`	Minimum number of contiguous qualifying frames to form an event. Default: 10.	required
`min_event_count`	Minimum number of qualifying frames within an event run to keep it. Default: 5.	required
`orientation_gate_cos`	If set, require the approacher's body orientation to align with its velocity (cos threshold). Default: cos(30°) ≈ 0.866. None disables the gate.	required
`smooth_window_sec`	If set, apply a sliding-window average (in seconds) to velocities before thresholding. Default: None (disabled; framewise behaviour).	required

extract_events `staticmethod` ¶

extract_events(aa_df: DataFrame, min_duration: int = 1) -> pd.DataFrame

Convert per-frame AA output into a compact event table.

Parameters¶

aa_df : DataFrame Per-frame output with columns: frame, id1, id2, aa_event, aa_event_12, aa_event_21. May span multiple sequences/groups (they are handled independently). min_duration : int Minimum event length in frames. Events shorter than this are discarded.

Returns¶

DataFrame with columns: id1, id2, start_frame, end_frame, duration, direction ('12' if id1→id2, '21' if id2→id1, 'both'), approacher_id, avoider_id, sequence (if present), group (if present).

ArHmmFeature ¶

ArHmmFeature(inputs: Inputs, params: dict[str, object] | None = None)

AR-HMM behavioral syllable discovery as a pipeline feature.

Fits an autoregressive Hidden Markov Model across all input sequences and assigns per-frame syllable labels via Viterbi decoding.

Parameters:

Name	Description	Default
`model`	Pre-fitted ArHmmModelArtifact to load (skip fit). Default: None (fit from scratch).	required
`pca_dim`	Number of PCA components for dimensionality reduction before fitting. None skips PCA. Default: None.	required
`n_states`	Maximum number of HMM states (pruned after fit). Default: 50.	required
`n_lags`	AR order (number of lagged frames as regressors). Default: 1.	required
`sticky_weight`	Extra pseudo-count on the diagonal of the transition matrix (encourages state persistence). Default: 100.0.	required
`n_iter`	Maximum EM iterations per restart. Default: 200.	required
`tol`	Convergence tolerance on relative LL change. Default: 1e-4.	required
`n_restarts`	Number of random restarts (best LL kept). Default: 1.	required
`standardize`	If True, z-score features before fitting. Default: True.	required
`downsample_rate`	Temporal downsampling factor. None disables. Default: None.	required
`prune_threshold`	Drop states with posterior mass below this fraction. Default: 0.01.	required
`random_state`	Random seed. Default: 42.	required

ArtifactSpec ¶

Bases: Result[str], Generic[L, R]

Reference to a feature artifact with load specification.

Class Type Parameters:

Name	Bound or Constraints	Description	Default
`L`		Load spec type (NpzLoadSpec, ParquetLoadSpec, JoblibLoadSpec).	required
`R`		Return type of from_path(). Defaults to object.	required

Attributes:

Name	Type	Description
`load`	`L`	How to load the matched files.
`pattern`	`str`	Glob pattern. Auto-derived from load.kind when empty.

from_path ¶

from_path(path: Path) -> R

Load artifact from a resolved file path.

Dispatches on load-spec type via load_from_spec(). Return type is determined by the R type parameter.

from_result `classmethod` ¶

from_result(result: Result[str]) -> Self

Create from a Result, validating feature match.

Typed artifact subclasses (with a default feature) validate that result.feature matches. Base ArtifactSpec passes through.

BodyScaleFeature ¶

BodyScaleFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-frame body scale: median intra-animal pose distance.

Outputs per sequence parquet with columns: frame, id, scale, sequence, group. Intended to be averaged later (per sequence or dataset) to derive a single normalization constant for downstream orientation features.

ExtractLabeledTemplates ¶

ExtractLabeledTemplates(inputs: Inputs, params: dict[str, object] | None = None)

Extract labeled, split-annotated templates from upstream features.

Streams upstream feature data, aligns ground truth labels from NPZ files, assigns train/test splits by sequence, and subsamples per class. Produces a templates parquet with feature columns + label (int) + split (str).

Parameters:

Name	Description	Default
`labels`	GroundTruthLabelsSource specifying where to load per-frame ground-truth labels (required).	required
`strategy`	Template selection method — "random" or "farthest_first". Default: "random".	required
`n_per_class`	Number of templates per class. An int applies uniformly; a dict maps class -> count. Exactly one of n_per_class or n_total must be set. Default: None.	required
`n_total`	Total number of templates across all classes (distributed proportionally). Exactly one of n_per_class or n_total must be set. Default: None.	required
`pool`	PoolConfig controlling candidate pool size and allocation. Default: PoolConfig().	required
`test_fraction`	Fraction of sequences held out for the test split. Default: 0.2.	required
`random_state`	Random seed for reproducibility. Default: 42.	required

ExtractTemplates ¶

ExtractTemplates(inputs: Inputs, params: dict[str, object] | None = None)

Subsample per-sequence data into a representative template matrix.

Entry point for the global feature pipeline. Streams per-sequence inputs, builds a candidate pool with proportional per-entry contribution, and selects templates using the configured strategy.

Parameters:

Name	Description	Default
`strategy`	Template selection method — "random" for uniform random sampling, "farthest_first" for greedy diversity maximization. Default: "random".	required
`n_templates`	Number of templates to select (required).	required
`pool`	PoolConfig controlling candidate pool size, allocation strategy, and per-entry caps. Default: PoolConfig().	required
`random_state`	Random seed for reproducibility. Default: 42.	required
`pair_filter`	Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.	required

Params ¶

Bases: Params

ExtractTemplates parameters.

Attributes:

Name	Type	Description
`strategy`	`Literal['random', 'farthest_first']`	Selection strategy. Default "random".
`n_templates`	`int`	Number of templates to select. Required.
`pool`	`PoolConfig`	Pool configuration. Default PoolConfig().
`random_state`	`int`	Random seed. Default 42.

FFGroups ¶

FFGroups(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence fission-fusion grouping metrics.

Inputs: raw tracks (columns: x, y, id, frame/time, group, sequence). Outputs per (frame, id): - group_membership (component label) - group_size (size of that component) - event (event id from dp.get_events_info, -1 if not in an event)

Parameters:

Name	Description	Default
`distance_cutoff`	Pairwise distance threshold below which two animals are considered in the same group. Default: 50.0.	required
`window_size`	Sliding-window size (frames) for smoothing the pairwise distance matrix before thresholding. Default: 5.	required
`min_event_duration`	Minimum number of contiguous frames for a stable subgroup to be registered as an event. Default: 1.	required

FFGroupsMetrics ¶

FFGroupsMetrics(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence summary of focal-fish group metrics.

Per-frame computed (internal): - distance_from_centroid, xrot_to_centroid, yrot_to_centroid, dev_speed_to_mean Summaries (output: one row per id within sequence): - fractime_norm2 - avg_duration_frame - med_duration_frame - ftime_periphery - ftime_periphery_norm

Parameters:

Name	Description	Default
`group_col`	Column name that identifies group events (e.g. from FFGroups output). Default: "event".	required
`speed_col`	Column name for speed values. Default: "speed".	required
`time_chunk_sec`	If set, split each sequence into time-based chunks of this duration (seconds) and compute summaries per chunk. Default: None (whole sequence).	required
`frame_chunk`	If set, split each sequence into frame-based chunks of this size and compute summaries per chunk. Default: None.	required
`centroid_heading_col`	Column for centroid heading used in rotation calculations. Default: "centroid_heading".	required
`exclude_cols`	List of boolean column names (e.g. "bad_frame") whose truthy rows are dropped before computation. Default: [].	required

Feature ¶

Bases: Protocol

Feature protocol -- 4 attributes, 4 methods.

FeralFeature ¶

FeralFeature(inputs: Inputs, params: dict[str, object] | None = None)

FERAL vision-transformer behavior classifier as a pipeline feature.

Supports two operating modes:

Training mode (video_dir + label_json + training): Runs the full FERAL ViT fine-tuning loop, saves checkpoints, evaluates the test split (if present), then applies to all sequences in the apply phase.

Inference mode (model_dir): Loads a pre-trained FERAL model and runs per-frame behavior classification on crop videos.

Supports two input formats for the apply phase:

InteractionCropPipeline output (pair-level): One row per crop video with video_path, id_a, id_b, target_id, interaction_id, start_frame, end_frame.
EgocentricCrop output (individual-level): One row per frame with target_id, frame. Videos are derived as egocentric_id{target_id}.mp4.

Params¶

feral_code_dir : Path Path to a local clone of https://github.com/Skovorp/feral. model_name : str HuggingFace model name (default: V-JEPA2 ViT-L). predict_per_item : int Predictions per chunk (default 64). chunk_length : int Frames per video chunk (default 64). chunk_shift : int Stride between chunks for overlapping inference (default 32). chunk_step : int Frame sampling step within chunks (default 1). resize_to : int Input resolution for ViT (default 256). device : str PyTorch device (default "cuda"). class_names : dict | None Class index -> name mapping. Auto-detected from model config. decision_threshold : float | None Probability threshold for positive class. None uses argmax. default_class : int Fallback class when no class exceeds threshold (default 0). model_dir : Path | None Directory with model_best.pt + config.json (inference mode). video_dir : Path | None Directory containing crop videos (training mode). label_json : Path | None Path to FERAL-format label JSON with splits (training mode). training : FeralTrainingConfig | None Training hyperparameters. None = inference-only mode.

bind_dataset ¶

bind_dataset(ds)

Store dataset reference for resolving media paths.

fit ¶

fit(inputs: InputStream) -> None

Train a FERAL model or verify pre-trained model is loaded.

In training mode (video_dir + label_json + training set), runs the full ViT fine-tuning loop with intermediate checkpoints. After training, evaluates the test split if present.

In inference mode (model_dir set), the model is already loaded by load_state() and this method is not called.

The inputs argument is not consumed -- FERAL reads video files directly from params.video_dir.

FeralTrainingConfig ¶

Bases: StrictModel

Training hyperparameters for FERAL ViT fine-tuning.

These mirror the FERAL default_vjepa.yaml configuration.

GlobalIdentityModel ¶

GlobalIdentityModel(inputs: Inputs, params: dict[str, object] | None = None)

Train a visual identity model from individual animal sequences.

Takes EgocentricCrop output as input. Each identity is specified as a mapping of identity names to lists of sequences containing that individual alone. Trains a V200 CNN classifier (T-Rex-compatible) and exports weights loadable via visual_identification_model_path.

Example::

ego_result = dataset.run_feature(ego_crop)

identity_model = GlobalIdentityModel(
    Inputs((Result(feature="egocentric-crop"),)),
    params={
        "identities": {
            "mouse_A": ["cage1/day1_mouseA_alone", "cage1/day3_mouseA_alone"],
            "mouse_B": ["cage1/day1_mouseB_alone"],
            "mouse_C": ["cage1/day2_mouseC_alone"],
            "mouse_D": ["cage1/day1_mouseD_alone"],
        },
        "image_size": (128, 128),
        "channels": 1,
    },
)
result = dataset.run_feature(identity_model)

Parameters:

Name	Description	Default
`identities`	Explicit identity -> sequences mapping. Keys are identity names, values are lists of "group/sequence" strings.	required
`group_as_identity`	Convenience shortcut -- treat each group name as one identity. Default False.	required
`image_size`	Crop resize target (height, width). Default (128, 128).	required
`channels`	Number of image channels (1=grayscale, 3=color). Default 1.	required
`epochs`	Training epochs. Default 150.	required
`learning_rate`	Adam learning rate. Default 0.0001.	required
`batch_size`	Training batch size. Default 64.	required
`val_split`	Fraction of data reserved for validation. Default 0.2.	required
`max_images_per_identity`	Cap on images per identity to balance classes. Default 2000.	required
`export_trex_weights`	Save a T-Rex-loadable .pth file. Default True.	required
`trex_weights_name`	Stem of the exported .pth file. Default "identity_model".	required

Params ¶

Bases: Params

Global identity model parameters.

apply ¶

apply(df: DataFrame) -> pd.DataFrame

Passthrough -- identity predictions are applied by T-Rex, not Mosaic.

GlobalKMeansClustering ¶

GlobalKMeansClustering(inputs: Inputs, params: dict[str, object] | None = None)

Global K-Means clustering on templates loaded via load_state. Per-sequence cluster assignment is done in apply().

Parameters:

Name	Description	Default
`templates`	Templates artifact to fit on (inherited from GlobalModelParams).	required
`model`	Pre-fitted KMeansModelArtifact to load (skip fit). Default: KMeansModelArtifact().	required
`k`	Number of clusters. Default: 100.	required
`random_state`	Random seed for KMeans initialization. Default: 42.	required
`n_init`	Number of KMeans initializations to run. Default: "auto".	required
`max_iter`	Maximum iterations per KMeans run. Default: 300.	required
`device`	Compute device — "cpu" or "cuda" (requires cuML). Default: "cpu".	required
`label_artifact_points`	If True, assign cluster labels to the template points used for fitting. Default: True.	required
`pair_filter`	Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.	required

Params ¶

Bases: GlobalModelParams[KMeansModelArtifact]

Global K-means clustering parameters.

Attributes:

Name	Type	Description
`templates`	`ParquetArtifact \| None`	Templates artifact to fit on (inherited).
`model`	`KMeansModelArtifact \| None`	Pre-fitted KMeans model artifact (skip fit).
`k`	`int`	Number of clusters. Default 100.
`random_state`	`int`	Random seed. Default 42.
`n_init`	`Literal['auto'] \| int`	KMeans initializations. Default "auto".
`max_iter`	`int`	Max iterations per run. Default 300.
`device`	`str`	Compute device. Default "cpu".
`label_artifact_points`	`bool`	Label points used for fitting. Default True.
`pair_filter`	`NNResult \| None`	Nearest-neighbor pair filter for dependency resolution. Default None.

GlobalModelParams ¶

Bases: Params, Generic[M]

Base params for global features that fit on a templates artifact or load a pre-fitted model.

Type parameter M is the model artifact type (must extend JoblibArtifact). Exactly one of templates or model must be provided.

Both fields use default_factory so that from_overrides() merges partial dicts correctly. The _exclusive_source validator checks model_fields_set and nulls out the field that was not provided.

Attributes:

Name	Type	Description
`templates`	`ParquetArtifact \| None`	Templates artifact to fit from. Mutually exclusive with model.
`model`	`M \| None`	Pre-fitted model artifact. Mutually exclusive with templates.

GlobalScaler ¶

GlobalScaler(inputs: Inputs, params: dict[str, object] | None = None)

Fit a StandardScaler on templates and scale per-sequence data.

Consumes a templates artifact (from ExtractTemplates or any feature producing templates.parquet). Produces a scaler model bundle and scaled templates.

Parameters:

Name	Type	Description	Default
`templates`		Templates artifact to fit the scaler on (inherited from GlobalModelParams).	required
`model`		Pre-fitted ScalerModelArtifact to load (skip fit). Default: ScalerModelArtifact().	required

Params ¶

Bases: GlobalModelParams[ScalerModelArtifact]

GlobalScaler parameters.

Attributes:

Name	Type	Description
`templates`	`ParquetArtifact \| None`	Templates artifact to fit scaler on.
`model`	`ScalerModelArtifact \| None`	Pre-fitted scaler model artifact (skip fit).

GlobalTSNE ¶

GlobalTSNE(inputs: Inputs, params: dict[str, object] | None = None)

Fit an openTSNE embedding on templates and map per-sequence data.

Consumes a templates artifact (from ExtractTemplates, GlobalScaler, or any feature producing templates). Produces an embedding model bundle and template coordinates.

Parameters:

Name	Description	Default
`templates`	Templates artifact to fit embedding on (inherited from GlobalModelParams).	required
`model`	Pre-fitted TSNEModelArtifact to load (skip fit). Default: TSNEModelArtifact().	required
`random_state`	Random seed. Default: 42.	required
`perplexity`	t-SNE perplexity parameter. Default: 50.	required
`knn_method`	kNN backend — "annoy", "faiss", or "faiss-gpu". Default: "annoy".	required
`n_jobs`	Number of parallel jobs for openTSNE. Default: 8.	required
`fit`	TSNEFitConfig controlling learning rate, exaggeration iterations, momentum, etc. Default: TSNEFitConfig().	required
`mapping`	TSNEMapConfig controlling partial-embedding parameters (k, iterations, chunk_size, etc.). Default: TSNEMapConfig().	required

Params ¶

Bases: GlobalModelParams[TSNEModelArtifact]

Global t-SNE parameters.

Attributes:

Name	Type	Description
`templates`	`ParquetArtifact \| None`	Templates artifact to fit embedding on.
`model`	`TSNEModelArtifact \| None`	Pre-fitted embedding model artifact (skip fit).
`random_state`	`int`	Random seed. Default 42.
`perplexity`	`int`	t-SNE perplexity. Default 50.
`knn_method`	`str`	kNN method ("annoy", "faiss", "faiss-gpu"). Default "annoy".
`n_jobs`	`int`	Parallel jobs for openTSNE. Default 8.
`fit`	`TSNEFitConfig`	Embedding fitting parameters.
`mapping`	`TSNEMapConfig`	Partial embedding mapping parameters.

GlobalWardClustering ¶

GlobalWardClustering(inputs: Inputs, params: dict[str, object] | None = None)

Ward hierarchical clustering on templates with per-sequence 1-NN assignment.

Parameters:

Name	Description	Default
`templates`	Templates artifact to cluster (inherited from GlobalModelParams).	required
`model`	Pre-fitted WardModelArtifact to load (skip fit). Default: WardModelArtifact().	required
`n_clusters`	Number of clusters to cut from the linkage tree. Default: 20.	required
`method`	Linkage method passed to scipy.cluster.hierarchy.linkage. Default: "ward".	required
`pair_filter`	Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.	required

Params ¶

Bases: GlobalModelParams[WardModelArtifact]

Global Ward clustering parameters.

Attributes:

Name	Type	Description
`templates`	`ParquetArtifact \| None`	Templates artifact to cluster (inherited).
`model`	`WardModelArtifact \| None`	Pre-fitted Ward model artifact (skip fit).
`n_clusters`	`int`	Number of clusters to cut. Default 20.
`method`	`str`	Linkage method. Default "ward".
`pair_filter`	`NNResult \| None`	Nearest-neighbor pair filter. Default None.

GroundTruthLabelsSource ¶

Bases: LabelsSource[Literal['behavior']]

Labels loaded from labels//index.csv.

IdTagColumns ¶

IdTagColumns(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Attach per-id label fields (from labels/) to each frame, so they can be merged via Inputs() and used as categories (e.g., focal/nonfocal).

Outputs per row (same granularity as input tracks/feature): frame/time/id/group/sequence + one column per requested label field.

Parameters:

Name	Description	Default
`labels`	LabelsSource specifying which labels directory to load. Default: LabelsSource(kind="id_tags").	required
`label_kind`	Label subdirectory name used for dependency resolution. Default: "id_tags".	required
`fields`	List of label field names to attach. None means all fields found in the labels file. Default: None.	required
`field_renames`	Optional mapping of original field names to renamed column names in the output. Default: None.	required

Inputs ¶

Bases: RootModel[tuple[InputItem, ...]], Generic[InputItem]

Base class for feature input collections. Mirrors Params.

Each Feature subclasses to narrow allowed input types, paralleling class Params(Params):.

Examples:

Inputs(("tracks",)) Inputs((Result(feature="speed-angvel"),)) Inputs(("tracks", Result(feature="nn", run_id="0.1-abc")))

Per-feature narrowing

class Inputs(Inputs[TrackInput]): pass

Features that take no pipeline inputs

class Inputs(Inputs[Result]): _require: ClassVar[InputRequire] = "empty"

Self-loading features that optionally accept inputs (e.g. fit + assign): class Inputs(Inputs[Result]): _require: ClassVar[InputRequire] = "any"

InputsLike ¶

Bases: Protocol

Read-only interface satisfied by any Inputs[InputItem].

KpmsFeature ¶

KpmsFeature(inputs: Inputs, params: dict[str, object] | None = None)

Unified keypoint-MoSeq feature: fit + apply via persistent subprocess.

Parameters:

Name	Description	Default
`model`	Pre-fitted KpmsModelArtifact to load (skip fit). Default: None (fit from scratch).	required
`kpms_python`	Path to a Python interpreter with keypoint-moseq installed. None uses the bundled external .venv. Default: None.	required
`pose`	Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().	required
`anterior_bodyparts`	List of bodypart names forming the anterior reference (required, min 1 element).	required
`posterior_bodyparts`	List of bodypart names forming the posterior reference (required, min 1 element).	required
`fps`	Frames per second of the input data. Default: 30.	required
`num_iters_ar`	Number of AR-only fitting iterations. Default: 50.	required
`num_iters_full`	Number of full model fitting iterations. Default: 500.	required
`kappa_ar`	AR transition concentration parameter. None lets keypoint-moseq choose. Default: None.	required
`kappa_full`	Full-model transition concentration parameter. None lets keypoint-moseq choose. Default: None.	required
`latent_dim`	Dimensionality of the latent pose space. Must satisfy latent_dim < 2 * num_keypoints. Default: 10.	required
`location_aware`	If True, include centroid location in the model. Default: False.	required
`outlier_scale_factor`	Scale factor for outlier detection. Default: 6.0.	required
`remove_outliers`	If True, remove detected outlier frames before fitting. Default: True.	required
`mixed_map_iters`	Number of mixed MAP iterations. None uses the keypoint-moseq default. Default: None.	required
`parallel_message_passing`	Enable parallel message passing. None uses the keypoint-moseq default. Default: None.	required
`resume`	If True, resume fitting from a previously saved checkpoint. Default: True.	required
`downsample_rate`	Temporal downsampling factor applied before fitting. None disables downsampling. Default: None.	required
`save_every_n_iters`	Save a checkpoint every N iterations during fit. Default: 25.	required
`num_iters_apply`	Number of iterations when applying the model to new data. Default: 500.	required

LightningActionFeature ¶

LightningActionFeature(inputs: Inputs, params: dict[str, object] | None = None)

Supervised temporal action segmentation via lightning-action.

Trains a temporal neural network classifier (DilatedTCN, RNN, or TemporalMLP head + linear classifier) on labeled templates and predicts per-frame action probabilities.

Parameters:

Name	Description	Default
`model`	Pre-fitted LightningActionModelArtifact to load (skip training). Default: LightningActionModelArtifact().	required
`head`	Temporal encoder architecture — "dtcn" (dilated temporal convolution), "rnn" (LSTM/GRU), or "temporalmlp". Default: "dtcn".	required
`num_hid_units`	Hidden units in the temporal encoder. Default: 64.	required
`num_layers`	Number of encoder layers. Default: 2.	required
`num_lags`	Lag/kernel size for temporal context. Default: 4.	required
`activation`	Activation function. Default: "lrelu".	required
`dropout_rate`	Dropout rate. Default: 0.1.	required
`sequence_length`	Training sequence length (frames per chunk). Default: 500.	required
`num_epochs`	Number of training epochs. Default: 200.	required
`batch_size`	Training batch size. Default: 32.	required
`learning_rate`	Optimizer learning rate. Default: 1e-3.	required
`weight_decay`	Optimizer weight decay. Default: 0.0.	required
`optimizer`	Optimizer type. Default: "Adam".	required
`weight_classes`	If True, weight loss by inverse class frequency. Default: True.	required
`device`	Compute device — "cpu" or "gpu". Default: "cpu".	required
`random_state`	Random seed. Default: 42.	required
`decision_threshold`	Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None.	required
`default_class`	Class label assigned when no class exceeds the decision threshold (required).	required

NearestNeighbor ¶

NearestNeighbor(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature computing nearest-neighbor identity and relative kinematics.

Outputs per frame (one row per individual): - nn_id: id of nearest neighbor (NaN if none) - nn_delta_x / nn_delta_y: neighbor position minus focal, world frame - nn_dist: Euclidean distance to nearest neighbor - nn_delta_angle: neighbor heading minus focal, wrapped to [-pi, pi] - nn_delta_x_ego / nn_delta_y_ego: neighbor offset in focal ego frame

NearestNeighborDelta ¶

NearestNeighborDelta(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature that measures how a focal fish changes position/heading/speed over the next diff_numframes frames relative to its nearest neighbor at the current frame.

Expected inputs (via tracks or an Inputs() that merges tracks + nearest-neighbor feature): - position/heading/speed columns for the focal (x, y, ANGLE, speed_col) - nearest-neighbor id column (nn_id_col, default: 'nn_id') - neighbor offsets in ego frame (nn_delta_x_ego / nn_delta_y_ego); if missing, world offsets (nn_delta_x / nn_delta_y) are rotated using the focal heading.

Outputs per focal row (filtered to frames with a valid future sample diff_numframes ahead): frame, id, group, sequence, nn_id, neighbor_x/y (ego), neighbor_focal (if available), dx, dy, dt, dangle (wrapped; optionally scaled by fps), dspeed, plus passthrough columns like group_size/event/Focal_fish when present.

Parameters:

Name	Description	Default
`sampling`	Frame rate and smoothing settings. Default: SamplingConfig().	required
`speed_col`	Column name for speed. Default: "SPEED#wcentroid".	required
`nn_id_col`	Column name for the nearest-neighbor ID. Default: "nn_id".	required
`nn_dx_ego_col`	Column for neighbor delta-x in ego frame. Default: "nn_delta_x_ego".	required
`nn_dy_ego_col`	Column for neighbor delta-y in ego frame. Default: "nn_delta_y_ego".	required
`nn_dx_world_col`	Fallback column for neighbor delta-x in world frame (used when ego columns are absent). Default: "nn_delta_x".	required
`nn_dy_world_col`	Fallback column for neighbor delta-y in world frame. Default: "nn_delta_y".	required
`focal_col`	Column name for the focal-animal flag. Default: "Focal_fish".	required
`diff_numframes`	Number of frames ahead to compute the future response delta. Default: 4.	required
`wrap_angle`	If True, wrap heading differences to [-pi, pi]. Default: True.	required
`divide_dangle_by_frames`	If True, divide the heading change by diff_numframes. Default: True.	required
`scale_dangle_by_fps`	If True, multiply dangle by fps to convert to radians/sec. Default: True.	required
`tag_cols`	Additional columns to pass through to the output. Default: [].	required

NearestNeighborDeltaBins ¶

NearestNeighborDeltaBins(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Bin nearest-neighbor response fields (dangle, dspeed) over neighbor position.

Inputs: expect outputs from nn-delta-response (neighbor_x/neighbor_y in ego frame, dangle, dspeed, group_size, and focal/neighbor category columns).

tidy DataFrame with mean turn/speed per bin for focal role and neighbor role:

columns: [group, sequence, exp, trial, role, category, group_size, metric, bin_idx, value]

Parameters:

Name	Description	Default
`nbins`	Number of spatial bins along the binning axis. Default: 45.	required
`binmax`	Maximum absolute value for bin edges. Default: 14.0.	required
`max_for_avg`	Maximum neighbor distance used when computing binned-mean responses. Default: 5.0.	required
`antisymm`	If True, use front/back antisymmetric folding for turn-force computation. Default: True.	required
`focal_category_col`	Column name for the focal animal's category flag. Default: "Focal_fish".	required
`neighbor_category_col`	Column name for the neighbor's category flag. Default: "neighbor_focal".	required
`group_size_col`	Column name for group size. Default: "group_size".	required
`exp_col`	Column name for experimental condition. Default: "Exp".	required
`trial_col`	Column name for trial identifier. Default: "Trial".	required
`category_specs`	List of dicts defining derived category columns (keys: source_col, new_col, quantile, op). Default: [].	required
`exclude_cols`	List of boolean column names whose truthy rows are dropped before computation. Default: [].	required
`nonfocal_flag_col`	Column used to flag nonfocal animals. Default: "Focal_fish".	required
`nonfocal_flag_value`	Value in nonfocal_flag_col that marks an animal as nonfocal. Default: False.	required

OrientationRelativeFeature ¶

OrientationRelativeFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Orientation-aware relative features between animal pairs, order-agnostic to pose points.

For each frame and ordered pair (id_a -> id_b): - Express B in A's body frame (using heading angle and global scale). - Emit signed centroid deltas, heading difference, quantiles over B's points in A's frame, and nearest-k distances.

Params ¶

Bases: Params

Orientation-relative feature parameters.

Attributes:

Name	Type	Description
`scale`	`BodyScaleResult`	Body-scale artifact for normalization.
`nearest_k`	`int`	Number of nearest pose-point distances to emit. Default 3.
`quantiles`	`list[float]`	Distance distribution quantiles to compute. Default [0.25, 0.5, 0.75].

PairEgocentricFeatures ¶

PairEgocentricFeatures(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-egocentric' -- per-sequence egocentric + kinematic features for dyads. Produces a row-wise DataFrame with columns: - frame (if available) or time passthrough (only if it's the order col) - perspective: 0 for A->B, 1 for B->A - id1, id2: pair identifiers - feature columns (e.g., A_speed, AB_dx_egoA, ...) - (optionally) group/sequence if present in df, for convenience

This feature is stateless (no fitting). It computes features for all C(n,2) pairs per sequence, cleans/interpolates pose per animal, inner-joins by the chosen order column, and computes A->B and B->A features for each pair.

Parameters:

Name	Description	Default
`interpolation`	Interpolation settings for missing pose data. Default: InterpolationConfig().	required
`sampling`	Frame rate and smoothing settings. Default: SamplingConfig().	required
`pose`	Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().	required
`neck_idx`	Index of the neck keypoint in the pose array, used to compute heading direction. Default: 3.	required
`tail_base_idx`	Index of the tail-base keypoint, paired with neck_idx for heading vector. Default: 6.	required
`center_mode`	How to compute the animal's center — "mean" averages all keypoints, other values use a specific keypoint. Default: "mean".	required

PairInteractionFilter ¶

PairInteractionFilter(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Detect pairwise interaction segments from trajectory data.

For every unique pair of individuals in a sequence, tests per-frame distance and (optionally) angular criteria, applies morphological filtering, and extracts continuous interaction segments that meet a minimum duration.

Output columns (one row per frame per interaction segment): - frame: frame number - id_a, id_b: individual IDs (id_a < id_b by convention) - interaction_id: integer label for the segment within this pair - interaction_start: first frame of this segment - interaction_end: last frame (exclusive) of this segment

Params¶

shift_dist : float Pixel shift along heading before distance check (default 15). Set to 0 to use raw positions without forward shift. max_dist : float Maximum shifted-position distance in pixels (default 40). require_facing : bool If True (default), require individuals to face each other (inverse orientation difference < max_inv_orientation_diff_deg). Set to False for distance-only filtering. max_inv_orientation_diff_deg : float Max angle (degrees) between inverse orientations (default 80). Only used when require_facing=True. min_run_frames : int Minimum continuous frames for a valid interaction (default 250). frame_padding : int Frames to pad before/after each segment (default 10). morphological_structure_size : int Structure element length for binary close/open (default 25). Set to 0 to disable morphological filtering. px_scale : float Scale factor applied to shift_dist and max_dist (default 1.0). Use to adjust for videos with different pixel resolutions. use_pixel_coords : bool If True, use poseX/poseY columns (pixel coordinates) for distance calculations instead of X/Y (world coordinates). Default True since thresholds are in pixel units. pose_head_index : int | None If set and use_pixel_coords is True, use this pose index as the position for distance calculations.

PairPoseDistancePCA ¶

PairPoseDistancePCA(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-posedistance-pca' — builds per-frame pairwise pose-distance features and fits an IncrementalPCA globally; outputs PC scores per sequence (and perspective).

Parameters:

Name	Description	Default
`interpolation`	Interpolation settings for missing pose data. Default: InterpolationConfig().	required
`pose`	Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().	required
`include_intra_A`	If True, include intra-animal A pairwise keypoint distances. Default: True.	required
`include_intra_B`	If True, include intra-animal B pairwise keypoint distances. Default: True.	required
`include_inter`	If True, include inter-animal pairwise keypoint distances. Default: True.	required
`duplicate_perspective`	If True, output both A->B and B->A perspectives per pair. Default: True.	required
`n_components`	Number of PCA components to retain. Default: 6.	required
`batch_size`	Batch size for IncrementalPCA partial_fit. Default: 5000.	required

PairPositionFeatures ¶

PairPositionFeatures(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-position' -- per-sequence egocentric + kinematic features for all pairs.

Unlike PairEgocentricFeatures which requires full pose keypoints, this feature works with minimal input: just (x, y, angle) per animal.

For N animals per sequence, computes features for all N*(N-1)/2 unique pairs, each with two perspectives (A->B and B->A).

Output columns (per row): - frame: frame number - perspective: 0 for A->B, 1 for B->A - id1, id2: IDs of the two animals in this pair - A_speed, A_v_para, A_v_perp, A_ang_speed: focal kinematics - A_heading_cos, A_heading_sin: focal heading - AB_dist: inter-animal distance - AB_dx_egoA, AB_dy_egoA: partner position in focal's egocentric frame - rel_heading_cos, rel_heading_sin: relative heading - B_speed, B_v_para, B_v_perp, B_ang_speed: partner kinematics - (optionally) group, sequence for convenience

Parameters:

Name	Type	Description	Default
`interpolation`		Interpolation settings for missing position data. Default: InterpolationConfig().	required
`sampling`		Frame rate and smoothing settings. Default: SamplingConfig().	required

PairWavelet ¶

PairWavelet(inputs: Inputs, params: dict[str, object] | None = None)

CWT spectrograms on PairPoseDistancePCA outputs.

Expects input df to contain columns

'perspective' (0 = A->B, 1 = B->A)
'frame' (preferred) or 'time' (if used as order column)
PC0..PC{k-1} (k = number of PCA components)

Returns a DataFrame with columns

frame (or time if that was the order col)
perspective
W_{col}_f{fi} (log-power, clamped, for each component x frequency) and (optionally) passthrough group/sequence if present in df.

Stateless (no fitting). FPS is inferred from constant df['fps'] if present, otherwise from fps_default. Frequencies are dyadically spaced in [f_min, f_max].

Parameters:

Name	Description	Default
`sampling`	Frame rate and smoothing settings. Default: SamplingConfig().	required
`f_min`	Minimum frequency in Hz for the CWT band. Default: 0.2.	required
`f_max`	Maximum frequency in Hz for the CWT band. Default: 5.0.	required
`n_freq`	Number of frequency bins (dyadically spaced between f_min and f_max). Default: 25.	required
`wavelet`	PyWavelets wavelet name. Default: "cmor1.5-1.0".	required
`log_floor`	Floor value for log-power clamping. Default: -3.0.	required
`pc_prefix`	Column prefix used to auto-detect PC input columns (e.g. "PC0", "PC1", ...). Default: "PC".	required
`cols`	Explicit list of input column names. If None, columns are auto-detected using pc_prefix. Default: None.	required

Result ¶

Bases: StrictModel, Generic[F]

Reference to a prior feature's output as pipeline input.

Attributes:

Name	Type	Description
`feature`	`F`	Feature name whose output to consume.
`run_id`	`str \| None`	Specific run ID, or None for latest finished run.

use_latest ¶

use_latest() -> Self

Return a copy with run_id=None (resolves to latest run).

ResultColumn ¶

Bases: Result[str]

Reference to a column in a feature's standard parquet output.

Attributes:

Name	Type	Description
`feature`	`str`	Source feature name.
`column`	`str`	Column name to extract from the parquet output.
`run_id`	`str \| None`	Specific run ID, or None for latest.

from_result ¶

from_result(result: Result[str]) -> Self

Return a copy with feature and run_id set from another Result.

SpeedAngvel ¶

SpeedAngvel(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature computing translational speed and angular velocity.

Outputs (per frame): - speed: displacement magnitude between consecutive frames divided by dt - angvel: wrapped heading difference (rad) divided by dt - speed_step / angvel_step: same, but using a configurable step_size (omitted if step_size is None) - speed_smooth: Savitzky-Golay smoothed speed (polyorder=1), only present when smooth_window is set in Params

Time-delta (dt) computation: Speed and angular velocity require dividing by a time interval. The source for dt is chosen by priority:

frame + fps (recommended for constant-fps video): when fps is set in Params, dt is computed as frame_diff / fps. This is immune to irregular real timestamps that some trackers embed in the time column (e.g. TRex uses wall-clock timestamps that may jitter by several milliseconds per frame). It also correctly handles frame gaps from dropped/bad frames.
time column: if fps is not set but a time column exists, dt is computed from consecutive time differences.
array index: last resort when neither frame+fps nor time is available — assumes each row is one step apart.

For most video-based tracking data, setting fps is strongly recommended to avoid speed artifacts from timestamp jitter.

Parameters:

Name	Description	Default
`step_size`	If set, also compute speed_step / angvel_step using this frame step (in addition to step=1). Default: None.	required
`smooth_window`	If set, apply Savitzky-Golay smoothing (polyorder=1) over this many frames to produce speed_smooth. Default: None.	required
`fps`	Frames per second. When set, dt is derived from frame_diff/fps instead of the time column — more robust for constant-fps data with jittery timestamps. Default: None.	required

TemporalStackingFeature ¶

TemporalStackingFeature(inputs: Inputs, params: dict[str, object] | None = None)

Build temporal context windows over per-sequence feature data.

Parameters:

Name	Description	Default
`half`	Half-width of the temporal window in frames. The full window spans [-half, +half]. Default: 60.	required
`skip`	Step size between time offsets in the stacking window. Default: 5.	required
`use_temporal_stack`	If True, concatenate Gaussian-smoothed copies at each time offset. Default: True.	required
`sigma_stack`	Gaussian sigma (in frames) for smoothing before stacking. 0 disables smoothing. Default: 30.0.	required
`add_pool`	If True, append pooled statistics (e.g. mean, std) computed over a sliding Gaussian window. Default: True.	required
`pool_stats`	Tuple of pooled statistics to compute. Supported: "mean", "std", "variance". Default: ("mean",).	required
`sigma_pool`	Gaussian sigma (in frames) for the pooling window. Default: 30.0.	required
`fps`	Frames per second; used to convert win_sec to frames. Default: 30.0.	required
`win_sec`	Pooling window width in seconds. Default: 0.5.	required
`pair_filter`	Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.	required

TrajectorySmooth ¶

TrajectorySmooth(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature that smooths and interpolates trajectory positions.

Pipeline (per individual): 1. Bad-frame detection: flag frames with speed > speed_threshold, expand flagged region by expand_frames in each direction. 2. Interpolation: set positions to NaN at bad frames, linearly interpolate, forward/backward fill edges. Controlled separately for centroid (interpolate_centroid) and pose (interpolate_pose). 3. Savgol smoothing: apply savgol_filter to centroid X/Y and all pose columns (always, regardless of interpolation flags).

Output is the full track DataFrame with smoothed positions replacing originals, plus a bad_frame boolean column. Downstream features can consume this via Inputs((Result(feature="trajectory-smooth"),)).

Parameters:

Name	Description	Default
`speed_threshold`	Speed above which a frame is flagged as bad. When `fps` is set, interpreted as units/sec (e.g. 40 cm/s); otherwise units/frame. Default: None (no bad-frame detection).	required
`fps`	Frames per second. When provided, `speed_threshold` is converted from units/sec to units/frame internally. Default: None.	required
`interpolate_centroid`	If True, replace bad-frame centroid positions with linear interpolation. Default: True.	required
`interpolate_pose`	If True, replace bad-frame pose keypoint positions with linear interpolation. Default: False.	required
`expand_frames`	Number of frames to expand the bad-frame region in each direction. Default: 2.	required
`savgol_window`	Window length for Savitzky-Golay smoothing. Must be odd and >= savgol_polyorder + 1. None disables smoothing. Default: None.	required
`savgol_polyorder`	Polynomial order for Savitzky-Golay filter. Default: 2.	required

XgboostFeature ¶

XgboostFeature(inputs: Inputs, params: dict[str, object] | None = None)

XGBoost behavior classifier as a pipeline feature.

Trains on labeled templates (from ExtractLabeledTemplates) and runs per-sequence inference. Supports multiclass and one-vs-rest strategies.

Parameters:

Name	Description	Default
`model`	Pre-fitted XgboostModelArtifact to load (skip training). Default: XgboostModelArtifact().	required
`strategy`	Classification strategy — "multiclass" trains a single multi-class model; "one_vs_rest" trains one binary classifier per class. Default: "multiclass".	required
`decision_threshold`	Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None.	required
`default_class`	Class label assigned when no class exceeds the decision threshold (required).	required
`class_weight`	If "balanced", adjust sample weights inversely proportional to class frequency. Default: "balanced".	required
`use_smote`	If True, apply SMOTE oversampling to the training set. Default: False.	required
`undersample_ratio`	If set, undersample majority classes to this ratio relative to the minority class before SMOTE. Default: None.	required
`n_estimators`	Number of boosting rounds. Default: 100.	required
`max_depth`	Maximum tree depth. Default: 6.	required
`learning_rate`	Boosting learning rate. Default: 0.1.	required
`subsample`	Fraction of training samples used per tree. Default: 0.8.	required
`colsample_bytree`	Fraction of features used per tree. Default: 0.8.	required
`random_state`	Random seed for reproducibility. Default: 42.	required

approach_avoidance ¶

ApproachAvoidance feature.

Detects approach-avoidance (AA) events for all C(n,2) unordered pairs per sequence.

Default decision logic follows trajognize AA

role-specific speed thresholds (approacher vs avoider)
distance threshold
cosine thresholds between velocity and pair direction
approacher forward-motion gate vs body orientation
minimum event continuity (min_event_count of min_event_length frames)

Optional sliding-window averaging can be enabled, but it is OFF by default to preserve trajognize-style framewise behavior.

Output columns (per frame × pair): - frame, id1, id2 (canonical order: id1 < id2) - label_id: primary non-directional AA label for visualization compatibility - aa_event: 1 if either direction is active - aa_event_12: 1 if id1 approaches and id2 avoids - aa_event_21: 1 if id2 approaches and id1 avoids - sequence, group (metadata pass-through)

ApproachAvoidance ¶

ApproachAvoidance(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'approach-avoidance' — per-sequence AA event detection for all pairs.

For N animals per sequence, evaluates all N*(N-1)/2 unique unordered pairs. The output stores directional events as aa_event_12 and aa_event_21 over canonical (id1,id2), plus aa_event/label_id as non-directional union.

Parameters:

Name	Description	Default
`interpolation`	Interpolation settings for missing data. Default: InterpolationConfig().	required
`sampling`	Frame rate and smoothing settings. Default: SamplingConfig().	required
`velocity_units`	Whether speed thresholds are in "per_frame" or "per_second". Default: "per_frame".	required
`angle_units`	Unit for heading angles — "radians", "degrees", or "auto" (detect from data range). Default: "radians".	required
`consecutive_frame_delta`	Expected frame step between consecutive rows; used to detect gaps. Default: 1.0.	required
`distance_threshold`	Maximum inter-animal distance (in position units) for a frame to be considered AA-eligible. Default: 200.0.	required
`approacher_velocity_threshold`	Minimum speed of the approaching animal. Default: 5.0.	required
`avoider_velocity_threshold`	Minimum speed of the avoiding animal. Default: 5.0.	required
`cos_approacher_threshold`	Minimum cosine between the approacher's velocity vector and the direction toward the partner. Default: 0.8.	required
`cos_avoider_threshold`	Minimum cosine between the avoider's velocity vector and the direction away from the partner. Default: 0.5.	required
`min_event_length`	Minimum number of contiguous qualifying frames to form an event. Default: 10.	required
`min_event_count`	Minimum number of qualifying frames within an event run to keep it. Default: 5.	required
`orientation_gate_cos`	If set, require the approacher's body orientation to align with its velocity (cos threshold). Default: cos(30°) ≈ 0.866. None disables the gate.	required
`smooth_window_sec`	If set, apply a sliding-window average (in seconds) to velocities before thresholding. Default: None (disabled; framewise behaviour).	required

extract_events `staticmethod` ¶

extract_events(aa_df: DataFrame, min_duration: int = 1) -> pd.DataFrame

Convert per-frame AA output into a compact event table.

Parameters¶

aa_df : DataFrame Per-frame output with columns: frame, id1, id2, aa_event, aa_event_12, aa_event_21. May span multiple sequences/groups (they are handled independently). min_duration : int Minimum event length in frames. Events shorter than this are discarded.

Returns¶

DataFrame with columns: id1, id2, start_frame, end_frame, duration, direction ('12' if id1→id2, '21' if id2→id1, 'both'), approacher_id, avoider_id, sequence (if present), group (if present).

arhmm ¶

AR-HMM global feature.

Fits an autoregressive Hidden Markov Model on arbitrary upstream feature inputs and produces per-frame syllable (state) labels. This is a native mosaic implementation — no KPMS or JAX dependency.

The feature accepts any combination of upstream Result inputs. Mosaic's manifest system merges them via inner join on alignment columns, so the feature receives a single merged DataFrame whose numeric columns are the union of all input features.

ArHmmFeature ¶

ArHmmFeature(inputs: Inputs, params: dict[str, object] | None = None)

AR-HMM behavioral syllable discovery as a pipeline feature.

Fits an autoregressive Hidden Markov Model across all input sequences and assigns per-frame syllable labels via Viterbi decoding.

Parameters:

Name	Description	Default
`model`	Pre-fitted ArHmmModelArtifact to load (skip fit). Default: None (fit from scratch).	required
`pca_dim`	Number of PCA components for dimensionality reduction before fitting. None skips PCA. Default: None.	required
`n_states`	Maximum number of HMM states (pruned after fit). Default: 50.	required
`n_lags`	AR order (number of lagged frames as regressors). Default: 1.	required
`sticky_weight`	Extra pseudo-count on the diagonal of the transition matrix (encourages state persistence). Default: 100.0.	required
`n_iter`	Maximum EM iterations per restart. Default: 200.	required
`tol`	Convergence tolerance on relative LL change. Default: 1e-4.	required
`n_restarts`	Number of random restarts (best LL kept). Default: 1.	required
`standardize`	If True, z-score features before fitting. Default: True.	required
`downsample_rate`	Temporal downsampling factor. None disables. Default: None.	required
`prune_threshold`	Drop states with posterior mass below this fraction. Default: 0.01.	required
`random_state`	Random seed. Default: 42.	required

ArHmmModelArtifact ¶

Bases: JoblibArtifact[ArHmmModelBundle]

Fitted AR-HMM model bundle (arhmm_model.joblib).

arhmm_model ¶

Autoregressive Hidden Markov Model (AR-HMM) with EM fitting.

A standalone implementation using numpy/scipy — no external HMM library required. Fits switching autoregressive dynamics with sticky transitions via Expectation–Maximisation and decodes the most-likely state sequence with the Viterbi algorithm.

This module has no mosaic imports and can be tested independently.

ARHMM `dataclass` ¶

ARHMM(n_states: int = 50, n_lags: int = 1, sticky_weight: float = 100.0, n_iter: int = 200, tol: float = 0.0001, n_restarts: int = 1, random_state: int | None = None, A_: ndarray | None = None, Q_: ndarray | None = None, Q_cho_: list | None = None, Q_logdet_: ndarray | None = None, log_transmat_: ndarray | None = None, log_startprob_: ndarray | None = None, n_features_: int | None = None, active_states_: ndarray | None = None)

Autoregressive Hidden Markov Model.

Each of the K discrete states owns an AR(n_lags) linear model:

x_t = A_k @ [x_{t-1}; ...; x_{t-nlags}; 1] + ε,   ε ~ N(0, Q_k)

Transitions between states are governed by a K × K matrix with a sticky prior that encourages self-transitions (controlled by sticky_weight).

Parameters¶

n_states : int Maximum number of hidden states. n_lags : int AR order (number of lagged frames used as regressors). sticky_weight : float Extra pseudo-count added to the diagonal of the transition matrix during M-step updates. Larger values → states persist longer. n_iter : int Maximum EM iterations per restart. tol : float Convergence threshold on relative change in log-likelihood. n_restarts : int Number of random restarts; the best (highest LL) is kept. random_state : int | None Seed for reproducibility.

fit ¶

fit(sequences: list[ndarray]) -> ARHMM

Fit the AR-HMM via EM on sequences.

Parameters¶

sequences : list of ndarray, each shape (T_i, D) Feature matrices for each sequence.

Returns¶

self

predict ¶

predict(X: ndarray) -> np.ndarray

Viterbi decoding → per-frame state labels.

Parameters¶

X : ndarray, shape (T, D)

Returns¶

labels : ndarray of int32, shape (T,) State assignments. The first n_lags frames are assigned the same state as frame n_lags (the earliest decodable frame).

prune_states ¶

prune_states(sequences: list[ndarray], threshold: float = 0.01) -> None

Drop states whose posterior mass is below threshold.

Re-indexes the remaining states to 0..K'-1.

score ¶

score(X: ndarray) -> float

Log-likelihood of X under the fitted model.

body_scale ¶

BodyScaleFeature feature.

Extracted from features.py as part of feature_library modularization.

BodyScaleFeature ¶

BodyScaleFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-frame body scale: median intra-animal pose distance.

Outputs per sequence parquet with columns: frame, id, scale, sequence, group. Intended to be averaged later (per sequence or dataset) to derive a single normalization constant for downstream orientation features.

external ¶

External tool runners for mosaic.

Scripts in this directory bridge mosaic with external packages that have incompatible dependencies or restrictive licenses. They are invoked via subprocess using a separate Python environment.

kpms_protocol ¶

Shared protocol models and wire helpers for the kpms server/client.

Defines the request/response Pydantic models and the newline-delimited JSON framing used over Unix domain sockets. Importable from both the main mosaic environment (client) and the external .venv (server).

Dependencies: pydantic, numpy (available in both environments).

check_latent_dim ¶

check_latent_dim(latent_dim: int, num_keypoints: int) -> None

Raise ValueError if latent_dim exceeds (num_keypoints - 1) * 2.

receive_message ¶

receive_message(conn: socket) -> bytes

Read a single newline-terminated line from conn.

send_message ¶

send_message(conn: socket, message: BaseModel) -> None

Send a newline-terminated JSON message.

kpms_server ¶

Persistent subprocess server for keypoint-moseq operations.

Runs in the external .venv (keypoint-moseq environment). Imports JAX and keypoint-moseq once at startup, then serves commands over a Unix domain socket.

Commands: add_track, fit, load_model, apply, save_model, shutdown.

Wire protocol: newline-delimited JSON. Arrays are base64-encoded in the JSON with dtype and shape metadata.

Usage::

.venv/bin/python kpms_server.py /tmp/kpms.sock

prctl_set_pdeathsig ¶

prctl_set_pdeathsig() -> None

Ask the kernel to send SIGTERM when the parent process dies.

recv_request ¶

recv_request(conn: socket) -> Request

Read a newline-terminated JSON request.

serve ¶

serve(server: KpmsServer, conn: socket) -> None

Read commands from conn and dispatch to server handlers.

extract_labeled_templates ¶

ExtractLabeledTemplates ¶

ExtractLabeledTemplates(inputs: Inputs, params: dict[str, object] | None = None)

Extract labeled, split-annotated templates from upstream features.

Streams upstream feature data, aligns ground truth labels from NPZ files, assigns train/test splits by sequence, and subsamples per class. Produces a templates parquet with feature columns + label (int) + split (str).

Parameters:

Name	Description	Default
`labels`	GroundTruthLabelsSource specifying where to load per-frame ground-truth labels (required).	required
`strategy`	Template selection method — "random" or "farthest_first". Default: "random".	required
`n_per_class`	Number of templates per class. An int applies uniformly; a dict maps class -> count. Exactly one of n_per_class or n_total must be set. Default: None.	required
`n_total`	Total number of templates across all classes (distributed proportionally). Exactly one of n_per_class or n_total must be set. Default: None.	required
`pool`	PoolConfig controlling candidate pool size and allocation. Default: PoolConfig().	required
`test_fraction`	Fraction of sequences held out for the test split. Default: 0.2.	required
`random_state`	Random seed for reproducibility. Default: 42.	required

LabeledProvenanceArtifact ¶

Bases: ParquetArtifact

Per-entry template provenance (template_provenance.parquet).

LabeledTemplatesArtifact ¶

Bases: ParquetArtifact

Labeled template feature vectors (templates.parquet).

Uses numeric_only=False because the parquet contains the str 'split' column alongside numeric feature columns and int 'label'.

extract_templates ¶

ExtractTemplates ¶

ExtractTemplates(inputs: Inputs, params: dict[str, object] | None = None)

Subsample per-sequence data into a representative template matrix.

Entry point for the global feature pipeline. Streams per-sequence inputs, builds a candidate pool with proportional per-entry contribution, and selects templates using the configured strategy.

Parameters:

Name	Description	Default
`strategy`	Template selection method — "random" for uniform random sampling, "farthest_first" for greedy diversity maximization. Default: "random".	required
`n_templates`	Number of templates to select (required).	required
`pool`	PoolConfig controlling candidate pool size, allocation strategy, and per-entry caps. Default: PoolConfig().	required
`random_state`	Random seed for reproducibility. Default: 42.	required
`pair_filter`	Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.	required

Params ¶

Bases: Params

ExtractTemplates parameters.

Attributes:

Name	Type	Description
`strategy`	`Literal['random', 'farthest_first']`	Selection strategy. Default "random".
`n_templates`	`int`	Number of templates to select. Required.
`pool`	`PoolConfig`	Pool configuration. Default PoolConfig().
`random_state`	`int`	Random seed. Default 42.

ProvenanceArtifact ¶

Bases: ParquetArtifact

Per-entry template provenance (template_provenance.parquet).

TemplatesArtifact ¶

Bases: ParquetArtifact

Template feature vectors (templates.parquet).

feature_template__global ¶

Template for a global feature (clustering, embedding, dimensionality reduction).

Copy this file, rename the class and name, and fill in your logic.

Protocol (4 attributes + 4 methods): - name, version, parallelizable, scope_dependent - load_state(run_root, artifact_paths, dependency_lookups) -> bool - fit(inputs: factory returning iterator of (entry_key, DataFrame)) -> None - save_state(run_root) -> None - apply(df: DataFrame) -> DataFrame

Global features are stateful: fit() iterates over all sequences to build a model, save_state() persists it, and load_state() restores it to skip re-fitting. apply() then maps per-sequence data using the fitted model.

Set scope_dependent = False unless outputs change depending on which sequences are in scope (most global features are scope-independent once fitted).

See GlobalTSNE and GlobalWardClustering for real examples.

MyGlobalFeature ¶

MyGlobalFeature(inputs: Inputs, params: dict[str, object] | None = None)

Template for a global feature.

Global features load data from prior feature outputs (via Result-based inputs), run a cross-sequence algorithm in fit(), and persist the model via save_state(). The apply() method maps per-sequence data using the fitted model.

Typical workflow

load_state() checks for a cached model on disk
fit() iterates over all sequences, accumulates data, runs algorithm
save_state() persists the model to run_root
apply() maps per-sequence data using the fitted model

Params ¶

Bases: Params

Global feature template parameters.

Attributes:

Name	Type	Description
`random_state`	`int`	Random seed. Default 42.

feature_template__per_sequence ¶

Template for a per-sequence feature.

Copy this file, rename the class and name, and fill in your logic.

Protocol (4 attributes + 4 methods): - name, version, parallelizable, scope_dependent - load_state(run_root, artifact_paths, dependency_lookups) -> bool - fit(inputs: factory returning iterator of (entry_key, DataFrame)) -> None - save_state(run_root) -> None - apply(df: DataFrame) -> DataFrame

Per-sequence features are stateless by default: load_state returns True (nothing to restore), fit/save_state are no-ops, and apply does all the work. Set scope_dependent = False unless outputs depend on which sequences are in scope.

See SpeedAngvel for a real per-sequence feature.

MyPerSequenceFeature ¶

MyPerSequenceFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Template for a per-sequence feature.

Input

A DataFrame for a single (group, sequence) from either: * tracks (input_kind="tracks") * another feature (input_kind="feature") * a multi-input Inputs() tuple

Output

A DataFrame with one row per frame (or per frame x pair), with: * frame (or time) * group, sequence * id1, id2 (when pair-aware) * your feature columns

Params ¶

Bases: Params

Per-sequence feature template parameters.

Attributes:

Name	Type	Description
`window_size`	`int`	Sliding window size. Default 15.

apply ¶

apply(df: DataFrame) -> pd.DataFrame

Compute features for a single (group, sequence).

For pair-aware inputs the df may contain multiple (id1, id2) pairs; process each pair independently to avoid mixing contexts.

feral_feature ¶

FeralFeature -- FERAL vision-transformer behavior classifier as a Mosaic pipeline feature.

Supports both training and inference in a single unified feature, following the same global-feature pattern as XgboostFeature and KpmsFeature.

Training mode¶

Provide video_dir, label_json, and a training config dict. The label_json file must contain class_names, splits (with train and optionally val/test keys), and optionally is_multilabel. Training runs the full FERAL ViT fine-tuning loop with intermediate checkpoints saved to disk for crash recovery. After training, the test split (if present) is automatically evaluated.

Inference mode¶

Provide model_dir pointing to a directory with model_best.pt and config.json from a previous training run.

Output follows the same pattern as XgboostFeature: per-frame rows with prob_<class> probability columns and a predicted_label column.

Requires the FERAL code directory (https://github.com/Skovorp/feral). Point feral_code_dir to a local clone of the repository.

FeralFeature ¶

FeralFeature(inputs: Inputs, params: dict[str, object] | None = None)

FERAL vision-transformer behavior classifier as a pipeline feature.

Supports two operating modes:

Training mode (video_dir + label_json + training): Runs the full FERAL ViT fine-tuning loop, saves checkpoints, evaluates the test split (if present), then applies to all sequences in the apply phase.

Inference mode (model_dir): Loads a pre-trained FERAL model and runs per-frame behavior classification on crop videos.

Supports two input formats for the apply phase:

InteractionCropPipeline output (pair-level): One row per crop video with video_path, id_a, id_b, target_id, interaction_id, start_frame, end_frame.
EgocentricCrop output (individual-level): One row per frame with target_id, frame. Videos are derived as egocentric_id{target_id}.mp4.

Params¶

feral_code_dir : Path Path to a local clone of https://github.com/Skovorp/feral. model_name : str HuggingFace model name (default: V-JEPA2 ViT-L). predict_per_item : int Predictions per chunk (default 64). chunk_length : int Frames per video chunk (default 64). chunk_shift : int Stride between chunks for overlapping inference (default 32). chunk_step : int Frame sampling step within chunks (default 1). resize_to : int Input resolution for ViT (default 256). device : str PyTorch device (default "cuda"). class_names : dict | None Class index -> name mapping. Auto-detected from model config. decision_threshold : float | None Probability threshold for positive class. None uses argmax. default_class : int Fallback class when no class exceeds threshold (default 0). model_dir : Path | None Directory with model_best.pt + config.json (inference mode). video_dir : Path | None Directory containing crop videos (training mode). label_json : Path | None Path to FERAL-format label JSON with splits (training mode). training : FeralTrainingConfig | None Training hyperparameters. None = inference-only mode.

bind_dataset ¶

bind_dataset(ds)

Store dataset reference for resolving media paths.

fit ¶

fit(inputs: InputStream) -> None

Train a FERAL model or verify pre-trained model is loaded.

In training mode (video_dir + label_json + training set), runs the full ViT fine-tuning loop with intermediate checkpoints. After training, evaluates the test split if present.

In inference mode (model_dir set), the model is already loaded by load_state() and this method is not called.

The inputs argument is not consumed -- FERAL reads video files directly from params.video_dir.

FeralTrainingConfig ¶

Bases: StrictModel

Training hyperparameters for FERAL ViT fine-tuning.

These mirror the FERAL default_vjepa.yaml configuration.

ffgroups ¶

FFGroups ¶

FFGroups(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence fission-fusion grouping metrics.

Inputs: raw tracks (columns: x, y, id, frame/time, group, sequence). Outputs per (frame, id): - group_membership (component label) - group_size (size of that component) - event (event id from dp.get_events_info, -1 if not in an event)

Parameters:

Name	Description	Default
`distance_cutoff`	Pairwise distance threshold below which two animals are considered in the same group. Default: 50.0.	required
`window_size`	Sliding-window size (frames) for smoothing the pairwise distance matrix before thresholding. Default: 5.	required
`min_event_duration`	Minimum number of contiguous frames for a stable subgroup to be registered as an event. Default: 1.	required

ffgroups_metrics ¶

FFGroupsMetrics ¶

FFGroupsMetrics(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence summary of focal-fish group metrics.

Per-frame computed (internal): - distance_from_centroid, xrot_to_centroid, yrot_to_centroid, dev_speed_to_mean Summaries (output: one row per id within sequence): - fractime_norm2 - avg_duration_frame - med_duration_frame - ftime_periphery - ftime_periphery_norm

Parameters:

Name	Description	Default
`group_col`	Column name that identifies group events (e.g. from FFGroups output). Default: "event".	required
`speed_col`	Column name for speed values. Default: "speed".	required
`time_chunk_sec`	If set, split each sequence into time-based chunks of this duration (seconds) and compute summaries per chunk. Default: None (whole sequence).	required
`frame_chunk`	If set, split each sequence into frame-based chunks of this size and compute summaries per chunk. Default: None.	required
`centroid_heading_col`	Column for centroid heading used in rotation calculations. Default: "centroid_heading".	required
`exclude_cols`	List of boolean column names (e.g. "bad_frame") whose truthy rows are dropped before computation. Default: [].	required

global_kmeans ¶

GlobalKMeansClustering feature.

Extracted from features.py as part of feature_library modularization.

GlobalKMeansClustering ¶

GlobalKMeansClustering(inputs: Inputs, params: dict[str, object] | None = None)

Global K-Means clustering on templates loaded via load_state. Per-sequence cluster assignment is done in apply().

Parameters:

Name	Description	Default
`templates`	Templates artifact to fit on (inherited from GlobalModelParams).	required
`model`	Pre-fitted KMeansModelArtifact to load (skip fit). Default: KMeansModelArtifact().	required
`k`	Number of clusters. Default: 100.	required
`random_state`	Random seed for KMeans initialization. Default: 42.	required
`n_init`	Number of KMeans initializations to run. Default: "auto".	required
`max_iter`	Maximum iterations per KMeans run. Default: 300.	required
`device`	Compute device — "cpu" or "cuda" (requires cuML). Default: "cpu".	required
`label_artifact_points`	If True, assign cluster labels to the template points used for fitting. Default: True.	required
`pair_filter`	Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.	required

Params ¶

Bases: GlobalModelParams[KMeansModelArtifact]

Global K-means clustering parameters.

Attributes:

Name	Type	Description
`templates`	`ParquetArtifact \| None`	Templates artifact to fit on (inherited).
`model`	`KMeansModelArtifact \| None`	Pre-fitted KMeans model artifact (skip fit).
`k`	`int`	Number of clusters. Default 100.
`random_state`	`int`	Random seed. Default 42.
`n_init`	`Literal['auto'] \| int`	KMeans initializations. Default "auto".
`max_iter`	`int`	Max iterations per run. Default 300.
`device`	`str`	Compute device. Default "cpu".
`label_artifact_points`	`bool`	Label points used for fitting. Default True.
`pair_filter`	`NNResult \| None`	Nearest-neighbor pair filter for dependency resolution. Default None.

KMeansArtifactLabelsArtifact ¶

Bases: NpzArtifact

Labels for the artifact points used in fitting (artifact_labels.npz).

KMeansClusterCentersArtifact ¶

Bases: NpzArtifact

Cluster center vectors (cluster_centers.npz).

KMeansClusterSizesArtifact ¶

Bases: ParquetArtifact

Per-cluster sample counts (cluster_sizes.parquet).

KMeansModelArtifact ¶

Bases: JoblibArtifact[KMeansModelBundle]

KMeans model (model.joblib).

global_scaler ¶

GlobalScaler ¶

GlobalScaler(inputs: Inputs, params: dict[str, object] | None = None)

Fit a StandardScaler on templates and scale per-sequence data.

Consumes a templates artifact (from ExtractTemplates or any feature producing templates.parquet). Produces a scaler model bundle and scaled templates.

Parameters:

Name	Type	Description	Default
`templates`		Templates artifact to fit the scaler on (inherited from GlobalModelParams).	required
`model`		Pre-fitted ScalerModelArtifact to load (skip fit). Default: ScalerModelArtifact().	required

Params ¶

Bases: GlobalModelParams[ScalerModelArtifact]

GlobalScaler parameters.

Attributes:

Name	Type	Description
`templates`	`ParquetArtifact \| None`	Templates artifact to fit scaler on.
`model`	`ScalerModelArtifact \| None`	Pre-fitted scaler model artifact (skip fit).

ScaledTemplatesArtifact ¶

Bases: ParquetArtifact

Scaled template vectors (scaled_templates.parquet).

ScalerModelArtifact ¶

Bases: JoblibArtifact[ScalerModelBundle]

Fitted scaler model bundle (scaler.joblib).

global_tsne ¶

GlobalTSNE feature.

GlobalTSNE ¶

GlobalTSNE(inputs: Inputs, params: dict[str, object] | None = None)

Fit an openTSNE embedding on templates and map per-sequence data.

Consumes a templates artifact (from ExtractTemplates, GlobalScaler, or any feature producing templates). Produces an embedding model bundle and template coordinates.

Parameters:

Name	Description	Default
`templates`	Templates artifact to fit embedding on (inherited from GlobalModelParams).	required
`model`	Pre-fitted TSNEModelArtifact to load (skip fit). Default: TSNEModelArtifact().	required
`random_state`	Random seed. Default: 42.	required
`perplexity`	t-SNE perplexity parameter. Default: 50.	required
`knn_method`	kNN backend — "annoy", "faiss", or "faiss-gpu". Default: "annoy".	required
`n_jobs`	Number of parallel jobs for openTSNE. Default: 8.	required
`fit`	TSNEFitConfig controlling learning rate, exaggeration iterations, momentum, etc. Default: TSNEFitConfig().	required
`mapping`	TSNEMapConfig controlling partial-embedding parameters (k, iterations, chunk_size, etc.). Default: TSNEMapConfig().	required

Params ¶

Bases: GlobalModelParams[TSNEModelArtifact]

Global t-SNE parameters.

Attributes:

Name	Type	Description
`templates`	`ParquetArtifact \| None`	Templates artifact to fit embedding on.
`model`	`TSNEModelArtifact \| None`	Pre-fitted embedding model artifact (skip fit).
`random_state`	`int`	Random seed. Default 42.
`perplexity`	`int`	t-SNE perplexity. Default 50.
`knn_method`	`str`	kNN method ("annoy", "faiss", "faiss-gpu"). Default "annoy".
`n_jobs`	`int`	Parallel jobs for openTSNE. Default 8.
`fit`	`TSNEFitConfig`	Embedding fitting parameters.
`mapping`	`TSNEMapConfig`	Partial embedding mapping parameters.

TSNECoordsArtifact ¶

Bases: NpzArtifact

t-SNE coordinates of templates (global_tsne_templates.npz).

TSNEFitConfig ¶

Bases: StrictModel

openTSNE fitting parameters.

Attributes:

Name	Type	Description
`learning_rate`	`float \| str`	Learning rate ("auto" lets openTSNE compute). Default "auto".
`exaggeration_iters`	`int`	Early exaggeration phase iterations. Default 250.
`exaggeration`	`float`	Early exaggeration factor. Default 12.
`exaggeration_momentum`	`float`	Momentum during early exaggeration. Default 0.5.
`iters`	`int`	Refinement phase iterations. Default 750.
`momentum`	`float`	Momentum during refinement. Default 0.8.

TSNEMapConfig ¶

Bases: StrictModel

Parameters for mapping new points into the fitted embedding.

Attributes:

Name	Type	Description
`k`	`int`	Neighbors for partial embedding. Default 25.
`iters`	`int`	Optimization iterations. Default 100.
`learning_rate`	`float`	Learning rate. Default 1.0.
`exaggeration`	`float`	Exaggeration factor. Default 2.0.
`momentum`	`float`	Momentum. Default 0.0.
`chunk_size`	`int`	Chunk size for large sequences. Default 50000.

TSNEModelArtifact ¶

Bases: JoblibArtifact[TSNEModelBundle]

Fitted t-SNE embedding model (embedding.joblib).

global_ward ¶

GlobalWardClustering feature.

Fits Ward hierarchical linkage on templates, cuts at n_clusters, builds centroids, and assigns per-sequence rows via 1-NN.

GlobalWardClustering ¶

GlobalWardClustering(inputs: Inputs, params: dict[str, object] | None = None)

Ward hierarchical clustering on templates with per-sequence 1-NN assignment.

Parameters:

Name	Description	Default
`templates`	Templates artifact to cluster (inherited from GlobalModelParams).	required
`model`	Pre-fitted WardModelArtifact to load (skip fit). Default: WardModelArtifact().	required
`n_clusters`	Number of clusters to cut from the linkage tree. Default: 20.	required
`method`	Linkage method passed to scipy.cluster.hierarchy.linkage. Default: "ward".	required
`pair_filter`	Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.	required

Params ¶

Bases: GlobalModelParams[WardModelArtifact]

Global Ward clustering parameters.

Attributes:

Name	Type	Description
`templates`	`ParquetArtifact \| None`	Templates artifact to cluster (inherited).
`model`	`WardModelArtifact \| None`	Pre-fitted Ward model artifact (skip fit).
`n_clusters`	`int`	Number of clusters to cut. Default 20.
`method`	`str`	Linkage method. Default "ward".
`pair_filter`	`NNResult \| None`	Nearest-neighbor pair filter. Default None.

WardModelArtifact ¶

Bases: JoblibArtifact[WardModelBundle]

Ward linkage model (model.joblib).

helpers ¶

Shared helper functions for feature implementations.

This module contains utility functions used across multiple features in the feature_library to avoid code duplication.

apply_exclude_cols ¶

apply_exclude_cols(df: DataFrame, exclude_cols: list[str] | None) -> pd.DataFrame

Drop rows where any exclude_cols column is truthy.

Silently skips column names not present in df. Returns df unchanged when exclude_cols is empty/None.

clean_animal_track ¶

clean_animal_track(g: DataFrame, data_cols: list[str], order_col: str, config: InterpolationConfig) -> pd.DataFrame

Sort, interpolate, fill, and drop rows with excessive missing data.

clean_tracks_grouped ¶

clean_tracks_grouped(df: DataFrame, group_cols: list[str], data_cols: list[str], order_col: str, config: InterpolationConfig) -> pd.DataFrame

Clean tracks per group, preserving group columns in the result.

Pandas 3.0 excludes group columns from groupby().apply() results. This wrapper uses group_keys=True and resets the index to restore them.

ego_rotate ¶

ego_rotate(dx: ndarray, dy: ndarray, heading: ndarray) -> tuple[np.ndarray, np.ndarray]

Rotate world-frame deltas into ego frame (heading aligned with +x).

ensure_columns ¶

ensure_columns(df: DataFrame, required: list[str]) -> None

Raise ValueError if any required columns are missing from df.

feature_columns ¶

feature_columns(df: DataFrame) -> list[str]

Return the sorted list of numeric feature column names in df.

Excludes standard metadata columns (COLUMNS.meta_set()) and known non-feature columns (id1, id2, entity_level, perspective, fps).

smooth_1d ¶

smooth_1d(x: ndarray, win: int) -> np.ndarray

Moving average with reflected padding.

unwrap_diff ¶

unwrap_diff(theta: ndarray, fps: float) -> np.ndarray

Compute angular velocity from angle array.

wrap_angle ¶

wrap_angle(x: ndarray) -> np.ndarray

Wrap angles to [-pi, pi].

id_tag_columns ¶

IdTagColumns ¶

IdTagColumns(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Attach per-id label fields (from labels/) to each frame, so they can be merged via Inputs() and used as categories (e.g., focal/nonfocal).

Outputs per row (same granularity as input tracks/feature): frame/time/id/group/sequence + one column per requested label field.

Parameters:

Name	Description	Default
`labels`	LabelsSource specifying which labels directory to load. Default: LabelsSource(kind="id_tags").	required
`label_kind`	Label subdirectory name used for dependency resolution. Default: "id_tags".	required
`fields`	List of label field names to attach. None means all fields found in the labels file. Default: None.	required
`field_renames`	Optional mapping of original field names to renamed column names in the output. Default: None.	required

identity_model ¶

GlobalIdentityModel feature.

Trains a T-Rex-compatible visual identification model from egocentric crop images of individual animals. Uses the V200 CNN architecture to produce weights loadable via T-Rex's visual_identification_model_path setting.

GlobalIdentityModel ¶

GlobalIdentityModel(inputs: Inputs, params: dict[str, object] | None = None)

Train a visual identity model from individual animal sequences.

Takes EgocentricCrop output as input. Each identity is specified as a mapping of identity names to lists of sequences containing that individual alone. Trains a V200 CNN classifier (T-Rex-compatible) and exports weights loadable via visual_identification_model_path.

Example::

ego_result = dataset.run_feature(ego_crop)

identity_model = GlobalIdentityModel(
    Inputs((Result(feature="egocentric-crop"),)),
    params={
        "identities": {
            "mouse_A": ["cage1/day1_mouseA_alone", "cage1/day3_mouseA_alone"],
            "mouse_B": ["cage1/day1_mouseB_alone"],
            "mouse_C": ["cage1/day2_mouseC_alone"],
            "mouse_D": ["cage1/day1_mouseD_alone"],
        },
        "image_size": (128, 128),
        "channels": 1,
    },
)
result = dataset.run_feature(identity_model)

Parameters:

Name	Description	Default
`identities`	Explicit identity -> sequences mapping. Keys are identity names, values are lists of "group/sequence" strings.	required
`group_as_identity`	Convenience shortcut -- treat each group name as one identity. Default False.	required
`image_size`	Crop resize target (height, width). Default (128, 128).	required
`channels`	Number of image channels (1=grayscale, 3=color). Default 1.	required
`epochs`	Training epochs. Default 150.	required
`learning_rate`	Adam learning rate. Default 0.0001.	required
`batch_size`	Training batch size. Default 64.	required
`val_split`	Fraction of data reserved for validation. Default 0.2.	required
`max_images_per_identity`	Cap on images per identity to balance classes. Default 2000.	required
`export_trex_weights`	Save a T-Rex-loadable .pth file. Default True.	required
`trex_weights_name`	Stem of the exported .pth file. Default "identity_model".	required

Params ¶

Bases: Params

Global identity model parameters.

apply ¶

apply(df: DataFrame) -> pd.DataFrame

Passthrough -- identity predictions are applied by T-Rex, not Mosaic.

kpms ¶

Unified keypoint-MoSeq feature.

Fits an AR-HMM model and applies it to extract per-frame syllable labels, using a persistent subprocess server to avoid repeated JAX startup costs. The kpms package does NOT need to be installed in the mosaic environment -- only in a separate .venv whose interpreter path is passed via kpms_python.

KpmsFeature ¶

KpmsFeature(inputs: Inputs, params: dict[str, object] | None = None)

Unified keypoint-MoSeq feature: fit + apply via persistent subprocess.

Parameters:

Name	Description	Default
`model`	Pre-fitted KpmsModelArtifact to load (skip fit). Default: None (fit from scratch).	required
`kpms_python`	Path to a Python interpreter with keypoint-moseq installed. None uses the bundled external .venv. Default: None.	required
`pose`	Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().	required
`anterior_bodyparts`	List of bodypart names forming the anterior reference (required, min 1 element).	required
`posterior_bodyparts`	List of bodypart names forming the posterior reference (required, min 1 element).	required
`fps`	Frames per second of the input data. Default: 30.	required
`num_iters_ar`	Number of AR-only fitting iterations. Default: 50.	required
`num_iters_full`	Number of full model fitting iterations. Default: 500.	required
`kappa_ar`	AR transition concentration parameter. None lets keypoint-moseq choose. Default: None.	required
`kappa_full`	Full-model transition concentration parameter. None lets keypoint-moseq choose. Default: None.	required
`latent_dim`	Dimensionality of the latent pose space. Must satisfy latent_dim < 2 * num_keypoints. Default: 10.	required
`location_aware`	If True, include centroid location in the model. Default: False.	required
`outlier_scale_factor`	Scale factor for outlier detection. Default: 6.0.	required
`remove_outliers`	If True, remove detected outlier frames before fitting. Default: True.	required
`mixed_map_iters`	Number of mixed MAP iterations. None uses the keypoint-moseq default. Default: None.	required
`parallel_message_passing`	Enable parallel message passing. None uses the keypoint-moseq default. Default: None.	required
`resume`	If True, resume fitting from a previously saved checkpoint. Default: True.	required
`downsample_rate`	Temporal downsampling factor applied before fitting. None disables downsampling. Default: None.	required
`save_every_n_iters`	Save a checkpoint every N iterations during fit. Default: 25.	required
`num_iters_apply`	Number of iterations when applying the model to new data. Default: 500.	required

lightning_action_feature ¶

Lightning-action supervised temporal action segmentation feature.

Wraps the lightning-action package (Paninski lab, MIT license) as a mosaic global feature. Trains a temporal neural network classifier (DilatedTCN, RNN, or TemporalMLP) on labeled templates and predicts per-frame action probabilities with temporal context.

Requires the optional lightning-action package::

pip install lightning-action

Or install mosaic with the extra::

pip install mosaic-behavior[lightning-action]

LightningActionFeature ¶

LightningActionFeature(inputs: Inputs, params: dict[str, object] | None = None)

Supervised temporal action segmentation via lightning-action.

Trains a temporal neural network classifier (DilatedTCN, RNN, or TemporalMLP head + linear classifier) on labeled templates and predicts per-frame action probabilities.

Parameters:

Name	Description	Default
`model`	Pre-fitted LightningActionModelArtifact to load (skip training). Default: LightningActionModelArtifact().	required
`head`	Temporal encoder architecture — "dtcn" (dilated temporal convolution), "rnn" (LSTM/GRU), or "temporalmlp". Default: "dtcn".	required
`num_hid_units`	Hidden units in the temporal encoder. Default: 64.	required
`num_layers`	Number of encoder layers. Default: 2.	required
`num_lags`	Lag/kernel size for temporal context. Default: 4.	required
`activation`	Activation function. Default: "lrelu".	required
`dropout_rate`	Dropout rate. Default: 0.1.	required
`sequence_length`	Training sequence length (frames per chunk). Default: 500.	required
`num_epochs`	Number of training epochs. Default: 200.	required
`batch_size`	Training batch size. Default: 32.	required
`learning_rate`	Optimizer learning rate. Default: 1e-3.	required
`weight_decay`	Optimizer weight decay. Default: 0.0.	required
`optimizer`	Optimizer type. Default: "Adam".	required
`weight_classes`	If True, weight loss by inverse class frequency. Default: True.	required
`device`	Compute device — "cpu" or "gpu". Default: "cpu".	required
`random_state`	Random seed. Default: 42.	required
`decision_threshold`	Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None.	required
`default_class`	Class label assigned when no class exceeds the decision threshold (required).	required

LightningActionModelArtifact ¶

Bases: JoblibArtifact[LightningActionModelBundle]

Fitted lightning-action model bundle.

movement ¶

Movement library integration for mosaic.

Provides bidirectional conversion between mosaic DataFrames and movement xarray Datasets, plus mosaic features that wrap movement's smoothing, filtering, and interpolation functions.

MovementFilterInterpolate ¶

MovementFilterInterpolate(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Filter low-confidence points and interpolate gaps using movement.

Wraps movement.filtering.filter_by_confidence and movement.filtering.interpolate_over_time.

When no confidence columns (poseP0..N) are present, the confidence filter is skipped and only interpolation of existing NaN gaps is performed.

The output is a full track DataFrame with cleaned positions replacing the originals, so downstream features can chain off the result.

MovementSmooth ¶

MovementSmooth(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Smooth trajectory positions using the movement library.

Wraps movement.filtering.rolling_filter and movement.filtering.savgol_filter to smooth X/Y centroid and/or poseX/poseY keypoint positions.

The output is a full track DataFrame with smoothed positions replacing the originals, so downstream features can chain off the result via Inputs((Result(feature="movement-smooth"),)).

from_movement_dataset ¶

from_movement_dataset(ds: Any, original_df: DataFrame, metadata: dict[str, Any], update_confidence: bool = False) -> pd.DataFrame

Merge a movement xarray Dataset back into a mosaic DataFrame.

Overwrites X/Y and poseX/poseY columns in a copy of original_df with the (smoothed/filtered) values from the Dataset.

Parameters¶

ds : xarray.Dataset movement Dataset with position and confidence data variables. original_df : pd.DataFrame The original mosaic DataFrame to merge into. metadata : dict Metadata returned by to_movement_dataset. update_confidence : bool Whether to also overwrite poseP columns from the Dataset's confidence values. Default False.

Returns¶

pd.DataFrame Copy of original_df with position columns replaced.

to_movement_dataset ¶

to_movement_dataset(df: DataFrame, fps: float | None = None, keypoint_names: list[str] | None = None, include_centroid: bool = True) -> tuple[Any, dict[str, Any]]

Convert a mosaic tracks DataFrame to a movement xarray Dataset.

Parameters¶

df : pd.DataFrame Mosaic tracks DataFrame with columns like X, Y, poseX0..N, poseY0..N, id, frame, etc. fps : float, optional Frames per second. If None, the time dimension uses frame numbers. keypoint_names : list[str], optional Names for the pose keypoints. If None, defaults to "keypoint_0", etc. include_centroid : bool Whether to include the centroid (X, Y) as an additional keypoint named "centroid". Default True.

Returns¶

ds : xarray.Dataset movement poses Dataset with dimensions (time, space, keypoints, individuals). metadata : dict Metadata needed by from_movement_dataset to convert back: individual_ids, frame_index, include_centroid, pose_pairs.

convert ¶

Bidirectional conversion between mosaic DataFrames and movement xarray Datasets.

from_movement_dataset ¶

from_movement_dataset(ds: Any, original_df: DataFrame, metadata: dict[str, Any], update_confidence: bool = False) -> pd.DataFrame

Merge a movement xarray Dataset back into a mosaic DataFrame.

Overwrites X/Y and poseX/poseY columns in a copy of original_df with the (smoothed/filtered) values from the Dataset.

Parameters¶

ds : xarray.Dataset movement Dataset with position and confidence data variables. original_df : pd.DataFrame The original mosaic DataFrame to merge into. metadata : dict Metadata returned by to_movement_dataset. update_confidence : bool Whether to also overwrite poseP columns from the Dataset's confidence values. Default False.

Returns¶

pd.DataFrame Copy of original_df with position columns replaced.

to_movement_dataset ¶

to_movement_dataset(df: DataFrame, fps: float | None = None, keypoint_names: list[str] | None = None, include_centroid: bool = True) -> tuple[Any, dict[str, Any]]

Convert a mosaic tracks DataFrame to a movement xarray Dataset.

Parameters¶

df : pd.DataFrame Mosaic tracks DataFrame with columns like X, Y, poseX0..N, poseY0..N, id, frame, etc. fps : float, optional Frames per second. If None, the time dimension uses frame numbers. keypoint_names : list[str], optional Names for the pose keypoints. If None, defaults to "keypoint_0", etc. include_centroid : bool Whether to include the centroid (X, Y) as an additional keypoint named "centroid". Default True.

Returns¶

ds : xarray.Dataset movement poses Dataset with dimensions (time, space, keypoints, individuals). metadata : dict Metadata needed by from_movement_dataset to convert back: individual_ids, frame_index, include_centroid, pose_pairs.

filter_interp ¶

Movement-based confidence filtering and interpolation feature.

MovementFilterInterpolate ¶

MovementFilterInterpolate(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Filter low-confidence points and interpolate gaps using movement.

Wraps movement.filtering.filter_by_confidence and movement.filtering.interpolate_over_time.

When no confidence columns (poseP0..N) are present, the confidence filter is skipped and only interpolation of existing NaN gaps is performed.

The output is a full track DataFrame with cleaned positions replacing the originals, so downstream features can chain off the result.

smooth ¶

Movement-based trajectory smoothing feature.

MovementSmooth ¶

MovementSmooth(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Smooth trajectory positions using the movement library.

Wraps movement.filtering.rolling_filter and movement.filtering.savgol_filter to smooth X/Y centroid and/or poseX/poseY keypoint positions.

The output is a full track DataFrame with smoothed positions replacing the originals, so downstream features can chain off the result via Inputs((Result(feature="movement-smooth"),)).

nearestneighbor ¶

NearestNeighbor ¶

NearestNeighbor(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature computing nearest-neighbor identity and relative kinematics.

Outputs per frame (one row per individual): - nn_id: id of nearest neighbor (NaN if none) - nn_delta_x / nn_delta_y: neighbor position minus focal, world frame - nn_dist: Euclidean distance to nearest neighbor - nn_delta_angle: neighbor heading minus focal, wrapped to [-pi, pi] - nn_delta_x_ego / nn_delta_y_ego: neighbor offset in focal ego frame

nn_delta_bins ¶

NearestNeighborDeltaBins ¶

NearestNeighborDeltaBins(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Bin nearest-neighbor response fields (dangle, dspeed) over neighbor position.

Inputs: expect outputs from nn-delta-response (neighbor_x/neighbor_y in ego frame, dangle, dspeed, group_size, and focal/neighbor category columns).

tidy DataFrame with mean turn/speed per bin for focal role and neighbor role:

columns: [group, sequence, exp, trial, role, category, group_size, metric, bin_idx, value]

Parameters:

Name	Description	Default
`nbins`	Number of spatial bins along the binning axis. Default: 45.	required
`binmax`	Maximum absolute value for bin edges. Default: 14.0.	required
`max_for_avg`	Maximum neighbor distance used when computing binned-mean responses. Default: 5.0.	required
`antisymm`	If True, use front/back antisymmetric folding for turn-force computation. Default: True.	required
`focal_category_col`	Column name for the focal animal's category flag. Default: "Focal_fish".	required
`neighbor_category_col`	Column name for the neighbor's category flag. Default: "neighbor_focal".	required
`group_size_col`	Column name for group size. Default: "group_size".	required
`exp_col`	Column name for experimental condition. Default: "Exp".	required
`trial_col`	Column name for trial identifier. Default: "Trial".	required
`category_specs`	List of dicts defining derived category columns (keys: source_col, new_col, quantile, op). Default: [].	required
`exclude_cols`	List of boolean column names whose truthy rows are dropped before computation. Default: [].	required
`nonfocal_flag_col`	Column used to flag nonfocal animals. Default: "Focal_fish".	required
`nonfocal_flag_value`	Value in nonfocal_flag_col that marks an animal as nonfocal. Default: False.	required

nn_delta_response ¶

NearestNeighborDelta ¶

NearestNeighborDelta(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature that measures how a focal fish changes position/heading/speed over the next diff_numframes frames relative to its nearest neighbor at the current frame.

Expected inputs (via tracks or an Inputs() that merges tracks + nearest-neighbor feature): - position/heading/speed columns for the focal (x, y, ANGLE, speed_col) - nearest-neighbor id column (nn_id_col, default: 'nn_id') - neighbor offsets in ego frame (nn_delta_x_ego / nn_delta_y_ego); if missing, world offsets (nn_delta_x / nn_delta_y) are rotated using the focal heading.

Outputs per focal row (filtered to frames with a valid future sample diff_numframes ahead): frame, id, group, sequence, nn_id, neighbor_x/y (ego), neighbor_focal (if available), dx, dy, dt, dangle (wrapped; optionally scaled by fps), dspeed, plus passthrough columns like group_size/event/Focal_fish when present.

Parameters:

Name	Description	Default
`sampling`	Frame rate and smoothing settings. Default: SamplingConfig().	required
`speed_col`	Column name for speed. Default: "SPEED#wcentroid".	required
`nn_id_col`	Column name for the nearest-neighbor ID. Default: "nn_id".	required
`nn_dx_ego_col`	Column for neighbor delta-x in ego frame. Default: "nn_delta_x_ego".	required
`nn_dy_ego_col`	Column for neighbor delta-y in ego frame. Default: "nn_delta_y_ego".	required
`nn_dx_world_col`	Fallback column for neighbor delta-x in world frame (used when ego columns are absent). Default: "nn_delta_x".	required
`nn_dy_world_col`	Fallback column for neighbor delta-y in world frame. Default: "nn_delta_y".	required
`focal_col`	Column name for the focal-animal flag. Default: "Focal_fish".	required
`diff_numframes`	Number of frames ahead to compute the future response delta. Default: 4.	required
`wrap_angle`	If True, wrap heading differences to [-pi, pi]. Default: True.	required
`divide_dangle_by_frames`	If True, divide the heading change by diff_numframes. Default: True.	required
`scale_dangle_by_fps`	If True, multiply dangle by fps to convert to radians/sec. Default: True.	required
`tag_cols`	Additional columns to pass through to the output. Default: [].	required

orientation_relative ¶

OrientationRelativeFeature feature.

Extracted from features.py as part of feature_library modularization.

OrientationRelativeFeature ¶

OrientationRelativeFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Orientation-aware relative features between animal pairs, order-agnostic to pose points.

For each frame and ordered pair (id_a -> id_b): - Express B in A's body frame (using heading angle and global scale). - Emit signed centroid deltas, heading difference, quantiles over B's points in A's frame, and nearest-k distances.

Params ¶

Bases: Params

Orientation-relative feature parameters.

Attributes:

Name	Type	Description
`scale`	`BodyScaleResult`	Body-scale artifact for normalization.
`nearest_k`	`int`	Number of nearest pose-point distances to emit. Default 3.
`quantiles`	`list[float]`	Distance distribution quantiles to compute. Default [0.25, 0.5, 0.75].

pair_egocentric ¶

PairEgocentricFeatures feature.

Extracted from features.py as part of feature_library modularization.

PairEgocentricFeatures ¶

PairEgocentricFeatures(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-egocentric' -- per-sequence egocentric + kinematic features for dyads. Produces a row-wise DataFrame with columns: - frame (if available) or time passthrough (only if it's the order col) - perspective: 0 for A->B, 1 for B->A - id1, id2: pair identifiers - feature columns (e.g., A_speed, AB_dx_egoA, ...) - (optionally) group/sequence if present in df, for convenience

This feature is stateless (no fitting). It computes features for all C(n,2) pairs per sequence, cleans/interpolates pose per animal, inner-joins by the chosen order column, and computes A->B and B->A features for each pair.

Parameters:

Name	Description	Default
`interpolation`	Interpolation settings for missing pose data. Default: InterpolationConfig().	required
`sampling`	Frame rate and smoothing settings. Default: SamplingConfig().	required
`pose`	Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().	required
`neck_idx`	Index of the neck keypoint in the pose array, used to compute heading direction. Default: 3.	required
`tail_base_idx`	Index of the tail-base keypoint, paired with neck_idx for heading vector. Default: 6.	required
`center_mode`	How to compute the animal's center — "mean" averages all keypoints, other values use a specific keypoint. Default: "mean".	required

pair_interaction_filter ¶

PairInteractionFilter -- detect pairwise interaction segments from trajectories.

Identifies frames where pairs of individuals meet configurable distance and angular thresholds. Applies morphological filtering to remove noise and enforces a minimum interaction duration.

Typical use cases

Detecting face-to-face interactions (distance + facing criterion)
Proximity-based pair detection (distance only, require_facing=False)
Pre-filtering for expensive downstream processing (e.g. interaction crops)

All thresholds are parameterized and should be tuned per application.

PairInteractionFilter ¶

PairInteractionFilter(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Detect pairwise interaction segments from trajectory data.

For every unique pair of individuals in a sequence, tests per-frame distance and (optionally) angular criteria, applies morphological filtering, and extracts continuous interaction segments that meet a minimum duration.

Output columns (one row per frame per interaction segment): - frame: frame number - id_a, id_b: individual IDs (id_a < id_b by convention) - interaction_id: integer label for the segment within this pair - interaction_start: first frame of this segment - interaction_end: last frame (exclusive) of this segment

Params¶

shift_dist : float Pixel shift along heading before distance check (default 15). Set to 0 to use raw positions without forward shift. max_dist : float Maximum shifted-position distance in pixels (default 40). require_facing : bool If True (default), require individuals to face each other (inverse orientation difference < max_inv_orientation_diff_deg). Set to False for distance-only filtering. max_inv_orientation_diff_deg : float Max angle (degrees) between inverse orientations (default 80). Only used when require_facing=True. min_run_frames : int Minimum continuous frames for a valid interaction (default 250). frame_padding : int Frames to pad before/after each segment (default 10). morphological_structure_size : int Structure element length for binary close/open (default 25). Set to 0 to disable morphological filtering. px_scale : float Scale factor applied to shift_dist and max_dist (default 1.0). Use to adjust for videos with different pixel resolutions. use_pixel_coords : bool If True, use poseX/poseY columns (pixel coordinates) for distance calculations instead of X/Y (world coordinates). Default True since thresholds are in pixel units. pose_head_index : int | None If set and use_pixel_coords is True, use this pose index as the position for distance calculations.

pair_position ¶

PairPositionFeatures - egocentric dyadic features using only (x, y, angle).

Drop-in replacement for PairEgocentricFeatures when pose keypoints are not available. Uses the ANGLE column directly for heading instead of computing from neck->tail vector.

Output columns match PairEgocentricFeatures exactly, enabling use with downstream features like PairWavelet.

PairPositionFeatures ¶

PairPositionFeatures(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-position' -- per-sequence egocentric + kinematic features for all pairs.

Unlike PairEgocentricFeatures which requires full pose keypoints, this feature works with minimal input: just (x, y, angle) per animal.

For N animals per sequence, computes features for all N*(N-1)/2 unique pairs, each with two perspectives (A->B and B->A).

Output columns (per row): - frame: frame number - perspective: 0 for A->B, 1 for B->A - id1, id2: IDs of the two animals in this pair - A_speed, A_v_para, A_v_perp, A_ang_speed: focal kinematics - A_heading_cos, A_heading_sin: focal heading - AB_dist: inter-animal distance - AB_dx_egoA, AB_dy_egoA: partner position in focal's egocentric frame - rel_heading_cos, rel_heading_sin: relative heading - B_speed, B_v_para, B_v_perp, B_ang_speed: partner kinematics - (optionally) group, sequence for convenience

Parameters:

Name	Type	Description	Default
`interpolation`		Interpolation settings for missing position data. Default: InterpolationConfig().	required
`sampling`		Frame rate and smoothing settings. Default: SamplingConfig().	required

pair_wavelet ¶

PairWavelet feature -- CWT spectrograms on PairPoseDistancePCA outputs.

PairWavelet ¶

PairWavelet(inputs: Inputs, params: dict[str, object] | None = None)

CWT spectrograms on PairPoseDistancePCA outputs.

Expects input df to contain columns

'perspective' (0 = A->B, 1 = B->A)
'frame' (preferred) or 'time' (if used as order column)
PC0..PC{k-1} (k = number of PCA components)

Returns a DataFrame with columns

frame (or time if that was the order col)
perspective
W_{col}_f{fi} (log-power, clamped, for each component x frequency) and (optionally) passthrough group/sequence if present in df.

Stateless (no fitting). FPS is inferred from constant df['fps'] if present, otherwise from fps_default. Frequencies are dyadically spaced in [f_min, f_max].

Parameters:

Name	Description	Default
`sampling`	Frame rate and smoothing settings. Default: SamplingConfig().	required
`f_min`	Minimum frequency in Hz for the CWT band. Default: 0.2.	required
`f_max`	Maximum frequency in Hz for the CWT band. Default: 5.0.	required
`n_freq`	Number of frequency bins (dyadically spaced between f_min and f_max). Default: 25.	required
`wavelet`	PyWavelets wavelet name. Default: "cmor1.5-1.0".	required
`log_floor`	Floor value for log-power clamping. Default: -3.0.	required
`pc_prefix`	Column prefix used to auto-detect PC input columns (e.g. "PC0", "PC1", ...). Default: "PC".	required
`cols`	Explicit list of input column names. If None, columns are auto-detected using pc_prefix. Default: None.	required

pairposedistancepca ¶

PairPoseDistancePCA ¶

PairPoseDistancePCA(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

'pair-posedistance-pca' — builds per-frame pairwise pose-distance features and fits an IncrementalPCA globally; outputs PC scores per sequence (and perspective).

Parameters:

Name	Description	Default
`interpolation`	Interpolation settings for missing pose data. Default: InterpolationConfig().	required
`pose`	Pose keypoint configuration (indices, column prefixes). Default: PoseConfig().	required
`include_intra_A`	If True, include intra-animal A pairwise keypoint distances. Default: True.	required
`include_intra_B`	If True, include intra-animal B pairwise keypoint distances. Default: True.	required
`include_inter`	If True, include inter-animal pairwise keypoint distances. Default: True.	required
`duplicate_perspective`	If True, output both A->B and B->A perspectives per pair. Default: True.	required
`n_components`	Number of PCA components to retain. Default: 6.	required
`batch_size`	Batch size for IncrementalPCA partial_fit. Default: 5000.	required

speed_angvel ¶

SpeedAngvel ¶

SpeedAngvel(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature computing translational speed and angular velocity.

Outputs (per frame): - speed: displacement magnitude between consecutive frames divided by dt - angvel: wrapped heading difference (rad) divided by dt - speed_step / angvel_step: same, but using a configurable step_size (omitted if step_size is None) - speed_smooth: Savitzky-Golay smoothed speed (polyorder=1), only present when smooth_window is set in Params

Time-delta (dt) computation: Speed and angular velocity require dividing by a time interval. The source for dt is chosen by priority:

frame + fps (recommended for constant-fps video): when fps is set in Params, dt is computed as frame_diff / fps. This is immune to irregular real timestamps that some trackers embed in the time column (e.g. TRex uses wall-clock timestamps that may jitter by several milliseconds per frame). It also correctly handles frame gaps from dropped/bad frames.
time column: if fps is not set but a time column exists, dt is computed from consecutive time differences.
array index: last resort when neither frame+fps nor time is available — assumes each row is one step apart.

For most video-based tracking data, setting fps is strongly recommended to avoid speed artifacts from timestamp jitter.

Parameters:

Name	Description	Default
`step_size`	If set, also compute speed_step / angvel_step using this frame step (in addition to step=1). Default: None.	required
`smooth_window`	If set, apply Savitzky-Golay smoothing (polyorder=1) over this many frames to produce speed_smooth. Default: None.	required
`fps`	Frames per second. When set, dt is derived from frame_diff/fps instead of the time column — more robust for constant-fps data with jittery timestamps. Default: None.	required

temporal_stacking ¶

Temporal stacking feature.

Builds temporal context windows over per-sequence feature data by stacking Gaussian-smoothed frames at time offsets and optional pooled statistics.

TemporalStackingFeature ¶

TemporalStackingFeature(inputs: Inputs, params: dict[str, object] | None = None)

Build temporal context windows over per-sequence feature data.

Parameters:

Name	Description	Default
`half`	Half-width of the temporal window in frames. The full window spans [-half, +half]. Default: 60.	required
`skip`	Step size between time offsets in the stacking window. Default: 5.	required
`use_temporal_stack`	If True, concatenate Gaussian-smoothed copies at each time offset. Default: True.	required
`sigma_stack`	Gaussian sigma (in frames) for smoothing before stacking. 0 disables smoothing. Default: 30.0.	required
`add_pool`	If True, append pooled statistics (e.g. mean, std) computed over a sliding Gaussian window. Default: True.	required
`pool_stats`	Tuple of pooled statistics to compute. Supported: "mean", "std", "variance". Default: ("mean",).	required
`sigma_pool`	Gaussian sigma (in frames) for the pooling window. Default: 30.0.	required
`fps`	Frames per second; used to convert win_sec to frames. Default: 30.0.	required
`win_sec`	Pooling window width in seconds. Default: 0.5.	required
`pair_filter`	Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None.	required

trajectory_smooth ¶

TrajectorySmooth ¶

TrajectorySmooth(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)

Per-sequence feature that smooths and interpolates trajectory positions.

Pipeline (per individual): 1. Bad-frame detection: flag frames with speed > speed_threshold, expand flagged region by expand_frames in each direction. 2. Interpolation: set positions to NaN at bad frames, linearly interpolate, forward/backward fill edges. Controlled separately for centroid (interpolate_centroid) and pose (interpolate_pose). 3. Savgol smoothing: apply savgol_filter to centroid X/Y and all pose columns (always, regardless of interpolation flags).

Output is the full track DataFrame with smoothed positions replacing originals, plus a bad_frame boolean column. Downstream features can consume this via Inputs((Result(feature="trajectory-smooth"),)).

Parameters:

Name	Description	Default
`speed_threshold`	Speed above which a frame is flagged as bad. When `fps` is set, interpreted as units/sec (e.g. 40 cm/s); otherwise units/frame. Default: None (no bad-frame detection).	required
`fps`	Frames per second. When provided, `speed_threshold` is converted from units/sec to units/frame internally. Default: None.	required
`interpolate_centroid`	If True, replace bad-frame centroid positions with linear interpolation. Default: True.	required
`interpolate_pose`	If True, replace bad-frame pose keypoint positions with linear interpolation. Default: False.	required
`expand_frames`	Number of frames to expand the bad-frame region in each direction. Default: 2.	required
`savgol_window`	Window length for Savitzky-Golay smoothing. Must be odd and >= savgol_polyorder + 1. None disables smoothing. Default: None.	required
`savgol_polyorder`	Polynomial order for Savitzky-Golay filter. Default: 2.	required

types ¶

InterpolationConfig ¶

Bases: StrictModel

Interpolation parameters for missing pose/position data.

Attributes:

Name	Type	Description
`linear_interp_limit`	`int`	Max consecutive NaN frames to fill via linear interpolation. Default 10, must be >= 1.
`edge_fill_limit`	`int`	Max frames to forward/backward fill at sequence edges. Default 3, must be >= 0.
`max_missing_fraction`	`float`	Rows with a higher fraction of NaN columns are dropped entirely. Default 0.10, range [0, 1].

PoolConfig ¶

Bases: StrictModel

Candidate pool configuration for template extraction.

Controls how per-entry contributions to the candidate pool are allocated before the final template selection step.

Attributes:

Name	Type	Description
`size`	`int \| None`	Candidate pool size. For "random" strategy, defaults to n_templates (pool == output). For "farthest_first", should be larger (e.g. n_templates * 3).
`allocation`	`Literal['reservoir', 'exact']`	How per-entry quotas are computed. "reservoir": weighted reservoir sampling, single pass. "exact": two-pass -- first counts rows, second samples with exact proportional quotas. Default "reservoir".
`max_entry_fraction`	`float \| None`	Cap per entry as fraction of pool size. None means no cap (purely proportional). At runtime, effective cap is max(max_entry_fraction, 1 / n_entries) so the pool can always be filled completely. Default None.

SamplingConfig ¶

Bases: StrictModel

Frame rate and temporal smoothing parameters.

Attributes:

Name	Type	Description
`fps_default`	`float`	Fallback frames-per-second when the data does not carry an fps column. Default 30.0, must be > 0.
`smooth_win`	`int`	Moving-average window size applied to pose coordinates before feature computation. 0 disables smoothing. Default 0.

xgboost_feature ¶

XgboostFeature ¶

XgboostFeature(inputs: Inputs, params: dict[str, object] | None = None)

XGBoost behavior classifier as a pipeline feature.

Trains on labeled templates (from ExtractLabeledTemplates) and runs per-sequence inference. Supports multiclass and one-vs-rest strategies.

Parameters:

Name	Description	Default
`model`	Pre-fitted XgboostModelArtifact to load (skip training). Default: XgboostModelArtifact().	required
`strategy`	Classification strategy — "multiclass" trains a single multi-class model; "one_vs_rest" trains one binary classifier per class. Default: "multiclass".	required
`decision_threshold`	Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None.	required
`default_class`	Class label assigned when no class exceeds the decision threshold (required).	required
`class_weight`	If "balanced", adjust sample weights inversely proportional to class frequency. Default: "balanced".	required
`use_smote`	If True, apply SMOTE oversampling to the training set. Default: False.	required
`undersample_ratio`	If set, undersample majority classes to this ratio relative to the minority class before SMOTE. Default: None.	required
`n_estimators`	Number of boosting rounds. Default: 100.	required
`max_depth`	Maximum tree depth. Default: 6.	required
`learning_rate`	Boosting learning rate. Default: 0.1.	required
`subsample`	Fraction of training samples used per tree. Default: 0.8.	required
`colsample_bytree`	Fraction of features used per tree. Default: 0.8.	required
`random_state`	Random seed for reproducibility. Default: 42.	required

XgboostModelArtifact ¶

Bases: JoblibArtifact[XgboostModelBundle]

Fitted XGBoost model bundle (xgboost_model.joblib).

Feature Library¶

Feature categories¶

Registry¶

feature_library ¶

Usage¶

Track-only feature (default inputs)¶

Feature consuming another feature's output¶

List all registered features¶

ApproachAvoidance ¶

extract_events staticmethod ¶

Parameters¶

Returns¶

ArHmmFeature ¶

ArtifactSpec ¶

from_path ¶

from_result classmethod ¶

BodyScaleFeature ¶

ExtractLabeledTemplates ¶

ExtractTemplates ¶

Params ¶

FFGroups ¶

FFGroupsMetrics ¶

Feature ¶

FeralFeature ¶

Params¶

bind_dataset ¶

fit ¶

FeralTrainingConfig ¶

GlobalIdentityModel ¶

Params ¶

apply ¶

GlobalKMeansClustering ¶

Params ¶

GlobalModelParams ¶

GlobalScaler ¶

Params ¶

GlobalTSNE ¶

Params ¶

GlobalWardClustering ¶

Params ¶

GroundTruthLabelsSource ¶

IdTagColumns ¶

Inputs ¶

InputsLike ¶

KpmsFeature ¶

LightningActionFeature ¶

NearestNeighbor ¶

NearestNeighborDelta ¶

NearestNeighborDeltaBins ¶

OrientationRelativeFeature ¶

Params ¶

PairEgocentricFeatures ¶

PairInteractionFilter ¶

Params¶

PairPoseDistancePCA ¶

PairPositionFeatures ¶

PairWavelet ¶

Result ¶

use_latest ¶

ResultColumn ¶

from_result ¶

SpeedAngvel ¶

TemporalStackingFeature ¶

TrajectorySmooth ¶

XgboostFeature ¶

approach_avoidance ¶

ApproachAvoidance ¶

extract_events staticmethod ¶

Parameters¶

Returns¶

arhmm ¶

ArHmmFeature ¶

ArHmmModelArtifact ¶

arhmm_model ¶

ARHMM dataclass ¶

Parameters¶

fit ¶

Parameters¶

Returns¶

predict ¶

extract_events `staticmethod` ¶

from_result `classmethod` ¶

extract_events `staticmethod` ¶

ARHMM `dataclass` ¶