Feature Library¶
Mosaic's feature library provides 30+ registered feature implementations organized by output type. Features are composable pipeline stages that read from tracks or upstream feature outputs and produce per-sequence parquet files.
Feature categories¶
| Category | Features |
|---|---|
| Per-frame kinematic | SpeedAngvel, BodyScale, OrientationRelative |
| Per-frame spatial | PairEgocentric, PairPosition, PairInteractionFilter, ApproachAvoidance |
| Per-frame social | NearestNeighbor, FFGroups, FFGroupsMetrics, NNDeltaResponse, NNDeltaBins |
| Per-frame context | TemporalStacking, PairWavelet |
| Dimensionality reduction | PairPoseDistancePCA, GlobalScaler |
| Embedding & clustering | GlobalTSNE, GlobalKMeansClustering, GlobalWardClustering, WardAssign, ExtractTemplates, ExtractLabeledTemplates |
| Classification | XgboostFeature, FeralFeature, KpmsFeature |
Registry¶
feature_library ¶
Feature library for behavior datasets.
This module provides a collection of features for behavioral analysis. Features are automatically registered on import via the @register_feature decorator.
All features are automatically loaded when the feature_library is imported, making them available in the global FEATURES registry.
Usage¶
from mosaic.behavior.feature_library import Inputs, Result from mosaic.behavior.feature_library.speed_angvel import SpeedAngvel
Track-only feature (default inputs)¶
feat = SpeedAngvel() dataset.run_feature(feat)
Feature consuming another feature's output¶
feat = SpeedAngvel(inputs=Inputs((Result(feature="nn"),))) dataset.run_feature(feat)
List all registered features¶
from mosaic.behavior.feature_library.registry import FEATURES print(list(FEATURES.keys()))
ApproachAvoidance ¶
'approach-avoidance' — per-sequence AA event detection for all pairs.
For N animals per sequence, evaluates all N*(N-1)/2 unique unordered pairs. The output stores directional events as aa_event_12 and aa_event_21 over canonical (id1,id2), plus aa_event/label_id as non-directional union.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interpolation
|
Interpolation settings for missing data. Default: InterpolationConfig(). |
required | |
sampling
|
Frame rate and smoothing settings. Default: SamplingConfig(). |
required | |
velocity_units
|
Whether speed thresholds are in "per_frame" or "per_second". Default: "per_frame". |
required | |
angle_units
|
Unit for heading angles — "radians", "degrees", or "auto" (detect from data range). Default: "radians". |
required | |
consecutive_frame_delta
|
Expected frame step between consecutive rows; used to detect gaps. Default: 1.0. |
required | |
distance_threshold
|
Maximum inter-animal distance (in position units) for a frame to be considered AA-eligible. Default: 200.0. |
required | |
approacher_velocity_threshold
|
Minimum speed of the approaching animal. Default: 5.0. |
required | |
avoider_velocity_threshold
|
Minimum speed of the avoiding animal. Default: 5.0. |
required | |
cos_approacher_threshold
|
Minimum cosine between the approacher's velocity vector and the direction toward the partner. Default: 0.8. |
required | |
cos_avoider_threshold
|
Minimum cosine between the avoider's velocity vector and the direction away from the partner. Default: 0.5. |
required | |
min_event_length
|
Minimum number of contiguous qualifying frames to form an event. Default: 10. |
required | |
min_event_count
|
Minimum number of qualifying frames within an event run to keep it. Default: 5. |
required | |
orientation_gate_cos
|
If set, require the approacher's body orientation to align with its velocity (cos threshold). Default: cos(30°) ≈ 0.866. None disables the gate. |
required | |
smooth_window_sec
|
If set, apply a sliding-window average (in seconds) to velocities before thresholding. Default: None (disabled; framewise behaviour). |
required |
extract_events
staticmethod
¶
Convert per-frame AA output into a compact event table.
Parameters¶
aa_df : DataFrame Per-frame output with columns: frame, id1, id2, aa_event, aa_event_12, aa_event_21. May span multiple sequences/groups (they are handled independently). min_duration : int Minimum event length in frames. Events shorter than this are discarded.
Returns¶
DataFrame with columns: id1, id2, start_frame, end_frame, duration, direction ('12' if id1→id2, '21' if id2→id1, 'both'), approacher_id, avoider_id, sequence (if present), group (if present).
ArHmmFeature ¶
AR-HMM behavioral syllable discovery as a pipeline feature.
Fits an autoregressive Hidden Markov Model across all input sequences and assigns per-frame syllable labels via Viterbi decoding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Pre-fitted ArHmmModelArtifact to load (skip fit). Default: None (fit from scratch). |
required | |
pca_dim
|
Number of PCA components for dimensionality reduction before fitting. None skips PCA. Default: None. |
required | |
n_states
|
Maximum number of HMM states (pruned after fit). Default: 50. |
required | |
n_lags
|
AR order (number of lagged frames as regressors). Default: 1. |
required | |
sticky_weight
|
Extra pseudo-count on the diagonal of the transition matrix (encourages state persistence). Default: 100.0. |
required | |
n_iter
|
Maximum EM iterations per restart. Default: 200. |
required | |
tol
|
Convergence tolerance on relative LL change. Default: 1e-4. |
required | |
n_restarts
|
Number of random restarts (best LL kept). Default: 1. |
required | |
standardize
|
If True, z-score features before fitting. Default: True. |
required | |
downsample_rate
|
Temporal downsampling factor. None disables. Default: None. |
required | |
prune_threshold
|
Drop states with posterior mass below this fraction. Default: 0.01. |
required | |
random_state
|
Random seed. Default: 42. |
required |
ArtifactSpec ¶
Bases: Result[str], Generic[L, R]
Reference to a feature artifact with load specification.
Class Type Parameters:
| Name | Bound or Constraints | Description | Default |
|---|---|---|---|
L
|
Load spec type (NpzLoadSpec, ParquetLoadSpec, JoblibLoadSpec). |
required | |
R
|
Return type of from_path(). Defaults to object. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
load |
L
|
How to load the matched files. |
pattern |
str
|
Glob pattern. Auto-derived from load.kind when empty. |
from_path ¶
Load artifact from a resolved file path.
Dispatches on load-spec type via load_from_spec(). Return type is determined by the R type parameter.
from_result
classmethod
¶
Create from a Result, validating feature match.
Typed artifact subclasses (with a default feature) validate that result.feature matches. Base ArtifactSpec passes through.
BodyScaleFeature ¶
Per-frame body scale: median intra-animal pose distance.
Outputs per sequence parquet with columns: frame, id, scale, sequence, group. Intended to be averaged later (per sequence or dataset) to derive a single normalization constant for downstream orientation features.
ExtractLabeledTemplates ¶
Extract labeled, split-annotated templates from upstream features.
Streams upstream feature data, aligns ground truth labels from NPZ files, assigns train/test splits by sequence, and subsamples per class. Produces a templates parquet with feature columns + label (int) + split (str).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
GroundTruthLabelsSource specifying where to load per-frame ground-truth labels (required). |
required | |
strategy
|
Template selection method — "random" or "farthest_first". Default: "random". |
required | |
n_per_class
|
Number of templates per class. An int applies uniformly; a dict maps class -> count. Exactly one of n_per_class or n_total must be set. Default: None. |
required | |
n_total
|
Total number of templates across all classes (distributed proportionally). Exactly one of n_per_class or n_total must be set. Default: None. |
required | |
pool
|
PoolConfig controlling candidate pool size and allocation. Default: PoolConfig(). |
required | |
test_fraction
|
Fraction of sequences held out for the test split. Default: 0.2. |
required | |
random_state
|
Random seed for reproducibility. Default: 42. |
required |
ExtractTemplates ¶
Subsample per-sequence data into a representative template matrix.
Entry point for the global feature pipeline. Streams per-sequence inputs, builds a candidate pool with proportional per-entry contribution, and selects templates using the configured strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
Template selection method — "random" for uniform random sampling, "farthest_first" for greedy diversity maximization. Default: "random". |
required | |
n_templates
|
Number of templates to select (required). |
required | |
pool
|
PoolConfig controlling candidate pool size, allocation strategy, and per-entry caps. Default: PoolConfig(). |
required | |
random_state
|
Random seed for reproducibility. Default: 42. |
required | |
pair_filter
|
Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None. |
required |
Params ¶
Bases: Params
ExtractTemplates parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
strategy |
Literal['random', 'farthest_first']
|
Selection strategy. Default "random". |
n_templates |
int
|
Number of templates to select. Required. |
pool |
PoolConfig
|
Pool configuration. Default PoolConfig(). |
random_state |
int
|
Random seed. Default 42. |
FFGroups ¶
Per-sequence fission-fusion grouping metrics.
Inputs: raw tracks (columns: x, y, id, frame/time, group, sequence). Outputs per (frame, id): - group_membership (component label) - group_size (size of that component) - event (event id from dp.get_events_info, -1 if not in an event)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
distance_cutoff
|
Pairwise distance threshold below which two animals are considered in the same group. Default: 50.0. |
required | |
window_size
|
Sliding-window size (frames) for smoothing the pairwise distance matrix before thresholding. Default: 5. |
required | |
min_event_duration
|
Minimum number of contiguous frames for a stable subgroup to be registered as an event. Default: 1. |
required |
FFGroupsMetrics ¶
Per-sequence summary of focal-fish group metrics.
Per-frame computed (internal): - distance_from_centroid, xrot_to_centroid, yrot_to_centroid, dev_speed_to_mean Summaries (output: one row per id within sequence): - fractime_norm2 - avg_duration_frame - med_duration_frame - ftime_periphery - ftime_periphery_norm
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_col
|
Column name that identifies group events (e.g. from FFGroups output). Default: "event". |
required | |
speed_col
|
Column name for speed values. Default: "speed". |
required | |
time_chunk_sec
|
If set, split each sequence into time-based chunks of this duration (seconds) and compute summaries per chunk. Default: None (whole sequence). |
required | |
frame_chunk
|
If set, split each sequence into frame-based chunks of this size and compute summaries per chunk. Default: None. |
required | |
centroid_heading_col
|
Column for centroid heading used in rotation calculations. Default: "centroid_heading". |
required | |
exclude_cols
|
List of boolean column names (e.g. "bad_frame") whose truthy rows are dropped before computation. Default: []. |
required |
Feature ¶
Bases: Protocol
Feature protocol -- 4 attributes, 4 methods.
FeralFeature ¶
FERAL vision-transformer behavior classifier as a pipeline feature.
Supports two operating modes:
Training mode (video_dir + label_json + training):
Runs the full FERAL ViT fine-tuning loop, saves checkpoints,
evaluates the test split (if present), then applies to all
sequences in the apply phase.
Inference mode (model_dir):
Loads a pre-trained FERAL model and runs per-frame behavior
classification on crop videos.
Supports two input formats for the apply phase:
-
InteractionCropPipeline output (pair-level): One row per crop video with
video_path,id_a,id_b,target_id,interaction_id,start_frame,end_frame. -
EgocentricCrop output (individual-level): One row per frame with
target_id,frame. Videos are derived asegocentric_id{target_id}.mp4.
Params¶
feral_code_dir : Path
Path to a local clone of https://github.com/Skovorp/feral.
model_name : str
HuggingFace model name (default: V-JEPA2 ViT-L).
predict_per_item : int
Predictions per chunk (default 64).
chunk_length : int
Frames per video chunk (default 64).
chunk_shift : int
Stride between chunks for overlapping inference (default 32).
chunk_step : int
Frame sampling step within chunks (default 1).
resize_to : int
Input resolution for ViT (default 256).
device : str
PyTorch device (default "cuda").
class_names : dict | None
Class index -> name mapping. Auto-detected from model config.
decision_threshold : float | None
Probability threshold for positive class. None uses argmax.
default_class : int
Fallback class when no class exceeds threshold (default 0).
model_dir : Path | None
Directory with model_best.pt + config.json (inference mode).
video_dir : Path | None
Directory containing crop videos (training mode).
label_json : Path | None
Path to FERAL-format label JSON with splits (training mode).
training : FeralTrainingConfig | None
Training hyperparameters. None = inference-only mode.
fit ¶
Train a FERAL model or verify pre-trained model is loaded.
In training mode (video_dir + label_json + training set),
runs the full ViT fine-tuning loop with intermediate checkpoints.
After training, evaluates the test split if present.
In inference mode (model_dir set), the model is already loaded
by load_state() and this method is not called.
The inputs argument is not consumed -- FERAL reads video files
directly from params.video_dir.
FeralTrainingConfig ¶
Bases: StrictModel
Training hyperparameters for FERAL ViT fine-tuning.
These mirror the FERAL default_vjepa.yaml configuration.
GlobalIdentityModel ¶
Train a visual identity model from individual animal sequences.
Takes EgocentricCrop output as input. Each identity is specified as a
mapping of identity names to lists of sequences containing that
individual alone. Trains a V200 CNN classifier (T-Rex-compatible)
and exports weights loadable via visual_identification_model_path.
Example::
ego_result = dataset.run_feature(ego_crop)
identity_model = GlobalIdentityModel(
Inputs((Result(feature="egocentric-crop"),)),
params={
"identities": {
"mouse_A": ["cage1/day1_mouseA_alone", "cage1/day3_mouseA_alone"],
"mouse_B": ["cage1/day1_mouseB_alone"],
"mouse_C": ["cage1/day2_mouseC_alone"],
"mouse_D": ["cage1/day1_mouseD_alone"],
},
"image_size": (128, 128),
"channels": 1,
},
)
result = dataset.run_feature(identity_model)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identities
|
Explicit identity -> sequences mapping. Keys are identity names, values are lists of "group/sequence" strings. |
required | |
group_as_identity
|
Convenience shortcut -- treat each group name as one identity. Default False. |
required | |
image_size
|
Crop resize target (height, width). Default (128, 128). |
required | |
channels
|
Number of image channels (1=grayscale, 3=color). Default 1. |
required | |
epochs
|
Training epochs. Default 150. |
required | |
learning_rate
|
Adam learning rate. Default 0.0001. |
required | |
batch_size
|
Training batch size. Default 64. |
required | |
val_split
|
Fraction of data reserved for validation. Default 0.2. |
required | |
max_images_per_identity
|
Cap on images per identity to balance classes. Default 2000. |
required | |
export_trex_weights
|
Save a T-Rex-loadable .pth file. Default True. |
required | |
trex_weights_name
|
Stem of the exported .pth file. Default "identity_model". |
required |
GlobalKMeansClustering ¶
Global K-Means clustering on templates loaded via load_state. Per-sequence cluster assignment is done in apply().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templates
|
Templates artifact to fit on (inherited from GlobalModelParams). |
required | |
model
|
Pre-fitted KMeansModelArtifact to load (skip fit). Default: KMeansModelArtifact(). |
required | |
k
|
Number of clusters. Default: 100. |
required | |
random_state
|
Random seed for KMeans initialization. Default: 42. |
required | |
n_init
|
Number of KMeans initializations to run. Default: "auto". |
required | |
max_iter
|
Maximum iterations per KMeans run. Default: 300. |
required | |
device
|
Compute device — "cpu" or "cuda" (requires cuML). Default: "cpu". |
required | |
label_artifact_points
|
If True, assign cluster labels to the template points used for fitting. Default: True. |
required | |
pair_filter
|
Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None. |
required |
Params ¶
Bases: GlobalModelParams[KMeansModelArtifact]
Global K-means clustering parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
templates |
ParquetArtifact | None
|
Templates artifact to fit on (inherited). |
model |
KMeansModelArtifact | None
|
Pre-fitted KMeans model artifact (skip fit). |
k |
int
|
Number of clusters. Default 100. |
random_state |
int
|
Random seed. Default 42. |
n_init |
Literal['auto'] | int
|
KMeans initializations. Default "auto". |
max_iter |
int
|
Max iterations per run. Default 300. |
device |
str
|
Compute device. Default "cpu". |
label_artifact_points |
bool
|
Label points used for fitting. Default True. |
pair_filter |
NNResult | None
|
Nearest-neighbor pair filter for dependency resolution. Default None. |
GlobalModelParams ¶
Bases: Params, Generic[M]
Base params for global features that fit on a templates artifact or load a pre-fitted model.
Type parameter M is the model artifact type (must extend JoblibArtifact).
Exactly one of templates or model must be provided.
Both fields use default_factory so that from_overrides() merges partial dicts correctly. The _exclusive_source validator checks model_fields_set and nulls out the field that was not provided.
Attributes:
| Name | Type | Description |
|---|---|---|
templates |
ParquetArtifact | None
|
Templates artifact to fit from. Mutually exclusive with model. |
model |
M | None
|
Pre-fitted model artifact. Mutually exclusive with templates. |
GlobalScaler ¶
Fit a StandardScaler on templates and scale per-sequence data.
Consumes a templates artifact (from ExtractTemplates or any feature producing templates.parquet). Produces a scaler model bundle and scaled templates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templates
|
Templates artifact to fit the scaler on (inherited from GlobalModelParams). |
required | |
model
|
Pre-fitted ScalerModelArtifact to load (skip fit). Default: ScalerModelArtifact(). |
required |
Params ¶
Bases: GlobalModelParams[ScalerModelArtifact]
GlobalScaler parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
templates |
ParquetArtifact | None
|
Templates artifact to fit scaler on. |
model |
ScalerModelArtifact | None
|
Pre-fitted scaler model artifact (skip fit). |
GlobalTSNE ¶
Fit an openTSNE embedding on templates and map per-sequence data.
Consumes a templates artifact (from ExtractTemplates, GlobalScaler, or any feature producing templates). Produces an embedding model bundle and template coordinates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templates
|
Templates artifact to fit embedding on (inherited from GlobalModelParams). |
required | |
model
|
Pre-fitted TSNEModelArtifact to load (skip fit). Default: TSNEModelArtifact(). |
required | |
random_state
|
Random seed. Default: 42. |
required | |
perplexity
|
t-SNE perplexity parameter. Default: 50. |
required | |
knn_method
|
kNN backend — "annoy", "faiss", or "faiss-gpu". Default: "annoy". |
required | |
n_jobs
|
Number of parallel jobs for openTSNE. Default: 8. |
required | |
fit
|
TSNEFitConfig controlling learning rate, exaggeration iterations, momentum, etc. Default: TSNEFitConfig(). |
required | |
mapping
|
TSNEMapConfig controlling partial-embedding parameters (k, iterations, chunk_size, etc.). Default: TSNEMapConfig(). |
required |
Params ¶
Bases: GlobalModelParams[TSNEModelArtifact]
Global t-SNE parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
templates |
ParquetArtifact | None
|
Templates artifact to fit embedding on. |
model |
TSNEModelArtifact | None
|
Pre-fitted embedding model artifact (skip fit). |
random_state |
int
|
Random seed. Default 42. |
perplexity |
int
|
t-SNE perplexity. Default 50. |
knn_method |
str
|
kNN method ("annoy", "faiss", "faiss-gpu"). Default "annoy". |
n_jobs |
int
|
Parallel jobs for openTSNE. Default 8. |
fit |
TSNEFitConfig
|
Embedding fitting parameters. |
mapping |
TSNEMapConfig
|
Partial embedding mapping parameters. |
GlobalWardClustering ¶
Ward hierarchical clustering on templates with per-sequence 1-NN assignment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templates
|
Templates artifact to cluster (inherited from GlobalModelParams). |
required | |
model
|
Pre-fitted WardModelArtifact to load (skip fit). Default: WardModelArtifact(). |
required | |
n_clusters
|
Number of clusters to cut from the linkage tree. Default: 20. |
required | |
method
|
Linkage method passed to scipy.cluster.hierarchy.linkage. Default: "ward". |
required | |
pair_filter
|
Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None. |
required |
Params ¶
Bases: GlobalModelParams[WardModelArtifact]
Global Ward clustering parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
templates |
ParquetArtifact | None
|
Templates artifact to cluster (inherited). |
model |
WardModelArtifact | None
|
Pre-fitted Ward model artifact (skip fit). |
n_clusters |
int
|
Number of clusters to cut. Default 20. |
method |
str
|
Linkage method. Default "ward". |
pair_filter |
NNResult | None
|
Nearest-neighbor pair filter. Default None. |
GroundTruthLabelsSource ¶
IdTagColumns ¶
Attach per-id label fields (from labels/
Outputs per row (same granularity as input tracks/feature): frame/time/id/group/sequence + one column per requested label field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
LabelsSource specifying which labels directory to load. Default: LabelsSource(kind="id_tags"). |
required | |
label_kind
|
Label subdirectory name used for dependency resolution. Default: "id_tags". |
required | |
fields
|
List of label field names to attach. None means all fields found in the labels file. Default: None. |
required | |
field_renames
|
Optional mapping of original field names to renamed column names in the output. Default: None. |
required |
Inputs ¶
Bases: RootModel[tuple[InputItem, ...]], Generic[InputItem]
Base class for feature input collections. Mirrors Params.
Each Feature subclasses to narrow allowed input types, paralleling class Params(Params):.
Examples:
Inputs(("tracks",)) Inputs((Result(feature="speed-angvel"),)) Inputs(("tracks", Result(feature="nn", run_id="0.1-abc")))
Per-feature narrowing
class Inputs(Inputs[TrackInput]): pass
Features that take no pipeline inputs
class Inputs(Inputs[Result]): _require: ClassVar[InputRequire] = "empty"
Self-loading features that optionally accept inputs (e.g. fit + assign): class Inputs(Inputs[Result]): _require: ClassVar[InputRequire] = "any"
InputsLike ¶
Bases: Protocol
Read-only interface satisfied by any Inputs[InputItem].
KpmsFeature ¶
Unified keypoint-MoSeq feature: fit + apply via persistent subprocess.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Pre-fitted KpmsModelArtifact to load (skip fit). Default: None (fit from scratch). |
required | |
kpms_python
|
Path to a Python interpreter with keypoint-moseq installed. None uses the bundled external .venv. Default: None. |
required | |
pose
|
Pose keypoint configuration (indices, column prefixes). Default: PoseConfig(). |
required | |
anterior_bodyparts
|
List of bodypart names forming the anterior reference (required, min 1 element). |
required | |
posterior_bodyparts
|
List of bodypart names forming the posterior reference (required, min 1 element). |
required | |
fps
|
Frames per second of the input data. Default: 30. |
required | |
num_iters_ar
|
Number of AR-only fitting iterations. Default: 50. |
required | |
num_iters_full
|
Number of full model fitting iterations. Default: 500. |
required | |
kappa_ar
|
AR transition concentration parameter. None lets keypoint-moseq choose. Default: None. |
required | |
kappa_full
|
Full-model transition concentration parameter. None lets keypoint-moseq choose. Default: None. |
required | |
latent_dim
|
Dimensionality of the latent pose space. Must satisfy latent_dim < 2 * num_keypoints. Default: 10. |
required | |
location_aware
|
If True, include centroid location in the model. Default: False. |
required | |
outlier_scale_factor
|
Scale factor for outlier detection. Default: 6.0. |
required | |
remove_outliers
|
If True, remove detected outlier frames before fitting. Default: True. |
required | |
mixed_map_iters
|
Number of mixed MAP iterations. None uses the keypoint-moseq default. Default: None. |
required | |
parallel_message_passing
|
Enable parallel message passing. None uses the keypoint-moseq default. Default: None. |
required | |
resume
|
If True, resume fitting from a previously saved checkpoint. Default: True. |
required | |
downsample_rate
|
Temporal downsampling factor applied before fitting. None disables downsampling. Default: None. |
required | |
save_every_n_iters
|
Save a checkpoint every N iterations during fit. Default: 25. |
required | |
num_iters_apply
|
Number of iterations when applying the model to new data. Default: 500. |
required |
LightningActionFeature ¶
Supervised temporal action segmentation via lightning-action.
Trains a temporal neural network classifier (DilatedTCN, RNN, or TemporalMLP head + linear classifier) on labeled templates and predicts per-frame action probabilities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Pre-fitted LightningActionModelArtifact to load (skip training). Default: LightningActionModelArtifact(). |
required | |
head
|
Temporal encoder architecture — "dtcn" (dilated temporal convolution), "rnn" (LSTM/GRU), or "temporalmlp". Default: "dtcn". |
required | |
num_hid_units
|
Hidden units in the temporal encoder. Default: 64. |
required | |
num_layers
|
Number of encoder layers. Default: 2. |
required | |
num_lags
|
Lag/kernel size for temporal context. Default: 4. |
required | |
activation
|
Activation function. Default: "lrelu". |
required | |
dropout_rate
|
Dropout rate. Default: 0.1. |
required | |
sequence_length
|
Training sequence length (frames per chunk). Default: 500. |
required | |
num_epochs
|
Number of training epochs. Default: 200. |
required | |
batch_size
|
Training batch size. Default: 32. |
required | |
learning_rate
|
Optimizer learning rate. Default: 1e-3. |
required | |
weight_decay
|
Optimizer weight decay. Default: 0.0. |
required | |
optimizer
|
Optimizer type. Default: "Adam". |
required | |
weight_classes
|
If True, weight loss by inverse class frequency. Default: True. |
required | |
device
|
Compute device — "cpu" or "gpu". Default: "cpu". |
required | |
random_state
|
Random seed. Default: 42. |
required | |
decision_threshold
|
Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None. |
required | |
default_class
|
Class label assigned when no class exceeds the decision threshold (required). |
required |
NearestNeighbor ¶
Per-sequence feature computing nearest-neighbor identity and relative kinematics.
Outputs per frame (one row per individual): - nn_id: id of nearest neighbor (NaN if none) - nn_delta_x / nn_delta_y: neighbor position minus focal, world frame - nn_dist: Euclidean distance to nearest neighbor - nn_delta_angle: neighbor heading minus focal, wrapped to [-pi, pi] - nn_delta_x_ego / nn_delta_y_ego: neighbor offset in focal ego frame
NearestNeighborDelta ¶
Per-sequence feature that measures how a focal fish changes position/heading/speed over
the next diff_numframes frames relative to its nearest neighbor at the current frame.
Expected inputs (via tracks or an Inputs() that merges tracks + nearest-neighbor feature):
- position/heading/speed columns for the focal (x, y, ANGLE, speed_col)
- nearest-neighbor id column (nn_id_col, default: 'nn_id')
- neighbor offsets in ego frame (nn_delta_x_ego / nn_delta_y_ego); if missing, world
offsets (nn_delta_x / nn_delta_y) are rotated using the focal heading.
Outputs per focal row (filtered to frames with a valid future sample diff_numframes ahead):
frame, id, group, sequence, nn_id, neighbor_x/y (ego), neighbor_focal (if available),
dx, dy, dt, dangle (wrapped; optionally scaled by fps), dspeed, plus passthrough columns
like group_size/event/Focal_fish when present.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sampling
|
Frame rate and smoothing settings. Default: SamplingConfig(). |
required | |
speed_col
|
Column name for speed. Default: "SPEED#wcentroid". |
required | |
nn_id_col
|
Column name for the nearest-neighbor ID. Default: "nn_id". |
required | |
nn_dx_ego_col
|
Column for neighbor delta-x in ego frame. Default: "nn_delta_x_ego". |
required | |
nn_dy_ego_col
|
Column for neighbor delta-y in ego frame. Default: "nn_delta_y_ego". |
required | |
nn_dx_world_col
|
Fallback column for neighbor delta-x in world frame (used when ego columns are absent). Default: "nn_delta_x". |
required | |
nn_dy_world_col
|
Fallback column for neighbor delta-y in world frame. Default: "nn_delta_y". |
required | |
focal_col
|
Column name for the focal-animal flag. Default: "Focal_fish". |
required | |
diff_numframes
|
Number of frames ahead to compute the future response delta. Default: 4. |
required | |
wrap_angle
|
If True, wrap heading differences to [-pi, pi]. Default: True. |
required | |
divide_dangle_by_frames
|
If True, divide the heading change by diff_numframes. Default: True. |
required | |
scale_dangle_by_fps
|
If True, multiply dangle by fps to convert to radians/sec. Default: True. |
required | |
tag_cols
|
Additional columns to pass through to the output. Default: []. |
required |
NearestNeighborDeltaBins ¶
NearestNeighborDeltaBins(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)
Bin nearest-neighbor response fields (dangle, dspeed) over neighbor position.
Inputs: expect outputs from nn-delta-response (neighbor_x/neighbor_y in ego frame, dangle, dspeed, group_size, and focal/neighbor category columns).
tidy DataFrame with mean turn/speed per bin for focal role and neighbor role:
columns: [group, sequence, exp, trial, role, category, group_size, metric, bin_idx, value]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nbins
|
Number of spatial bins along the binning axis. Default: 45. |
required | |
binmax
|
Maximum absolute value for bin edges. Default: 14.0. |
required | |
max_for_avg
|
Maximum neighbor distance used when computing binned-mean responses. Default: 5.0. |
required | |
antisymm
|
If True, use front/back antisymmetric folding for turn-force computation. Default: True. |
required | |
focal_category_col
|
Column name for the focal animal's category flag. Default: "Focal_fish". |
required | |
neighbor_category_col
|
Column name for the neighbor's category flag. Default: "neighbor_focal". |
required | |
group_size_col
|
Column name for group size. Default: "group_size". |
required | |
exp_col
|
Column name for experimental condition. Default: "Exp". |
required | |
trial_col
|
Column name for trial identifier. Default: "Trial". |
required | |
category_specs
|
List of dicts defining derived category columns (keys: source_col, new_col, quantile, op). Default: []. |
required | |
exclude_cols
|
List of boolean column names whose truthy rows are dropped before computation. Default: []. |
required | |
nonfocal_flag_col
|
Column used to flag nonfocal animals. Default: "Focal_fish". |
required | |
nonfocal_flag_value
|
Value in nonfocal_flag_col that marks an animal as nonfocal. Default: False. |
required |
OrientationRelativeFeature ¶
OrientationRelativeFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)
Orientation-aware relative features between animal pairs, order-agnostic to pose points.
For each frame and ordered pair (id_a -> id_b): - Express B in A's body frame (using heading angle and global scale). - Emit signed centroid deltas, heading difference, quantiles over B's points in A's frame, and nearest-k distances.
Params ¶
Bases: Params
Orientation-relative feature parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
scale |
BodyScaleResult
|
Body-scale artifact for normalization. |
nearest_k |
int
|
Number of nearest pose-point distances to emit. Default 3. |
quantiles |
list[float]
|
Distance distribution quantiles to compute. Default [0.25, 0.5, 0.75]. |
PairEgocentricFeatures ¶
PairEgocentricFeatures(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)
'pair-egocentric' -- per-sequence egocentric + kinematic features for dyads. Produces a row-wise DataFrame with columns: - frame (if available) or time passthrough (only if it's the order col) - perspective: 0 for A->B, 1 for B->A - id1, id2: pair identifiers - feature columns (e.g., A_speed, AB_dx_egoA, ...) - (optionally) group/sequence if present in df, for convenience
This feature is stateless (no fitting). It computes features for all C(n,2) pairs per sequence, cleans/interpolates pose per animal, inner-joins by the chosen order column, and computes A->B and B->A features for each pair.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interpolation
|
Interpolation settings for missing pose data. Default: InterpolationConfig(). |
required | |
sampling
|
Frame rate and smoothing settings. Default: SamplingConfig(). |
required | |
pose
|
Pose keypoint configuration (indices, column prefixes). Default: PoseConfig(). |
required | |
neck_idx
|
Index of the neck keypoint in the pose array, used to compute heading direction. Default: 3. |
required | |
tail_base_idx
|
Index of the tail-base keypoint, paired with neck_idx for heading vector. Default: 6. |
required | |
center_mode
|
How to compute the animal's center — "mean" averages all keypoints, other values use a specific keypoint. Default: "mean". |
required |
PairInteractionFilter ¶
PairInteractionFilter(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)
Detect pairwise interaction segments from trajectory data.
For every unique pair of individuals in a sequence, tests per-frame distance and (optionally) angular criteria, applies morphological filtering, and extracts continuous interaction segments that meet a minimum duration.
Output columns (one row per frame per interaction segment): - frame: frame number - id_a, id_b: individual IDs (id_a < id_b by convention) - interaction_id: integer label for the segment within this pair - interaction_start: first frame of this segment - interaction_end: last frame (exclusive) of this segment
Params¶
shift_dist : float
Pixel shift along heading before distance check (default 15).
Set to 0 to use raw positions without forward shift.
max_dist : float
Maximum shifted-position distance in pixels (default 40).
require_facing : bool
If True (default), require individuals to face each other
(inverse orientation difference < max_inv_orientation_diff_deg).
Set to False for distance-only filtering.
max_inv_orientation_diff_deg : float
Max angle (degrees) between inverse orientations (default 80).
Only used when require_facing=True.
min_run_frames : int
Minimum continuous frames for a valid interaction (default 250).
frame_padding : int
Frames to pad before/after each segment (default 10).
morphological_structure_size : int
Structure element length for binary close/open (default 25).
Set to 0 to disable morphological filtering.
px_scale : float
Scale factor applied to shift_dist and max_dist (default 1.0).
Use to adjust for videos with different pixel resolutions.
use_pixel_coords : bool
If True, use poseX/poseY columns (pixel coordinates) for
distance calculations instead of X/Y (world coordinates).
Default True since thresholds are in pixel units.
pose_head_index : int | None
If set and use_pixel_coords is True, use this pose index
as the position for distance calculations.
PairPoseDistancePCA ¶
'pair-posedistance-pca' — builds per-frame pairwise pose-distance features and fits an IncrementalPCA globally; outputs PC scores per sequence (and perspective).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interpolation
|
Interpolation settings for missing pose data. Default: InterpolationConfig(). |
required | |
pose
|
Pose keypoint configuration (indices, column prefixes). Default: PoseConfig(). |
required | |
include_intra_A
|
If True, include intra-animal A pairwise keypoint distances. Default: True. |
required | |
include_intra_B
|
If True, include intra-animal B pairwise keypoint distances. Default: True. |
required | |
include_inter
|
If True, include inter-animal pairwise keypoint distances. Default: True. |
required | |
duplicate_perspective
|
If True, output both A->B and B->A perspectives per pair. Default: True. |
required | |
n_components
|
Number of PCA components to retain. Default: 6. |
required | |
batch_size
|
Batch size for IncrementalPCA partial_fit. Default: 5000. |
required |
PairPositionFeatures ¶
'pair-position' -- per-sequence egocentric + kinematic features for all pairs.
Unlike PairEgocentricFeatures which requires full pose keypoints, this feature works with minimal input: just (x, y, angle) per animal.
For N animals per sequence, computes features for all N*(N-1)/2 unique pairs, each with two perspectives (A->B and B->A).
Output columns (per row): - frame: frame number - perspective: 0 for A->B, 1 for B->A - id1, id2: IDs of the two animals in this pair - A_speed, A_v_para, A_v_perp, A_ang_speed: focal kinematics - A_heading_cos, A_heading_sin: focal heading - AB_dist: inter-animal distance - AB_dx_egoA, AB_dy_egoA: partner position in focal's egocentric frame - rel_heading_cos, rel_heading_sin: relative heading - B_speed, B_v_para, B_v_perp, B_ang_speed: partner kinematics - (optionally) group, sequence for convenience
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interpolation
|
Interpolation settings for missing position data. Default: InterpolationConfig(). |
required | |
sampling
|
Frame rate and smoothing settings. Default: SamplingConfig(). |
required |
PairWavelet ¶
CWT spectrograms on PairPoseDistancePCA outputs.
Expects input df to contain columns
- 'perspective' (0 = A->B, 1 = B->A)
- 'frame' (preferred) or 'time' (if used as order column)
- PC0..PC{k-1} (k = number of PCA components)
Returns a DataFrame with columns
- frame (or time if that was the order col)
- perspective
- W_{col}_f{fi} (log-power, clamped, for each component x frequency) and (optionally) passthrough group/sequence if present in df.
Stateless (no fitting). FPS is inferred from constant df['fps'] if present, otherwise from fps_default. Frequencies are dyadically spaced in [f_min, f_max].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sampling
|
Frame rate and smoothing settings. Default: SamplingConfig(). |
required | |
f_min
|
Minimum frequency in Hz for the CWT band. Default: 0.2. |
required | |
f_max
|
Maximum frequency in Hz for the CWT band. Default: 5.0. |
required | |
n_freq
|
Number of frequency bins (dyadically spaced between f_min and f_max). Default: 25. |
required | |
wavelet
|
PyWavelets wavelet name. Default: "cmor1.5-1.0". |
required | |
log_floor
|
Floor value for log-power clamping. Default: -3.0. |
required | |
pc_prefix
|
Column prefix used to auto-detect PC input columns (e.g. "PC0", "PC1", ...). Default: "PC". |
required | |
cols
|
Explicit list of input column names. If None, columns are auto-detected using pc_prefix. Default: None. |
required |
Result ¶
Bases: StrictModel, Generic[F]
Reference to a prior feature's output as pipeline input.
Attributes:
| Name | Type | Description |
|---|---|---|
feature |
F
|
Feature name whose output to consume. |
run_id |
str | None
|
Specific run ID, or None for latest finished run. |
ResultColumn ¶
Bases: Result[str]
Reference to a column in a feature's standard parquet output.
Attributes:
| Name | Type | Description |
|---|---|---|
feature |
str
|
Source feature name. |
column |
str
|
Column name to extract from the parquet output. |
run_id |
str | None
|
Specific run ID, or None for latest. |
from_result ¶
Return a copy with feature and run_id set from another Result.
SpeedAngvel ¶
Per-sequence feature computing translational speed and angular velocity.
Outputs (per frame): - speed: displacement magnitude between consecutive frames divided by dt - angvel: wrapped heading difference (rad) divided by dt - speed_step / angvel_step: same, but using a configurable step_size (omitted if step_size is None) - speed_smooth: Savitzky-Golay smoothed speed (polyorder=1), only present when smooth_window is set in Params
Time-delta (dt) computation: Speed and angular velocity require dividing by a time interval. The source for dt is chosen by priority:
- frame + fps (recommended for constant-fps video): when
fpsis set in Params, dt is computed asframe_diff / fps. This is immune to irregular real timestamps that some trackers embed in thetimecolumn (e.g. TRex uses wall-clock timestamps that may jitter by several milliseconds per frame). It also correctly handles frame gaps from dropped/bad frames. - time column: if
fpsis not set but atimecolumn exists, dt is computed from consecutive time differences. - array index: last resort when neither frame+fps nor time is available — assumes each row is one step apart.
For most video-based tracking data, setting fps is strongly
recommended to avoid speed artifacts from timestamp jitter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
step_size
|
If set, also compute speed_step / angvel_step using this frame step (in addition to step=1). Default: None. |
required | |
smooth_window
|
If set, apply Savitzky-Golay smoothing (polyorder=1) over this many frames to produce speed_smooth. Default: None. |
required | |
fps
|
Frames per second. When set, dt is derived from frame_diff/fps instead of the time column — more robust for constant-fps data with jittery timestamps. Default: None. |
required |
TemporalStackingFeature ¶
Build temporal context windows over per-sequence feature data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
half
|
Half-width of the temporal window in frames. The full window spans [-half, +half]. Default: 60. |
required | |
skip
|
Step size between time offsets in the stacking window. Default: 5. |
required | |
use_temporal_stack
|
If True, concatenate Gaussian-smoothed copies at each time offset. Default: True. |
required | |
sigma_stack
|
Gaussian sigma (in frames) for smoothing before stacking. 0 disables smoothing. Default: 30.0. |
required | |
add_pool
|
If True, append pooled statistics (e.g. mean, std) computed over a sliding Gaussian window. Default: True. |
required | |
pool_stats
|
Tuple of pooled statistics to compute. Supported: "mean", "std", "variance". Default: ("mean",). |
required | |
sigma_pool
|
Gaussian sigma (in frames) for the pooling window. Default: 30.0. |
required | |
fps
|
Frames per second; used to convert win_sec to frames. Default: 30.0. |
required | |
win_sec
|
Pooling window width in seconds. Default: 0.5. |
required | |
pair_filter
|
Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None. |
required |
TrajectorySmooth ¶
Per-sequence feature that smooths and interpolates trajectory positions.
Pipeline (per individual): 1. Bad-frame detection: flag frames with speed > speed_threshold, expand flagged region by expand_frames in each direction. 2. Interpolation: set positions to NaN at bad frames, linearly interpolate, forward/backward fill edges. Controlled separately for centroid (interpolate_centroid) and pose (interpolate_pose). 3. Savgol smoothing: apply savgol_filter to centroid X/Y and all pose columns (always, regardless of interpolation flags).
Output is the full track DataFrame with smoothed positions replacing
originals, plus a bad_frame boolean column. Downstream features
can consume this via Inputs((Result(feature="trajectory-smooth"),)).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
speed_threshold
|
Speed above which a frame is flagged as bad.
When |
required | |
fps
|
Frames per second. When provided, |
required | |
interpolate_centroid
|
If True, replace bad-frame centroid positions with linear interpolation. Default: True. |
required | |
interpolate_pose
|
If True, replace bad-frame pose keypoint positions with linear interpolation. Default: False. |
required | |
expand_frames
|
Number of frames to expand the bad-frame region in each direction. Default: 2. |
required | |
savgol_window
|
Window length for Savitzky-Golay smoothing. Must be odd and >= savgol_polyorder + 1. None disables smoothing. Default: None. |
required | |
savgol_polyorder
|
Polynomial order for Savitzky-Golay filter. Default: 2. |
required |
XgboostFeature ¶
XGBoost behavior classifier as a pipeline feature.
Trains on labeled templates (from ExtractLabeledTemplates) and runs per-sequence inference. Supports multiclass and one-vs-rest strategies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Pre-fitted XgboostModelArtifact to load (skip training). Default: XgboostModelArtifact(). |
required | |
strategy
|
Classification strategy — "multiclass" trains a single multi-class model; "one_vs_rest" trains one binary classifier per class. Default: "multiclass". |
required | |
decision_threshold
|
Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None. |
required | |
default_class
|
Class label assigned when no class exceeds the decision threshold (required). |
required | |
class_weight
|
If "balanced", adjust sample weights inversely proportional to class frequency. Default: "balanced". |
required | |
use_smote
|
If True, apply SMOTE oversampling to the training set. Default: False. |
required | |
undersample_ratio
|
If set, undersample majority classes to this ratio relative to the minority class before SMOTE. Default: None. |
required | |
n_estimators
|
Number of boosting rounds. Default: 100. |
required | |
max_depth
|
Maximum tree depth. Default: 6. |
required | |
learning_rate
|
Boosting learning rate. Default: 0.1. |
required | |
subsample
|
Fraction of training samples used per tree. Default: 0.8. |
required | |
colsample_bytree
|
Fraction of features used per tree. Default: 0.8. |
required | |
random_state
|
Random seed for reproducibility. Default: 42. |
required |
approach_avoidance ¶
ApproachAvoidance feature.
Detects approach-avoidance (AA) events for all C(n,2) unordered pairs per sequence.
Default decision logic follows trajognize AA
- role-specific speed thresholds (approacher vs avoider)
- distance threshold
- cosine thresholds between velocity and pair direction
- approacher forward-motion gate vs body orientation
- minimum event continuity (min_event_count of min_event_length frames)
Optional sliding-window averaging can be enabled, but it is OFF by default to preserve trajognize-style framewise behavior.
Output columns (per frame × pair): - frame, id1, id2 (canonical order: id1 < id2) - label_id: primary non-directional AA label for visualization compatibility - aa_event: 1 if either direction is active - aa_event_12: 1 if id1 approaches and id2 avoids - aa_event_21: 1 if id2 approaches and id1 avoids - sequence, group (metadata pass-through)
ApproachAvoidance ¶
'approach-avoidance' — per-sequence AA event detection for all pairs.
For N animals per sequence, evaluates all N*(N-1)/2 unique unordered pairs. The output stores directional events as aa_event_12 and aa_event_21 over canonical (id1,id2), plus aa_event/label_id as non-directional union.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interpolation
|
Interpolation settings for missing data. Default: InterpolationConfig(). |
required | |
sampling
|
Frame rate and smoothing settings. Default: SamplingConfig(). |
required | |
velocity_units
|
Whether speed thresholds are in "per_frame" or "per_second". Default: "per_frame". |
required | |
angle_units
|
Unit for heading angles — "radians", "degrees", or "auto" (detect from data range). Default: "radians". |
required | |
consecutive_frame_delta
|
Expected frame step between consecutive rows; used to detect gaps. Default: 1.0. |
required | |
distance_threshold
|
Maximum inter-animal distance (in position units) for a frame to be considered AA-eligible. Default: 200.0. |
required | |
approacher_velocity_threshold
|
Minimum speed of the approaching animal. Default: 5.0. |
required | |
avoider_velocity_threshold
|
Minimum speed of the avoiding animal. Default: 5.0. |
required | |
cos_approacher_threshold
|
Minimum cosine between the approacher's velocity vector and the direction toward the partner. Default: 0.8. |
required | |
cos_avoider_threshold
|
Minimum cosine between the avoider's velocity vector and the direction away from the partner. Default: 0.5. |
required | |
min_event_length
|
Minimum number of contiguous qualifying frames to form an event. Default: 10. |
required | |
min_event_count
|
Minimum number of qualifying frames within an event run to keep it. Default: 5. |
required | |
orientation_gate_cos
|
If set, require the approacher's body orientation to align with its velocity (cos threshold). Default: cos(30°) ≈ 0.866. None disables the gate. |
required | |
smooth_window_sec
|
If set, apply a sliding-window average (in seconds) to velocities before thresholding. Default: None (disabled; framewise behaviour). |
required |
extract_events
staticmethod
¶
Convert per-frame AA output into a compact event table.
Parameters¶
aa_df : DataFrame Per-frame output with columns: frame, id1, id2, aa_event, aa_event_12, aa_event_21. May span multiple sequences/groups (they are handled independently). min_duration : int Minimum event length in frames. Events shorter than this are discarded.
Returns¶
DataFrame with columns: id1, id2, start_frame, end_frame, duration, direction ('12' if id1→id2, '21' if id2→id1, 'both'), approacher_id, avoider_id, sequence (if present), group (if present).
arhmm ¶
AR-HMM global feature.
Fits an autoregressive Hidden Markov Model on arbitrary upstream feature inputs and produces per-frame syllable (state) labels. This is a native mosaic implementation — no KPMS or JAX dependency.
The feature accepts any combination of upstream Result inputs. Mosaic's
manifest system merges them via inner join on alignment columns, so the
feature receives a single merged DataFrame whose numeric columns are the
union of all input features.
ArHmmFeature ¶
AR-HMM behavioral syllable discovery as a pipeline feature.
Fits an autoregressive Hidden Markov Model across all input sequences and assigns per-frame syllable labels via Viterbi decoding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Pre-fitted ArHmmModelArtifact to load (skip fit). Default: None (fit from scratch). |
required | |
pca_dim
|
Number of PCA components for dimensionality reduction before fitting. None skips PCA. Default: None. |
required | |
n_states
|
Maximum number of HMM states (pruned after fit). Default: 50. |
required | |
n_lags
|
AR order (number of lagged frames as regressors). Default: 1. |
required | |
sticky_weight
|
Extra pseudo-count on the diagonal of the transition matrix (encourages state persistence). Default: 100.0. |
required | |
n_iter
|
Maximum EM iterations per restart. Default: 200. |
required | |
tol
|
Convergence tolerance on relative LL change. Default: 1e-4. |
required | |
n_restarts
|
Number of random restarts (best LL kept). Default: 1. |
required | |
standardize
|
If True, z-score features before fitting. Default: True. |
required | |
downsample_rate
|
Temporal downsampling factor. None disables. Default: None. |
required | |
prune_threshold
|
Drop states with posterior mass below this fraction. Default: 0.01. |
required | |
random_state
|
Random seed. Default: 42. |
required |
ArHmmModelArtifact ¶
Bases: JoblibArtifact[ArHmmModelBundle]
Fitted AR-HMM model bundle (arhmm_model.joblib).
arhmm_model ¶
Autoregressive Hidden Markov Model (AR-HMM) with EM fitting.
A standalone implementation using numpy/scipy — no external HMM library required. Fits switching autoregressive dynamics with sticky transitions via Expectation–Maximisation and decodes the most-likely state sequence with the Viterbi algorithm.
This module has no mosaic imports and can be tested independently.
ARHMM
dataclass
¶
ARHMM(n_states: int = 50, n_lags: int = 1, sticky_weight: float = 100.0, n_iter: int = 200, tol: float = 0.0001, n_restarts: int = 1, random_state: int | None = None, A_: ndarray | None = None, Q_: ndarray | None = None, Q_cho_: list | None = None, Q_logdet_: ndarray | None = None, log_transmat_: ndarray | None = None, log_startprob_: ndarray | None = None, n_features_: int | None = None, active_states_: ndarray | None = None)
Autoregressive Hidden Markov Model.
Each of the K discrete states owns an AR(n_lags) linear model:
x_t = A_k @ [x_{t-1}; ...; x_{t-nlags}; 1] + ε, ε ~ N(0, Q_k)
Transitions between states are governed by a K × K matrix with a
sticky prior that encourages self-transitions (controlled by
sticky_weight).
Parameters¶
n_states : int Maximum number of hidden states. n_lags : int AR order (number of lagged frames used as regressors). sticky_weight : float Extra pseudo-count added to the diagonal of the transition matrix during M-step updates. Larger values → states persist longer. n_iter : int Maximum EM iterations per restart. tol : float Convergence threshold on relative change in log-likelihood. n_restarts : int Number of random restarts; the best (highest LL) is kept. random_state : int | None Seed for reproducibility.
body_scale ¶
BodyScaleFeature feature.
Extracted from features.py as part of feature_library modularization.
BodyScaleFeature ¶
Per-frame body scale: median intra-animal pose distance.
Outputs per sequence parquet with columns: frame, id, scale, sequence, group. Intended to be averaged later (per sequence or dataset) to derive a single normalization constant for downstream orientation features.
external ¶
External tool runners for mosaic.
Scripts in this directory bridge mosaic with external packages that have incompatible dependencies or restrictive licenses. They are invoked via subprocess using a separate Python environment.
kpms_protocol ¶
Shared protocol models and wire helpers for the kpms server/client.
Defines the request/response Pydantic models and the newline-delimited JSON framing used over Unix domain sockets. Importable from both the main mosaic environment (client) and the external .venv (server).
Dependencies: pydantic, numpy (available in both environments).
kpms_server ¶
Persistent subprocess server for keypoint-moseq operations.
Runs in the external .venv (keypoint-moseq environment). Imports JAX and keypoint-moseq once at startup, then serves commands over a Unix domain socket.
Commands: add_track, fit, load_model, apply, save_model, shutdown.
Wire protocol: newline-delimited JSON. Arrays are base64-encoded in the JSON with dtype and shape metadata.
Usage::
.venv/bin/python kpms_server.py /tmp/kpms.sock
extract_labeled_templates ¶
ExtractLabeledTemplates ¶
Extract labeled, split-annotated templates from upstream features.
Streams upstream feature data, aligns ground truth labels from NPZ files, assigns train/test splits by sequence, and subsamples per class. Produces a templates parquet with feature columns + label (int) + split (str).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
GroundTruthLabelsSource specifying where to load per-frame ground-truth labels (required). |
required | |
strategy
|
Template selection method — "random" or "farthest_first". Default: "random". |
required | |
n_per_class
|
Number of templates per class. An int applies uniformly; a dict maps class -> count. Exactly one of n_per_class or n_total must be set. Default: None. |
required | |
n_total
|
Total number of templates across all classes (distributed proportionally). Exactly one of n_per_class or n_total must be set. Default: None. |
required | |
pool
|
PoolConfig controlling candidate pool size and allocation. Default: PoolConfig(). |
required | |
test_fraction
|
Fraction of sequences held out for the test split. Default: 0.2. |
required | |
random_state
|
Random seed for reproducibility. Default: 42. |
required |
LabeledProvenanceArtifact ¶
Bases: ParquetArtifact
Per-entry template provenance (template_provenance.parquet).
LabeledTemplatesArtifact ¶
Bases: ParquetArtifact
Labeled template feature vectors (templates.parquet).
Uses numeric_only=False because the parquet contains the str 'split' column alongside numeric feature columns and int 'label'.
extract_templates ¶
ExtractTemplates ¶
Subsample per-sequence data into a representative template matrix.
Entry point for the global feature pipeline. Streams per-sequence inputs, builds a candidate pool with proportional per-entry contribution, and selects templates using the configured strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
Template selection method — "random" for uniform random sampling, "farthest_first" for greedy diversity maximization. Default: "random". |
required | |
n_templates
|
Number of templates to select (required). |
required | |
pool
|
PoolConfig controlling candidate pool size, allocation strategy, and per-entry caps. Default: PoolConfig(). |
required | |
random_state
|
Random seed for reproducibility. Default: 42. |
required | |
pair_filter
|
Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None. |
required |
Params ¶
Bases: Params
ExtractTemplates parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
strategy |
Literal['random', 'farthest_first']
|
Selection strategy. Default "random". |
n_templates |
int
|
Number of templates to select. Required. |
pool |
PoolConfig
|
Pool configuration. Default PoolConfig(). |
random_state |
int
|
Random seed. Default 42. |
ProvenanceArtifact ¶
Bases: ParquetArtifact
Per-entry template provenance (template_provenance.parquet).
TemplatesArtifact ¶
Bases: ParquetArtifact
Template feature vectors (templates.parquet).
feature_template__global ¶
Template for a global feature (clustering, embedding, dimensionality reduction).
Copy this file, rename the class and name, and fill in your logic.
Protocol (4 attributes + 4 methods): - name, version, parallelizable, scope_dependent - load_state(run_root, artifact_paths, dependency_lookups) -> bool - fit(inputs: factory returning iterator of (entry_key, DataFrame)) -> None - save_state(run_root) -> None - apply(df: DataFrame) -> DataFrame
Global features are stateful: fit() iterates over all sequences to build a model, save_state() persists it, and load_state() restores it to skip re-fitting. apply() then maps per-sequence data using the fitted model.
Set scope_dependent = False unless outputs change depending on which sequences are in scope (most global features are scope-independent once fitted).
See GlobalTSNE and GlobalWardClustering for real examples.
MyGlobalFeature ¶
Template for a global feature.
Global features load data from prior feature outputs (via Result-based inputs), run a cross-sequence algorithm in fit(), and persist the model via save_state(). The apply() method maps per-sequence data using the fitted model.
Typical workflow
- load_state() checks for a cached model on disk
- fit() iterates over all sequences, accumulates data, runs algorithm
- save_state() persists the model to run_root
- apply() maps per-sequence data using the fitted model
feature_template__per_sequence ¶
Template for a per-sequence feature.
Copy this file, rename the class and name, and fill in your logic.
Protocol (4 attributes + 4 methods): - name, version, parallelizable, scope_dependent - load_state(run_root, artifact_paths, dependency_lookups) -> bool - fit(inputs: factory returning iterator of (entry_key, DataFrame)) -> None - save_state(run_root) -> None - apply(df: DataFrame) -> DataFrame
Per-sequence features are stateless by default: load_state returns True (nothing to restore), fit/save_state are no-ops, and apply does all the work. Set scope_dependent = False unless outputs depend on which sequences are in scope.
See SpeedAngvel for a real per-sequence feature.
MyPerSequenceFeature ¶
Template for a per-sequence feature.
Input
A DataFrame for a single (group, sequence) from either: * tracks (input_kind="tracks") * another feature (input_kind="feature") * a multi-input Inputs() tuple
Output
A DataFrame with one row per frame (or per frame x pair), with: * frame (or time) * group, sequence * id1, id2 (when pair-aware) * your feature columns
Params ¶
Bases: Params
Per-sequence feature template parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
window_size |
int
|
Sliding window size. Default 15. |
apply ¶
Compute features for a single (group, sequence).
For pair-aware inputs the df may contain multiple (id1, id2) pairs; process each pair independently to avoid mixing contexts.
feral_feature ¶
FeralFeature -- FERAL vision-transformer behavior classifier as a Mosaic pipeline feature.
Supports both training and inference in a single unified feature, following the same global-feature pattern as XgboostFeature and KpmsFeature.
Training mode¶
Provide video_dir, label_json, and a training config dict.
The label_json file must contain class_names, splits
(with train and optionally val/test keys), and optionally
is_multilabel. Training runs the full FERAL ViT fine-tuning loop
with intermediate checkpoints saved to disk for crash recovery.
After training, the test split (if present) is automatically evaluated.
Inference mode¶
Provide model_dir pointing to a directory with model_best.pt
and config.json from a previous training run.
Output follows the same pattern as XgboostFeature: per-frame rows with
prob_<class> probability columns and a predicted_label column.
Requires the FERAL code directory (https://github.com/Skovorp/feral).
Point feral_code_dir to a local clone of the repository.
FeralFeature ¶
FERAL vision-transformer behavior classifier as a pipeline feature.
Supports two operating modes:
Training mode (video_dir + label_json + training):
Runs the full FERAL ViT fine-tuning loop, saves checkpoints,
evaluates the test split (if present), then applies to all
sequences in the apply phase.
Inference mode (model_dir):
Loads a pre-trained FERAL model and runs per-frame behavior
classification on crop videos.
Supports two input formats for the apply phase:
-
InteractionCropPipeline output (pair-level): One row per crop video with
video_path,id_a,id_b,target_id,interaction_id,start_frame,end_frame. -
EgocentricCrop output (individual-level): One row per frame with
target_id,frame. Videos are derived asegocentric_id{target_id}.mp4.
Params¶
feral_code_dir : Path
Path to a local clone of https://github.com/Skovorp/feral.
model_name : str
HuggingFace model name (default: V-JEPA2 ViT-L).
predict_per_item : int
Predictions per chunk (default 64).
chunk_length : int
Frames per video chunk (default 64).
chunk_shift : int
Stride between chunks for overlapping inference (default 32).
chunk_step : int
Frame sampling step within chunks (default 1).
resize_to : int
Input resolution for ViT (default 256).
device : str
PyTorch device (default "cuda").
class_names : dict | None
Class index -> name mapping. Auto-detected from model config.
decision_threshold : float | None
Probability threshold for positive class. None uses argmax.
default_class : int
Fallback class when no class exceeds threshold (default 0).
model_dir : Path | None
Directory with model_best.pt + config.json (inference mode).
video_dir : Path | None
Directory containing crop videos (training mode).
label_json : Path | None
Path to FERAL-format label JSON with splits (training mode).
training : FeralTrainingConfig | None
Training hyperparameters. None = inference-only mode.
fit ¶
Train a FERAL model or verify pre-trained model is loaded.
In training mode (video_dir + label_json + training set),
runs the full ViT fine-tuning loop with intermediate checkpoints.
After training, evaluates the test split if present.
In inference mode (model_dir set), the model is already loaded
by load_state() and this method is not called.
The inputs argument is not consumed -- FERAL reads video files
directly from params.video_dir.
FeralTrainingConfig ¶
Bases: StrictModel
Training hyperparameters for FERAL ViT fine-tuning.
These mirror the FERAL default_vjepa.yaml configuration.
ffgroups ¶
FFGroups ¶
Per-sequence fission-fusion grouping metrics.
Inputs: raw tracks (columns: x, y, id, frame/time, group, sequence). Outputs per (frame, id): - group_membership (component label) - group_size (size of that component) - event (event id from dp.get_events_info, -1 if not in an event)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
distance_cutoff
|
Pairwise distance threshold below which two animals are considered in the same group. Default: 50.0. |
required | |
window_size
|
Sliding-window size (frames) for smoothing the pairwise distance matrix before thresholding. Default: 5. |
required | |
min_event_duration
|
Minimum number of contiguous frames for a stable subgroup to be registered as an event. Default: 1. |
required |
ffgroups_metrics ¶
FFGroupsMetrics ¶
Per-sequence summary of focal-fish group metrics.
Per-frame computed (internal): - distance_from_centroid, xrot_to_centroid, yrot_to_centroid, dev_speed_to_mean Summaries (output: one row per id within sequence): - fractime_norm2 - avg_duration_frame - med_duration_frame - ftime_periphery - ftime_periphery_norm
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_col
|
Column name that identifies group events (e.g. from FFGroups output). Default: "event". |
required | |
speed_col
|
Column name for speed values. Default: "speed". |
required | |
time_chunk_sec
|
If set, split each sequence into time-based chunks of this duration (seconds) and compute summaries per chunk. Default: None (whole sequence). |
required | |
frame_chunk
|
If set, split each sequence into frame-based chunks of this size and compute summaries per chunk. Default: None. |
required | |
centroid_heading_col
|
Column for centroid heading used in rotation calculations. Default: "centroid_heading". |
required | |
exclude_cols
|
List of boolean column names (e.g. "bad_frame") whose truthy rows are dropped before computation. Default: []. |
required |
global_kmeans ¶
GlobalKMeansClustering feature.
Extracted from features.py as part of feature_library modularization.
GlobalKMeansClustering ¶
Global K-Means clustering on templates loaded via load_state. Per-sequence cluster assignment is done in apply().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templates
|
Templates artifact to fit on (inherited from GlobalModelParams). |
required | |
model
|
Pre-fitted KMeansModelArtifact to load (skip fit). Default: KMeansModelArtifact(). |
required | |
k
|
Number of clusters. Default: 100. |
required | |
random_state
|
Random seed for KMeans initialization. Default: 42. |
required | |
n_init
|
Number of KMeans initializations to run. Default: "auto". |
required | |
max_iter
|
Maximum iterations per KMeans run. Default: 300. |
required | |
device
|
Compute device — "cpu" or "cuda" (requires cuML). Default: "cpu". |
required | |
label_artifact_points
|
If True, assign cluster labels to the template points used for fitting. Default: True. |
required | |
pair_filter
|
Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None. |
required |
Params ¶
Bases: GlobalModelParams[KMeansModelArtifact]
Global K-means clustering parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
templates |
ParquetArtifact | None
|
Templates artifact to fit on (inherited). |
model |
KMeansModelArtifact | None
|
Pre-fitted KMeans model artifact (skip fit). |
k |
int
|
Number of clusters. Default 100. |
random_state |
int
|
Random seed. Default 42. |
n_init |
Literal['auto'] | int
|
KMeans initializations. Default "auto". |
max_iter |
int
|
Max iterations per run. Default 300. |
device |
str
|
Compute device. Default "cpu". |
label_artifact_points |
bool
|
Label points used for fitting. Default True. |
pair_filter |
NNResult | None
|
Nearest-neighbor pair filter for dependency resolution. Default None. |
KMeansArtifactLabelsArtifact ¶
Bases: NpzArtifact
Labels for the artifact points used in fitting (artifact_labels.npz).
KMeansClusterCentersArtifact ¶
Bases: NpzArtifact
Cluster center vectors (cluster_centers.npz).
KMeansClusterSizesArtifact ¶
Bases: ParquetArtifact
Per-cluster sample counts (cluster_sizes.parquet).
KMeansModelArtifact ¶
Bases: JoblibArtifact[KMeansModelBundle]
KMeans model (model.joblib).
global_scaler ¶
GlobalScaler ¶
Fit a StandardScaler on templates and scale per-sequence data.
Consumes a templates artifact (from ExtractTemplates or any feature producing templates.parquet). Produces a scaler model bundle and scaled templates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templates
|
Templates artifact to fit the scaler on (inherited from GlobalModelParams). |
required | |
model
|
Pre-fitted ScalerModelArtifact to load (skip fit). Default: ScalerModelArtifact(). |
required |
Params ¶
Bases: GlobalModelParams[ScalerModelArtifact]
GlobalScaler parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
templates |
ParquetArtifact | None
|
Templates artifact to fit scaler on. |
model |
ScalerModelArtifact | None
|
Pre-fitted scaler model artifact (skip fit). |
ScaledTemplatesArtifact ¶
Bases: ParquetArtifact
Scaled template vectors (scaled_templates.parquet).
ScalerModelArtifact ¶
Bases: JoblibArtifact[ScalerModelBundle]
Fitted scaler model bundle (scaler.joblib).
global_tsne ¶
GlobalTSNE feature.
GlobalTSNE ¶
Fit an openTSNE embedding on templates and map per-sequence data.
Consumes a templates artifact (from ExtractTemplates, GlobalScaler, or any feature producing templates). Produces an embedding model bundle and template coordinates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templates
|
Templates artifact to fit embedding on (inherited from GlobalModelParams). |
required | |
model
|
Pre-fitted TSNEModelArtifact to load (skip fit). Default: TSNEModelArtifact(). |
required | |
random_state
|
Random seed. Default: 42. |
required | |
perplexity
|
t-SNE perplexity parameter. Default: 50. |
required | |
knn_method
|
kNN backend — "annoy", "faiss", or "faiss-gpu". Default: "annoy". |
required | |
n_jobs
|
Number of parallel jobs for openTSNE. Default: 8. |
required | |
fit
|
TSNEFitConfig controlling learning rate, exaggeration iterations, momentum, etc. Default: TSNEFitConfig(). |
required | |
mapping
|
TSNEMapConfig controlling partial-embedding parameters (k, iterations, chunk_size, etc.). Default: TSNEMapConfig(). |
required |
Params ¶
Bases: GlobalModelParams[TSNEModelArtifact]
Global t-SNE parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
templates |
ParquetArtifact | None
|
Templates artifact to fit embedding on. |
model |
TSNEModelArtifact | None
|
Pre-fitted embedding model artifact (skip fit). |
random_state |
int
|
Random seed. Default 42. |
perplexity |
int
|
t-SNE perplexity. Default 50. |
knn_method |
str
|
kNN method ("annoy", "faiss", "faiss-gpu"). Default "annoy". |
n_jobs |
int
|
Parallel jobs for openTSNE. Default 8. |
fit |
TSNEFitConfig
|
Embedding fitting parameters. |
mapping |
TSNEMapConfig
|
Partial embedding mapping parameters. |
TSNECoordsArtifact ¶
Bases: NpzArtifact
t-SNE coordinates of templates (global_tsne_templates.npz).
TSNEFitConfig ¶
Bases: StrictModel
openTSNE fitting parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
learning_rate |
float | str
|
Learning rate ("auto" lets openTSNE compute). Default "auto". |
exaggeration_iters |
int
|
Early exaggeration phase iterations. Default 250. |
exaggeration |
float
|
Early exaggeration factor. Default 12. |
exaggeration_momentum |
float
|
Momentum during early exaggeration. Default 0.5. |
iters |
int
|
Refinement phase iterations. Default 750. |
momentum |
float
|
Momentum during refinement. Default 0.8. |
TSNEMapConfig ¶
Bases: StrictModel
Parameters for mapping new points into the fitted embedding.
Attributes:
| Name | Type | Description |
|---|---|---|
k |
int
|
Neighbors for partial embedding. Default 25. |
iters |
int
|
Optimization iterations. Default 100. |
learning_rate |
float
|
Learning rate. Default 1.0. |
exaggeration |
float
|
Exaggeration factor. Default 2.0. |
momentum |
float
|
Momentum. Default 0.0. |
chunk_size |
int
|
Chunk size for large sequences. Default 50000. |
TSNEModelArtifact ¶
Bases: JoblibArtifact[TSNEModelBundle]
Fitted t-SNE embedding model (embedding.joblib).
global_ward ¶
GlobalWardClustering feature.
Fits Ward hierarchical linkage on templates, cuts at n_clusters, builds centroids, and assigns per-sequence rows via 1-NN.
GlobalWardClustering ¶
Ward hierarchical clustering on templates with per-sequence 1-NN assignment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templates
|
Templates artifact to cluster (inherited from GlobalModelParams). |
required | |
model
|
Pre-fitted WardModelArtifact to load (skip fit). Default: WardModelArtifact(). |
required | |
n_clusters
|
Number of clusters to cut from the linkage tree. Default: 20. |
required | |
method
|
Linkage method passed to scipy.cluster.hierarchy.linkage. Default: "ward". |
required | |
pair_filter
|
Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None. |
required |
Params ¶
Bases: GlobalModelParams[WardModelArtifact]
Global Ward clustering parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
templates |
ParquetArtifact | None
|
Templates artifact to cluster (inherited). |
model |
WardModelArtifact | None
|
Pre-fitted Ward model artifact (skip fit). |
n_clusters |
int
|
Number of clusters to cut. Default 20. |
method |
str
|
Linkage method. Default "ward". |
pair_filter |
NNResult | None
|
Nearest-neighbor pair filter. Default None. |
WardModelArtifact ¶
Bases: JoblibArtifact[WardModelBundle]
Ward linkage model (model.joblib).
helpers ¶
Shared helper functions for feature implementations.
This module contains utility functions used across multiple features in the feature_library to avoid code duplication.
apply_exclude_cols ¶
Drop rows where any exclude_cols column is truthy.
Silently skips column names not present in df. Returns df unchanged when exclude_cols is empty/None.
clean_animal_track ¶
clean_animal_track(g: DataFrame, data_cols: list[str], order_col: str, config: InterpolationConfig) -> pd.DataFrame
Sort, interpolate, fill, and drop rows with excessive missing data.
clean_tracks_grouped ¶
clean_tracks_grouped(df: DataFrame, group_cols: list[str], data_cols: list[str], order_col: str, config: InterpolationConfig) -> pd.DataFrame
Clean tracks per group, preserving group columns in the result.
Pandas 3.0 excludes group columns from groupby().apply() results.
This wrapper uses group_keys=True and resets the index to restore them.
ego_rotate ¶
Rotate world-frame deltas into ego frame (heading aligned with +x).
ensure_columns ¶
Raise ValueError if any required columns are missing from df.
feature_columns ¶
Return the sorted list of numeric feature column names in df.
Excludes standard metadata columns (COLUMNS.meta_set()) and known non-feature columns (id1, id2, entity_level, perspective, fps).
unwrap_diff ¶
Compute angular velocity from angle array.
id_tag_columns ¶
IdTagColumns ¶
Attach per-id label fields (from labels/
Outputs per row (same granularity as input tracks/feature): frame/time/id/group/sequence + one column per requested label field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
LabelsSource specifying which labels directory to load. Default: LabelsSource(kind="id_tags"). |
required | |
label_kind
|
Label subdirectory name used for dependency resolution. Default: "id_tags". |
required | |
fields
|
List of label field names to attach. None means all fields found in the labels file. Default: None. |
required | |
field_renames
|
Optional mapping of original field names to renamed column names in the output. Default: None. |
required |
identity_model ¶
GlobalIdentityModel feature.
Trains a T-Rex-compatible visual identification model from egocentric crop
images of individual animals. Uses the V200 CNN architecture to produce
weights loadable via T-Rex's visual_identification_model_path setting.
GlobalIdentityModel ¶
Train a visual identity model from individual animal sequences.
Takes EgocentricCrop output as input. Each identity is specified as a
mapping of identity names to lists of sequences containing that
individual alone. Trains a V200 CNN classifier (T-Rex-compatible)
and exports weights loadable via visual_identification_model_path.
Example::
ego_result = dataset.run_feature(ego_crop)
identity_model = GlobalIdentityModel(
Inputs((Result(feature="egocentric-crop"),)),
params={
"identities": {
"mouse_A": ["cage1/day1_mouseA_alone", "cage1/day3_mouseA_alone"],
"mouse_B": ["cage1/day1_mouseB_alone"],
"mouse_C": ["cage1/day2_mouseC_alone"],
"mouse_D": ["cage1/day1_mouseD_alone"],
},
"image_size": (128, 128),
"channels": 1,
},
)
result = dataset.run_feature(identity_model)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identities
|
Explicit identity -> sequences mapping. Keys are identity names, values are lists of "group/sequence" strings. |
required | |
group_as_identity
|
Convenience shortcut -- treat each group name as one identity. Default False. |
required | |
image_size
|
Crop resize target (height, width). Default (128, 128). |
required | |
channels
|
Number of image channels (1=grayscale, 3=color). Default 1. |
required | |
epochs
|
Training epochs. Default 150. |
required | |
learning_rate
|
Adam learning rate. Default 0.0001. |
required | |
batch_size
|
Training batch size. Default 64. |
required | |
val_split
|
Fraction of data reserved for validation. Default 0.2. |
required | |
max_images_per_identity
|
Cap on images per identity to balance classes. Default 2000. |
required | |
export_trex_weights
|
Save a T-Rex-loadable .pth file. Default True. |
required | |
trex_weights_name
|
Stem of the exported .pth file. Default "identity_model". |
required |
kpms ¶
Unified keypoint-MoSeq feature.
Fits an AR-HMM model and applies it to extract per-frame syllable labels,
using a persistent subprocess server to avoid repeated JAX startup costs.
The kpms package does NOT need to be installed in the mosaic environment --
only in a separate .venv whose interpreter path is passed via kpms_python.
KpmsFeature ¶
Unified keypoint-MoSeq feature: fit + apply via persistent subprocess.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Pre-fitted KpmsModelArtifact to load (skip fit). Default: None (fit from scratch). |
required | |
kpms_python
|
Path to a Python interpreter with keypoint-moseq installed. None uses the bundled external .venv. Default: None. |
required | |
pose
|
Pose keypoint configuration (indices, column prefixes). Default: PoseConfig(). |
required | |
anterior_bodyparts
|
List of bodypart names forming the anterior reference (required, min 1 element). |
required | |
posterior_bodyparts
|
List of bodypart names forming the posterior reference (required, min 1 element). |
required | |
fps
|
Frames per second of the input data. Default: 30. |
required | |
num_iters_ar
|
Number of AR-only fitting iterations. Default: 50. |
required | |
num_iters_full
|
Number of full model fitting iterations. Default: 500. |
required | |
kappa_ar
|
AR transition concentration parameter. None lets keypoint-moseq choose. Default: None. |
required | |
kappa_full
|
Full-model transition concentration parameter. None lets keypoint-moseq choose. Default: None. |
required | |
latent_dim
|
Dimensionality of the latent pose space. Must satisfy latent_dim < 2 * num_keypoints. Default: 10. |
required | |
location_aware
|
If True, include centroid location in the model. Default: False. |
required | |
outlier_scale_factor
|
Scale factor for outlier detection. Default: 6.0. |
required | |
remove_outliers
|
If True, remove detected outlier frames before fitting. Default: True. |
required | |
mixed_map_iters
|
Number of mixed MAP iterations. None uses the keypoint-moseq default. Default: None. |
required | |
parallel_message_passing
|
Enable parallel message passing. None uses the keypoint-moseq default. Default: None. |
required | |
resume
|
If True, resume fitting from a previously saved checkpoint. Default: True. |
required | |
downsample_rate
|
Temporal downsampling factor applied before fitting. None disables downsampling. Default: None. |
required | |
save_every_n_iters
|
Save a checkpoint every N iterations during fit. Default: 25. |
required | |
num_iters_apply
|
Number of iterations when applying the model to new data. Default: 500. |
required |
lightning_action_feature ¶
Lightning-action supervised temporal action segmentation feature.
Wraps the lightning-action package (Paninski lab, MIT license) as a
mosaic global feature. Trains a temporal neural network classifier
(DilatedTCN, RNN, or TemporalMLP) on labeled templates and predicts
per-frame action probabilities with temporal context.
Requires the optional lightning-action package::
pip install lightning-action
Or install mosaic with the extra::
pip install mosaic-behavior[lightning-action]
LightningActionFeature ¶
Supervised temporal action segmentation via lightning-action.
Trains a temporal neural network classifier (DilatedTCN, RNN, or TemporalMLP head + linear classifier) on labeled templates and predicts per-frame action probabilities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Pre-fitted LightningActionModelArtifact to load (skip training). Default: LightningActionModelArtifact(). |
required | |
head
|
Temporal encoder architecture — "dtcn" (dilated temporal convolution), "rnn" (LSTM/GRU), or "temporalmlp". Default: "dtcn". |
required | |
num_hid_units
|
Hidden units in the temporal encoder. Default: 64. |
required | |
num_layers
|
Number of encoder layers. Default: 2. |
required | |
num_lags
|
Lag/kernel size for temporal context. Default: 4. |
required | |
activation
|
Activation function. Default: "lrelu". |
required | |
dropout_rate
|
Dropout rate. Default: 0.1. |
required | |
sequence_length
|
Training sequence length (frames per chunk). Default: 500. |
required | |
num_epochs
|
Number of training epochs. Default: 200. |
required | |
batch_size
|
Training batch size. Default: 32. |
required | |
learning_rate
|
Optimizer learning rate. Default: 1e-3. |
required | |
weight_decay
|
Optimizer weight decay. Default: 0.0. |
required | |
optimizer
|
Optimizer type. Default: "Adam". |
required | |
weight_classes
|
If True, weight loss by inverse class frequency. Default: True. |
required | |
device
|
Compute device — "cpu" or "gpu". Default: "cpu". |
required | |
random_state
|
Random seed. Default: 42. |
required | |
decision_threshold
|
Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None. |
required | |
default_class
|
Class label assigned when no class exceeds the decision threshold (required). |
required |
LightningActionModelArtifact ¶
Bases: JoblibArtifact[LightningActionModelBundle]
Fitted lightning-action model bundle.
movement ¶
Movement library integration for mosaic.
Provides bidirectional conversion between mosaic DataFrames and movement xarray Datasets, plus mosaic features that wrap movement's smoothing, filtering, and interpolation functions.
MovementFilterInterpolate ¶
MovementFilterInterpolate(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)
Filter low-confidence points and interpolate gaps using movement.
Wraps movement.filtering.filter_by_confidence and
movement.filtering.interpolate_over_time.
When no confidence columns (poseP0..N) are present, the confidence filter is skipped and only interpolation of existing NaN gaps is performed.
The output is a full track DataFrame with cleaned positions replacing the originals, so downstream features can chain off the result.
MovementSmooth ¶
Smooth trajectory positions using the movement library.
Wraps movement.filtering.rolling_filter and
movement.filtering.savgol_filter to smooth X/Y centroid and/or
poseX/poseY keypoint positions.
The output is a full track DataFrame with smoothed positions replacing
the originals, so downstream features can chain off the result via
Inputs((Result(feature="movement-smooth"),)).
from_movement_dataset ¶
from_movement_dataset(ds: Any, original_df: DataFrame, metadata: dict[str, Any], update_confidence: bool = False) -> pd.DataFrame
Merge a movement xarray Dataset back into a mosaic DataFrame.
Overwrites X/Y and poseX/poseY columns in a copy of original_df
with the (smoothed/filtered) values from the Dataset.
Parameters¶
ds : xarray.Dataset
movement Dataset with position and confidence data variables.
original_df : pd.DataFrame
The original mosaic DataFrame to merge into.
metadata : dict
Metadata returned by to_movement_dataset.
update_confidence : bool
Whether to also overwrite poseP columns from the Dataset's
confidence values. Default False.
Returns¶
pd.DataFrame
Copy of original_df with position columns replaced.
to_movement_dataset ¶
to_movement_dataset(df: DataFrame, fps: float | None = None, keypoint_names: list[str] | None = None, include_centroid: bool = True) -> tuple[Any, dict[str, Any]]
Convert a mosaic tracks DataFrame to a movement xarray Dataset.
Parameters¶
df : pd.DataFrame Mosaic tracks DataFrame with columns like X, Y, poseX0..N, poseY0..N, id, frame, etc. fps : float, optional Frames per second. If None, the time dimension uses frame numbers. keypoint_names : list[str], optional Names for the pose keypoints. If None, defaults to "keypoint_0", etc. include_centroid : bool Whether to include the centroid (X, Y) as an additional keypoint named "centroid". Default True.
Returns¶
ds : xarray.Dataset
movement poses Dataset with dimensions (time, space, keypoints, individuals).
metadata : dict
Metadata needed by from_movement_dataset to convert back:
individual_ids, frame_index, include_centroid, pose_pairs.
convert ¶
Bidirectional conversion between mosaic DataFrames and movement xarray Datasets.
from_movement_dataset ¶
from_movement_dataset(ds: Any, original_df: DataFrame, metadata: dict[str, Any], update_confidence: bool = False) -> pd.DataFrame
Merge a movement xarray Dataset back into a mosaic DataFrame.
Overwrites X/Y and poseX/poseY columns in a copy of original_df
with the (smoothed/filtered) values from the Dataset.
Parameters¶
ds : xarray.Dataset
movement Dataset with position and confidence data variables.
original_df : pd.DataFrame
The original mosaic DataFrame to merge into.
metadata : dict
Metadata returned by to_movement_dataset.
update_confidence : bool
Whether to also overwrite poseP columns from the Dataset's
confidence values. Default False.
Returns¶
pd.DataFrame
Copy of original_df with position columns replaced.
to_movement_dataset ¶
to_movement_dataset(df: DataFrame, fps: float | None = None, keypoint_names: list[str] | None = None, include_centroid: bool = True) -> tuple[Any, dict[str, Any]]
Convert a mosaic tracks DataFrame to a movement xarray Dataset.
Parameters¶
df : pd.DataFrame Mosaic tracks DataFrame with columns like X, Y, poseX0..N, poseY0..N, id, frame, etc. fps : float, optional Frames per second. If None, the time dimension uses frame numbers. keypoint_names : list[str], optional Names for the pose keypoints. If None, defaults to "keypoint_0", etc. include_centroid : bool Whether to include the centroid (X, Y) as an additional keypoint named "centroid". Default True.
Returns¶
ds : xarray.Dataset
movement poses Dataset with dimensions (time, space, keypoints, individuals).
metadata : dict
Metadata needed by from_movement_dataset to convert back:
individual_ids, frame_index, include_centroid, pose_pairs.
filter_interp ¶
Movement-based confidence filtering and interpolation feature.
MovementFilterInterpolate ¶
MovementFilterInterpolate(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)
Filter low-confidence points and interpolate gaps using movement.
Wraps movement.filtering.filter_by_confidence and
movement.filtering.interpolate_over_time.
When no confidence columns (poseP0..N) are present, the confidence filter is skipped and only interpolation of existing NaN gaps is performed.
The output is a full track DataFrame with cleaned positions replacing the originals, so downstream features can chain off the result.
smooth ¶
Movement-based trajectory smoothing feature.
MovementSmooth ¶
Smooth trajectory positions using the movement library.
Wraps movement.filtering.rolling_filter and
movement.filtering.savgol_filter to smooth X/Y centroid and/or
poseX/poseY keypoint positions.
The output is a full track DataFrame with smoothed positions replacing
the originals, so downstream features can chain off the result via
Inputs((Result(feature="movement-smooth"),)).
nearestneighbor ¶
NearestNeighbor ¶
Per-sequence feature computing nearest-neighbor identity and relative kinematics.
Outputs per frame (one row per individual): - nn_id: id of nearest neighbor (NaN if none) - nn_delta_x / nn_delta_y: neighbor position minus focal, world frame - nn_dist: Euclidean distance to nearest neighbor - nn_delta_angle: neighbor heading minus focal, wrapped to [-pi, pi] - nn_delta_x_ego / nn_delta_y_ego: neighbor offset in focal ego frame
nn_delta_bins ¶
NearestNeighborDeltaBins ¶
NearestNeighborDeltaBins(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)
Bin nearest-neighbor response fields (dangle, dspeed) over neighbor position.
Inputs: expect outputs from nn-delta-response (neighbor_x/neighbor_y in ego frame, dangle, dspeed, group_size, and focal/neighbor category columns).
tidy DataFrame with mean turn/speed per bin for focal role and neighbor role:
columns: [group, sequence, exp, trial, role, category, group_size, metric, bin_idx, value]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nbins
|
Number of spatial bins along the binning axis. Default: 45. |
required | |
binmax
|
Maximum absolute value for bin edges. Default: 14.0. |
required | |
max_for_avg
|
Maximum neighbor distance used when computing binned-mean responses. Default: 5.0. |
required | |
antisymm
|
If True, use front/back antisymmetric folding for turn-force computation. Default: True. |
required | |
focal_category_col
|
Column name for the focal animal's category flag. Default: "Focal_fish". |
required | |
neighbor_category_col
|
Column name for the neighbor's category flag. Default: "neighbor_focal". |
required | |
group_size_col
|
Column name for group size. Default: "group_size". |
required | |
exp_col
|
Column name for experimental condition. Default: "Exp". |
required | |
trial_col
|
Column name for trial identifier. Default: "Trial". |
required | |
category_specs
|
List of dicts defining derived category columns (keys: source_col, new_col, quantile, op). Default: []. |
required | |
exclude_cols
|
List of boolean column names whose truthy rows are dropped before computation. Default: []. |
required | |
nonfocal_flag_col
|
Column used to flag nonfocal animals. Default: "Focal_fish". |
required | |
nonfocal_flag_value
|
Value in nonfocal_flag_col that marks an animal as nonfocal. Default: False. |
required |
nn_delta_response ¶
NearestNeighborDelta ¶
Per-sequence feature that measures how a focal fish changes position/heading/speed over
the next diff_numframes frames relative to its nearest neighbor at the current frame.
Expected inputs (via tracks or an Inputs() that merges tracks + nearest-neighbor feature):
- position/heading/speed columns for the focal (x, y, ANGLE, speed_col)
- nearest-neighbor id column (nn_id_col, default: 'nn_id')
- neighbor offsets in ego frame (nn_delta_x_ego / nn_delta_y_ego); if missing, world
offsets (nn_delta_x / nn_delta_y) are rotated using the focal heading.
Outputs per focal row (filtered to frames with a valid future sample diff_numframes ahead):
frame, id, group, sequence, nn_id, neighbor_x/y (ego), neighbor_focal (if available),
dx, dy, dt, dangle (wrapped; optionally scaled by fps), dspeed, plus passthrough columns
like group_size/event/Focal_fish when present.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sampling
|
Frame rate and smoothing settings. Default: SamplingConfig(). |
required | |
speed_col
|
Column name for speed. Default: "SPEED#wcentroid". |
required | |
nn_id_col
|
Column name for the nearest-neighbor ID. Default: "nn_id". |
required | |
nn_dx_ego_col
|
Column for neighbor delta-x in ego frame. Default: "nn_delta_x_ego". |
required | |
nn_dy_ego_col
|
Column for neighbor delta-y in ego frame. Default: "nn_delta_y_ego". |
required | |
nn_dx_world_col
|
Fallback column for neighbor delta-x in world frame (used when ego columns are absent). Default: "nn_delta_x". |
required | |
nn_dy_world_col
|
Fallback column for neighbor delta-y in world frame. Default: "nn_delta_y". |
required | |
focal_col
|
Column name for the focal-animal flag. Default: "Focal_fish". |
required | |
diff_numframes
|
Number of frames ahead to compute the future response delta. Default: 4. |
required | |
wrap_angle
|
If True, wrap heading differences to [-pi, pi]. Default: True. |
required | |
divide_dangle_by_frames
|
If True, divide the heading change by diff_numframes. Default: True. |
required | |
scale_dangle_by_fps
|
If True, multiply dangle by fps to convert to radians/sec. Default: True. |
required | |
tag_cols
|
Additional columns to pass through to the output. Default: []. |
required |
orientation_relative ¶
OrientationRelativeFeature feature.
Extracted from features.py as part of feature_library modularization.
OrientationRelativeFeature ¶
OrientationRelativeFeature(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)
Orientation-aware relative features between animal pairs, order-agnostic to pose points.
For each frame and ordered pair (id_a -> id_b): - Express B in A's body frame (using heading angle and global scale). - Emit signed centroid deltas, heading difference, quantiles over B's points in A's frame, and nearest-k distances.
Params ¶
Bases: Params
Orientation-relative feature parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
scale |
BodyScaleResult
|
Body-scale artifact for normalization. |
nearest_k |
int
|
Number of nearest pose-point distances to emit. Default 3. |
quantiles |
list[float]
|
Distance distribution quantiles to compute. Default [0.25, 0.5, 0.75]. |
pair_egocentric ¶
PairEgocentricFeatures feature.
Extracted from features.py as part of feature_library modularization.
PairEgocentricFeatures ¶
PairEgocentricFeatures(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)
'pair-egocentric' -- per-sequence egocentric + kinematic features for dyads. Produces a row-wise DataFrame with columns: - frame (if available) or time passthrough (only if it's the order col) - perspective: 0 for A->B, 1 for B->A - id1, id2: pair identifiers - feature columns (e.g., A_speed, AB_dx_egoA, ...) - (optionally) group/sequence if present in df, for convenience
This feature is stateless (no fitting). It computes features for all C(n,2) pairs per sequence, cleans/interpolates pose per animal, inner-joins by the chosen order column, and computes A->B and B->A features for each pair.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interpolation
|
Interpolation settings for missing pose data. Default: InterpolationConfig(). |
required | |
sampling
|
Frame rate and smoothing settings. Default: SamplingConfig(). |
required | |
pose
|
Pose keypoint configuration (indices, column prefixes). Default: PoseConfig(). |
required | |
neck_idx
|
Index of the neck keypoint in the pose array, used to compute heading direction. Default: 3. |
required | |
tail_base_idx
|
Index of the tail-base keypoint, paired with neck_idx for heading vector. Default: 6. |
required | |
center_mode
|
How to compute the animal's center — "mean" averages all keypoints, other values use a specific keypoint. Default: "mean". |
required |
pair_interaction_filter ¶
PairInteractionFilter -- detect pairwise interaction segments from trajectories.
Identifies frames where pairs of individuals meet configurable distance and angular thresholds. Applies morphological filtering to remove noise and enforces a minimum interaction duration.
Typical use cases
- Detecting face-to-face interactions (distance + facing criterion)
- Proximity-based pair detection (distance only,
require_facing=False) - Pre-filtering for expensive downstream processing (e.g. interaction crops)
All thresholds are parameterized and should be tuned per application.
PairInteractionFilter ¶
PairInteractionFilter(inputs: Inputs = Inputs(('tracks',)), params: dict[str, object] | None = None)
Detect pairwise interaction segments from trajectory data.
For every unique pair of individuals in a sequence, tests per-frame distance and (optionally) angular criteria, applies morphological filtering, and extracts continuous interaction segments that meet a minimum duration.
Output columns (one row per frame per interaction segment): - frame: frame number - id_a, id_b: individual IDs (id_a < id_b by convention) - interaction_id: integer label for the segment within this pair - interaction_start: first frame of this segment - interaction_end: last frame (exclusive) of this segment
Params¶
shift_dist : float
Pixel shift along heading before distance check (default 15).
Set to 0 to use raw positions without forward shift.
max_dist : float
Maximum shifted-position distance in pixels (default 40).
require_facing : bool
If True (default), require individuals to face each other
(inverse orientation difference < max_inv_orientation_diff_deg).
Set to False for distance-only filtering.
max_inv_orientation_diff_deg : float
Max angle (degrees) between inverse orientations (default 80).
Only used when require_facing=True.
min_run_frames : int
Minimum continuous frames for a valid interaction (default 250).
frame_padding : int
Frames to pad before/after each segment (default 10).
morphological_structure_size : int
Structure element length for binary close/open (default 25).
Set to 0 to disable morphological filtering.
px_scale : float
Scale factor applied to shift_dist and max_dist (default 1.0).
Use to adjust for videos with different pixel resolutions.
use_pixel_coords : bool
If True, use poseX/poseY columns (pixel coordinates) for
distance calculations instead of X/Y (world coordinates).
Default True since thresholds are in pixel units.
pose_head_index : int | None
If set and use_pixel_coords is True, use this pose index
as the position for distance calculations.
pair_position ¶
PairPositionFeatures - egocentric dyadic features using only (x, y, angle).
Drop-in replacement for PairEgocentricFeatures when pose keypoints are not available. Uses the ANGLE column directly for heading instead of computing from neck->tail vector.
Output columns match PairEgocentricFeatures exactly, enabling use with downstream features like PairWavelet.
PairPositionFeatures ¶
'pair-position' -- per-sequence egocentric + kinematic features for all pairs.
Unlike PairEgocentricFeatures which requires full pose keypoints, this feature works with minimal input: just (x, y, angle) per animal.
For N animals per sequence, computes features for all N*(N-1)/2 unique pairs, each with two perspectives (A->B and B->A).
Output columns (per row): - frame: frame number - perspective: 0 for A->B, 1 for B->A - id1, id2: IDs of the two animals in this pair - A_speed, A_v_para, A_v_perp, A_ang_speed: focal kinematics - A_heading_cos, A_heading_sin: focal heading - AB_dist: inter-animal distance - AB_dx_egoA, AB_dy_egoA: partner position in focal's egocentric frame - rel_heading_cos, rel_heading_sin: relative heading - B_speed, B_v_para, B_v_perp, B_ang_speed: partner kinematics - (optionally) group, sequence for convenience
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interpolation
|
Interpolation settings for missing position data. Default: InterpolationConfig(). |
required | |
sampling
|
Frame rate and smoothing settings. Default: SamplingConfig(). |
required |
pair_wavelet ¶
PairWavelet feature -- CWT spectrograms on PairPoseDistancePCA outputs.
PairWavelet ¶
CWT spectrograms on PairPoseDistancePCA outputs.
Expects input df to contain columns
- 'perspective' (0 = A->B, 1 = B->A)
- 'frame' (preferred) or 'time' (if used as order column)
- PC0..PC{k-1} (k = number of PCA components)
Returns a DataFrame with columns
- frame (or time if that was the order col)
- perspective
- W_{col}_f{fi} (log-power, clamped, for each component x frequency) and (optionally) passthrough group/sequence if present in df.
Stateless (no fitting). FPS is inferred from constant df['fps'] if present, otherwise from fps_default. Frequencies are dyadically spaced in [f_min, f_max].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sampling
|
Frame rate and smoothing settings. Default: SamplingConfig(). |
required | |
f_min
|
Minimum frequency in Hz for the CWT band. Default: 0.2. |
required | |
f_max
|
Maximum frequency in Hz for the CWT band. Default: 5.0. |
required | |
n_freq
|
Number of frequency bins (dyadically spaced between f_min and f_max). Default: 25. |
required | |
wavelet
|
PyWavelets wavelet name. Default: "cmor1.5-1.0". |
required | |
log_floor
|
Floor value for log-power clamping. Default: -3.0. |
required | |
pc_prefix
|
Column prefix used to auto-detect PC input columns (e.g. "PC0", "PC1", ...). Default: "PC". |
required | |
cols
|
Explicit list of input column names. If None, columns are auto-detected using pc_prefix. Default: None. |
required |
pairposedistancepca ¶
PairPoseDistancePCA ¶
'pair-posedistance-pca' — builds per-frame pairwise pose-distance features and fits an IncrementalPCA globally; outputs PC scores per sequence (and perspective).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interpolation
|
Interpolation settings for missing pose data. Default: InterpolationConfig(). |
required | |
pose
|
Pose keypoint configuration (indices, column prefixes). Default: PoseConfig(). |
required | |
include_intra_A
|
If True, include intra-animal A pairwise keypoint distances. Default: True. |
required | |
include_intra_B
|
If True, include intra-animal B pairwise keypoint distances. Default: True. |
required | |
include_inter
|
If True, include inter-animal pairwise keypoint distances. Default: True. |
required | |
duplicate_perspective
|
If True, output both A->B and B->A perspectives per pair. Default: True. |
required | |
n_components
|
Number of PCA components to retain. Default: 6. |
required | |
batch_size
|
Batch size for IncrementalPCA partial_fit. Default: 5000. |
required |
speed_angvel ¶
SpeedAngvel ¶
Per-sequence feature computing translational speed and angular velocity.
Outputs (per frame): - speed: displacement magnitude between consecutive frames divided by dt - angvel: wrapped heading difference (rad) divided by dt - speed_step / angvel_step: same, but using a configurable step_size (omitted if step_size is None) - speed_smooth: Savitzky-Golay smoothed speed (polyorder=1), only present when smooth_window is set in Params
Time-delta (dt) computation: Speed and angular velocity require dividing by a time interval. The source for dt is chosen by priority:
- frame + fps (recommended for constant-fps video): when
fpsis set in Params, dt is computed asframe_diff / fps. This is immune to irregular real timestamps that some trackers embed in thetimecolumn (e.g. TRex uses wall-clock timestamps that may jitter by several milliseconds per frame). It also correctly handles frame gaps from dropped/bad frames. - time column: if
fpsis not set but atimecolumn exists, dt is computed from consecutive time differences. - array index: last resort when neither frame+fps nor time is available — assumes each row is one step apart.
For most video-based tracking data, setting fps is strongly
recommended to avoid speed artifacts from timestamp jitter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
step_size
|
If set, also compute speed_step / angvel_step using this frame step (in addition to step=1). Default: None. |
required | |
smooth_window
|
If set, apply Savitzky-Golay smoothing (polyorder=1) over this many frames to produce speed_smooth. Default: None. |
required | |
fps
|
Frames per second. When set, dt is derived from frame_diff/fps instead of the time column — more robust for constant-fps data with jittery timestamps. Default: None. |
required |
temporal_stacking ¶
Temporal stacking feature.
Builds temporal context windows over per-sequence feature data by stacking Gaussian-smoothed frames at time offsets and optional pooled statistics.
TemporalStackingFeature ¶
Build temporal context windows over per-sequence feature data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
half
|
Half-width of the temporal window in frames. The full window spans [-half, +half]. Default: 60. |
required | |
skip
|
Step size between time offsets in the stacking window. Default: 5. |
required | |
use_temporal_stack
|
If True, concatenate Gaussian-smoothed copies at each time offset. Default: True. |
required | |
sigma_stack
|
Gaussian sigma (in frames) for smoothing before stacking. 0 disables smoothing. Default: 30.0. |
required | |
add_pool
|
If True, append pooled statistics (e.g. mean, std) computed over a sliding Gaussian window. Default: True. |
required | |
pool_stats
|
Tuple of pooled statistics to compute. Supported: "mean", "std", "variance". Default: ("mean",). |
required | |
sigma_pool
|
Gaussian sigma (in frames) for the pooling window. Default: 30.0. |
required | |
fps
|
Frames per second; used to convert win_sec to frames. Default: 30.0. |
required | |
win_sec
|
Pooling window width in seconds. Default: 0.5. |
required | |
pair_filter
|
Optional NNResult for nearest-neighbor pair filtering during dependency resolution. Default: None. |
required |
trajectory_smooth ¶
TrajectorySmooth ¶
Per-sequence feature that smooths and interpolates trajectory positions.
Pipeline (per individual): 1. Bad-frame detection: flag frames with speed > speed_threshold, expand flagged region by expand_frames in each direction. 2. Interpolation: set positions to NaN at bad frames, linearly interpolate, forward/backward fill edges. Controlled separately for centroid (interpolate_centroid) and pose (interpolate_pose). 3. Savgol smoothing: apply savgol_filter to centroid X/Y and all pose columns (always, regardless of interpolation flags).
Output is the full track DataFrame with smoothed positions replacing
originals, plus a bad_frame boolean column. Downstream features
can consume this via Inputs((Result(feature="trajectory-smooth"),)).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
speed_threshold
|
Speed above which a frame is flagged as bad.
When |
required | |
fps
|
Frames per second. When provided, |
required | |
interpolate_centroid
|
If True, replace bad-frame centroid positions with linear interpolation. Default: True. |
required | |
interpolate_pose
|
If True, replace bad-frame pose keypoint positions with linear interpolation. Default: False. |
required | |
expand_frames
|
Number of frames to expand the bad-frame region in each direction. Default: 2. |
required | |
savgol_window
|
Window length for Savitzky-Golay smoothing. Must be odd and >= savgol_polyorder + 1. None disables smoothing. Default: None. |
required | |
savgol_polyorder
|
Polynomial order for Savitzky-Golay filter. Default: 2. |
required |
types ¶
InterpolationConfig ¶
Bases: StrictModel
Interpolation parameters for missing pose/position data.
Attributes:
| Name | Type | Description |
|---|---|---|
linear_interp_limit |
int
|
Max consecutive NaN frames to fill via linear interpolation. Default 10, must be >= 1. |
edge_fill_limit |
int
|
Max frames to forward/backward fill at sequence edges. Default 3, must be >= 0. |
max_missing_fraction |
float
|
Rows with a higher fraction of NaN columns are dropped entirely. Default 0.10, range [0, 1]. |
PoolConfig ¶
Bases: StrictModel
Candidate pool configuration for template extraction.
Controls how per-entry contributions to the candidate pool are allocated before the final template selection step.
Attributes:
| Name | Type | Description |
|---|---|---|
size |
int | None
|
Candidate pool size. For "random" strategy, defaults to n_templates (pool == output). For "farthest_first", should be larger (e.g. n_templates * 3). |
allocation |
Literal['reservoir', 'exact']
|
How per-entry quotas are computed. "reservoir": weighted reservoir sampling, single pass. "exact": two-pass -- first counts rows, second samples with exact proportional quotas. Default "reservoir". |
max_entry_fraction |
float | None
|
Cap per entry as fraction of pool size. None means no cap (purely proportional). At runtime, effective cap is max(max_entry_fraction, 1 / n_entries) so the pool can always be filled completely. Default None. |
SamplingConfig ¶
Bases: StrictModel
Frame rate and temporal smoothing parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
fps_default |
float
|
Fallback frames-per-second when the data does not carry an fps column. Default 30.0, must be > 0. |
smooth_win |
int
|
Moving-average window size applied to pose coordinates before feature computation. 0 disables smoothing. Default 0. |
xgboost_feature ¶
XgboostFeature ¶
XGBoost behavior classifier as a pipeline feature.
Trains on labeled templates (from ExtractLabeledTemplates) and runs per-sequence inference. Supports multiclass and one-vs-rest strategies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Pre-fitted XgboostModelArtifact to load (skip training). Default: XgboostModelArtifact(). |
required | |
strategy
|
Classification strategy — "multiclass" trains a single multi-class model; "one_vs_rest" trains one binary classifier per class. Default: "multiclass". |
required | |
decision_threshold
|
Probability threshold(s) for positive prediction. A float applies to all classes; a dict maps class -> threshold. None uses argmax. Default: None. |
required | |
default_class
|
Class label assigned when no class exceeds the decision threshold (required). |
required | |
class_weight
|
If "balanced", adjust sample weights inversely proportional to class frequency. Default: "balanced". |
required | |
use_smote
|
If True, apply SMOTE oversampling to the training set. Default: False. |
required | |
undersample_ratio
|
If set, undersample majority classes to this ratio relative to the minority class before SMOTE. Default: None. |
required | |
n_estimators
|
Number of boosting rounds. Default: 100. |
required | |
max_depth
|
Maximum tree depth. Default: 6. |
required | |
learning_rate
|
Boosting learning rate. Default: 0.1. |
required | |
subsample
|
Fraction of training samples used per tree. Default: 0.8. |
required | |
colsample_bytree
|
Fraction of features used per tree. Default: 0.8. |
required | |
random_state
|
Random seed for reproducibility. Default: 42. |
required |
XgboostModelArtifact ¶
Bases: JoblibArtifact[XgboostModelBundle]
Fitted XGBoost model bundle (xgboost_model.joblib).