Capabilities & lifecycle
FFAI models declare what they can do via Capability, the user
picks what to enable at load time via LoadOptions, and the model
exposes its load progress + hot capability changes via an
AsyncStream<ModelLifecycleEvent>.
The infrastructure is in place from Phase 2; the first multi-modal
model that exercises it end-to-end (vision encoder, hot
enable(.visionIn), etc.) lands in Phase 6.
The Capability enum
Section titled “The Capability enum”public enum Capability: String, Sendable, Hashable, CaseIterable, Codable { case textIn case textOut case visionIn case audioIn case audioOut case toolCalling}| Capability | Today | When |
|---|---|---|
.textIn / .textOut | ✅ Always on for LLMs. | Phase 2 |
.visionIn | Declared on family files but no family supports it yet. | Phase 6 (Qwen 2.5/3.5-VL) |
.audioIn / .audioOut | Not declared by any family. | Phase 8+ |
.toolCalling | Not declared by any family. | Phase 8+ |
Convenience sets:
Capability.textOnly // [.textIn, .textOut]Capability.textWithTools // [.textIn, .textOut, .toolCalling]What each family declares
Section titled “What each family declares”| Family | availableCapabilities |
|---|---|
Llama.LlamaDense | [.textIn, .textOut] |
Qwen3.Qwen3Dense | [.textIn, .textOut] |
When a family adds a capability (e.g. Qwen35VL adds .visionIn),
the family file declares it and the loader allocates the
corresponding subnet only if the user opts in.
LoadOptions
Section titled “LoadOptions”let model = try await Model.load( "unsloth/Llama-3.2-1B", options: LoadOptions( capabilities: [.textIn, .textOut], kvCache: .raw, dispatchMode: .eager, prewarm: true, lazyCapabilities: true, revision: "main" ))| Field | Default | Notes |
|---|---|---|
capabilities | Capability.textOnly | What to load. textIn + textOut are always implicitly on. Disabled modalities skip weight allocation. |
kvCache | .raw | Cache compression scheme — see kv-cache.md. |
dispatchMode | .eager | Standard MTLComputeCommandEncoder per kernel. .argumentBuffers / .icb deferred. |
prewarm | true | Run one no-op forward to compile PSOs before the first user-visible decode. |
lazyCapabilities | true | Allow runtime enable(_:) / disable(_:) after load. Phase 6 wires this end-to-end. |
revision | "main" | HF branch / tag / commit. |
Inspecting a loaded model
Section titled “Inspecting a loaded model”let model = try await Model.load("mlx-community/Qwen3-4B-4bit")
print(model.availableCapabilities) // what the family supportsprint(model.enabledCapabilities) // what you opted intoprint(model.config.modelType) // "qwen3"print(model.modelDirectory) // resolved local snapshotIf you ask for a capability the family doesn’t expose, the loader
throws ModelError.capabilityNotAvailable(.visionIn).
Lifecycle states
Section titled “Lifecycle states”ModelLifecycleState: idle → downloading(Progress) → loading(LoadProgress) → loaded → ready (or failed(Error) at any stage)Model.events is an AsyncStream<ModelLifecycleEvent> that emits
each transition. The stream is multi-consumer-safe and finishes when
the Model is deinitialized.
let model = try await Model.load("unsloth/Llama-3.2-1B")
Task { for await event in model.events { switch event.state { case .downloading(let progress): print("downloading \(progress.fractionCompleted)") case .loading(let p): print("loading \(p)") case .loaded: print("weights resident") case .ready: print("ready to generate") case .failed(let err): print("failed: \(err)") default: break } }}
print(model.currentState) // sync snapshot — typically .ready by the time load() returnscurrentState is a thread-safe snapshot of the latest emitted event.
The stream is the source of truth for fine-grained progress.
Hot capability changes (Phase 6)
Section titled “Hot capability changes (Phase 6)”The API surface is in place from Phase 2; the implementation lands alongside the first VL family:
// Phase 6:try await model.enable(.visionIn) // mmaps vision weights, builds encoder, prewarms// ... use the model with images ...try await model.disable(.visionIn) // releases MTLBuffers, frees GPU residencyEach call emits per-capability lifecycle events through the same
events stream. If lazyCapabilities = false was passed at load
time, both calls throw — capabilities are then frozen at the load-time
set.
See also
Section titled “See also”- Quick start — the basic
Model.load+generateflow. - Models — what each family declares for
availableCapabilities. - Architecture — where capability-driven loading sits in the load sequence.