Skip to content

Capabilities & lifecycle

FFAI models declare what they can do via Capability, the user picks what to enable at load time via LoadOptions, and the model exposes its load progress + hot capability changes via an AsyncStream<ModelLifecycleEvent>.

The infrastructure is in place from Phase 2; the first multi-modal model that exercises it end-to-end (vision encoder, hot enable(.visionIn), etc.) lands in Phase 6.

public enum Capability: String, Sendable, Hashable, CaseIterable, Codable {
case textIn
case textOut
case visionIn
case audioIn
case audioOut
case toolCalling
}
CapabilityTodayWhen
.textIn / .textOut✅ Always on for LLMs.Phase 2
.visionInDeclared on family files but no family supports it yet.Phase 6 (Qwen 2.5/3.5-VL)
.audioIn / .audioOutNot declared by any family.Phase 8+
.toolCallingNot declared by any family.Phase 8+

Convenience sets:

Capability.textOnly // [.textIn, .textOut]
Capability.textWithTools // [.textIn, .textOut, .toolCalling]
FamilyavailableCapabilities
Llama.LlamaDense[.textIn, .textOut]
Qwen3.Qwen3Dense[.textIn, .textOut]

When a family adds a capability (e.g. Qwen35VL adds .visionIn), the family file declares it and the loader allocates the corresponding subnet only if the user opts in.

let model = try await Model.load(
"unsloth/Llama-3.2-1B",
options: LoadOptions(
capabilities: [.textIn, .textOut],
kvCache: .raw,
dispatchMode: .eager,
prewarm: true,
lazyCapabilities: true,
revision: "main"
)
)
FieldDefaultNotes
capabilitiesCapability.textOnlyWhat to load. textIn + textOut are always implicitly on. Disabled modalities skip weight allocation.
kvCache.rawCache compression scheme — see kv-cache.md.
dispatchMode.eagerStandard MTLComputeCommandEncoder per kernel. .argumentBuffers / .icb deferred.
prewarmtrueRun one no-op forward to compile PSOs before the first user-visible decode.
lazyCapabilitiestrueAllow runtime enable(_:) / disable(_:) after load. Phase 6 wires this end-to-end.
revision"main"HF branch / tag / commit.
let model = try await Model.load("mlx-community/Qwen3-4B-4bit")
print(model.availableCapabilities) // what the family supports
print(model.enabledCapabilities) // what you opted into
print(model.config.modelType) // "qwen3"
print(model.modelDirectory) // resolved local snapshot

If you ask for a capability the family doesn’t expose, the loader throws ModelError.capabilityNotAvailable(.visionIn).

ModelLifecycleState:
idle → downloading(Progress) → loading(LoadProgress)
→ loaded → ready
(or failed(Error) at any stage)

Model.events is an AsyncStream<ModelLifecycleEvent> that emits each transition. The stream is multi-consumer-safe and finishes when the Model is deinitialized.

let model = try await Model.load("unsloth/Llama-3.2-1B")
Task {
for await event in model.events {
switch event.state {
case .downloading(let progress): print("downloading \(progress.fractionCompleted)")
case .loading(let p): print("loading \(p)")
case .loaded: print("weights resident")
case .ready: print("ready to generate")
case .failed(let err): print("failed: \(err)")
default: break
}
}
}
print(model.currentState) // sync snapshot — typically .ready by the time load() returns

currentState is a thread-safe snapshot of the latest emitted event. The stream is the source of truth for fine-grained progress.

The API surface is in place from Phase 2; the implementation lands alongside the first VL family:

// Phase 6:
try await model.enable(.visionIn) // mmaps vision weights, builds encoder, prewarms
// ... use the model with images ...
try await model.disable(.visionIn) // releases MTLBuffers, frees GPU residency

Each call emits per-capability lifecycle events through the same events stream. If lazyCapabilities = false was passed at load time, both calls throw — capabilities are then frozen at the load-time set.

  • Quick start — the basic Model.load + generate flow.
  • Models — what each family declares for availableCapabilities.
  • Architecture — where capability-driven loading sits in the load sequence.