Command Line
Overview
This page focuses on the core CLI workflow for everyday use, with reference details at the end.
Quickstart
talu get downloads a model from HuggingFace and stores it locally. The weights stay in their original safetensors format, alongside the tokenizer and chat template — nothing is converted at this stage. Once cached, the model is available offline.
talu ask sends a prompt to a model and streams the response to the terminal. The model's chat template is detected and applied automatically, so the prompt is wrapped in the correct format for that architecture.
The -S flag sets a system prompt that controls the model's behavior and tone. When omitted, the default system prompt is "You are a helpful assistant."
Working with models
talu ls shows all cached models in a table with columns for SIZE, DATE, TYPE (architecture), QUANT (quantization format), and MODEL (the HuggingFace identifier). Passing an organization name filters the list to that org; passing a full model name shows the individual weight files.
When talu get is called without a model name, it opens an interactive browser. This lets you search HuggingFace by keyword, see parameter counts and architecture details, and then select a model to download — useful when you're exploring what's available rather than fetching a specific model.
talu rm removes a model from the local cache and frees the associated disk space.
Profiles
A profile stores a default model, conversation history, and other settings in ~/.talu/, persisted across terminal sessions. By default, talu uses a profile named default. The active profile can be changed with the TALU_PROFILE environment variable, which makes it straightforward to maintain separate configurations for different projects or use cases.
talu set assigns a default model to the current profile, which means -m no longer needs to be passed on every command.
Once a default is set, every command that needs a model uses it automatically.
Piping also works with a default model. Text from stdin is treated as the prompt for ask.
The -m flag overrides the default for a single command without changing the profile. talu set show displays the current profile configuration, and talu set without arguments opens an interactive picker that lists all cached models.
Conversations
By default, each call to talu ask creates a new conversation. Conversations are persisted automatically — the full transcript is stored, so it can be reviewed, continued with follow-up prompts, or resumed at a later time.
talu ask without a prompt lists all stored conversations, showing their session IDs and first messages.
Passing a session ID with --session displays the full transcript of that conversation.
Adding a prompt along with --session continues the conversation. The model receives the full history as context, so it remembers earlier exchanges.
The --delete flag removes a conversation and its transcript permanently.
--no-bucket creates an ephemeral conversation that is not persisted. This is useful for one-off questions where saving the transcript is unnecessary.
Quantization
Models from HuggingFace are typically in full-precision (F16) safetensors format. Quantization compresses the weights to a smaller representation, reducing memory usage and increasing inference speed with minimal quality loss.
talu convert quantizes a model using Grouped Affine (GAF) quantization. The model is positional (talu convert <model>). When --scheme is not specified, the default scheme is gaf4_64.
A different scheme can be selected with --scheme.
Smaller group sizes produce higher accuracy at the cost of slightly larger files. talu convert --scheme help prints this table in the terminal.
| Scheme | Bits | Group size | Notes |
|---|---|---|---|
| gaf4_32 | 4 | 32 | Highest accuracy |
| gaf4_64 | 4 | 64 | Default — balanced quality and size |
| gaf4_128 | 4 | 128 | Smallest file size |
| gaf8_32 | 8 | 32 | Near-lossless |
| gaf8_64 | 8 | 64 | |
| gaf8_128 | 8 | 128 |
Converted models are saved under $TALU_HOME/models/ with the scheme as a suffix (e.g. Qwen/Qwen3-0.6B-GAF4). The --output flag writes to a different directory instead. The original model is never modified — both versions remain available and talu set can switch between them.
Tokenization
talu tokenize shows how a model splits text into tokens. Each token is displayed alongside its numeric ID, which is useful for understanding context window usage or debugging prompt formatting.
Scripting and CI
Most commands work in scripts and CI pipelines. The CLI separates human-readable formatting from machine-parseable output: a set of flags control what goes to stdout, while logs always go to stderr. Interactive modes still require a TTY.
Stdout contracts
| Flag | Stdout behavior |
|---|---|
| --model-uri | Prints only the model URI |
| --session-id | Prints only the session ID |
| -q | Prints response text only, no formatting |
| --format json | Prints OpenResponses JSON |
| -s | Prints nothing (exit code only) |
Model URI flows
The --model-uri flag changes stdout to print only the model URI, making it easy to capture in a shell variable and pass to subsequent commands. This is the typical pattern for chaining download, convert, and set operations in a script.
The same flag works with talu convert, outputting the URI of the newly quantized model.
And with talu set, it confirms the URI that was actually assigned to the profile.
Session flows
The --session-id flag prints only the session ID to stdout, allowing scripts to capture it and reference the conversation in later steps.
That session ID can then be passed back with --session to continue the same conversation, building up context across multiple script steps.
Output controls
Response text only.
OpenResponses JSON.
Write to file.
Silent (exit code only).
Reference
Environment variables
Environment variables provide an alternative to flags, useful in CI or when setting parameters for an entire shell session. Flags take precedence when both are specified.
| Variable | Description | Example |
|---|---|---|
| MODEL_URI | Per-command model override | LiquidAI/LFM2-350M |
| SESSION_ID | Target a session without --session | UUID |
| TOKENS | Max output tokens (default: 1024) | 100 |
| TEMPERATURE | Sampling temperature | 0.7 |
| TOP_K | Top-k sampling limit | 40 |
| TOP_P | Nucleus sampling cutoff (0.0 to 1.0) | 0.95 |
| SEED | Deterministic seed | 42 |
| TALU_PROFILE | Active profile name | default |
| TALU_BUCKET | Storage bucket path override | /path/to/db |
| HF_TOKEN | Auth token for private HuggingFace models | hf_xxx... |
| HF_ENDPOINT | HuggingFace endpoint override | https://huggingface.co |
Global flags
These apply to all commands. Logs are always written to stderr.
| Flag | Description |
|---|---|
| -v | INFO-level logs |
| -vv | DEBUG-level logs |
| -vvv | TRACE-level logs |
| --log-format | human (default for TTY) or json (default for pipes) |
Common commands
This table focuses on everyday CLI workflows, plus a few advanced commands for completeness.
| Command | Purpose | Mode |
|---|---|---|
| talu get | Browse HuggingFace interactively | Interactive |
| talu get <model> | Download model to cache | Both |
| talu get --model-uri <model> | Download and print URI only | Script |
| talu ls | List models | Both |
| talu ls <model> | List files in a model | Both |
| talu rm <model> | Remove model from cache | Both |
| talu convert <model> | Quantize model (default: gaf4_64) | Both |
| talu convert <model> --scheme <s> | Quantize with specific scheme | Both |
| talu convert --model-uri <model> | Quantize and print URI only | Script |
| talu set | Set default model (interactive picker) | Interactive |
| talu set <model> | Set default model | Both |
| talu set show | Show profile configuration | Both |
| talu ask "prompt" | Ask model (creates new conversation) | Both |
| talu ask | List conversations | Both |
| talu ask --session <id> | View conversation transcript | Both |
| talu ask --session <id> "prompt" | Continue conversation | Both |
| talu ask --session <id> --delete | Delete conversation | Both |
| talu ask --session-id ["prompt"] | Create conversation and print session ID | Script |
| talu ask -q "prompt" | Response text only | Script |
| talu ask -s "prompt" | Silent (exit code only) | Script |
| talu ask --format json "prompt" | OpenResponses JSON output | Script |
| talu ask --output <path> "prompt" | Write response to file | Script |
| talu ask --no-bucket "prompt" | Ephemeral query (not persisted) | Both |
| talu tokenize <model> "text" | Show token breakdown | Both |
| talu describe <model> | Inspect model architecture/plan | Both |
| talu shell | Interactive tool-calling shell | Interactive |
| talu xray <model> | Kernel profiling for debugging | Both |
| talu serve | Start HTTP server | Both |