Command Line

Overview

This page focuses on the core CLI workflow for everyday use, with reference details at the end.

Quickstart

talu get downloads a model from HuggingFace and stores it locally. The weights stay in their original safetensors format, alongside the tokenizer and chat template — nothing is converted at this stage. Once cached, the model is available offline.

$ talu get LiquidAI/LFM2-350M

talu ask sends a prompt to a model and streams the response to the terminal. The model's chat template is detected and applied automatically, so the prompt is wrapped in the correct format for that architecture.

$ talu ask -m LiquidAI/LFM2-350M "What is the capital of France?"

The -S flag sets a system prompt that controls the model's behavior and tone. When omitted, the default system prompt is "You are a helpful assistant."

$ talu ask -m LiquidAI/LFM2-350M -S "You are a helpful math tutor." "What is 2+2?"

Working with models

talu ls shows all cached models in a table with columns for SIZE, DATE, TYPE (architecture), QUANT (quantization format), and MODEL (the HuggingFace identifier). Passing an organization name filters the list to that org; passing a full model name shows the individual weight files.

$ talu ls

When talu get is called without a model name, it opens an interactive browser. This lets you search HuggingFace by keyword, see parameter counts and architecture details, and then select a model to download — useful when you're exploring what's available rather than fetching a specific model.

$ talu get

talu rm removes a model from the local cache and frees the associated disk space.

$ talu rm LiquidAI/LFM2-350M

Profiles

A profile stores a default model, conversation history, and other settings in ~/.talu/, persisted across terminal sessions. By default, talu uses a profile named default. The active profile can be changed with the TALU_PROFILE environment variable, which makes it straightforward to maintain separate configurations for different projects or use cases.

talu set assigns a default model to the current profile, which means -m no longer needs to be passed on every command.

$ talu set LiquidAI/LFM2-350M

Once a default is set, every command that needs a model uses it automatically.

$ talu ask "What is the capital of France?"

Piping also works with a default model. Text from stdin is treated as the prompt for ask.

$ echo "Summarize this" | talu

The -m flag overrides the default for a single command without changing the profile. talu set show displays the current profile configuration, and talu set without arguments opens an interactive picker that lists all cached models.

Conversations

By default, each call to talu ask creates a new conversation. Conversations are persisted automatically — the full transcript is stored, so it can be reviewed, continued with follow-up prompts, or resumed at a later time.

talu ask without a prompt lists all stored conversations, showing their session IDs and first messages.

$ talu ask

Passing a session ID with --session displays the full transcript of that conversation.

$ talu ask --session <session_id>

Adding a prompt along with --session continues the conversation. The model receives the full history as context, so it remembers earlier exchanges.

$ talu ask --session <session_id> "Tell me more about that"

The --delete flag removes a conversation and its transcript permanently.

$ talu ask --session <session_id> --delete

--no-bucket creates an ephemeral conversation that is not persisted. This is useful for one-off questions where saving the transcript is unnecessary.

$ talu ask --no-bucket "Quick check — what year was Python released?"

Quantization

Models from HuggingFace are typically in full-precision (F16) safetensors format. Quantization compresses the weights to a smaller representation, reducing memory usage and increasing inference speed with minimal quality loss.

talu convert quantizes a model using Grouped Affine (GAF) quantization. The model is positional (talu convert <model>). When --scheme is not specified, the default scheme is gaf4_64.

$ talu convert LiquidAI/LFM2-350M

A different scheme can be selected with --scheme.

$ talu convert LiquidAI/LFM2-350M --scheme gaf4_32

Smaller group sizes produce higher accuracy at the cost of slightly larger files. talu convert --scheme help prints this table in the terminal.

Scheme	Bits	Group size	Notes
gaf4_32	4	32	Highest accuracy
gaf4_64	4	64	Default — balanced quality and size
gaf4_128	4	128	Smallest file size
gaf8_32	8	32	Near-lossless
gaf8_64	8	64
gaf8_128	8	128

Converted models are saved under $TALU_HOME/models/ with the scheme as a suffix (e.g. Qwen/Qwen3-0.6B-GAF4). The --output flag writes to a different directory instead. The original model is never modified — both versions remain available and talu set can switch between them.

Tokenization

talu tokenize shows how a model splits text into tokens. Each token is displayed alongside its numeric ID, which is useful for understanding context window usage or debugging prompt formatting.

$ talu tokenize LiquidAI/LFM2-350M "Hello world"

Scripting and CI

Most commands work in scripts and CI pipelines. The CLI separates human-readable formatting from machine-parseable output: a set of flags control what goes to stdout, while logs always go to stderr. Interactive modes still require a TTY.

Stdout contracts

Flag	Stdout behavior
--model-uri	Prints only the model URI
--session-id	Prints only the session ID
-q	Prints response text only, no formatting
--format json	Prints OpenResponses JSON
-s	Prints nothing (exit code only)

Model URI flows

The --model-uri flag changes stdout to print only the model URI, making it easy to capture in a shell variable and pass to subsequent commands. This is the typical pattern for chaining download, convert, and set operations in a script.

$ MODEL_URI=$(talu get --model-uri LiquidAI/LFM2-350M)

The same flag works with talu convert, outputting the URI of the newly quantized model.

$ CONVERTED_URI=$(talu convert --model-uri "$MODEL_URI")

And with talu set, it confirms the URI that was actually assigned to the profile.

$ SET_URI=$(talu set --model-uri "$CONVERTED_URI")

Session flows

The --session-id flag prints only the session ID to stdout, allowing scripts to capture it and reference the conversation in later steps.

$ SESSION_ID=$(talu ask --session-id "Start incident analysis")

That session ID can then be passed back with --session to continue the same conversation, building up context across multiple script steps.

$ talu ask --session "$SESSION_ID" "List likely causes"

Output controls

Response text only.

$ ANSWER=$(talu ask -q "What is 2+2?")

OpenResponses JSON.

$ talu ask --format json "Summarize this log block"

Write to file.

$ talu ask --output /tmp/answer.txt "Write one sentence"

Silent (exit code only).

$ talu ask -s "Run background check"

Reference

Environment variables

Environment variables provide an alternative to flags, useful in CI or when setting parameters for an entire shell session. Flags take precedence when both are specified.

Variable	Description	Example
MODEL_URI	Per-command model override	`LiquidAI/LFM2-350M`
SESSION_ID	Target a session without `--session`	UUID
TOKENS	Max output tokens (default: 1024)	`100`
TEMPERATURE	Sampling temperature	`0.7`
TOP_K	Top-k sampling limit	`40`
TOP_P	Nucleus sampling cutoff (0.0 to 1.0)	`0.95`
SEED	Deterministic seed	`42`
TALU_PROFILE	Active profile name	`default`
TALU_BUCKET	Storage bucket path override	`/path/to/db`
HF_TOKEN	Auth token for private HuggingFace models	`hf_xxx...`
HF_ENDPOINT	HuggingFace endpoint override	`https://huggingface.co`

Global flags

These apply to all commands. Logs are always written to stderr.

Flag	Description
-v	INFO-level logs
-vv	DEBUG-level logs
-vvv	TRACE-level logs
--log-format	`human` (default for TTY) or `json` (default for pipes)

Common commands

This table focuses on everyday CLI workflows, plus a few advanced commands for completeness.

Command	Purpose	Mode
talu get	Browse HuggingFace interactively	Interactive
talu get <model>	Download model to cache	Both
talu get --model-uri <model>	Download and print URI only	Script
talu ls	List models	Both
talu ls <model>	List files in a model	Both
talu rm <model>	Remove model from cache	Both
talu convert <model>	Quantize model (default: gaf4_64)	Both
talu convert <model> --scheme <s>	Quantize with specific scheme	Both
talu convert --model-uri <model>	Quantize and print URI only	Script
talu set	Set default model (interactive picker)	Interactive
talu set <model>	Set default model	Both
talu set show	Show profile configuration	Both
talu ask "prompt"	Ask model (creates new conversation)	Both
talu ask	List conversations	Both
talu ask --session <id>	View conversation transcript	Both
talu ask --session <id> "prompt"	Continue conversation	Both
talu ask --session <id> --delete	Delete conversation	Both
talu ask --session-id ["prompt"]	Create conversation and print session ID	Script
talu ask -q "prompt"	Response text only	Script
talu ask -s "prompt"	Silent (exit code only)	Script
talu ask --format json "prompt"	OpenResponses JSON output	Script
talu ask --output <path> "prompt"	Write response to file	Script
talu ask --no-bucket "prompt"	Ephemeral query (not persisted)	Both
talu tokenize <model> "text"	Show token breakdown	Both
talu describe <model>	Inspect model architecture/plan	Both
talu shell	Interactive tool-calling shell	Interactive
talu xray <model>	Kernel profiling for debugging	Both
talu serve	Start HTTP server	Both