Command Line

Overview

This page focuses on the core CLI workflow for everyday use, with reference details at the end.

Quickstart

talu get downloads a model from HuggingFace and stores it locally. The weights stay in their original safetensors format, alongside the tokenizer and chat template — nothing is converted at this stage. Once cached, the model is available offline.

$ talu get LiquidAI/LFM2-350M

talu ask sends a prompt to a model and streams the response to the terminal. The model's chat template is detected and applied automatically, so the prompt is wrapped in the correct format for that architecture.

$ talu ask -m LiquidAI/LFM2-350M "What is the capital of France?"

The -S flag sets a system prompt that controls the model's behavior and tone. When omitted, the default system prompt is "You are a helpful assistant."

$ talu ask -m LiquidAI/LFM2-350M -S "You are a helpful math tutor." "What is 2+2?"

Working with models

talu ls shows all cached models in a table with columns for SIZE, DATE, TYPE (architecture), QUANT (quantization format), and MODEL (the HuggingFace identifier). Passing an organization name filters the list to that org; passing a full model name shows the individual weight files.

$ talu ls

When talu get is called without a model name, it opens an interactive browser. This lets you search HuggingFace by keyword, see parameter counts and architecture details, and then select a model to download — useful when you're exploring what's available rather than fetching a specific model.

$ talu get

talu rm removes a model from the local cache and frees the associated disk space.

$ talu rm LiquidAI/LFM2-350M

Profiles

A profile stores a default model, conversation history, and other settings in ~/.talu/, persisted across terminal sessions. By default, talu uses a profile named default. The active profile can be changed with the TALU_PROFILE environment variable, which makes it straightforward to maintain separate configurations for different projects or use cases.

talu set assigns a default model to the current profile, which means -m no longer needs to be passed on every command.

$ talu set LiquidAI/LFM2-350M

Once a default is set, every command that needs a model uses it automatically.

$ talu ask "What is the capital of France?"

Piping also works with a default model. Text from stdin is treated as the prompt for ask.

$ echo "Summarize this" | talu

The -m flag overrides the default for a single command without changing the profile. talu set show displays the current profile configuration, and talu set without arguments opens an interactive picker that lists all cached models.

Conversations

By default, each call to talu ask creates a new conversation. Conversations are persisted automatically — the full transcript is stored, so it can be reviewed, continued with follow-up prompts, or resumed at a later time.

talu ask without a prompt lists all stored conversations, showing their session IDs and first messages.

$ talu ask

Passing a session ID with --session displays the full transcript of that conversation.

$ talu ask --session <session_id>

Adding a prompt along with --session continues the conversation. The model receives the full history as context, so it remembers earlier exchanges.

$ talu ask --session <session_id> "Tell me more about that"

The --delete flag removes a conversation and its transcript permanently.

$ talu ask --session <session_id> --delete

--no-bucket creates an ephemeral conversation that is not persisted. This is useful for one-off questions where saving the transcript is unnecessary.

$ talu ask --no-bucket "Quick check — what year was Python released?"

Quantization

Models from HuggingFace are typically in full-precision (F16) safetensors format. Quantization compresses the weights to a smaller representation, reducing memory usage and increasing inference speed with minimal quality loss.

talu convert quantizes a model using Grouped Affine (GAF) quantization. The model is positional (talu convert <model>). When --scheme is not specified, the default scheme is gaf4_64.

$ talu convert LiquidAI/LFM2-350M

A different scheme can be selected with --scheme.

$ talu convert LiquidAI/LFM2-350M --scheme gaf4_32

Smaller group sizes produce higher accuracy at the cost of slightly larger files. talu convert --scheme help prints this table in the terminal.

SchemeBitsGroup sizeNotes
gaf4_32432Highest accuracy
gaf4_64464Default — balanced quality and size
gaf4_1284128Smallest file size
gaf8_32832Near-lossless
gaf8_64864
gaf8_1288128

Converted models are saved under $TALU_HOME/models/ with the scheme as a suffix (e.g. Qwen/Qwen3-0.6B-GAF4). The --output flag writes to a different directory instead. The original model is never modified — both versions remain available and talu set can switch between them.

Tokenization

talu tokenize shows how a model splits text into tokens. Each token is displayed alongside its numeric ID, which is useful for understanding context window usage or debugging prompt formatting.

$ talu tokenize LiquidAI/LFM2-350M "Hello world"

Scripting and CI

Most commands work in scripts and CI pipelines. The CLI separates human-readable formatting from machine-parseable output: a set of flags control what goes to stdout, while logs always go to stderr. Interactive modes still require a TTY.

Stdout contracts

FlagStdout behavior
--model-uriPrints only the model URI
--session-idPrints only the session ID
-qPrints response text only, no formatting
--format jsonPrints OpenResponses JSON
-sPrints nothing (exit code only)

Model URI flows

The --model-uri flag changes stdout to print only the model URI, making it easy to capture in a shell variable and pass to subsequent commands. This is the typical pattern for chaining download, convert, and set operations in a script.

$ MODEL_URI=$(talu get --model-uri LiquidAI/LFM2-350M)

The same flag works with talu convert, outputting the URI of the newly quantized model.

$ CONVERTED_URI=$(talu convert --model-uri "$MODEL_URI")

And with talu set, it confirms the URI that was actually assigned to the profile.

$ SET_URI=$(talu set --model-uri "$CONVERTED_URI")

Session flows

The --session-id flag prints only the session ID to stdout, allowing scripts to capture it and reference the conversation in later steps.

$ SESSION_ID=$(talu ask --session-id "Start incident analysis")

That session ID can then be passed back with --session to continue the same conversation, building up context across multiple script steps.

$ talu ask --session "$SESSION_ID" "List likely causes"

Output controls

Response text only.

$ ANSWER=$(talu ask -q "What is 2+2?")

OpenResponses JSON.

$ talu ask --format json "Summarize this log block"

Write to file.

$ talu ask --output /tmp/answer.txt "Write one sentence"

Silent (exit code only).

$ talu ask -s "Run background check"

Reference

Environment variables

Environment variables provide an alternative to flags, useful in CI or when setting parameters for an entire shell session. Flags take precedence when both are specified.

VariableDescriptionExample
MODEL_URIPer-command model overrideLiquidAI/LFM2-350M
SESSION_IDTarget a session without --sessionUUID
TOKENSMax output tokens (default: 1024)100
TEMPERATURESampling temperature0.7
TOP_KTop-k sampling limit40
TOP_PNucleus sampling cutoff (0.0 to 1.0)0.95
SEEDDeterministic seed42
TALU_PROFILEActive profile namedefault
TALU_BUCKETStorage bucket path override/path/to/db
HF_TOKENAuth token for private HuggingFace modelshf_xxx...
HF_ENDPOINTHuggingFace endpoint overridehttps://huggingface.co

Global flags

These apply to all commands. Logs are always written to stderr.

FlagDescription
-vINFO-level logs
-vvDEBUG-level logs
-vvvTRACE-level logs
--log-formathuman (default for TTY) or json (default for pipes)

Common commands

This table focuses on everyday CLI workflows, plus a few advanced commands for completeness.

CommandPurposeMode
talu getBrowse HuggingFace interactivelyInteractive
talu get <model>Download model to cacheBoth
talu get --model-uri <model>Download and print URI onlyScript
talu lsList modelsBoth
talu ls <model>List files in a modelBoth
talu rm <model>Remove model from cacheBoth
talu convert <model>Quantize model (default: gaf4_64)Both
talu convert <model> --scheme <s>Quantize with specific schemeBoth
talu convert --model-uri <model>Quantize and print URI onlyScript
talu setSet default model (interactive picker)Interactive
talu set <model>Set default modelBoth
talu set showShow profile configurationBoth
talu ask "prompt"Ask model (creates new conversation)Both
talu askList conversationsBoth
talu ask --session <id>View conversation transcriptBoth
talu ask --session <id> "prompt"Continue conversationBoth
talu ask --session <id> --deleteDelete conversationBoth
talu ask --session-id ["prompt"]Create conversation and print session IDScript
talu ask -q "prompt"Response text onlyScript
talu ask -s "prompt"Silent (exit code only)Script
talu ask --format json "prompt"OpenResponses JSON outputScript
talu ask --output <path> "prompt"Write response to fileScript
talu ask --no-bucket "prompt"Ephemeral query (not persisted)Both
talu tokenize <model> "text"Show token breakdownBoth
talu describe <model>Inspect model architecture/planBoth
talu shellInteractive tool-calling shellInteractive
talu xray <model>Kernel profiling for debuggingBoth
talu serveStart HTTP serverBoth