Converter

Model Conversion API

Classes

Class	Description
VerificationResult	Result of model verification.
ModelInfo	Model architecture and configuration information.

Functions

Function	Description
convert()	Convert a model to an optimized format for efficient infe...
verify()	Verify a model can load and generate text.
list_schemes()	List available quantization schemes with descriptions and...
describe()	Get model architecture and configuration information.

class VerificationResult

`VerificationResult( self, success: bool, model_path: str, output: str = '', tokens_generated: int = 0, error: str | None = None )`

Result of model verification.

Attributes

success : bool: Whether verification passed.
model_path : str: Path to the verified model.
output : str: Generated output from the test prompt.
tokens_generated : int: Number of tokens successfully generated.
error : str | None: Error message if verification failed.

Example

>>> result = VerificationResult(success=True, model_path="/path/to/model")
>>> bool(result)
True
>>> result.success
True

class ModelInfo

ModelInfo( self, vocab_size: int, hidden_size: int, num_layers: int, num_heads: int, num_kv_heads: int, intermediate_size: int, max_seq_len: int, head_dim: int, rope_theta: float, norm_eps: float, quant_bits: int, quant_group_size: int, model_type: str | None, architecture: str | None, tie_word_embeddings: bool, use_gelu: bool, num_experts: int, experts_per_token: int )

Model architecture and configuration information.

This class provides a read-only view of model metadata extracted from config.json without loading the model weights. Useful for:

Pre-flight checks before conversion
Comparing model architectures
Determining quantization status
Checking MoE configuration

Attributes

vocab_size : int: Vocabulary size
hidden_size : int: Hidden dimension (d_model)
num_layers : int: Number of transformer layers
num_heads : int: Number of attention heads
num_kv_heads : int: Number of key-value heads (for GQA)
intermediate_size : int: FFN intermediate dimension
max_seq_len : int: Maximum sequence length
head_dim : int: Dimension per attention head
rope_theta : float: RoPE base frequency
norm_eps : float: Layer norm epsilon
quant_bits : int: Quantization bits (4, 8, or 16)
quant_group_size : int: Quantization group size
model_type : str or None: Model type string (e.g., "qwen3", "llama")
architecture : str or None: Architecture class name
tie_word_embeddings : bool: Whether embeddings are tied
use_gelu : bool: Whether GELU activation is used
num_experts : int: Number of MoE experts (0 if not MoE)
experts_per_token : int: Experts used per token

Example

>>> from talu.converter import describe
>>> info = describe("Qwen/Qwen3-0.6B")  # doctest: +SKIP
>>> info.num_layers  # doctest: +SKIP
28
>>> info.is_quantized  # doctest: +SKIP
False

Quick Reference

Properties

Name	Type
`is_moe`	`bool`
`is_quantized`	`bool`

Properties

`is_moe: bool`

Whether the model uses Mixture of Experts.

`is_quantized: bool`

Whether the model is quantized (not fp16).

def convert

`talu.converter.convert( model: str, scheme: str | None = None, platform: str | None = None, quant: str | None = None, output_dir: str | None = None, destination: str | None = None, force: bool = False, offline: bool = False, verify: bool = False, overrides: dict[str, str] | None = None, max_shard_size: int | str | None = None, dry_run: bool = False ) → str | dict`

Convert a model to an optimized format for efficient inference.

Parameters

model : str

Model to convert. Can be:

Model ID: "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-8B"
Local path: "./my-model" or "/path/to/model"

scheme : str, optional

Explicit quantization scheme. Each scheme encodes all necessary parameters (method, bits, group_size). You can use specific keys or aliases.

If not set, uses platform and quant for automatic resolution. When neither scheme nor platform/quant are specified, defaults to gaf4_64.

User-Friendly Aliases (Recommended):

"4bit" / "q4" / "int4": Maps to gaf4_64 (balanced 4-bit)
"8bit" / "q8" / "int8": Maps to gaf8_64 (near-lossless)
"mlx" / "mlx4" / "gaf4": Maps to gaf4_64 (Apple Silicon optimized)
"mlx8" / "gaf8": Maps to gaf8_64 (8-bit MLX)
"fp8": Maps to fp8_e4m3 (H100/vLLM inference)

Grouped Affine (MLX compatible): gaf4_32, gaf4_64, gaf4_128, gaf8_32, gaf8_64, gaf8_128

Hardware float (not yet implemented): fp8_e4m3, fp8_e5m2, mxfp4, nvfp4

platform : str, optional

Target platform for scheme resolution ("cpu", "metal", "cuda"). When set, resolves to the appropriate scheme for that platform and quant level. Ignored if scheme is explicitly set.

quant : str, optional

Quantization level ("4bit", "8bit"). Used with platform. Defaults to "4bit" if platform is set but quant is not.

output_dir : str, optional

Parent directory for auto-named output. Defaults to ~/.cache/talu/models (or $TALU_HOME/models). Ignored if destination is set.

destination : str, optional

Explicit output path (overrides output_dir).

force : bool, default False

Overwrite existing output directory.

offline : bool, default False

If True, do not use network access when resolving model URIs.

verify : bool, default False

After conversion, verify the model by loading it and generating a few tokens. Catches corruption, missing files, and basic inference failures early.

overrides : dict, optional

Reserved for future use. Not currently supported.

max_shard_size : int | str, optional

Maximum size per shard file. When set, splits large models into multiple SafeTensors files. Can be bytes (int) or human-readable string (e.g., "5GB", "500MB").

dry_run : bool, default False

If True, estimate conversion without writing files. Returns a dict with estimation results (total_params, estimated_size_bytes, shard_count, scheme, bits_per_param).

Returns

str | dict When dry_run=False: Absolute path to the converted model directory. When dry_run=True: Dict with estimation results.

Raises

ConvertError: If conversion fails (network error, unsupported format, etc.), or if verification fails when verify=True.
ValueError: If invalid scheme or override is provided.

Examples

>>> import talu
>>> path = talu.convert("Qwen/Qwen3-0.6B")  # Uses gaf4_64 by default

>>> path = talu.convert("Qwen/Qwen3-0.6B", scheme="gaf4_32")  # Higher quality

>>> # Platform-aware conversion
>>> path = talu.convert("Qwen/Qwen3-0.6B", platform="metal")  # → gaf4_64

>>> # Sharded output for large models
>>> path = talu.convert("meta-llama/Llama-3-70B", max_shard_size="5GB")

See Also -------- list_schemes : List all available quantization schemes. verify : Verify a converted model.

def verify

`talu.converter.verify( model_path: str, prompt: str | None = None, max_tokens: int = 5 ) → VerificationResult`

Verify a model can load and generate text.

Performs a quick sanity check by loading the model and generating a few tokens. This catches corruption, missing files, and basic inference failures early.

Parameters

model_path : str: Path to the model directory to verify.
prompt : str, optional: Custom prompt to use. Defaults to "The capital of France is".
max_tokens : int, default 5: Number of tokens to generate. Keep small for speed.

Returns

VerificationResult Result with success status, output, and any error message.

Examples

Basic verification:

>>> result = talu.verify("./models/qwen3-q4")
>>> if result:
...     print(f"OK: generated {result.tokens_generated} tokens")
... else:
...     print(f"FAILED: {result.error}")

With custom prompt:

>>> result = talu.verify(
...     "./models/qwen3-q4",
...     prompt="2 + 2 =",
...     max_tokens=3,
... )
>>> print(result.output)  # Should be " 4" or similar

def list_schemes

`talu.converter.list_schemes(include_unimplemented: bool = False, category: str | None = None) → dict[str, dict]`

List available quantization schemes with descriptions and aliases.

Returns detailed information about each scheme to help users make informed decisions about which scheme to use. Fetches live alias information from the core runtime (Zig) to ensure consistency.

Parameters

include_unimplemented : bool, default False: If True, include schemes that are not yet implemented (fp8_e4m3, fp8_e5m2, mxfp4, nvfp4).
category : str, optional: Filter by category: "gaf" or "hardware". If None, returns all categories.

Returns

dict Dictionary mapping scheme names to their metadata:

category: "gaf" or "hardware"
bits: Bit width
group_size: (gaf/hardware only) Group size
description: What this scheme does
quality: Relative quality (fair/good/better/high/near-lossless/lossless)
size: Approximate size for a 7B model
status: "stable" or "not implemented"
mlx_compatible: (gaf only) True if compatible with MLX
aliases: List of user-friendly aliases (e.g., ["4bit", "q4"])

Example

>>> import talu
>>> schemes = talu.list_schemes()
>>> "gaf4_64" in schemes
True
>>> "description" in schemes["gaf4_64"]
True
>>> "4bit" in schemes["gaf4_64"]["aliases"]
True

See Also -------- convert : Convert a model using one of these schemes.

def describe

`talu.converter.describe(model: str) → ModelInfo`

Get model architecture and configuration information.

Reads config.json without loading model weights. This is useful for pre-flight checks before conversion or understanding model structure.

Parameters

model : str: Path to model directory or HuggingFace model ID.

Returns

ModelInfo Object containing model configuration details.

Raises

ModelError: If model cannot be loaded or parsed.

Example

>>> from talu.converter import describe
>>> info = describe("Qwen/Qwen3-0.6B")  # doctest: +SKIP
>>> info.num_layers > 0  # doctest: +SKIP
True

Converter

Classes

Functions

class VerificationResult

VerificationResult( self, success: bool, model_path: str, output: str = '', tokens_generated: int = 0, error: str | None = None)

Attributes

Example

class ModelInfo

Attributes

Example

Quick Reference

Properties

is_moe: bool

is_quantized: bool

def convert

Parameters

Returns

Raises

Examples

def verify

talu.converter.verify( model_path: str, prompt: str | None = None, max_tokens: int = 5) → VerificationResult

Parameters

Returns

Examples

def list_schemes

talu.converter.list_schemes(include_unimplemented: bool = False, category: str | None = None) → dict[str, dict]

Parameters

Returns

Example

def describe

talu.converter.describe(model: str) → ModelInfo

Parameters

Returns

Raises

Example

`VerificationResult( self, success: bool, model_path: str, output: str = '', tokens_generated: int = 0, error: str | None = None )`

`is_moe: bool`

`is_quantized: bool`

`talu.converter.verify( model_path: str, prompt: str | None = None, max_tokens: int = 5 ) → VerificationResult`

`talu.converter.list_schemes(include_unimplemented: bool = False, category: str | None = None) → dict[str, dict]`

`talu.converter.describe(model: str) → ModelInfo`