conduct architect

Multi-GPU model deployment for local hardware - serve models, plan deployments, and manage 8x B200 GPU clusters.

Command Hierarchy

architect — Multi-GPU model deployment ├── Deployment ├── serve Start single model server --port Server port (default: 8000) --gpus GPU indices to use └── serve-multi Run multiple models ├── Planning ├── plan Deployment plan preview └── optimize Suggest optimal config ├── Setup ├── download Download from HuggingFace └── config Configure settings ├── Status ├── status Running servers & GPUs ├── logs View server logs └── stop Stop server(s) └── tree This command hierarchy
root categories commands flags

Quick Start

conduct architect serve jamba-1.7-fp8        # Start single model server
conduct architect serve-multi jamba llama    # Run multiple models
conduct architect plan mistral-small-24b     # Preview deployment plan
conduct architect tree                       # Show command hierarchy

Subcommands

Deployment Deploy

CommandDescription
serveStart a model server on available GPUs
serve-multiRun multiple models simultaneously across 8x B200 GPUs

Planning Plan

CommandDescription
planShow deployment plan (GPU allocation, memory, parallelism)
optimizeSuggest optimal configuration

Setup Setup

CommandDescription
downloadDownload model from HuggingFace
configConfigure deployment settings

Status Status

CommandDescription
statusShow running servers and GPU usage
logsView server logs
stopStop running server(s)
treeDisplay command hierarchy