Multi-GPU model deployment for local hardware - serve models, plan deployments, and manage 8x B200 GPU clusters.
Command Hierarchy
architect— Multi-GPU model deployment│├──Deployment│├──serveStart single model server││--portServer port (default: 8000)││--gpusGPU indices to use│└──serve-multiRun multiple models│├──Planning│├──planDeployment plan preview│└──optimizeSuggest optimal config│├──Setup│├──downloadDownload from HuggingFace│└──configConfigure settings│├──Status│├──statusRunning servers & GPUs│├──logsView server logs│└──stopStop server(s)│└──treeThis command hierarchy
root categories commands flags
Quick Start
conduct architect serve jamba-1.7-fp8 # Start single model server
conduct architect serve-multi jamba llama # Run multiple models
conduct architect plan mistral-small-24b # Preview deployment plan
conduct architect tree # Show command hierarchy
Subcommands
Deployment Deploy
Command
Description
serve
Start a model server on available GPUs
serve-multi
Run multiple models simultaneously across 8x B200 GPUs
Planning Plan
Command
Description
plan
Show deployment plan (GPU allocation, memory, parallelism)