conduct architect

Multi-GPU model deployment for local hardware - serve models, plan deployments, and manage 8x B200 GPU clusters.

Command Hierarchy

architect — Multi-GPU model deployment │ ├── Deployment │ ├── serve Start single model server │ │ --port Server port (default: 8000) │ │ --gpus GPU indices to use │ └── serve-multi Run multiple models │ ├── Planning │ ├── plan Deployment plan preview │ └── optimize Suggest optimal config │ ├── Setup │ ├── download Download from HuggingFace │ └── config Configure settings │ ├── Status │ ├── status Running servers & GPUs │ ├── logs View server logs │ └── stop Stop server(s) │ └── tree This command hierarchy

root categories commands flags

Quick Start

conduct architect serve jamba-1.7-fp8        # Start single model server
conduct architect serve-multi jamba llama    # Run multiple models
conduct architect plan mistral-small-24b     # Preview deployment plan
conduct architect tree                       # Show command hierarchy

Subcommands

Deployment Deploy

Command	Description
`serve`	Start a model server on available GPUs
`serve-multi`	Run multiple models simultaneously across 8x B200 GPUs

Planning Plan

Command	Description
`plan`	Show deployment plan (GPU allocation, memory, parallelism)
`optimize`	Suggest optimal configuration

Setup Setup

Command	Description
`download`	Download model from HuggingFace
`config`	Configure deployment settings

Status Status

Command	Description
`status`	Show running servers and GPU usage
`logs`	View server logs
`stop`	Stop running server(s)
`tree`	Display command hierarchy