Selected Work

Evaluation Infrastructure · CTC AI Operations

Hybrid Evaluation Pipeline

A self-built local-plus-cloud pipeline for evaluating AI coding, agentic and data-science systems against frozen, versioned rubrics – built for VRAM efficiency on a single 32 GB GPU, reduced judge bias, and a roughly 10× cut in token cost.

Two models in 32 GBJudge ≠ system under test≈95% off repeats

View project →

Evaluation Method · CTC AI Operations

Contamination-Resistant Code Evaluation

A reproducible pipeline that synthesises evaluation tasks from a real, versioned codebase via AST analysis – so they can't have leaked into a model's training set the way public benchmarks do. Piloted with Qwen3.6-35B-A3B on the Cerberus framework, single-stream on one 32 GB GPU.

Tasks from live AST~3B-active MoE on 32 GBMethod-first pilot

View project →

Local Infrastructure · CTC AI Operations

Local Three-Tier Agent Workstation

A single-GPU operator setup matching agent framework and open-weight model to workload – Hermes, OpenCode and OpenClaw across one 32 GB card, with only one model resident at a time. Design intent.

One GPU, three rolesOne model residentNo cloud dependency

View project →

Research Agenda · CTC AI Operations

Multi-Agent Safety Evaluation

A research agenda for measuring the emergent risks of multi-agent systems – a failure-mode taxonomy, quantitative risk metrics and an instrumented testbed – toward a reusable pre-deployment safety harness, built on the sandboxed, contamination-resistant discipline of the pipeline.

Risk in the edges: k(k−1)Taxonomy · metrics · testbedResearch agenda

View project →

Product Design · CTC AI Operations

Sovereign Personal AI Assistant

A personal assistant with the polish of a modern chat app, whose inference and history stay on owned hardware – reached from a phone over an encrypted tunnel – and one architecture that scales from a single GPU to a multi-GPU team host by adding hardware, not re-architecting.

Inference and history stay localPersonal rig, paired with a smartphoneConsumer → enterprise

View project →

Hybrid Evaluation Pipeline

Contamination-Resistant Code Evaluation

Local Three-Tier Agent Workstation

Multi-Agent Safety Evaluation

Sovereign Personal AI Assistant