CTCarenskrieger.dev ENDE
Selected Work // CTC AI Operations

Selected work

Five self-built systems for evaluating and running frontier and agentic AI on sovereign, commodity hardware – each with a deep-dive page and a working-paper preprint. Together they form the research program.

Evaluation Infrastructure · CTC AI Operations

Hybrid Evaluation Pipeline

A self-built local-plus-cloud pipeline for evaluating AI coding, agentic and data-science systems against frozen, versioned rubrics – built for VRAM efficiency on a single 32 GB GPU, reduced judge bias, and a roughly 10× cut in token cost.

Two models in 32 GBJudge ≠ system under test≈95% off repeats
View project →
Evaluation Method · CTC AI Operations

Contamination-Resistant Code Evaluation

A reproducible pipeline that synthesises evaluation tasks from a real, versioned codebase via AST analysis – so they can't have leaked into a model's training set the way public benchmarks do. Piloted with Qwen3.6-35B-A3B on the Cerberus framework, single-stream on one 32 GB GPU.

Tasks from live AST~3B-active MoE on 32 GBMethod-first pilot
View project →
Local Infrastructure · CTC AI Operations

Local Three-Tier Agent Workstation

A single-GPU operator setup matching agent framework and open-weight model to workload – Hermes, OpenCode and OpenClaw across one 32 GB card, with only one model resident at a time. Design intent.

One GPU, three rolesOne model residentNo cloud dependency
View project →
Research Agenda · CTC AI Operations

Multi-Agent Safety Evaluation

A research agenda for measuring the emergent risks of multi-agent systems – a failure-mode taxonomy, quantitative risk metrics and an instrumented testbed – toward a reusable pre-deployment safety harness, built on the sandboxed, contamination-resistant discipline of the pipeline.

Risk in the edges: k(k−1)Taxonomy · metrics · testbedResearch agenda
View project →
Product Design · CTC AI Operations

Sovereign Personal AI Assistant

A personal assistant with the polish of a modern chat app, whose inference and history stay on owned hardware – reached from a phone over an encrypted tunnel – and one architecture that scales from a single GPU to a multi-GPU team host by adding hardware, not re-architecting.

Inference and history stay localPersonal rig, paired with a smartphoneConsumer → enterprise
View project →