Building in Public: from 3D NeRFs to safer AI agent harnesses

I’m Brian Mwangi — a software engineer and AI infrastructure engineer based in Nairobi, Kenya. This blog is where I’ll share what I’m building and what I’m learning as I go: the systems behind agents, practical AI infrastructure, and the occasional deep dive into tools that feel “too magical” until you open them up and see the moving parts.

This first post is an orientation: a snapshot of the technical threads I’m currently excited about, and the kind of writing you can expect here.

The kind of work I’m drawn to

I like projects where correctness matters and “it ran once” isn’t enough. The common pattern in my work is:

Build something real, not a demo.
Put it behind guardrails.
Measure what “good” means.
Tighten the loop until the system behaves predictably.

That mindset shows up in two places I keep coming back to lately:

3D reconstruction + neural rendering (NeRFs and friends)
Agent harnessing: turning LLMs into dependable systems (not just chat)

NeRF work: the practical pipeline (COLMAP → Nerfacto → tcnn)

NeRF is easy to talk about abstractly, but the real work is the pipeline and the failure modes.

A typical “make this real” workflow looks like:

1) Data capture and calibration (COLMAP)

COLMAP is the first place where reality pushes back. If your capture is messy, everything downstream becomes harder.

Things that matter more than people admit:

Consistent exposure (avoid auto-exposure swings)
Enough overlap between frames
Avoid motion blur
Cover the object/scene with varied angles (not just a semicircle)
Watch reflective/transparent surfaces (they break assumptions)

When COLMAP fails, the model doesn’t “learn harder.” It just learns the wrong geometry.

2) Training a robust baseline (Nerfstudio / Nerfacto)

Nerfstudio makes experimentation faster. Nerfacto gives a strong baseline for many scenes, and it’s a practical place to learn:

how pose quality affects convergence
what “floaters” look like and why they happen
why depth supervision and regularization choices matter
how to debug artifacts by looking at the data, not the loss curve

3) Performance engineering (tiny-cuda-nn / tcnn)

At some point you stop asking “can it render?” and start asking:

Can it render fast enough to be useful?
Can it run on constrained hardware?
Can I iterate quickly?

That’s where acceleration approaches like tcnn start to matter. It’s not glamorous, but it’s what makes the difference between research output and something that could power an actual product experience.

Agent harnessing is not “prompt → response”

A lot of agent demos are essentially:

Prompt in, response out, hope for the best.

That’s fine for toy tasks. But if an agent is allowed to touch real systems—files, networks, databases—then reliability has to come from structure, not vibes.

What works better is to treat an agent as a program with:

explicit states
explicit transitions
explicit validation gates
explicit failure handling

A simple mental model: a state machine, not a chatbot

You can think of agent execution like:

Plan → Collect context → Propose action → Validate → Execute → Verify → Commit
If validation fails: branch to Repair or Ask for clarification
If execution fails: branch to Rollback or Retry with constraints

Memory as markdown: accumulate decisions like an engineering journal

Another thing I’m experimenting with: memory as a real artifact.

Not just “the model remembers,” but the system writes down:

what the repo’s non-negotiables are
what we tried and why it failed
what conventions are enforced
what decisions we made (and what to revisit later)

Markdown files become a lightweight knowledge base that compounds over time—especially when paired with agentic research papers and practical patterns from production systems.

What to expect next

This blog will gradually move from “here’s how I think” to concrete deep dives:

A practical NeRF debugging checklist (COLMAP failure modes, artifact taxonomy)
What a “safe agent” architecture looks like in layers (isolation → policy → simulation → observability → credentials)
Rate limiting, hidden routes, and real-world hardening for internal tooling
Evaluations: how to measure agent reliability without fooling yourself

If you’re building similar systems—or you’re just curious—stick around