Father · Builder · Podcaster

I build tools thatgive people control.

I build production software and open-source tools. Currently focused on AI infrastructure, local inference, and developer tooling. Being a father changed how I see technology. My kid is going to inherit this internet.

The Journey

The Beginning

Content Protection

Started helping creators remove stolen content, fake accounts, and doxxed information from the web. DMCA takedowns, GDPR requests, hunting down impersonators.

"Learned that the systems we build shape who has power."

The Shift

Open Source & AI

Building production tools for developers. TurboQuant for KV cache quantization, vLLM Studio for local inference, Parchi for browser AI agents. 200+ repos, 1,200+ stars.

"The best tools give you control, not dependency."

Now

Systems & Infrastructure

Running local LLMs on 8x 3090s. Contributing to Warp, ExLlamaV3, and vLLM ecosystems. Building the boring plumbing that makes AI work in production.

"Infrastructure should be invisible."

What I Actually Do

AI Infrastructure

Contributing to TurboQuant (1,200+ stars) for KV cache quantization. Building vLLM Studio for local inference control panels. Running local models on 8x 3090s because I don't trust corporations with my data.

Developer Tools

Parchi (525 stars) - AI friend in your browser. Pi Brain (162 stars) - local AI memory. MoE Compress (177 stars) - model compression automation.

Systems & Research

REAP Expert Swap (122 stars) - MoE expert pruning. Claude ACP Server (91 stars) - Anthropic-compatible facade. Contributing to Warp, ExLlamaV3, and vLLM ecosystems.

Ethers Club Podcast

57 episodes talking to builders about technology, business, and life. Guests include founders from AI coding tools, open-source projects, and systems engineering.

HuggingFace Models

Published 40+ models on HuggingFace focusing on expert pruning (REAP) and quantization for efficient inference.

REAP Expert Pruning

GLM-4.7-REAP-50-W4A16 (69 likes) - MoE expert pruning with quantization. Also Gemma-4-REAP (91 likes) and Qwen3.6-REAP models for efficient inference.

GGUF Quantized Models

Qwen3.6-GGUF-Strix (16 likes) - Optimized for AMD Strix Halo with Vulkan. GGUF format for llama.cpp integration with efficient quantization.

Large-Scale Models

GLM-5.1-555B (5 likes) - Large model compression with REAP pruning and GPTQ quantization. Also NVFP4 variants for Blackwell GPUs.

Research & Experimentation

DeepSeek-V3.2-REAP (10 likes) - DeepSeek compression. INTELLECT-3-REAP (4 likes) - Prime Intellect model compression. Continuous research in MoE efficiency.

What I Believe

Build in Public

200+ repos, most of them open source

Local First

Run models locally. Own your data.

Boring Reliability

The best infrastructure is invisible

Say No

To projects that extract value without creating it

Outside the Terminal

I run local meetups for builders. Read a lot about systems thinking. Lift weights because desk work will kill you otherwise. Family comes first-everything else is noise.

US-based. Async preferred. Clear specs. Ship it.

Let's Work Together

I take on a limited number of projects — usually infrastructure work that requires deep context and careful execution. If you're building developer tools, AI systems, or production software that needs to just work, let's talk.

If you're building another extractive platform, I'm not your guy.

Start a conversation