Father · Builder · Podcaster
I build tools thatgive people control.
I build production software and open-source tools. Currently focused on AI infrastructure, local inference, and developer tooling. Being a father changed how I see technology. My kid is going to inherit this internet.
The Journey
The Beginning
Content Protection
Started helping creators remove stolen content, fake accounts, and doxxed information from the web. DMCA takedowns, GDPR requests, hunting down impersonators.
"Learned that the systems we build shape who has power."
The Shift
Open Source & AI
Building production tools for developers. TurboQuant for KV cache quantization, vLLM Studio for local inference, Parchi for browser AI agents. 200+ repos, 1,200+ stars.
"The best tools give you control, not dependency."
Now
Systems & Infrastructure
Running local LLMs on 8x 3090s. Contributing to Warp, ExLlamaV3, and vLLM ecosystems. Building the boring plumbing that makes AI work in production.
"Infrastructure should be invisible."
What I Actually Do
AI Infrastructure
Contributing to TurboQuant (1,200+ stars) for KV cache quantization. Building vLLM Studio for local inference control panels. Running local models on 8x 3090s because I don't trust corporations with my data.
Developer Tools
Parchi (525 stars) - AI friend in your browser. Pi Brain (162 stars) - local AI memory. MoE Compress (177 stars) - model compression automation.
Systems & Research
REAP Expert Swap (122 stars) - MoE expert pruning. Claude ACP Server (91 stars) - Anthropic-compatible facade. Contributing to Warp, ExLlamaV3, and vLLM ecosystems.
Ethers Club Podcast
57 episodes talking to builders about technology, business, and life. Guests include founders from AI coding tools, open-source projects, and systems engineering.
HuggingFace Models
Published 40+ models on HuggingFace focusing on expert pruning (REAP) and quantization for efficient inference.
REAP Expert Pruning
GLM-4.7-REAP-50-W4A16 (69 likes) - MoE expert pruning with quantization. Also Gemma-4-REAP (91 likes) and Qwen3.6-REAP models for efficient inference.
GGUF Quantized Models
Qwen3.6-GGUF-Strix (16 likes) - Optimized for AMD Strix Halo with Vulkan. GGUF format for llama.cpp integration with efficient quantization.
Large-Scale Models
GLM-5.1-555B (5 likes) - Large model compression with REAP pruning and GPTQ quantization. Also NVFP4 variants for Blackwell GPUs.
Research & Experimentation
DeepSeek-V3.2-REAP (10 likes) - DeepSeek compression. INTELLECT-3-REAP (4 likes) - Prime Intellect model compression. Continuous research in MoE efficiency.
What I Believe
Build in Public
200+ repos, most of them open source
Local First
Run models locally. Own your data.
Boring Reliability
The best infrastructure is invisible
Say No
To projects that extract value without creating it
Outside the Terminal
I run local meetups for builders. Read a lot about systems thinking. Lift weights because desk work will kill you otherwise. Family comes first-everything else is noise.
US-based. Async preferred. Clear specs. Ship it.
Let's Work Together
I take on a limited number of projects — usually infrastructure work that requires deep context and careful execution. If you're building developer tools, AI systems, or production software that needs to just work, let's talk.
If you're building another extractive platform, I'm not your guy.
Start a conversation