Managing Codex-Spark: 1000 Tokens Per Second Changes Everything
I spent a week with GPT-5.3-Codex-Spark on Cerebras hardware. The speed broke every workflow I had.
0xSero
I work to make AI more accessible, affordable, and useful for everyone. From compressing LLMs to fit on edge devices, to educating people on the technology and what can be done with it.
Tools and libraries built in public.
Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
Your AI friend right in your browser. Browser extension for AI automation with orchestrator mode and vision
Control panel for VLLM, SGLang, llama.cpp, ExLlamaV3. Local inference management UI
Model-agnostic MoE compression automation: calibration, REAP/quantization, benchmark, publish
Local AI memory system with graph-based retrieval and MCP server support
How many experts do we need to serve a model? MoE expert pruning research
I take on complex projects at the edge of what's possible and make them boringly reliable.
AI / AGENTS
LLM infrastructure, multi-agent orchestration, tool-calling systems
DEV TOOLS
Open-source libraries, browser automation, data extraction tooling
PRODUCT / SYSTEMS
Interfaces that match the depth of the underlying system
What I've been building lately
KV cache quantization for LLM inference
Control panel for local inference
AI friend in your browser
Local AI memory system
Contributed To
"Sybil Solutions brings a powerful blend of talent, integrity, and genuine curiosity. When they take on a task you can relax, knowing that they'll see it through to completion to the highest standards. I recommend them without a heartbeat's hesitation."
I spent a week with GPT-5.3-Codex-Spark on Cerebras hardware. The speed broke every workflow I had.
I ran a privacy preserving analysis on 809 conversations. The results were humbling, surprising, and exactly what I needed to see.
What happens when you fine-tune a 14B parameter model on your own coding conversations? Real numbers, real costs, and real frustrations.
Subscribe for insights on AI systems, developer tools, and building in public.