Every AI agent project I start ends up with the same boilerplate: chunk docs, pick an embedding model, set up a vector store, write retrieval logic, wire it into a custom tool.<p>It works, but it's plumbing — and it needs to be rebuilt for every new agent or runtime.<p>The idea I'm explori...
Hey guys,<p>I wanted to share a small project I've been working on to solve a personal pain point: TinyTTS.<p>We all love our massive 70B+ LLMs, but when building local voice assistants, running a heavy TTS framework alongside them often eats up way too much precious VRAM and compute. I wanted ...
Custom FP4 CUDA Kernel – 129 Tflops on DGX Spark with Pre-Quantized Weight Cache
(forums.developer.nvidia.com)
Hey HN,<p>I built StreamHouse, an open-source streaming platform that replaces Kafka's broker-managed storage with direct S3 writes. The goal: same semantics, fraction of the cost.<p>How it works: Producers batch and compress records, a stateless server manages partition routing and metadata (S...