I built this because I needed to benchmark LLM inference endpoints and the existing tools required Python environments. I wanted a single binary I could grab quickly on any server.<p>I've also become interested in performance metrics like time to first token, inter-token latency, throughput, an...