🦞🌯 Lobster Roll

Stories by matt_d

Compiling LLMs into a MegaKernel: A path to low-latency inference (zhihaojia.medium.com)
Algorithms for Modern Processor Architectures (lemire.github.io)
Test Results for AMD Zen 5 (agner.org)
Zen 5's AVX-512 Frequency Behavior (chipsandcheese.com)
High-Performance DBMSs with io_uring: When and How to use it (arxiv.org)
Evolving the OCaml Programming Language (2025) [pdf] (kcsrk.info)
Decompiling 2024: A Year of Resurgance in Decompilation Research (mahaloz.re)
You could have invented Fenwick trees (cambridge.org)
FlashAttention-T: Towards Tensorized Attention (dl.acm.org)
DWARF as a Shared Reverse Engineering Format (lief.re)
Using obscure graph theory to solve programming languages problems (reasonablypolymorphic.com)
TorchLean: Formalizing Neural Networks in Lean (leandojo.org)
Slicing Is All You Need: Towards a Universal One-Sided Distributed MatMul (arxiv.org)
Property-Based Testing for the People (repository.upenn.edu)
Explainable Linear Programs (jeremykun.com)
Modern Minimal Perfect Hashing: A Survey (arxiv.org)
Demystifying ARM SME to Optimize General Matrix Multiplications (arxiv.org)
How to Think About GPUs (jax-ml.github.io)
Safepoints and Fil-C (fil-c.org)
Binding Application in Idris (andrevidela.com)
The Hoare Cube (johnwickerson.wordpress.com)
Gluon: a GPU programming language based on the same compiler stack as Triton (github.com)
GPEmu: A GPU emulator for rapid, low-cost deep learning prototyping [pdf] (vldb.org)
Orders of Infinity (terrytao.wordpress.com)
How to train your program verifier (risemsr.github.io)
Packed Data Support in Haskell (arthi-chaud.github.io)
The Calculated Typer (bahr.io)
Safe and efficient C++ interoperability via non-escapable types and lifetimes (forums.swift.org)
A Generalized Algebraic Theory of Directed Equality (jacobneu.phd)
Still Asking: How Good Are Query Optimizers, Really? [pdf] (vldb.org)