🦞🌯 Lobster Roll

Stories by matt_d

Algorithms for Modern Processor Architectures (lemire.github.io)
Test Results for AMD Zen 5 (agner.org)
High-Performance DBMSs with io_uring: When and How to use it (arxiv.org)
Evolving the OCaml Programming Language (2025) [pdf] (kcsrk.info)
FlashAttention-T: Towards Tensorized Attention (dl.acm.org)
TorchLean: Formalizing Neural Networks in Lean (leandojo.org)
Slicing Is All You Need: Towards a Universal One-Sided Distributed MatMul (arxiv.org)
Demystifying ARM SME to Optimize General Matrix Multiplications (arxiv.org)
How to Think About GPUs (jax-ml.github.io)
Safepoints and Fil-C (fil-c.org)
Gluon: a GPU programming language based on the same compiler stack as Triton (github.com)
How to train your program verifier (risemsr.github.io)
A Generalized Algebraic Theory of Directed Equality (jacobneu.phd)
Still Asking: How Good Are Query Optimizers, Really? [pdf] (vldb.org)
Decompiling the Synergy: Human–LLM Teaming in Reverse Engineering [pdf] (zionbasque.com)
Identity Types (bartoszmilewski.com)
Ga68, a GNU Algol 68 Compiler (fosdem.org)
Clang: -Wexperimental-lifetime-safety: Experimental C++ Lifetime Safety Analysis (github.com)
Benchmarking a Baseline Fully-in-Place Functional Language Compiler [pdf] (trendsfp.github.io)
Dependent Types: Universes, or types of types (jonmsterling.com)
Transforming recursion into iteration for LLVM loop optimizations (dspace.mit.edu)
<a href="https:&#x2F;&#x2F;dspace.mit.edu&#x2F;bitstream&#x2F;handle&#x2F;1721.1&#x2F;162684&#x2F;cuevas-eliec-meng-eecs-2025-thesis.pdf?sequence=1&amp;isAllowed=y" rel="nofollow">https:&#x2F;&#x2F;dspace.mit.edu&#x2F;bitstream&#x2F;handle&#x2F;1721.1&#x2F;162684&#x2F;cuevas...</a>
Cloud RAM (mikekohn.net)
IRHash: Efficient Multi-Language Compiler Caching by IR-Level Hashing (usenix.org)
Computing Sharding with Einsum (blog.ezyang.com)
Draw high dimensional tensors as a matrix of matrices (blog.ezyang.com)
A Clash Course in Solving Sudoku (Functional Pearl) [pdf] (unsafeperform.io)
Optimizing a 6502 image decoder – part II: assembly (colino.net)
Converting Binary Floating-Point Numbers to Shortest Decimal Strings (onlinelibrary.wiley.com)
Wafer-Scale AI Compute: A System Software Perspective (sigops.org)
SE Radio 708: Jens Gustedt on C in 2026 (se-radio.net)