🦞🌯 Lobster Roll

Thread

Sparsely-Gated Mixture of Experts (MoE) (eli.thegreenplace.net)

Stories related to "Sparsely-Gated Mixture of Experts (MoE)" across the full archive.

Sparsely-Gated Mixture of Experts (MoE) (eli.thegreenplace.net)
Sparsely-Gated Mixture of Experts (MoE) (eli.thegreenplace.net)
Sparsely-Gated Mixture of Experts (Moe) (eli.thegreenplace.net)
The Sparsely-Gated Mixture-of-Experts Layer (2017) [pdf] (arxiv.org)
Mixture-of-Experts with Expert Choice Routing (2022) (ai.googleblog.com)
Mixture of A Million Experts: PEER (parameter efficient expert retrieval) (arxiv.org)
Mixture of a Million Experts (web3.arxiv.org)
Mixture of Nested Experts: Adaptive Processing of Visual Tokens (arxiv.org)
Layerwise Recurrent Router for Mixture-of-Experts (arxiv.org)
A Visual Guide to Mixture of Experts (Moe) LLMs (newsletter.maartengrootendorst.com)
ARIA: An Open Multimodal Native Mixture-of-Experts Model (arxiv.org)
Mixture of Parrots: Experts improve memorization more than reasoning (arxiv.org)
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts LMs (arxiv.org)
Mixture-of-Experts (MoE) LLMs (cameronrwolfe.substack.com)
Mixture-of-Experts (Moe) LLMs (cameronrwolfe.substack.com)
Scaling a 300B Mixture-of-Experts LING LLM Without Premium GPUs (arxiv.org)
Efficient and Portable Mixture-of-Experts Communication (perplexity.ai)
NanoMoE: Mixture-of-Experts (Moe) LLMs from Scratch in PyTorch (cameronrwolfe.substack.com)
Mixture of Experts: When Does It Deliver Energy Efficiency? (neuralwatt.com)
Mixture of Tunable Experts-DeepSeek R1 Behavior Modification at Inference Time (huggingface.co)
Pangu Pro Moe: Mixture of Grouped Experts for Efficient Sparsity (arxiv.org)
Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model (twitter.com)
GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;MoonshotAI&#x2F;Kimi-K2">https:&#x2F;&#x2F;github.com&#x2F;MoonshotAI&#x2F;Kimi-K2</a>
Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model (github.com)
A Mixture of Experts Approach to Handle Concept Drifts (arxiv.org)
Dor awards submission: Mixture Of Experts ft. AGI [video] (youtube.com)
REAP: One-Shot Pruning for Trillion-Parameter Mixture-of-Experts Models (cerebras.ai)
Reap: One-Shot Pruning for Trillion-Parameter Mixture-of-Experts Models (cerebras.ai)
Mixture-of-Experts explained with PyTorch implementation (medium.com)
Intro to Routing: Mixture-of-Experts and Expert Choice (neelsomaniblog.com)
Sparse Mixture of Experts for Game AI: An Accidental Architecture (github.com)