Stories by cowpig
There are so many models, and so many new ones being released all the time, that I have a hard time knowing which ones to prioritize testing anecdotally. What benchmarks have you found to be especially indicative of real-world performance?<p>I use:<p>* Aider's Polyglot benchmark seems to be a d...