ML-for-DB: what to cover

Here's my non-exhaustive checklist I think every good ML-for-DB paper should have, especially query optimization:

Tails: not just an average. Show the distribution or at least 90/95/99%

Query performance: show me that the queries actually get faster

Overhead: how much does training and inference cost?

Optimals: if possible, how close are you to the optimal prediction / latency?

Comparisons: existing (+ commercial, when possible), other learned, naive approaches

Failures: Show me systematic failures. If you think there aren't any, look harder

Pareto analysis: show me tradeoffs. No way you're better at everything.

Interpretation: what did the model learn? Examine the weights. Examine especially successful cases. It's ok to hypothesize!

None of the ML-for-systems papers at this VLDB (including my own!) check all these boxes. Let's raise the bar!

— Tweets by Ryan Marcus (@RyanMarcus)