Here's my non-exhaustive checklist I think every good ML-for-DB paper should have, especially query optimization:
- Tails: not just an average. Show the distribution or at least 90/95/99%
- Query performance: show me that the queries actually get faster
- Overhead: how much does training and inference cost?
- Optimals: if possible, how close are you to the optimal prediction / latency?
- Comparisons: existing (+ commercial, when possible), other learned, naive approaches
- Failures: Show me systematic failures. If you think there aren't any, look harder
- Pareto analysis: show me tradeoffs. No way you're better at everything.
- Interpretation: what did the model learn? Examine the weights. Examine especially successful cases. It's ok to hypothesize!
None of the ML-for-systems papers at this VLDB (including my own!) check all these boxes. Let's raise the bar!