Stanford WORKBank · arXiv 2506.06576
Paste one thing your agent does. We check it against the Stanford WORKBank map: can AI do it, and do people actually want it done with no human in the loop?