Researchers find a “value” direction inside a language model that tracks whether it thinks its current plan will work | arXiv News