“current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data”

Link. “Adding a single clause that seems relevant to the question causes significant performance drops (up to 65%) across all state-of-the-art models”

Reassuring! Best news in months.