“for roughly a third of those incorrect responses, the model’s chain of thought showed that it knew the answer was incorrect but provided it anyway”

Link. OpenAI’s o1-preview.