“as these models get bigger and are trained on more data, they improve on individual language-related abilities and also develop new ones by combining skills in a manner that hints at understanding”

Link. “… a mathematically provable argument for how and why an LLM can develop so many abilities… when Arora and his team tested some of its predictions, they found that these models behaved almost exactly as expected”

Researchers applied techniques from random graph theory. These models have unexpected behaviors.

We don’t know if this is how *we* “understand” but I suspect it’s something similar. We are not magical.

“Many people found it a little bit eerie how much GPT-4 was better than GPT-3.5, and that happened within a year. Does that mean in another year we’ll have a similar change of that magnitude? I don’t know. Only OpenAI knows.”