“trained on the same dataset for long enough, pretty much every model with enough weights and training time converges to the same point”

Link. That’s interesting claim.