Background: Despite not being officially released to the public yet, discussions regarding the technical details and real impact of Sora have never ceased. Behind these discussions lies the exploration of the fundamental questions of artificial intelligence.
Sora’s generated results are indeed impressive, with high resolution and consistent subject integrity even after multiple angle changes. Does this level of generation imply that Sora is a world model? On the basis of being able to generate realistic videos, can it be said that Sora understands the physical world?
(Continued from the previous article.)
Affirmative Side:
There’s a notion that ChatGPT doesn’t understand text or language. However, OpenAI’s Chief Scientist, Ilya, argues that performing next token prediction, predicting and generating the next word, constitutes understanding language.
Ilya illustrates with an example: feeding a large model a mystery novel and instructing it to predict the culprit. If it accurately predicts the culprit, does it truly understand the novel?
Negative Side:
The Turing test is essentially an engineering test, implying that failing it indicates a lack of capability, but passing it doesn’t guarantee understanding. Consider an exam: failing suggests a lack of understanding, but passing doesn’t necessarily indicate comprehension; one might memorize all knowledge through rote learning. Thus, the Turing test isn’t highly persuasive.
Affirmative Side:
I strongly believe Sora passes the Turing test; it involves comparison with humans and utilizes question-answer format. Although ChatGPT adheres to this format, Sora transcends it, engaging in what’s akin to a movie test. This test involves visual assessment for errors and doesn’t rely on question-answer format but on visual judgment to discern intelligence.
Negative Side:
What constitutes understanding the physical world? It must align with the real world. Some AI might understand mystery novels but not the physical world. If all of Sora’s data were from the magical world of Harry Potter, and it could predict the next frame, does it understand physics? No, it understands magic.
Affirmative Side:
Regarding understanding the physical world, we maintain that reaching a level comparable to the average person suffices, without needing to comprehend physics formulas. For instance, when a car approaches on the road, do you calculate its arrival time using Newton’s laws?
No, you intuitively predict and evade, with acceptable errors and biases. Furthermore, human understanding of physics is partial and evolving. We can’t demand Sora to understand Newton’s laws immediately; it’s inappropriate.