The real world isn't digital, and it's unforgiving.
Since generative AI is adept at manipulating digital text, voice and images, many assume this automatically infers it will be adept at the entirety of human endeavor and work. But this is false logic. The same false logic leads many to assume that since a humanoid robot can jump over boxes and a specialized robot can lay flooring tiles in a giant warehouse with a perfectly flat concrete floor, robots will soon be able to do every possible kind of work.
This is a layperson's logic based on a limited grasp of what makes tasks accessible to AI / robots. Jumping over boxes and laying flooring tiles are repeatable behaviors in a narrow context. There is little ambiguity or imperfect choices to make, and little need for dexterity in not one task but dozens of different tasks, none of which are repeatable in the long, complex slog to get job done.
Manipulating text, voice and images is easy for one reason: these are digital, not real-world. All three can be broken down to pattern matching and probability based on scraping millions of existing samples. The real world isn't quite so easy....