OpenAI president shares first image generated by GPT-4o
Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.
As you’ll see in the image below, it is quite convincingly photorealistic, showing a person wearing a black T-shirt with an OpenAI logo writing chalk text on a blackboard that reads “Transfer between Modalities. Suppose we directly model P (text, pixels, sound) with one big autoregressive transformer. What are the pros and cons?”
The new GPT-4o model, which debuted on Monday, improves upon the prior GPT-4 family of models (GPT-4, GPT-4 Vision, and GPT-4 Turbo) by being faster, cheaper, and retaining more information from inputs such as audio and vision.
It is able to do so because OpenAI took a different approach from its prior GPT-4 class LLMs. While those chained multiple different models together and converted other media such as audio and visuals to text and back, the new GPT-4o was trained on multimedia tokens from the get-go, allowing it to directly analyze and interpret vision and audio without first converting it into text.
Based on the above image, the new approach is a noticeable improvement over OpenAI’s last image generation model DALL-E 3 which debuted in September 2023. I ran a similar prompt through DALL-E 3 in ChatGPT and here is the result.
As you can see, the image shared by Brockman created with GPT-4o improves significantly in quality, photorealism, and accuracy of text generation.
However, GPT-4o’s native image generation capabilities are not yet publicly available. As Brockman alluded to in his X post by saying “Team is working hard to bring those to the world.”
'Legitimately dangerous': Google's erroneous AI Overviews spark mockery, concern
No comments:
Post a Comment