Although AI exploded onto the scene through sometimes eerily clever chatbots, text-based interactions are already old fashioned. The announcement of OpenAI’s GPT-4 update introduced GPT-Vision (GPT-V), the latest multimodal AI marvel. The announcement is now become reality as users finally get a chance to test the full potential of its abilities.
A multimodal large language model (LLM) means that it can interact not only with the written word, but also through other modes. In this case, the new GPT-V can understand images and work with them. Also, thanks to the new generative art tool DALL-E 3, ChatGPT can both take images as input but also generate images as output.
These new capabilities have raised eyebrows across the tech space as users put them through their paces. Can they decode redacted government documents on UFO sightings? Yes. “ChatGPT-4V Multimodal decodes a redacted government document on a UFO sighting released by NASA,” one tweet raves. “Maybe the truth isn’t out there; it’s right here in GPT-V.”
ChatGPT-4V Multimodal decodes a Redacted government document on a UFO sighting released by NASA.
I have tested this on 100s of redacted documents and I can say we are in a new world. pic.twitter.com/aCKOm577TO
— Brian Roemmele (@BrianRoemmele) October 6, 2023
Trying to fill gaps in a string of text is basically what LLMs do. The user did the next best thing
Go to Source to See Full Article
Author: Jose Antonio Lanz
Tip BTC Newswire with Cryptocurrency