Significant upgrades to ChatGPT will allow the chatbot to respond to voice commands and image-based inquiries. Users will be able to feed photos into ChatGPT on all platforms and engage in voice conversations with it on Android and iOS. The features are now being released through OpenAI. The image-based capabilities will initially only be accessible to Plus and Enterprise users; eventually, other users will also have access to them.
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023
If you want to test out voice conversations, you must enable them in the ChatGPT app’s settings (choose New Features from the Settings menu). Five voices are available for you to select by tapping the microphone button.
Also: How to use Chat GPT Plus from browser to Plugins
A new text-to-speech algorithm, according to OpenAI, powers the back-and-forth voice dialogues and can produce “human-like audio from just text and a few seconds of sample speech.” It used professional actors to help create the five voices. The company’s Whisper speech recognition system, on the other hand, transforms a user’s spoken words into text.
The image-based features are also fascinating. According to OpenAI, you could ask the chatbot to answer a math problem you take a picture of, show it a picture of your grill and ask why it won’t start, or get it to help you plan a dinner based on a photograph of what’s in your fridge. In fact, Microsoft emphasized the Copilot AI’s aptitude at math issues during last week’s Surface event.
GPT-3.5 and GPT-4 are used by OpenAI to fuel its image recognition capabilities. hit the photo button (you’ll need to hit the + button first on iOS or Android) to take a picture or select an existing image on your smartphone to enjoy ChatGPT’s image-based features. You can use a drawing tool to zoom in on a particular area of the image while asking ChatGPT about numerous images.
The possibility for harm was mentioned by OpenAI in a blog post introducing the revisions. The voices of well-known people (as well as regular people) can be imitated by bad actors, who might then engage in fraud. For this reason, OpenAI is concentrating on ChatGPT voice chats using this technology and collaborating with a few select partners on additional restricted use cases (more on that in a bit).
Regarding visuals, OpenAI collaborated with Be My Eyes, a free tool that enables volunteers to join video conversations with blind and low-vision users to help them better interpret their surroundings. According to OpenAI, “Users have told us they find it valuable to have general conversations about images that just so happen to have people in the background, like if someone pops up on TV while you’re figuring out your remote control settings.” The business stated that because ChatGPT “is not always accurate and these systems should respect individuals’ privacy,” it has also restricted how ChatGPT can assess and make direct statements about people who appear in photographs.
It has written a paper on the image-based functionality, which it refers to as GPT-4 with vision, and its safety attributes.
English text in graphics is easier for ChatGPT to interpret than other languages. For the time being, according to OpenAI, the chatbot “performs poorly” in other languages, particularly those that employ scripts other than Roman. As a result, it advises non-English speakers to refrain from utilizing ChatGPT to deal with text in photos for the time being.
While this is going on, Spotify and OpenAI have partnered to exploit the voice-based technology in an intriguing way. For podcasters, the former has revealed a Voice Translation tool pilot. Using the voices of the people who appear on the show, this may translate podcasts into many languages. According to Spotify, the program can translate the voice of the original speaker into different languages while keeping their speech patterns.
Do you dream of a world where some of the top podcasts would be spoken in your native language? Well, that’s now possible. We’re excited to pilot Voice Translation, a groundbreaking feature powered by AI that translates podcasts into additional languages—all in the podcaster’s… pic.twitter.com/7ebVwF99hD
— Spotify News (@SpotifyNews) September 25, 2023
Spotify is initially translating a few English-based shows into other languages. There are currently Spanish-language translations of some Armchair Expert and The Diary of a CEO with Steven Bartlett episodes, with French and German versions coming soon.