The development of artificial intelligence does not come to an end. The popular text generator ChatGPT has new functions now: It can hear, speak and see. But was exactly does that mean? How do the new features look like and what other functions are possible in the future?
The new functions
That ChatGPT can hear is already possible since a certain amount of time because of API (application programming interfaces). API works like a digital arbiter. This conducts information from one interface, e.g. an app, to another. With the new auditory function, the prompts can get entered directly via voice command and do not have to get keyboarded. The voice recognition follows here by Whisper. This is a reliable system for the conversion from speech into text.
The language function of ChatGPT is a bit newer. This works similar to e.g. Google Assistant. Therefore, it is possible to carry a conversation with the chatbot, even when these still proceed clipped. Though, a more interactive use is possible like this. When a voice command is given, it is also answered with voice. Therefor OpenAI has consulted professional dubbers. One has the choice out of five different voices. And the language function renders one more option. Blind people as well as people with dyslexia are able to directly get the texts read aloud and don’t need an extra program for that.
The visual function enables that ChatGPT can conduct imagine analysis from now on. Therefor one uploads one or more pictures and marks p. r. n. the area which should be analyzed. For example, this can help kids with homework. They can make a picture from the exercise and get support from the artificial intelligence.
This facilitates the lives of the parents who work in home office and simultaneously want to help their children. The results can be explained subsequently by the chatbot and discussed with it. So adaptions can be made faster and more easily. Also, the own fridge can be photographed and uploaded in the app for example, if one does not know what to cook. Then ChatGPT vomits a fitting recipe quickly.
With the new features, we can expect some more extensions in the future. Conceivable is that also spoken content, e.g. YouTube videos, can be heard and seen by the chatbot and get processed afterward. This would make the corpus even bigger and the results more valuable.
The conversations which can be carried on by ChatGPT do not proceed fluently yet. However, one can expect that this circumstance will amend quickly. So we soon can have conversations with the chatbot in real-time, like with the language assistants Siri from Apple and Alexa from Amazon.
Such language assistants have been taken kindly in the past and are used regularly since then. This shows that the language function also can be worthwhile for ChatGPT. In the future, it can be called into action in sales as well as in the customer service.
Beyond, it is possible that the AI soon will recognize and assess emotions. This would cause that the prompts are received even better and the texts can be issued more precisely. These will correspond the notions more and emotions can get included.
ChatGPT already helps us in many situations. The artificial intelligence produces texts of every description as desired. Through the new language and auditive function, the application underway is even more easily and faster.
Furthermore, it facilitates handicapped people the life. The visual function whereas can especially be supportive in everyday family life. In the future, we can expect more helpful functions on the basis of the new features. The AI program can be a huge advantage which should be used today. Especially when solutions for problems are needed quickly, the chatbot can be a good help.
Maximilian Schmidt is CEO of the CPI Technologies GmbH. The company is specialized in software development in the areas artificial intelligence, blockchain and digital product development.
Statements of the author and the interviewee do not necessarily represent the editors and the publisher opinion again.