It analyzes images and produces audio and video… What is really new and amazing about GPT-4?

It analyzes images and produces audio and video… What is really new and amazing about GPT-4?

Last Tuesday, March 14, OpenAI announced its highly anticipated new version of GPT-4, the latest version of the language model on which the most popular chatbot -ChatGPT- is based.
The previous version was “GPT-3.5”, a neural network that generates text and learns by identifying billions of distinct patterns in the way humans associate words, numbers, and symbols to be able to generate texts and create responses based on what is learned from the millions of resources available on the Internet, such as Wikipedia, articles, books, and human conversations.

In the beginning, it should be noted that the “GPT-4” version will support the “GPT Chat” bot, but it supports the paid version, with a monthly subscription of $20, and the new version will not be available for public use or for testing like its predecessor. Hopefully, it will be available in the future for the free version. It is also assumed that the version of “GPT-4″ supports the robot of the ” Bing ” search engine, as stated by Microsoft, since its launch for the experiment last month.

We now know how much the GPT chatbot can answer complex questions, write texts, and even poetry, all based on the older model, GPT-3.5. Therefore, the whole world was waiting for the announcement of the newest version, “GPT-4”, after many expectations, speculations, and rumors spread in the past period about the strength of this expected new model. Is there really a real difference? And what is new that the “GPT-4” model offers?

A robot that can see!

Well, the first difference is that the new model “GPT-4” is a multimedia model, that is, it is able to analyze both text and images, and this is the biggest difference between it and its predecessors. When you show him an image, he can analyze the components of that image, relate them to the question you ask, and generate an answer to the question. For example, you can show him a picture of the contents of your fridge and ask him what meal you can prepare, the robot will analyze the picture and find out what those contents are, then suggest a number of meals that you can prepare with the ingredients in your fridge.

We can expect that there will be several applications for this new feature. For example, the company offered the ability to write code in the new form based on a picture taken of this code written by hand in a regular notepad. The company also referred to the robot’s ability to understand funny pictures, translate “memes” and what they mean, and determine what is funny in them.

In practical applications, OpenAI is also collaborating with startup Be My Eyes, which has a smartphone app that uses the object recognition feature of a phone’s camera, or volunteers, to help people with vision problems map out the features of the environment around them, in order to develop its application using the “GPT-4” model, with the aim of increasing the capacity of the virtual volunteer, who will help application users see the world around them.

Although this feature is not something new or different, there are applications that actually present the same idea, but “Open AI” confirms that the “GPT-4” model can provide the same level of context and understanding that the human volunteer provides in the application, to describe what he sees accurately to the user. For example, the application, with the help of the “GPT-4” model, recognizes the colors of a garment, identifies the type of plants, explains how to access a sports machine in the gym, translates a sticker, offers a recipe for food, reads a map and performs a number of other tasks that show it actually understands the content in the image. The image recognition feature will not be available for general use, before testing it and making sure of its effectiveness within the “Be My Eyes” application first.

More logical!

OpenAI also confirms that its new model, GPT-4, is better in tasks that require creativity or logical thinking than the existing ChatGPT, such as summarizing a text or article. As indicated by the New York Times experiment, the new model provides an accurate and correct summary of the article, even if a random sentence is added to the summary, and when the robot is asked about the validity of this summary, it will indicate the existence of this extraneous sentence.

The new version also has different personalities, or what is known as “steerability”, which refers to the robot’s ability to change its behavior and the way it speaks on demand. Often when you use the current version of ChatGPT you will find that it speaks in a fixed tone and style, but in the new version, the user will be able to request a suitable personality to speak in a different style and tone according to the nature of the personality. In addition, it outperforms the current model in passing tests that humans undergo, such as the law school admission test.

Better handling!

As we know, these large language models train on millions of website pages, books, articles, and other text data, but when you’re having an actual conversation with the user, there are limits to how much the model can put into short memory. This memory is not measured in words, but in tokens. In the GPT-3.5 model, this limit was 4096 tokens, which is about 8,000 words, or about four to five pages of a book, so If the bot exceeds the limit, it may lose track of the conversation.

But with the GPT-4 model, the maximum number of tokens is more than 32,000 tokens, which translates to about 64,000 words, or 50 pages of text, enough to write a short story or process an entire research paper at once. Simply, during a conversation or while writing text, the robot will be able to keep up to 50 pages in its short memory, meaning that it will remember what you talked about on the tenth page of the conversation, for example, or when writing a long story or article, it may refer to the events that occurred before 20 pages.

Of course, English remains the primary language that dominates the world of data, especially artificial intelligence data, but the “GPT-4” model has taken a step towards providing a chatbot capable of speaking in more than one language, by proving its ability to answer accurately about 14,000 multiple-choice questions in 57 different topics, in 26 different languages, including Arabic, Italian, Turkish, Japanese and Korean.

This initial test of the bot’s language abilities is promising, but it still falls far short of saying the bot can use multiple languages, because the test criteria themselves have been translated from English, and the multiple-choice questions don’t really represent normal conversations in their natural context. However, the bright side here is that the model succeeded in skipping a test that it was not specially trained for, and this is promising about the possibility of the “GPT-4” model being more useful to non-native English speakers.

Despite all that the ChatGPT chatbot offers today, there are some tricks that can mislead it and use it in illegal matters and conversations. However, OpenAI states that its new model has been trained on many malicious and offensive instructions that it has received from users over the past period. The company says it spent six months making the GPT-4 form more secure and accurate, improving its response quality by 82% over the previous GPT-3.5 form with regard to questions about prohibited content.

In addition, the possibility of it fabricating unreal things and information has become 60 % less, but it is still exposed to this problem, which is what is known as the concept of “artificial hallucinations” that occur when the robot answers you with a confident answer, but there is no justification for it in the data that it was trained on, the same thing that It happened with Google’s BARD chatbot when it was announced last month.

Smart tools everywhere!

It is clear that large language paradigms are beginning to enter many of the tools we use today, and in addition to a search engine such as Bing, OpenAI has announced that it is cooperating with several other companies that use the new GPT-4 model and integrate it into their services offered to users, such as Khan Education Academy, which aims to use artificial intelligence to assist students during courses, and to help teachers produce ideas for lessons. The famous language learning application Duolingo has also integrated the new model into its paid services to provide a similar interactive learning experience.

On the same day as the new model was announced, March 14, Google also announced a host of AI features coming to the various business applications it offers, such as Google Docs, Gmail, and Sheets. These new features include innovative ways to compose and summarize texts and brainstorm using AI in Google Docs, much as the ChatGPT chatbot is now used by many. In addition, the ability to write complete email messages in the Gmail application based on the abbreviations that the user places, as well as the ability to produce images, audio, and video with artificial intelligence in the company’s presentation application, similar to the advantages in the “Microsoft Designer” application, which is supported by the DALL-E image generation service developed by OpenAI.

Therefore, even if you have not tried ChatGPT yet, expect that in the coming period, you will see these artificial intelligence-supported tools in front of you in most of the applications that you use in your work or in your studies, which is good news anyway, even if you are not a follower of the march of artificial intelligence, you will be able to get a better experience in these applications compared to what you used to do before.

sciencedz

Leave a Reply