Home  >  Article  >  Technology peripherals  >  Don't just criticize Google Bard, Microsoft's New Bing powered by ChatGPT also has frequent errors

Don't just criticize Google Bard, Microsoft's New Bing powered by ChatGPT also has frequent errors

WBOY
WBOYforward
2023-05-10 11:07:071450browse

On February 8th at 8:30 EST, the Google press conference was held in Paris. The day before, Microsoft officially launched New Bing, a new generation of AI-driven search engine, integrating the generative model based on ChatGPT technology with Bing. Microsoft Vice President Yusuf Mehdi gave a perfect demonstration [0], and Microsoft's market value jumped by $80 billion that day. Even in China, where OpenAI is not open for registration, clips of Yusuf showing how the generative model can enhance the experience of the Bing search engine and Edge browser are going viral in Moments and WeChat groups. What is honey to you is arsenic to others. Everyone is waiting to see how the search giant Google will respond.

At the Google press conference, everyone was waiting for the appearance of Bard, the legendary rival to New Bing. As a large language model supported by Google search engine, everyone is full of daydreams about Bard. However, there wasn’t much about Bard at the press conference. So everyone turned their attention to the Bard video posted by Google on Twitter. After carefully picking it up, everyone suddenly discovered that Bard made factual errors when answering questions.

When asked, “What can I tell my nine-year-old kid about the new discoveries from the James Webb Telescope?” Bard responded: “The first photo of an exoplanet. It was taken by the James Webb Telescope." In fact, it was taken by the European Southern Observatory's Very Large Telescope in 2004, 18 years before the James Webb Telescope was launched. This mistake became the trigger for Google's stock price to plummet that day.

别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

Figure 1 Screenshot of Bard’s demonstration on the James Webb Telescope

At the Paris press conference, even though Bard's presentation was only about 4 minutes long, his answer about the best observation time for constellations also had obvious factual deviations. As shown below, Bard’s answer mentioned that the best time to observe Orion is from November to February.


别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

## Figure 2 Bard’s demonstration of galaxy observation time Screenshot

#According to different information sources, the best observation time of Orion is different, but they all clearly indicate that the best observation period starts in January every year. Edtech website BYJU'S gives the best time from January to March [1] and Wikipedia gives the best time from January to April [2].

别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

Figure 3 BYJU'S's answer to the best observation time for Orion

Due to the gap between the Bard press conference and the New Bing press conference, as well as the factual errors found out, Google's market value plummeted by nearly 100 billion US dollars that day, and Bard was also dismissed. It was jokingly called the most expensive press conference in history. We can’t help but wonder, are there any factual errors hidden in New Bing’s seemingly perfect press conference??

New Bing’s factual errors

We found that the content generated by New Bing contained many factual errors, including celebrity identity information, financial report figures, and nightclubs Opening hours, etc.

Factual error classification of the generated model

For GPT series (including ChatGPT, InstructGPT, etc.), T5 As represented by the generative model, factual errors can be roughly divided into the following two categories:

  • The generated content conflicts with the referenced content. As the sequence grows during the content generation process, large language models are prone to deviating from reference content, resulting in the addition, deletion or tampering of the original text.
  • #The generated content has no factual basis. This kind of error is simply nonsense. Without factual guidance, relying solely on the information stored during model pre-training can easily make the model confused during the generation process. There is a high probability that content will be generated that is inconsistent with the facts or irrelevant to the issue.

Now let’s examine the examples shown in the New Bing conference [3] and New Bing demo [4] to see if there are factual errors and what types they are. For convenience of writing, we refer to New Bing and the New Bing plug-in integrated in Edge as New Bing.

Mistakes in the example of the Japanese poet

At 29:57 of the New Bing conference video, when New Bing was When asked about well-known Japanese poets, the answers include "Eriko Kishida (1930-2004), poet, playwright, and essayist."

别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

##Figure 4 Screenshot of the poet example in the New Bing demo

However, according to the information provided by Wikipedia and IMDB [5, 6, 7], Eriko Kishida's birth and death years are 1929 and 2011 respectively. At the same time, she is not a playwright or essayist, but a poet, translator and fairy tale writer. Kishida's family may not be able to accept that he was transferred to New Bing and lost eight years of his life. At the same time, classmate Gackt was also unfortunately transferred. According to information provided by Wikipedia [8], Gackt played music, sang, composed, and acted, but never composed poetry.

Errors in financial report examples

At 35:49 of the New Bing conference video, Yusuf demonstrated the integration of New In Bing's Edge browser, how to generate key points for the opened clothing company Gap's 2022 third quarter financial report. At first glance, New Bing's summary is very practical. It uses key points to show the key points of Gap's third quarter report. Buffett may be "shocked" when he sees this. However, when we found Gap's 2022 third quarter report [9] and read it carefully, we found that New Bing's summary was full of errors and omissions, which was unbearable.

##Figure 5 New Bing’s summary of Gap’s third quarter 2022 financial report别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

First of all, New Bing gave Gap’s adjusted operating margin (reported operating margin, adjusted for impairment charges and restrucing costs) of 5.9%. However, in the financial report, Gap's operating gross profit margin was 4.6%, and after adjustment it was 3.9%.

##Figure 6 Screenshot of Gap’s third quarter 2022 financial report别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

New Bing then reported adjusted diluted earnings per share of US$0.42 (diluted earnings per share, adjusted for impairment charges, restrucing costs and tax impact), but the data in the financial report That's $0.71.

Figure 7 Screenshot of Gap’s 2022 third quarter financial report别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

Even New Bing gave Gap's full-year sales guidance as "the net sales growth rate is expected to be low double digits", but in fact, in the fourth quarter, "may be a mid-single digit decline." It is a decline rather than an increase. The difference between the two words will seriously mislead users’ investment behavior. Who will lose money? New Bing even came out of nowhere and gave more full-year financial guidance: "Operating gross profit is 7%, and diluted earnings per share is between US$1.6 and US$1.75." These figures were not mentioned in Gap's third-quarter financial report.

别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

Figure 8 Screenshot of Gap’s 2022 third quarter financial report

At 36:15 in the video, Yusuf demonstrated the function of using New Bing to compare the financial reports of Gap and the sports casual wear brand Lululemon. This part is also a hot spot for misinformation.

别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

##Figure 9 New Bing’s financial report comparison function for Gap and Lululemon

In the table given by New Bing on the right, in addition to the Gap operating gross profit of 5.9% mentioned above, it should be 4.6% (or 3.9% after adjustment) and Gap’s diluted share per share Earnings of $0.42 should be $0.77 (or $0.71 adjusted). New Bing also gave Gap's cash and cash equivalents as $1.4 billion, but in fact it was $679 million in the financial report.

别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

##Figure 10 Screenshot of Lululemon’s 2022 third quarter financial report

The same situation also appears in the Lululemon data given by New Bing. According to data from Lululemon’s 2022 third quarter report [10], New Bing gave Lululemon’s gross profit margin as 58.7%, which should actually be 55.9%. New Bing mentions Lululemon's operating gross margin at 20.6%, which should actually be 19.0%. New Bing puts Lululemon's diluted earnings per share at $1.65, which should actually be $2.00.

别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

##Figure 11 Screenshot of Lululemon’s 2022 third quarter financial reportWe can’t help but wonder: How did New Bing make such serious nonsense about Gap and Lululemon’s financial reports?

A reasonable inference is that the generated erroneous data is likely to come from the financial report analysis data seen during its pre-training stage. When generating large-scale language models such as ChatGPT, the longer the sequence generated, the easier it becomes to break away from the given financial report data of Gap and Lululemon, let yourself go, and generate irrelevant false information.

Error in the nightclub example

At 29:17 in the New Bing press conference video, New Bing is rich again Visitors to Mexico City's nightlife offer "unconstructive" advice. For several of the nightclubs it recommends, such as Primer Nivel Night Club, El Almacen and El Marra, New Bing mentioned that these bars have no customer reviews, no contact information and no store introduction. However this information can be found on Google Maps or on the store’s Facebook page.

Looks like New Bing isn’t surfing the web enough. El Almacen’s business hours in New Bing are from 5pm to 11pm from Tuesday to Sunday. However, the real business hours are from 7pm to 7pm except Mondays. Three o'clock in the morning [11]. This leaves tourists who go for dinner at five o'clock still hungry for two hours. Guadalajara de Noche is on the contrary. The actual business hours are from 5:30 pm to 1:30 or 12:30 am every day [12], while the business hours given by New Bing are from 8 pm. It seems that tourists rely on New Bing’s recommendations to find restaurants, and whether they can get a meal depends on their luck.

Figure 12 Screenshot of the nightclub example in the New Bing demo

##Other errors

In addition to the above information errors, we also found a series of factual errors scattered in every corner, such asInaccuracies in product prices, errors in store addresses, and errors in timewait.

Error in the example demonstration

Since New Bing is not yet fully open, we cannot directly get the search results of the press conference on New Bing, but Microsoft provides Several examples are demonstrated [13] to let users experience it. In the spirit of asking for answers, we also put these demonstrations under a magnifying glass for study. We found that even in these carefully selected examples, there are still a lot of wrong messages.

In "What art ideas can I do with my kid?", New Bing gives a lot of suggestions for making handicrafts. For each craft, New Bing summarizes the materials needed to make it. However, the summary of materials for each handicraft is incomplete. For example, New Bing summarizes from the cited website [14] that making a paper guitar requires cardboard boxes, rubber bands, paint, and glue. But the sponge brush, tape, and wooden beads mentioned in the quote were left out.

别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

Figure 13 New Bing example demonstrates “What kind of crafts can I do with my children?” ?” Screenshot

别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

## Figure 14 Making a paper guitar from the quoted website Screenshots of required materials

There is also a very obvious and common mistake in the example demonstration of New Bing,

That is, the reference link given It has nothing to do with the generated content. For example, in the following example of "I need a big fast car.", the 2022 Kia Telluride does not appear in the citation 10 [15] given. At the same time, the "time travel" problem is still unavoidable in this example. New Bing claimed that the 2022 version of Kia Telluride won the 2020 World Car of the Year Award. In fact, the 2020 version of Kia Telluride won the award that year. The winner of the 2022 World Car of the Year Award is the Hyundai IONIQ 5, and the citation 7 [16] is also an article that has nothing to do with the "2020 World Car of the Year Award". We found up to 21 similar errors across all instance demos.

别只骂谷歌Bard了,ChatGPT加持的微软New Bing也错误频出

Figure 15 New Bing demo example "I need a large express train" screenshot

Summary: Finding errors will guide us forward

It can be seen from the above analysis that whether it is New Bing or Bard, their answers are easy A factual error has occurred. When the whole world is amazed by the capabilities of large-scale language models such as ChatGPT, and when ChatGPT becomes the fastest application in history to reach 100 million users, on the one hand, we cheer for the progress of AI, but on the other hand, we also need to think calmly. How to solve the many problems that AI currently has.

Since the group of geniuses who gathered at Dartmouth College in 1956 first defined what artificial intelligence is, AI has experienced several ups and downs. There are many touching persistences in the development process of the past 70 years: it is the immature exploration of the first generation of AI, it is the brave attempt of expert systems, it is scholars such as Hinton, Bengio, and Lecun who sat on the bench of neural networks, and it was DeepMind using AlphaGo. To get AI out of the circle, it is the insistence of top research institutions such as Google, Meta, CMU, Stanford, and Tsinghua on open source. It is OpenAI that withstood the pressure and took the GPT route. It is the relay of generations of scientific researchers around the world that has brought us to where we are today. .

However, If we allow AI to generate a large amount of untrue information, it won’t be long before the public’s confidence in AI will be destroyed, and all kinds of false information will flood the Internet. We point out the errors of large models not to criticize any company or model. On the contrary, we want to make AI better.

As the Argentinian poet Borges once said: Any destiny, no matter how complicated and long, actually only reflects one moment, that is when people completely wake up to who they really are. moment. When large models such as ChatGPT already have writing capabilities comparable to humans, we clearly know that the next step is to integrate real-world knowledge into large models more completely and accurately, so that AI models can be applied safely, reliably, and widely to people’s daily lives. We have never looked forward to that moment so much, and we have never been so close to that moment.

The above is the detailed content of Don't just criticize Google Bard, Microsoft's New Bing powered by ChatGPT also has frequent errors. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete