Language applications can generate quite convincing texts, and chat bots are successfully used by many stores, banks and institutions.
At first glance, it seems that natural language recognition systems have gone a long way. However, experts at the Allen Institute of Artificial Intelligence say that artificial intelligence did not even come close to a real understanding of the texts.
In order to assess how much AI understands the value of what is read, the Winograd Schema Challenge is used. This test was created in 2011. It has 273 tasks, each of which contains two sentences that differ in one word.
For example: “The trophy does not fit in a brown suitcase because it is too large. “The trophy does not fit in a brown suitcase because it is too small.” The task is to understand to which of the words the pronoun “he” refers. For a person, the answer is obvious: in the first case – a trophy, in the second – a suitcase.
For AI, this is a rather difficult task, which was believed to be impossible to solve without understanding the meaning. However, modern programs can do this with an accuracy of 90%. But does this mean that unprecedented progress has been made in the recognition of natural languages?
Researchers have created a new test – WinoGrande, which contains 44 thousand questions. All were manually assembled by Amazon Mechanical Turk workers. The test was run in public and only those tasks that were correctly solved by at least two-thirds of people were left, the answers were considered unambiguous, and pronouns could not be selected on the basis of simple word associations.
People answered with an accuracy of 94%, and programs from 60% to 80%. That is, a more complex test threw AI far back.
Another natural experiment proves that natural language recognition systems are easy to fool. TextFooler changes the meaningful sentence members to synonyms.
Example: “Characters who find themselves in incredibly contrived situations are completely alienated from reality” and “Characters who find themselves in incredibly artificial circumstances are alienated from reality.” There is not much difference for the reader, but there is for the program. The Google BERT neural network immediately reduced efficiency by 5-7 times.
Why is it important? By changing just a few pixels, you can fool image recognition systems and, for example, pass validation on the site. The TextFooler program proves that you can also outsmart the artificial intelligence of helpers like Siri, Alexa and Google Home, as well as language classifiers, hate detectors and spam filters. At the same time, researchers believe that TextFooler, pointing out the shortcomings of AI, will allow better training of existing programs.