We’re interacting with artificial intelligence (AI) online not only more than ever before—but more than we realize—so researchers asked people to talk to four agents, including one human and three different types of AI models, to see if they could tell a difference .
The “Turing Test,” first proposed as an “imitation game” by computer scientist Alan Turing in 1950, assesses whether a machine’s ability to display intelligence is indistinguishable from that of a human. In order for a machine to pass the Turing Test, it must be able to talk to someone and fool them into thinking it is a human.
The scientists decided to repeat this test, asking 500 people to talk to four respondents, including a human and the ELIZA program from the 1960s, as well as GPT-3.5 and GPT-4, the AI that powers ChatGPT. The conversations lasted five minutes – after which the participants had to say whether they believed they were talking to a human or an artificial intelligence. In the study, published May 9 on the arXiv preprint server, the researchers found that participants judged GPT-4 to be human 54 percent of the time,
ELIZA, a system pre-programmed with responses but without a large language model (LLM) or neural network architecture, was rated as human only 22% of the time. GPT-3.5 scored 50%, while the human participant scored 67%.
Read more: ‘It would be within its natural right to hurt us to protect itself’: How humans can mistreat AI right now without even knowing it
“Machines can get confused, mixing up plausible ex-post-facto justifications for things, just like humans do,” Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told Live Science.
“They can be cognitively biased, misled and manipulated, and become increasingly deceptive. All of these elements mean that human frailties and quirks are expressed in AI systems, making them more human-like than previous approaches that had little more than a list of ready-made answers.”
The study, which builds on decades of attempts to get AI agents to pass the Turing Test, echoed common concerns that AI systems deemed human would have “widespread social and economic consequences.”
The scientists also argue that there are valid criticisms of the Turing test for being too simplistic in its approach, saying that “stylistic and socio-emotional factors play a greater role in passing the Turing test than traditional notions of intelligence. ” This suggests that we have been looking for machine intelligence in the wrong place.
“Raw intellect only goes so far. What really matters is being intelligent enough to understand the situation, the skills of others, and having the empathy to tie these elements together. Abilities are only a small part of the value of AI – their ability to understand the values, preferences and boundaries of others is also essential. It is these qualities that will enable AI to serve as a faithful and reliable gatekeeper to our lives.”
Watson added that the study presents a challenge for future human-machine interaction and that we will become increasingly paranoid about the true nature of interactions, especially on sensitive issues. She added that the study highlights how AI has changed during the GPT era.
“ELIZA was limited to ready-made responses, which greatly limited its capabilities. He could fool someone for five minutes, but soon the limitations would become clear,” she said. “Language models are infinitely flexible, capable of synthesizing responses to a wide range of topics, speaking specific languages or sociolects, and presenting themselves with character-driven personality and values.” It’s a huge step up from something manually programmed by a human being, no matter how cleverly and carefully.”