A New AI Milestone: Passing an 8th Grade Science Exam

September 23, 2019 - 8 minutes read

In 2015, 700 artificial intelligence (AI) developers competed to win $80,000 in prize money. Their objective? To build an AI system capable of passing an eighth-grade science exam.

Unfortunately, all submitted systems failed to accomplish this, with the best contender answering only 60% of questions on the test correctly. At the time, AI was no match for the language and logic prowess of your typical student entering high school.

Fast forward four years, and it seems that AI has “hit the books” hard. A few weeks ago, the Allen Institute for Artificial Intelligence unveiled a new AI system that can correctly answer more than 90% of questions on an eighth-grade science exam and more than 80% of questions on a 12th-grade test.

A Digital Aristotle

Founded in 2013 by Seattle developer and Microsoft co-founder Paul Allen, the Allen Institute for Artificial Intelligence immediately set out to build a “digital Aristotle.” Now known as Aristo, the system and its recent accomplishments are a testament to the breakneck speed that AI is advancing at.

Just over the past several months, the world’s top research labs have rapidly improved AI’s abilities to comprehend natural language and mimic the decision-making previously only found in humans. As a result, machines can now analyze documents, identify key information, and answer questions drastically better than before.

Since its inception, Aristo was intended to master multiple-choice tests. To hone the system’s skills, researchers had it take exams created for students in New York City. Any questions that included pictures or diagrams were removed since they required extra capabilities like computer vision.

Still, what Aristo has achieved is nothing to scoff at. While answering some questions correctly strictly depended on information retrieval, others required “connecting the right dots” through logic, like this one:

Which change would most likely cause an increase in the number of squirrels living in an area?

(1) a decrease in available food

(2) an increase in competition between the squirrels

(3) an increase in the number of predators

(4) a decrease in the number of forest fires

A More Meaningful Benchmark for AI?

At this point, you’re probably wondering why the team at the Allen Institute decided to build Aristo in the first place. Basically, they saw standardized science tests as a more meaningful AI benchmark than other alternatives like games such as chess, backgammon, or Go. Science tests can’t be mastered just by learning rules; they require logic as well.

Aristo’s achievements have amassed worldwide attention. But they’ve also garnered skepticism from many researchers who think AI is still a long way off from mastering language or replicating general intelligence. Microsoft AI researcher Jingjing Liu says, “We can’t compare this technology to real human students and their ability to reason.”

The CEO of the Allen Institute, Oren Etzioni, PhD, sees things a little differently. He believes Aristo’s progress has the potential to improve a host of products and services, such as medical record-keeping systems and Internet search engines.

“This has significant business consequences,” explains Etzioni. “What I can say — with complete confidence — is you are going to see a whole new generation of products, some from start-ups, some from the big companies.”

Tempering Expectations

Instead of harboring unbridled optimism or cautious skepticism, Jeremy Howard, a founding researcher at San Francisco development lab Fast.ai, emphasizes that it’s still too early to say what will become of these recent AI advancements. “We are at the very early stage of this. We are so far away from the potential that I cannot say where it will end up.”

The media has a habit of hyping up any progress in the world of AI. Case in point: When a London-based AI lab created a system that beat the world’s best Go players, headlines hailed it as a new era for the technology. But behind the scenes, people like Dr. Etzioni knew this was far from the case. After all, the eighth-grade science test competition held by the Allen Institute prior to this event managed to stump all AI systems that entered it.

Things began to turn around when AI developers started doubling down on neural networks which could understand the intricacies of language by analyzing written works like articles and books. Neural networks are complex mathematical systems which learn how to accomplish tasks by analyzing immense amounts of data. For example, if you show a neural network thousands of dog photos, it will begin to identify patterns and will learn how to recognize dogs in future pictures.

One of the biggest breakthroughs in natural language analysis came from Bert, a system built by Google researchers that analyzed thousands of Wikipedia articles as well as numerous books spanning genres such as science fiction and romance. Eventually, Bert learned how to guess the missing word in a presented sentence. This elucidated insights into how language is fundamentally constructed.

AI’s Still at an Early Stage

The Allen Institute used Bert’s technology as the foundation for Aristo. By feeding Bert thousands of questions and answers, Aristo began to learn how to answer similar questions. Language models like Bert are now catalyzing a number of research projects around the world such as tools to spot fake news and conversational chatbots.

Today, you can see the effects of these advancements all around us in consumer AI applications. But AI still has a long way to go. Liu and her team of Microsoft researchers tried building a system capable of tackling the Graduate Records Exam (GRE). Thanks to progress in natural language processing, the language section of this test was doable. But the math section continued to evade the system’s abilities due to the reasoning required.

While an obstacle now, questions like these will probably only stump AI for a few years. Still, how this will happen is anyone’s guess. For now, it’s best to reserve judgment and wait and see. What do you think the near future of AI looks like? What will this technology be able to accomplish? Let us know your thoughts in the comments below!