GPT-4 Surprises Experts by Scoring a B on Quantum Computing Final Exam
UT Austin Professor Scott Aaronson recently put OpenAI’s GPT-4 AI language model to the test by having it take the 2019 final exam for his Introduction to Quantum Information Science course. The impressive performance of GPT-4 comes hot on the heels of the AI model scoring an A on economist Bryan Caplan’s Labor Economics midterm. Caplan, initially skeptical of AI, has since become a believer in GPT technology, stating that “AI enthusiasts have cried wolf for decades. GPT-4 is the wolf. I’ve seen it with my own eyes.”
The AI model’s achievement is particularly noteworthy considering GPT-4 took the exam without any prior exposure to the specific course materials or experiences that human students would have had access to, such as weekly problem sets, recitation sections, office hours, a midterm, and a practice final. Despite this apparent disadvantage, GPT-4 managed to score 73 out of 100 on the final exam, just slightly below the student average of 74.4. Importantly, the exam had not been previously posted online, ensuring that GPT-4 had no chance of encountering it during its training.
To conduct the test, Professor Aaronson collaborated with his PhD student Justin Yirka, who graded the exam as he would for any other student. They provided GPT-4 with the LaTeX source code of the exam, which the AI model skillfully processed and understood. Impressively, GPT-4 was even able to handle quantum circuits using the qcircuit package or, alternatively, provide English descriptions of the circuits.
GPT-4 showcased its strengths in true/false questions and conceptual questions—areas where many students usually encounter difficulties. However, the AI model did not perform as well in calculation questions, often recognizing the type of calculation required but failing to execute it correctly. The researchers did not use the new WolframAlpha interface for this test, which could potentially have improved GPT-4’s performance on calculation questions.
While GPT-4’s performance does not map directly to a letter grade, Aaronson suggests that it corresponds to a solid B. This achievement demonstrates the rapid advancements being made in AI capabilities, particularly in specialized fields like quantum computing. Furthermore, GPT-4’s impressive performance may have implications for the future of AI-assisted learning and research in various disciplines.
It is worth noting that the students who took the exam had the advantage of completing course-related activities that provided them with a clearer understanding of what to expect in the final exam. GPT-4, on the other hand, was “flying blind,” relying only on its extensive knowledge of the public internet, which presumably included other people’s quantum computing homework sets and exams. Aaronson and Yirka believe that GPT-4’s performance could potentially improve with fine-tuning or few-shot prompting using other exams or lecture notes from the course.
For anyone interested in replicating the test, Aaronson used the GPT-4 chat model in playground with a temperature of 0.2 and a max length of 1930 tokens. The successful completion of this test by GPT-4 raises questions about the future potential of AI in academic settings and the extent to which AI can advance our understanding of complex topics like quantum computing.
{
"seed": 9144425,
"used_random_seed": true,
"negative_prompt": "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts,signature, watermark, username, blurry, artist name",
"num_outputs": 1,
"num_inference_steps": 175,
"guidance_scale": 7.5,
"width": 512,
"height": 512,
"vram_usage_level": "high",
"sampler_name": "euler",
"use_stable_diffusion_model": "neverendingDreamNED_bakedVae",
"use_vae_model": "vae-ft-mse-840000-ema-pruned",
"stream_progress_updates": true,
"stream_image_progress": false,
"show_only_filtered_image": true,
"block_nsfw": false,
"output_format": "jpeg",
"output_quality": 75,
"metadata_output_format": "json",
"original_prompt": "GPT-4 AI Model Scored a B on Quantum Computing Final Exam, deepleaps.com, 4k, 8k, ultra highres, raw photo in hdr, sharp focus, intricate texture",
"active_tags": [],
"inactive_tags": [],
"use_upscale": "RealESRGAN_x4plus",
"upscale_amount": "4",
"use_lora_model": "",
"prompt": "GPT-4 AI Model Scored a B on Quantum Computing Final Exam, deepleaps.com, 4k, 8k, ultra highres, raw photo in hdr, sharp focus, intricate texture",
"use_cpu": false
}