In a recent study by Christian Terwiesch, the Andrew M. Heller Professor at the Wharton School of the University of Pennsylvania and Chair of the Operations, Information, and Decisions department, the performance of OpenAI’s Chat GPT3 was evaluated on a final exam for a typical MBA core course, Operations Management. Professor Terwiesch is also co-director of Penn’s Mack Institute for Innovation Management, and holds a faculty appointment in Penn’s Perelman School of Medicine. His research on Operations Management and on Innovation Management appears in many of the leading academic journals, and he is an award-winning teacher with extensive experience in MBA teaching and executive education. He is also the co-author of the widely used textbook, “Matching Supply with Demand,” which is now in its third edition, and has launched the first Massive Open Online Course (MOOC) in business on Coursera, which has been taken by over half a million students.
The results of the study show that while Chat GPT3 excels in basic operations management and process analysis, it struggles with more advanced calculations and process analysis. The study found that Chat GPT3 performed exceptionally well on basic operations management and process analysis questions, including those based on case studies. The answers provided were not only correct, but the explanations were also deemed excellent by the researchers. However, the study also found that Chat GPT3 made surprising mistakes in relatively simple calculations at the level of 6th grade math. These mistakes, according to the study, can be massive in magnitude.
Furthermore, the present version of Chat GPT3 is not capable of handling more advanced process analysis questions, even when they are based on fairly standard templates. This includes process flows with multiple products and problems with stochastic effects such as demand variability. In these instances, the model struggled to match the problem with the appropriate solution method.
The study also found that Chat GPT3 is remarkably good at modifying its answers in response to human hints. In other words, when the model initially struggled with a problem, it was able to correct itself after receiving an appropriate hint from a human expert.
Based on the performance of Chat GPT3 in the operations management course, the study predicts that the model would have received a B to B- grade on the exam. This has important implications for business school education, including the need for exam policies, curriculum design focusing on collaboration between human and AI, opportunities to simulate real-world decision-making processes, the need to teach creative problem-solving, improved teaching productivity, and more.
In addition to the above implications, the study also highlights the need for MBA programs and faculty to be mindful of what Chat GPT3 can and cannot do, to continue to teach the foundations, and to deal with the cheating when testing foundational knowledge. Many educators are concerned that their students might be cheating on homework assignments and final exams using Chat GPT3 and other technologies of this type, and this is something that needs to be addressed.