CAN AI PASS THE U.S. ARMY WAR COLLEGE COMPREHENSIVE EXAM?

Can AI pass military exams? Kevin Boyce & John Nagl join host Tom Spahr to discuss testing ChatGPT, Gemini, Claude & Grok on Army War College comps. All passed, but limits caused AI to degrade under pressure, proving human judgment remains indispensable. You can read the article Can AI Pass the U.S. Army War College? by Kevin Boyce, John Nagl and Kris Wheaton here https://publications.armywarcollege.edu/News/Display/Article/4472536/can-ai-pass-the-us-army-war-college/ You can find the manuscript Responsibly Pursuing Generative Artificial Intelligence (GenAI) for the War Fighter by Blair Wilcox and Anthony Pfaff here https://press.armywarcollege.edu/cgi/viewcontent.cgi?article=3365&context=parameters https://warroom.armywarcollege.edu/podcasts/ai-comps

Editor’s Note: In this episode, the participants discuss the performance of four commercially available large language models. This discussion and the testing of these models do not constitute an endorsement by the U.S. Army War College, the U.S. Army, or the Department of War. All products were obtained commercially, and no company or government agency received special consideration.

In February 2026, U.S. Army War College faculty conducted a groundbreaking experiment, administering rigorous oral comprehensive exams to four prominent AI models instead of students. By testing ChatGPT, Google Gemini, Anthropic’s Claude, and xAI’s Grok, the team benchmarked how advanced artificial intelligence handles complex strategic thinking.

Kevin Boyce and John Nagl join host Tom Spahr to discuss their methodology and remarkable findings. While all four commercial models passed, one comfortably stood out. However, the researchers discovered a critical flaw: all of these digital “students” degraded during extended questioning due to technical computing limits, with their responses growing repetitive and lazy.

This project highlights that while AI is a powerful tool for historical recall, human judgment remains indispensable for high-pressure decisions. Senior leaders should treat AI like a capable but imperfect staff officer—asking the right questions while carefully verifying the output.

You can read the article Can AI Pass the U.S. Army War College? by Kevin Boyce, John Nagl and Kris Wheaton here.

You can find the manuscript Responsibly Pursuing Generative Artificial Intelligence (GenAI) for
the War Fighter by Blair Wilcox and Anthony Pfaff here.

Professor Kris Wheaton said about two years ago… that AI is a mediocre staff officer. I would argue now that it’s a good staff officer, but still hallucinations, probabilistic thinking, it’s gonna make mistakes.

Podcast: Download

Subscribe: Android | RSS

Kevin Boyce is Director of the Futures Lab and Assistant Professor of Futures and Emerging Technology at the U.S. Army War College. A retired Marine Corps Aviation Command and Control Officer with 25 years of service, his research focuses on emerging technologies, AI benchmarking, and their application in senior military education.

John Nagl is Professor of Warfighting Studies at the U.S. Army War College. He is the author of Learning to Eat Soup with a Knife: Counterinsurgency Lessons from Malaya and Vietnam.

Thomas W. Spahr is the DeSerio Chair of Strategic and Theater Intelligence at the U.S. Army War College. He is a retired colonel in the U.S. Army and holds a Ph.D. in History from The Ohio State University. He teaches courses at the Army War College on Military Campaigning and Intelligence.

The views expressed in this presentation are those of the speakers and do not necessarily reflect those of the U.S. Army War College, U.S. Army, or Department of War.

Photo Description: (L-R) Dr. Alexandra Meise, Dr. Jadwiga Biskupska, Mr. Kevin Boyce and Dr. Kris Wheaton administering oral comprehensive exams to multiple large language models.

Photo Credit: Courtesy of the U.S. Army War College

Tags: AI joint PME LLM oral comprehensives

4 thoughts on “CAN AI PASS THE U.S. ARMY WAR COLLEGE COMPREHENSIVE EXAM?”

Given that the AI’s, as “staff officers,” would seem to be able to answer a commander’s questions immediately, fairly reliably and without — for example — 10 months worth of US War College schooling,

Then, given these such matters, how can the AI not be seen as being the better staff officer; this, than the human staff officer — who may have to take days or even weeks of research time — this, to, in equal quality, reliability and/or detail — answer the commander’s questions?

B.C. says:

June 10, 2026 at 12:05 pm

As to the suggestion that I make above (an AI “staff officer” may be able to answer a commander’s questions much quicker, as reliably and at least with equal quality and detail than a human staff officer), note that, from that such perspective — and sometime in the near future — (a) a single AI “staff officer” might be able to (b) replace ALL the human staff officers?

Reply

Thus — in a single AI “staff officer” — you might be able to obtain certain of the knowledge, abilities, capabilities and assistance that you currently require, and depend upon, (a) from your manpower or personnel staff officer, (b) from your intelligence, security, and information operations staff officer, (c) from your operations staff officer, (d) from your logistics staff officer (etc., etc., etc.)?

This such reliance on a single AI staff officer, for example, (a) massively reducing the size of your “headquarters” and, thus, (b) massively improving just how fast you can rapidly move, reposition and/or hide this such headquarters?

BC’s observations are rather insightful. It compels consideration.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

4 thoughts on “CAN AI PASS THE U.S. ARMY WAR COLLEGE COMPREHENSIVE EXAM?”

Leave a Reply Cancel reply

Related News

4 thoughts on “CAN AI PASS THE U.S. ARMY WAR COLLEGE COMPREHENSIVE EXAM?”

Leave a Reply Cancel reply

Related News

NAVAL POWER & POLICY: RED STAR OVER THE PACIFIC

AT THE SPEED OF RELEVANCE: REFORMING PROCUREMENT

BEYOND THE SELECTIVE SERVICE MYTHS

ESCAPING THE READINESS TRAP: RESHAPING THE RESERVES