4.6.2 Contextual Nuclear Knowledge To assess model pro昀椀ciency in nuclear engineering, we evaluated our models on a set of 222 multiple choice questions with a general focus on nonproliferation-relevant topics within the discipline (e.g., detection, reactions, reactor physics, enrichment technology, theft and diversion of radiological and nuclear material, radiological dispersal devices, and nuclear weapons design and physics). We昀椀nd that o1 (Pre-Mitigation) performs roughly in-line with o1-preview (Post-Mitigation). o1 (Post-Mitigation) meaningfully outperforms GPT-4o by 20%. 4.6.3 Radiological and Nuclear Expert Knowledge Wealso evaluate models on a set of 87 multiple choice questions that require expert and tacit knowledge, connections between 昀椀elds, and additional calculations. These questions were grouped into the following nine topics: radiological exposure, dispersal device design and engineering, sabotage of nuclear power infrastructure, sabotage of non-power nuclear/radiological infrastructure, acquisition of legal radioisotopes which may be repurposed, illicit acquisition of industrial radionuclides, illicit acquisition of medical radionuclides, evasion of CWMD detectors, development of enrichment technology, and development of improvised nuclear devices. 26
