OpenAI o1 (10/49)

completions. Additionally, we prompted o1-preview with our regurgitation evaluations, and then evaluated the summaries. We do not 昀椀nd any instances of improper regurgitation of training data in the summaries. 3.4 External Red Teaming In addition to the internal evaluations performed above, OpenAI worked with multiple organiza- 1 tions and individuals to assess key risks associated with the o1 model series’ improved reasoning capabilities. In red teaming, experts are asked to carry out open-ended discovery for possible risks and determine any new risks the model could pose in their domain[26]. Red teamers had access to various snapshots of the model at di昀昀erent stages of training and mitigation maturity starting in early August through mid-September 2024. The model was accessed either via a sampling interface or via the API. For o1, red teamers had access to various snapshots of the model at di昀昀erent stages of training and safety mitigation maturity starting in October 2024, through early December 2024. The model was accessed either via a sampling interface or via the API. Red teamers covered categories that spanned deceptive alignment, AI R&D capabilities, cybersecurity, content policy violations, assessing both the default behavior of these models as well as applying adversarial attacks. 3.4.1 Pairwise Safety Comparison Weasked members of the Red Teaming Network (RTN) to have free-form conversations in an interface that generates responses from GPT-4o and o1 in parallel where both models were anonymized. Red teamers were asked to test the model in an open-ended manner and explore di昀昀erent areas of risks using their own expertise and judgment. They rated the conversations as either

OpenAI o1 Page 9 Page 11