Risk Mitigation: We run our existing moderation model[17] over a text transcription of the audio input to detect if it contains a request for violent or erotic content, and will block a generation if so. 3.3.7 Other known risks and limitations of the model Through the course of internal testing and external red teaming, we discovered some additional risks and model limitations for which model or system level mitigations are nascent or still in development, including: Audio robustness: We saw anecdotal evidence of decreases in safety robustness through audio perturbations, such as low quality input audio, background noise in the input audio, and echoes in the input audio. Additionally, we observed similar decreases in safety robustness through intentional and unintentional audio interruptions while the model was generating output. Misinformation and conspiracy theories: Red teamers were able to compel the model to generate inaccurate information by prompting it to verbally repeat false information and produce conspiracy theories. While this is a known issue for text in GPT models [18, 19], there was concern from red teamers that this information may be more persuasive or harmful when delivered through audio, especially if the model was instructed to speak emotively or emphatically. The persuasiveness of the model was studied in detail (See Section 3.7 and we found that the model did not score higher than Medium risk for text-only, and for speech-to-speech the model did not score higher than Low. Speaking a non-English language in a non-native accent: Red teamers observed instances of the audio output using a non-native accent when speaking in a non-English language. This may lead to concerns of bias towards certain accents and languages, and more generally towards limitations of non-English language performance in audio outputs. Generating copyrighted content: We also tested GPT-4o’s capacity to repeat content found within its training data. We trained GPT-4o to refuse requests for copyrighted content, including audio, consistent with our broader practices. To account for GPT-4o’s audio modality, we also updated certain text-based 昀椀lters to work on audio conversations, built 昀椀lters to detect and block outputs containing music, and for our limited alpha of ChatGPT’s advanced Voice Mode, instructed the model to not sing at all. We intend to track the e昀昀ectiveness of these mitigations and re昀椀ne them over time. Although some technical mitigations are still in development, our Usage Policies[20] disallow intentionally deceiving or misleading others, and circumventing safeguards or safety mitigations. In addition to technical mitigations, we enforce our Usage Policies through monitoring and take action on violative behavior in both ChatGPT and the API. 3.4 Preparedness Framework Evaluations Weevaluated GPT-4o in accordance with our Preparedness Framework[4]. The Preparedness Framework is a living document that describes our procedural commitments to track, evaluate, forecast, and protect against catastrophic risks from frontier models. The evaluations currently cover four risk categories: cybersecurity, CBRN (chemical, biological, radiological, nuclear), persuasion, and model autonomy. If a model passes a high risk threshold, we do not deploy the model until mitigations lower the score to medium. We below detail the evaluations conducted 12

GPT-4o - Page 12 GPT-4o Page 11 Page 13