GPT-4o

This report outlines the safety work carried out prior to releasing GPT-4o including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas.

GPT-4o System Card OpenAI August 8, 2024 1 Introduction GPT-4o[1] is an autoregressive omni model, which accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It’s trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time[2] in a conversation. It matches GPT-4 Turbo performance on text in English and code, with signi昀椀cant improvement on text in non- English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House[3], we are sharing the GPT-4o System Card, which includes our Preparedness Framework[4]evaluations. Inthis SystemCard, weprovideadetailedlookatGPT-4o’scapabilities, limitations, and safety evaluations across multiple categories, with a focus on speech-to-speech 1 (voice) while also evaluating text and image capabilities, and the measures we’ve implemented to ensure the model is safe and aligned. We also include third party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o text and vision capabilities. 2 Model data and training GPT-4o’s text and voice capabilities were pre-trained using data up to October 2023, sourced from a wide variety of materials including: • Select publicly available data, mostly collected from industry-standard machine learning datasets and web crawls. • Proprietary data from data partnerships. We form partnerships to access non-publicly available data, such as pay-walled content, archives, and metadata. For example, we partnered with Shutterstock[5] on building and delivering AI-generated images. 1Some evaluations, in particular, the majority of the Preparedness Evaluations, third party assessments and some of the societal impacts focus on the text and vision capabilities of GPT-4o, depending on the risk assessed. This is indicated accordingly throughout the System Card. 1

GPT-4o Page 2