Figure 3: Evaluating GPT-4V + Refusal System against screenshots of a text refusal dataset 昀椀nds that the combination of model-level mitigations and our refusal system enabled us to reach our internal target of a 100% refusal rate. intended to test risks associated with the multimodal (vision) functionality of GPT-4, and builds upon the work in the GPT-4 system card. We focus this analysis on 6 2 key risk areas we received especially useful red teamer feedback in: • Scienti昀椀c pro昀椀ciency • Medical advice • Stereotyping and ungrounded inferences • Disinformation risks • Hateful Content • Visual vulnerabilities 2.3.1 Scienti昀椀c pro昀椀ciency RedteamerstestedGPT-4V’scapabilitiesandlimitationsinscienti昀椀cdomains. Intermsofcapabilities, red teamers noted the model’s ability to capture complex information in images, including very specialized imagery extracted from scienti昀椀c publications, and diagrams with text and detailed components. Additionally, in some instances, the model was successful at properly understanding advanced science from recent papers and critically assessing claims for novel scienti昀椀c discoveries. However, the model exhibited some key limitations. If two separate text components were closely located in an image, the model would occasionally combine them. For instance, it may merge

GPT-4V(ision) - Page 6 GPT-4V(ision) Page 5 Page 7