GPT-4V(ision) (12/18)

Figure 11: Examples of visual vulnerabilities GPT-4V exhibits. This example demonstrates model generations can be sensitive to the order in which images are given to the model. 2.3.6 Visual vulnerabilities Red teaming found some limitations that are speci昀椀cally associated with the ways that images could be used or presented. For example: ordering of the images used as input may in昀氀uence the recommendation made. In the example in 11, asking for which state to move to, based on the 昀氀ags inputted, favors the 昀椀rst 昀氀ag inputted when red teamers tested both possible orderings of the 昀氀ags. This example represents challenges with robustness and reliability that the model still faces. We anticipate there to be many more such vulnerabilities in the model that we discover through its broad usage and we will be working on improving model performance for future iterations to be robust to them. 2.4 Mitigations 2.4.1 Transfer bene昀椀ts from existing safety work GPT-4Vinherits several transfer bene昀椀ts from model-level and system-level safety mitigations already deployed in GPT-4.[7] In a similar vein, some of our safety measures implemented for DALL·E [6, 30, 31] proved bene昀椀cial in addressing potential multi-modal risk in GPT-4V. Internal evaluations show that performance of refusals of text content against our existing policies is equivalent to our base language model for GPT-4V. At the system-level, our existing moderation classi昀椀ers continue to inform our monitoring and enforcement pipelines for post-hoc enforcement of text inputs and outputs. GPT-4V mirrors [6] our existing moderation e昀昀orts deployed in DALL·E to detect explicit image uploads by users. These transfer bene昀椀ts from our prior safety work enable us to focus on novel risks introduced by this multimodal model. This includes areas where, in isolation, the text or image content is benign, but in concert create a harmful prompt or generation; images with people in them; and common multimodal jailbreaks such as adversarial images with text. 12

GPT-4V(ision) Page 11 Page 13