Figure 7: Examples of GPT-4V’s unreliable performance for medical purposes. radiographic imaging. Misdiagnosing the laterality of any number of conditions is very dangerous. Given the model’s imperfect performance in this domain and the risks associated with inaccuracies, we do not consider the current version of GPT-4V to be 昀椀t for performing any medical function or substituting professional medical advice, diagnosis, or treatment, or judgment. 2.3.3 Stereotyping and ungrounded inferences Using GPT-4Vforsometasksmightgenerate unwanted or harmful assumptions that are not grounded in the information provided to the model (the image or the text prompt). Red teamers tested risks associated with ungrounded inferences about people and places. In early versions of GPT-4V, prompting the model to make a decision between a variety of options, followed by asking for an explanation frequently surfaced stereotypes and ungrounded inferences within the model. Broad open-ended questions to the model paired with an image also exposed bias or anchoring towards speci昀椀c topics that may not necessarily have been intended by the prompt. Eg. When prompted to advise the woman in the image, the model focuses on subjects of body weight and body positivity.(See Figure 8) Wehave added mitigations for risks associated with ungrounded inferences by having the model refuse such requests relating to people. This is a conservative approach, and our hope is that as we re昀椀ne our research and mitigations, the model may be able to answer questions about people in low-risk contexts. 2.3.4 Disinformation risks Asnoted in the GPT-4 system card, the model can be used to generate plausible realistic and targeted text content. When paired with vision capabilities, image and text content can pose increased risks with disinformation since the model can create text content tailored to an image input. Previous work has shown that people are more likely to believe true and false statements when they’re presented alongside an image, and have false recall of made up headlines when they are accompanied with a photo. It is also known that engagement with content increases when it is associated with an image.[28][29] 3All images with people in them used here are synthetically generated. 9

GPT-4V(ision) - Page 9 GPT-4V(ision) Page 8 Page 10