GPT-4 (20/60) — OpenAI

on the model may hinder the development of new skills or even lead to the loss of important skills. Overreliance is a failure mode that likely increases with model capability and reach. As mistakes become harder for the average human user to detect and general trust in the model grows, users are less likely to challenge or verify the model’s responses.[96] Our existing mitigations across all of these axes include documentation and hedging language within the model. However, mitigating overreliance requires multiple defenses, and especially depends on downstream interventions by developers. We recommend that developers using our tools provide end users with detailed documentation on their systems’ capabilities and limitations, as well as guidance on how to get the best performance from the system. To prevent dependency, we urge developers to be cautious in how they refer to the model/system, and to generally avoid misleading claims or implications—including that it is human—and to consider the potential impact of changes to the model’s style, tone, or perceived personality on users. We also suggest that developers communicate to users the importance of critically evaluating model outputs. At the model-level we’ve also made changes to address the risks of both overreliance and underreliance. Weve found that GPT-4 exhibits enhanced steerability which allows it to better infer users intentions without extensive prompt tuning. To tackle overreliance, we’ve reﬁned the model’s refusal behavior, making it more stringent in rejecting requests that go against our content policy, while being more open to requests it can safely fulﬁll. One objective here is to discourage users from disregarding the model’s refusals. However, it’s worth noting that GPT-4 still displays a tendency to hedge in its responses. Some of our early studies suggest that this epistemic humility may inadvertently foster overreliance, as users develop trust in the model’s cautious approach. It’s crucial to recognize that the model isn’t always accurate in admitting its limitations, as evidenced by its tendency to hallucinate. Additionally, users might grow less attentive to the model’s hedging and refusal cues over time, further complicating the issue of overreliance. 60

GPT-4 Page 19 Page 21