Figure 11: Results on IF evaluations across GPT3.5, GPT3.5-Turbo, GPT-4-launch 98 GPT-4 Page 57 Page 59