I was curious to read this recent preprint on arXiv (https://doi.org/10.48550/arXiv.2410.02820) that describes the results of an experimental investigation into whether the GPT-4o model is risk averse and biased in a similar way to human subjects. The authors write that “GPT-4o consistently navigates conjunction fallacy and certain aspects of probability neglect with statistically robust responses. However, the model frequently falls prey to biases such as the resemblance heuristic and framing effects.”
So, the model does better than an untrained human in certain situations and similarly in others. While this experiment is limited to the kinds of language processing that GPT-4o should be able to handle, these models can be employed to give advice, and being aware of and developing methods to mitigate bias in these models will be an important area of research going forward.
If anyone is interested in setting up a GPT-like experiment similar to some of our previous engineering student experiments on risk-related decisions (like https://doi.org/10.1115/IMECE2022-95484 or https://doi.org/10.1115/1.4055156), let me know. I would be interested in running some head-to-head comparisons.
Leave a Reply