我们为何思考
Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post.
特别感谢John Schulman对这篇文章提供了许多非常有价值的反馈和直接编辑。
Test time compute (Graves et al. 2016, Ling, et al. 2017, Cobbe et al. 2021) and Chain-of-thought (CoT) (Wei et al. 2022, Nye et al. 2021), have led to significant improvements in model performance, while raising many research questions. This post aims to review recent developments in how to effectively use test-time compute (i.e. “thinking time”) and why it helps.
测试时间计算 (Graves et al. 2016, Ling, et al. 2017, Cobbe et al. 2021) 和链式推理 (CoT) (Wei et al. 2022, Nye et al. 2021),已显著提高了模型性能,同时提出了许多研究问题。本文旨在回顾如何有效使用测试时间计算(即“思考时间”)的最新进展,以及它为何有帮助。
Motivation#
动机#
Enabling models to think for longer can be motivated in a few different ways.
使模型能够更长时间思考可以通过几种不同的方式进行激励。
Analogy to Psychology#
与心理学的类比#
The core idea is deeply connected to how humans think. We humans cannot immediately provide the answer for "What's 12345 times 56789?"
. Rather, it is natural to spend time pondering and analyzing before getting to the result, especially for complex problems. In Thinking, Fast and Slow (Kahneman, 2013), Daniel Kahneman characterizes human thinking into two modes, through the lens of the dual process theory :
核心思想与人类思维密切相关。我们人类无法立即提供 "12345 乘以 56789 是多少?"
的答案。相反,在得出结果之前,花时间思考和分析是很自然的,尤其是对于复杂问题。在 《思考,快与慢》(Kahneman, 2013) 中,Daniel Kahneman 从 双重过程理论 的角度将人类思维分为两种模式:
- Fast thinking (System 1) operates quickly and automatically, driven by intuition and emotion while requiring little to no effort.
- 快速思维(系统1) 迅速且自动地运作,受直觉和情感驱动,几乎不需要努力。
- Slow thinking (System 2) demands deliberate, logical thought and significant cognitive efforts. This mode of thinking consumes more mental energy and requires intentional engagement.
- 慢速思维(系统2) 需要深思熟虑的逻辑思考和显著的认知努力。这种思维模式消耗更多的心理能量,并需要有意的参与。
Because System 1 thinking is fast and easy, it often ends up being the main decision driver, at the cost of accuracy and logic. It naturally relies on our brain’s mental shortcuts (i.e., heuristics) and can lead to errors and biase...