Lambda caluclus 是一种描述语言机制的 core language。除此之外,还有其他 calculi:
pi-calculus: a popular core language for defining the semantics of message-based concurrent languages object calculus: distills the core features of object-oriented languages 传统的 Lambda Calculus 是一个 core language,可以给它添加其他特性构成新的语言。
the pure untyped lambda-calculus:λ the lambda-calculus extended with booleans and arithmetic operations:λNB Basics 在 λ 演算中一切都是函数,包括函数的输入值和返回值都可以用函数表示。
λ 演算的中的 term 指的是任意的 term,而其中以 λ 开头的 term 被称为 lambda-abstractions 。
在 λ 演算中,其语法由三种形式组成:
变量:x
抽象:一个 term t
中抽象出一个变量 x
,记作 λ x . t \lambda x.t λ x . t 应用:将一个 term t2
作为参数传入另一个 term t1
,记作 t1 t2
t ∷ = ( terms ) x ( variable ) λ x . t ( abstraction ) t t ( application ) \begin{aligned}
t \Coloneqq & & (\text{terms}) \\
& x & (\text{variable}) \\
& \lambda x.t & (\text{abstraction}) \\
& t\ t & (\text{application}) \\
\end{aligned} t : : = x λ x . t t t ( terms ) ( variable ) ( abstraction ) ( application )
Abstract and Concrete Syntax 编程语言的语法有两种方式可以表示:一种是通常看到的字符串形式的代码,称为 concrete syntax(surface syntax);另一种是编译器或者解释器中用树结构(AST)表示的语法,称为 abstract syntax。后者有利于对代码的结构进行操作,因此常在编译器和解释器中使用。
将 concrete syntax 转换为 abstract syntax 的过程共分为两步:
通过 lexer 将字符串解析成一个 token 序列,token 表示最小的语法单元,删除空格、注释等内容,并进行简单的转换(如数字数制、字符串等) 通过 parser 将 token 序列转换成 AST,这个阶段要处理运算符的结合性等问题 除此之外,一些语言还有其他的解析过程。例如将语言的一些特性定义成更基本的特性的 derived forms。然后定义 internal language(IL)表示一门仅包含了必要的 derived forms 的 core language,同时定义 external language(EL)表示包含了所有 derived forms 的 full language。并且在 parse 的阶段后,将 EL 转换成 IL,从而使得语言的核心更简单。
为了减少冗余括号,本书书写 λ 表达式时会遵从两个规则:
application 采用左结合,即 s t u
表示 (s t) u
abstraction 的 body 部分尽可能得长,如 λ x . λ y . x y x \lambda x. \lambda y. x y x λ x . λ y . x y x 表示 ( λ x . ( λ y . ( x y ) x ) ) (\lambda x. (\lambda y. (x y) x)) ( λ x . ( λ y . ( x y ) x )) 出于习惯,t
表示任意的 term,xyz
表示任意的 metavariables。
Scope 当变量 x
出现在 abstraction λ x . t \lambda x.t λ x . t 的 body 部分时,称其是被约束的(bound) ,λ x \lambda x λ x 被称为 binder 。反之,如果变量不被约束,在称为是自由的(free) 。例如 ( λ x . x ) x (\lambda x.x)x ( λ x . x ) x 中第一次出现的 x
表示一个 binder,第二次出现的 x
是 bound 的,第三次出现的 x
是 free 的。
如果一个 term 没有自由变量,则称为封闭的 (closed)。封闭的 term 又被称为组合子(combinators) ,例如 identify function:
i d = λ x . x ;
\mathtt{id} = \lambda x.x;
id = λ x . x ;
Static-scoping and Dynamic-scoping 在编程语言中,scope 一般分为 static-scoping 和 dynamic-scoping:
在 static-scoping 中,一个变量的作用域是由它在源代码中的位置决定的。因此编译器或解释器可以仅通过分析代码结构(而不是实际执行代码)来确定变量的作用域。 与之相对的是 dynamic-scoping,变量的作用域由程序的运行时调用顺序决定。 下面是一个例子:
let x = 10 in
let f = ( lambda y . x + y ) in
let x = 20 in
f 5
在 static-scoping 中,这个表达式的值是 15,因为 f
的作用域是整个 let 表达式,所以 f
中的 x
是第一个 x
,而不是第二个;而在 dynamic-scoping 中,这个表达式的值是 25,因为 f
中的 x
是第二个 x
。
Operational Semantics 在 pure lambda calculus 中不包含任何数字、运算符等,唯一的运算只有 application。
Application 的步骤为将左侧的 abstraction 中的约束变量替换成右侧的组件。记作
( λ x . t 12 ) t 2 → [ x ↦ t 2 ] t 12
(\lambda x.t_{12}) t_2 \rightarrow [x \mapsto t_2] t_{12}
( λ x . t 12 ) t 2 → [ x ↦ t 2 ] t 12
其中 [ x ↦ t 2 ] t 12 [x \mapsto t_2] t_{12} [ x ↦ t 2 ] t 12 表示将 t 12 t_{12} t 12 中所有的自由出现的 x x x 替换成 t 2 t_2 t 2 。例如 ( λ x . x ) = y (\lambda x.x) = y ( λ x . x ) = y ,( λ x . x ( λ x . x ) ) ( u r ) = > u r ( λ x . x ) (\lambda x.x(\lambda x.x))(u r) => u r (λx.x) ( λ x . x ( λ x . x )) ( u r ) => u r ( λ x . x ) 。
类似于 ( λ x . t 12 ) t 2 (\lambda x.t_{12}) t_2 ( λ x . t 12 ) t 2 的表达式被称为 redex (reducible expressions)。Redex 可以用 beta-reduction 进行重写。
Evaluation strategies 例子:
( λ x . x ) ( ( λ x . x ) ) ( λ z . ( λ x . x ) ) z ) ) = i d ( i d ( λ z . i d z ) )
(\lambda x.x)\ ((\lambda x. x))\ (\lambda z. (\lambda x.x))\ z)) = \mathtt{id}\ (\mathtt{id}\ (\lambda z. \mathtt{id}\ z))
( λ x . x ) (( λ x . x )) ( λ z . ( λ x . x )) z )) = id ( id ( λ z . id z ))
Full beta-reduction :随便选一个 redex 进行 reduce
i d ( i d ( λ z . i d z ‾ ) ) → i d ( i d ( λ z . z ) ‾ ) → i d ( λ z . z ) → λ z . z \begin{aligned}
& \mathtt{id}\ (\mathtt{id}\ (\lambda z. \underline{\mathtt{id}\ z})) \\
\rightarrow {}& \mathtt{id}\ (\underline{\mathtt{id}\ (\lambda z. z)}) \\
\rightarrow {}& \mathtt{id}\ (\lambda z.z) \\
\rightarrow {}& \lambda z.z
\end{aligned} → → → id ( id ( λ z . id z )) id ( id ( λ z . z ) ) id ( λ z . z ) λ z . z
由 Church-Rosser property,λ 演算在 full beta-reduction 下是 confluent 的。(求值顺序不影响结果)
Normal order strategy:先 reduce 最外面、最左边的 redex
i d ( i d ( λ z . i d z ) ) ‾ → i d ( λ z . i d z ) ‾ → λ z . i d z ‾ → λ z . z \begin{aligned}
& \underline{\mathtt{id}\ (\mathtt{id}\ (\lambda z. \mathtt{id}\ z))} \\
\rightarrow {}& \underline{\mathtt{id}\ (\lambda z. \mathtt{id}\ z)} \\
\rightarrow {}& \lambda z.\ \underline{\mathtt{id}\ z} \\
\rightarrow {}& \lambda z.z
\end{aligned} → → → id ( id ( λ z . id z )) id ( λ z . id z ) λ z . id z λ z . z
Call by name strategy:call by name 和 normal order strategy 类似,但是它不允许在 abstraction 内部进行 reduces
Call by name 在调用的时候不计算值,而是直接传入对应的位置,用到的时候再调用
i d ( i d ( λ z . i d z ) ) ‾ → i d ( λ z . i d z ) ‾ → λ z . i d z ‾ \begin{aligned}
& \underline{\mathtt{id}\ (\mathtt{id}\ (\lambda z. \mathtt{id}\ z))} \\
\rightarrow {}& \underline{\mathtt{id}\ (\lambda z. \mathtt{id}\ z)} \\
\rightarrow {}& \lambda z.\ \underline{\mathtt{id}\ z} \\
\end{aligned} → → id ( id ( λ z . id z )) id ( λ z . id z ) λ z . id z
Call-by-name 被很多语言都实现了,比如 Algol60 和 Haskell。其中 Haskell 的更加特殊,使用了一个优化过的形式 call by need :在第一次求值后会记录下结果,后面进行复用,这是因为 Haskell 的 pure 的。
Call by value strategy (Applicative-order):最常用的 redex 策略。reduce 外层,且一个 redex 会被 reduce 仅当它的参数已经是一个 value。value 即一个不能被 reduce 的形式,包括 lambda abstractions,numbers,booleans 等。
i d ( i d ( λ z . i d z ) ) ‾ → i d ( λ z . i d z ) ‾ → λ z . i d z ‾ \begin{aligned}
& \mathtt{id}\ \underline{(\mathtt{id}\ (\lambda z. \mathtt{id}\ z))} \\
\rightarrow {}& \underline{\mathtt{id}\ (\lambda z. \mathtt{id}\ z)} \\
\rightarrow {}& \lambda z.\ \underline{\mathtt{id}\ z} \\
\end{aligned} → → id ( id ( λ z . id z )) id ( λ z . id z ) λ z . id z
其中,normal order strategy
和 call by name
都是 partial evaluation。它们在 reduce 的时候可能函数还没有被 apply。
Call by value
是 strict 的,即无论参数有没有用到,都会被 evaluate;反之 call by name
和 call by need
则只有在用到某个参数的时候才计算。
本书后面都使用 call by value
。因为这样实现 exceptions 和 reference 会更简单。
Programming in the Lambda-Calculus Multiple Arguments λ 演算中的多参数函数是通过高阶函数(higher-order functions)实现的。
假设 s s s 是一个包含自由变量 x
、y
的 term,f
是一个参数为 x
、y
的函数:
f = λ x . λ y . s
f = \lambda x. \lambda y. s
f = λ x . λ y . s
f v w = ( f v ) w → ( λ y . [ x ↦ v ] s ) w → [ y ↦ w ] [ x ↦ v ] s \begin{aligned}
f v w & = (f\ v) w \\
& \rightarrow (\lambda y.[x \mapsto v]s)\ w \\
& \rightarrow [y \mapsto w][x \mapsto v]s
\end{aligned} f v w = ( f v ) w → ( λ y . [ x ↦ v ] s ) w → [ y ↦ w ] [ x ↦ v ] s
这种参数一个个被 apply 的过程称为 currying。
Church Boolean λ 演算中的 boolean 也可以用 λ 表达式表示。其中 true
和 false
分别是一个接受两个参数的函数,true
返回第一个参数,false
返回第二个参数。这种表示可以看作是 testing the truth of a boolean value。
t r u = λ t . λ f . t ; f l s = λ t . λ f . f ; \begin{aligned}
\mathtt{tru} &= \lambda t. \lambda f. t; \\
\mathtt{fls} &= \lambda t. \lambda f. f;
\end{aligned} tru fls = λ t . λ f . t ; = λ t . λ f . f ;
下面定义一个类似 if
的 combinator test
。在 test b v w
中,当 b
为 true
时返回 v
,反之返回 w
。
t e s t = λ l . λ m . λ n . l m n ;
\mathtt{test} = \lambda l. \lambda m. \lambda n. l\ m\ n;
test = λ l . λm . λn . l m n ;
t e s t t r u v w = ( λ l . λ m . λ n . l m n ) t r u ‾ v w → ( λ m . λ n . t r u m n ) v ‾ w → ( λ n . t r u v n ) ‾ w → t r u v w = ( λ t . λ f . t ) v ‾ w → ( λ f . v ) w ‾ → v \begin{aligned}
&\mathtt{test}\ \mathtt{tru}\ v\ w \\
= {}& \underline{(\lambda l. \lambda m. \lambda n. l\ m\ n)\ \mathtt{tru}}\ v\ w \\
\rightarrow {}& \underline{(\lambda m. \lambda n. \mathtt{tru}\ m\ n)\ v}\ w \\
\rightarrow {}& \underline{(\lambda n. \mathtt{tru}\ v\ n)}\ w \\
\rightarrow {}& \mathtt{tru}\ v\ w \\
= {}& \underline{(\lambda t. \lambda f. t)\ v}\ w \\
\rightarrow {}& \underline{(\lambda f. v)\ w} \\
\rightarrow {}& v
\end{aligned} = → → → = → → test tru v w ( λ l . λm . λn . l m n ) tru v w ( λm . λn . tru m n ) v w ( λn . tru v n ) w tru v w ( λ t . λ f . t ) v w ( λ f . v ) w v
类似的,还可以定义 booleans 相关的逻辑运算:
and
:如果第一个数是 tru
,则看第二个数;否则直接返回 fls
a n d = λ b . λ c . b c f l s ; a n d 2 = λ b . λ c . b c b ; \begin{alignat*}{2}
&\mathtt{and} && = \lambda b.\lambda c.b\ c\ \mathtt{fls}; \\
&\mathtt{and2} && = \lambda b.\lambda c.b\ c\ b;
\end{alignat*} and and2 = λb . λ c . b c fls ; = λb . λ c . b c b ;
or
:如果第一个数是 tru
,则返回 tru
;否则看第二个数
o r = λ b . λ c . b t r u c ; o r 2 = λ b . λ c . b b c ; \begin{alignat*}{2}
&\mathtt{or} &&= \lambda b.\lambda c.b\ \mathtt{tru}\ c; \\
&\mathtt{or2} &&= \lambda b.\lambda c.b\ b\ c;
\end{alignat*} or or2 = λb . λ c . b tru c ; = λb . λ c . b b c ;
not
:
n o t = λ b . b f l s t r u
\mathtt{not} = \lambda b.b\ \mathtt{fls}\ \mathtt{tru}
not = λb . b fls tru
示例:
a n d t r u t r u = ( λ b . λ c . b c f l s ) t r u t r u ‾ → ∗ t r u t r u f l s = ( λ t . λ f . t ) t r u f l s ‾ → ∗ t r u \begin{aligned}
& \mathtt{and}\ \mathtt{tru}\ \mathtt{tru} \\
= {}& \underline{(\lambda b. \lambda c.b\ c\ \mathtt{fls})\ \mathtt{tru}\ \mathtt{tru}} \\
\rightarrow^* & \mathtt{tru}\ \mathtt{tru}\ \mathtt{fls} \\
= {}& \underline{(\lambda t. \lambda f.t)\ \mathtt{tru}\ \mathtt{fls}} \\
\rightarrow^* & \mathtt{tru}
\end{aligned} = → ∗ = → ∗ and tru tru ( λb . λ c . b c fls ) tru tru tru tru fls ( λ t . λ f . t ) tru fls tru
Pair p a i r = λ f . λ s . λ b . b f s ; f s t = λ p . p t r u ; s n d = λ p . p f l s ; \begin{alignat*}{2}
&\mathtt{pair} &&= \lambda f. \lambda s. \lambda b.b\ f\ s; \\
&\mathtt{fst} &&= \lambda p.p\ \mathtt{tru}; \\
&\mathtt{snd} &&= \lambda p.p\ \mathtt{fls};
\end{alignat*} pair fst snd = λ f . λ s . λb . b f s ; = λ p . p tru ; = λ p . p fls ;
示例:
f s t ( p a i r v w ) = f s t ( λ b . b v w ) = ( λ p . p t r u ) ( λ b . b v w ) → ( λ b . b v w ) t r u → t r u v w → ∗ v \begin{aligned}
&\mathtt{fst}\ (\mathtt{pair}\ v\ w) \\
= {}& \mathtt{fst}\ (\lambda b.b\ v\ w) \\
= {}& (\lambda p.\ p\ \mathtt{tru})(\lambda b.b\ v\ w) \\
\rightarrow {}& (\lambda b.b\ v\ w)\ \mathtt{tru} \\
\rightarrow {}& \mathtt{tru}\ v\ w \\
\rightarrow^* & v
\end{aligned} = = → → → ∗ fst ( pair v w ) fst ( λb . b v w ) ( λ p . p tru ) ( λb . b v w ) ( λb . b v w ) tru tru v w v
Church Numerals λ 演算中,自然数用 combinator 表示。其中,s
和 z
分别代表 succ
和 zero
。 其意义为递归对于 z
调用 n
次 s
,即 s n ( z ) s^n(z) s n ( z ) 。(The number n
is represented by a function that does something n
times)
个人感觉在 λ 演算中,对于数据强调的不是如何存储,而是如何去使用它们。所以 tru
和 fls
对应了程序的选择结构;自然数对应了程序的归纳结构(类似于循环)。
c 0 = λ s . λ z . z ; c 1 = λ s . λ z . s z ; c 2 = λ s . λ z . s ( s z ) ; c 3 = λ s . λ z . s ( s ( s z ) ) ; \begin{aligned}
\mathrm{c}_{0} &= \lambda s.\lambda z.\mathrm{z}; \\
\mathrm{c}_{1} &= \lambda s.\lambda z.\mathrm{s}\ \mathrm{z}; \\
\mathrm{c}_{2} &= \lambda s.\lambda z.\mathrm{s}\ (\mathrm{s}\ \mathrm{z}); \\
\mathrm{c}_{3} &= \lambda s.\lambda z.\mathrm{s}\ (\mathrm{s}\ (\mathrm{s} \mathrm{z}));
\end{aligned} c 0 c 1 c 2 c 3 = λ s . λ z . z ; = λ s . λ z . s z ; = λ s . λ z . s ( s z ) ; = λ s . λ z . s ( s ( sz )) ;
不难发现,C 0 C_0 C 0 和 f l s \mathtt{fls} fls 的表示形式相同!
求后继数:直接套上一层 s
(由于是 currying 的形式,所以结果还是 λ s . λ z . t \lambda s.\lambda z.t λ s . λ z . t )
s c c = λ n . λ s . λ z . s ( n s z ) ; s c c 2 = λ n . λ s . λ z . n s ( s z ) ; \begin{alignat*}{2}
& \mathtt{scc} &&= \lambda n.\lambda s.\lambda z.s\ (n\ s\ z); \\
& \mathtt{scc2} &&= \lambda n.\lambda s.\lambda z.\ n\ s\ (s\ z);
\end{alignat*} scc scc2 = λn . λ s . λ z . s ( n s z ) ; = λn . λ s . λ z . n s ( s z ) ;
求和:m
的 s
不变,z
变成 n
,意为在 n
上应用 m
次,即 s n + m ( z ) = s n ( s m ( z ) ) s^{n+m}(z) = s^n(s^m(z)) s n + m ( z ) = s n ( s m ( z ))
p l u s = λ m . λ n . λ s . λ z . m s ( n s z ) ;
\mathtt{plus} = \lambda m.\lambda n.\lambda s.\lambda z. m\ s\ (n\ s\ z);
plus = λm . λn . λ s . λ z . m s ( n s z ) ;
乘法:第一个数字的 s
变成 plus n
,意为在 z
上调用 m
次 plus n
,即 s n m ( z ) = ( s n ) m ( z ) s^{nm}(z) = (s^n)^m(z) s nm ( z ) = ( s n ) m ( z )
t i m e s = λ m . λ n . m ( p l u s n ) c 0 ; t i m e s 2 = λ m . λ n . λ s . λ z . λ . m ( n s ) z ; t i m e s 3 = λ m . λ n . λ s . m ( n s ) ; \begin{alignat*}{2}
& \mathtt{times} &&= \lambda m.\lambda n.m\ (\mathtt{plus}\ n)\ c_0; \\
& \mathtt{times2} &&= \lambda m.\lambda n.\lambda s.\lambda z.\lambda.m\ (n\ s)\ z; \\
& \mathtt{times3} &&= \lambda m.\lambda n.\lambda s.m\ (n\ s);
\end{alignat*} times times2 times3 = λm . λn . m ( plus n ) c 0 ; = λm . λn . λ s . λ z . λ . m ( n s ) z ; = λm . λn . λ s . m ( n s ) ;
其中 times2
比较有意思。其中 n s
的基数部分(z
)接受的是上一次加法的结果,这样调用 m
次,即执行 m
次加法。times3
是 times2
的化简形式。
幂次:
p o w e r = λ m . λ n . λ s . n ( t i m e s m ) c 1 ; p o w e r 2 = λ m . λ n . λ s . λ z . n ( λ f . m f s ) s z ; p o w e r 3 = λ m . λ n . n m ; \begin{alignat*}{2}
& \mathtt{power} &&= \lambda m.\lambda n.\lambda s.n\ (\mathtt{times}\ m)\ c_1; \\
& \mathtt{power2} &&= \lambda m.\lambda n.\lambda s.\lambda z.n\ (\lambda f.m\ f\ s)\ s\ z; \\
& \mathtt{power3} &&= \lambda m.\lambda n.n\ m;
\end{alignat*} power power2 power3 = λm . λn . λ s . n ( times m ) c 1 ; = λm . λn . λ s . λ z . n ( λ f . m f s ) s z ; = λm . λn . n m ;
其中有意思的是 power2
,可以从 power
化简,也可以这么理解:
考虑现在已经有了
g i = λ f ’ . λ z . f ’ ( f ’ ( ⋯ f ’ ( z ) ⋯ ) ) ⏟ m i times ; m = λ f . λ z . f ( f ( ⋯ f ( z ) ⋯ ) ) ⏟ m i times ; \begin{aligned}
g_i &= \lambda f’. \lambda z.\underbrace{f’ (f’ ( \cdots f’ (z) \cdots ))}_{m^i\ \text{times}}; \\
m &= \lambda f. \lambda z. \underbrace{f (f ( \cdots f (z) \cdots ))}_{m^i\ \text{times}};
\end{aligned} g i m = λ f ’. λ z . m i times f ’ ( f ’ ( ⋯ f ’ ( z ) ⋯ )) ; = λ f . λ z . m i times f ( f ( ⋯ f ( z ) ⋯ )) ;
令 m m m 中的每一个 f f f 都变成 λ z . g i s z \lambda z.g_i\ s\ z λ z . g i s z ,则得到
g i + 1 = λ s . λ z . m ( λ z ’ . g i s z ’ ) z ;
g_{i+1} = \lambda s. \lambda z. m\ (\lambda z’.g_i\ s\ z’)\ z;
g i + 1 = λ s . λ z . m ( λ z ’. g i s z ’ ) z ;
则
g n = λ s . λ z . m ( λ z ’ . g n − 1 s z ’ ) z → λ s . m ( g n − 1 s ) = λ s . m ( ( λ s . m ( g n − 2 s ) ) s ) → λ s . m ( m ( g n − 2 s ) s ) = λ s . m ( m ( … s ) s ) ⏟ n times = λ s . n ( λ f . m f s ) \begin{aligned}
g_n = {}& \lambda s. \lambda z. m\ (\lambda z’.g_{n-1}\ s\ z’)\ z \\
\rightarrow {}& \lambda s. m\ (g_{n-1}\ s) \\
= {}& \lambda s. m\ ((\lambda s.m\ (g_{n-2}\ s))\ s) \\
\rightarrow {}& \lambda s. m\ (m\ (g_{n-2}\ s)\ s) \\
= {}& \lambda s. \underbrace{m\ (m\ (\dots\ s)\ s)}_{n\ \text{times}} \\
= {}& \lambda s. n\ (\lambda f.m\ f\ s)
\end{aligned} g n = → = → = = λ s . λ z . m ( λ z ’. g n − 1 s z ’ ) z λ s . m ( g n − 1 s ) λ s . m (( λ s . m ( g n − 2 s )) s ) λ s . m ( m ( g n − 2 s ) s ) λ s . n times m ( m ( … s ) s ) λ s . n ( λ f . m f s )
iszro
:对于 λ s . λ z . z \lambda s. \lambda z. z λ s . λ z . z 返回 t r u \mathtt{tru} tru ;对于 λ s . λ z . s z \lambda s. \lambda z. s\ z λ s . λ z . s z 返回 f l s \mathtt{fls} fls 。直接令 z
返回 tru
,s
返回 fls
。
i s z r o = λ m . m ( λ x . f l s ) t r u ;
\mathtt{iszro} = \lambda m. m\ (\lambda x. \mathtt{fls})\ \mathtt{tru};
iszro = λm . m ( λ x . fls ) tru ;
prd
:求前置,思路比较巧妙
z z = p a i r c 0 c 0 ; s s = λ p . p a i r ( s n d p ) ( p l u s c 1 ( s n d p ) ) ; p r d = λ m . f s t ( m s s z z ) ; \begin{alignat*}{2}
&\mathtt{zz} &&= \mathtt{pair}\ \mathrm{c}_{0}\ \mathrm{c}_{0}; \\
&\mathtt{ss} &&= \lambda p. \mathtt{pair}\ (\mathtt{snd}\ p)\ (\mathtt{plus}\ \mathtt{c}_1\ (\mathtt{snd}\ p)); \\
&\mathtt{prd} &&= \lambda m. \mathtt{fst}\ (m\ ss\ zz);
\end{alignat*} zz ss prd = pair c 0 c 0 ; = λ p . pair ( snd p ) ( plus c 1 ( snd p )) ; = λm . fst ( m ss zz ) ;
构造序列:z z = ( 0 , 0 ) → s s ( 0 , 1 ) → s s ( 1 , 2 ) → s s ⋯ → s s ⏟ n times ( n − 1 , n ) \mathtt{zz} = (0,0) \underbrace{\xrightarrow{\mathtt{ss}} (0,1) \xrightarrow{\mathtt{ss}} (1,2) \xrightarrow{ss} \cdots \xrightarrow{ss}}_{n\ \text{times}}\ (n-1,n) zz = ( 0 , 0 ) n times ss ( 0 , 1 ) ss ( 1 , 2 ) ss ⋯ ss ( n − 1 , n ) ,恰好执行了 n n n 次,此时求一个 f s t \mathtt{fst} fst 即可。
除了用 pair
外,还可以有另外一种实现:
p r d 2 = λ n . λ s . λ z . n ( λ g . λ h . h ( g s ) ) ( λ u . z ) ( λ u . u )
\mathtt{prd2} = \lambda n. \lambda s.\lambda z. n\ (\lambda g. \lambda h. h\ (g\ s))\ (\lambda u. z)\ (\lambda u.u)
prd2 = λn . λ s . λ z . n ( λ g . λh . h ( g s )) ( λ u . z ) ( λ u . u )
令 c o n s t = ( λ u . z ) \mathtt{const} = (\lambda u. z) const = ( λ u . z ) ,i n c = λ g . λ h . h ( g s ) \mathtt{inc} = \lambda g. \lambda h. h\ (g\ s) inc = λ g . λh . h ( g s ) 则有
c o n s t = z i n c c o n s t = λ h . h z i n c i n c c o n s t = λ h . h s z i n c i n c i n c c o n s t = λ h . h s s z \begin{aligned}
\mathtt{const} &= z \\
\mathtt{inc}\ \mathtt{const} &= \lambda h. h\ z \\
\mathtt{inc}\ \mathtt{inc}\ \mathtt{const} &= \lambda h. h\ s\ z \\
\mathtt{inc}\ \mathtt{inc}\ \mathtt{inc}\ \mathtt{const} &= \lambda h. h\ s\ s\ z
\end{aligned} const inc const inc inc const inc inc inc const = z = λh . h z = λh . h s z = λh . h s s z
p r d 2 = λ n . λ s . λ z . n ( λ g . λ h . h ( g s ) ) ( λ u . z ) ( λ u . u ) = λ n . λ s . λ z . i n c ⋯ i n c ⏟ n times ( λ u . z ) ( λ u . u ) = λ n . λ s . λ z . ( λ h . h ( s s ⋯ s ⏟ n − 1 times z ) ) ( λ u . u ) → λ n . λ s . λ z . s s ⋯ s ⏟ n − 1 times z \begin{aligned}
\mathtt{prd2} = {}& \lambda n. \lambda s.\lambda z. n\ (\lambda g. \lambda h. h\ (g\ s))\ (\lambda u. z)\ (\lambda u.u) \\
= {}& \lambda n. \lambda s. \lambda z. \underbrace{\mathtt{inc} \cdots \mathtt{inc}}_{n\ \text{times}}\ (\lambda u.z)\ (\lambda u.u) \\
= {}& \lambda n. \lambda s. \lambda z. (\lambda h. h\ (\underbrace{s\ s \cdots\ s}_{n-1\ \text{times}}\ z)) \ (\lambda u.u) \\
\rightarrow {}& \lambda n. \lambda s. \lambda z. \underbrace{s\ s \cdots\ s}_{n-1\ \text{times}}\ z\\
\end{aligned} prd2 = = = → λn . λ s . λ z . n ( λ g . λh . h ( g s )) ( λ u . z ) ( λ u . u ) λn . λ s . λ z . n times inc ⋯ inc ( λ u . z ) ( λ u . u ) λn . λ s . λ z . ( λh . h ( n − 1 times s s ⋯ s z )) ( λ u . u ) λn . λ s . λ z . n − 1 times s s ⋯ s z
复杂度均为 O ( n ) O(n) O ( n ) 。
减法:利用 prd
实现
s u b t r a c t 1 = λ m . λ n . n p r d m ;
\mathtt{subtract1} = \lambda m. \lambda n. n\ \mathtt{prd}\ m;
subtract1 = λm . λn . n prd m ;
相等判断
e q u a l = λ m . λ n . a n d ( i s z r o p r d m n ) ( i s z r o p r d n m ) ;
\mathtt{equal} = \lambda m. \lambda n. \mathtt{and}\ (\mathtt{iszro}\ \mathtt{prd}\ m\ n)\ (\mathtt{iszro}\ \mathtt{prd}\ n\ m);
equal = λm . λn . and ( iszro prd m n ) ( iszro prd n m ) ;
列表:不难发现列表和自然数其实是同构的,因为它们都是递归定义的。其中 cons
对应了 succ
,nil
对应了 zero
。
列表可以看作一个嵌套的 p a i r \mathtt{pair} pair ,即 ( c x ( c y ( c z n ) ) ) (c\ x\ (c\ y\ (c\ z\ n))) ( c x ( c y ( c z n ))) 。其中 c
对应了 fold
函数,类似于 s
,但是它接受两个参数。
n i l = λ c . λ n . n ; c o n s = λ h . λ t . λ c . λ n . c h ( t c n ) ; h e a d = λ l . l ( λ h . λ t . h ) f l s = λ l . l t r u f l s ; i s n i l = λ l . ( λ h . λ t . f l s ) t r u ; t a i l = λ l . f s t ( l ( λ x . λ p . p a i r ( s n d p ) ( c o n s x ( s n d p ) ) ) ( p a i r n i l n i l ) ) ; \begin{alignat*}{2}
&\mathtt{nil} &&= \lambda c. \lambda n. n; \\
&\mathtt{cons} &&= \lambda h. \lambda t. \lambda c. \lambda n. c\ h\ (t\ c\ n); \\
&\mathtt{head} &&= \lambda l. l\ (\lambda h. \lambda t. h)\ \mathtt{fls} = \lambda l. l\ \mathtt{tru}\ \mathtt{fls}; \\
&\mathtt{isnil} &&= \lambda l. (\lambda h. \lambda t. \mathtt{fls})\ \mathtt{tru}; \\
&\mathtt{tail} &&= \lambda l. \mathtt{fst}\ (l\ (\lambda x. \lambda p. \mathtt{pair}\ (\mathtt{snd}\ p)\ (\mathtt{cons}\ x\ (\mathtt{snd}\ p)))\ (\mathtt{pair}\ \mathtt{nil}\ \mathtt{nil}));
\end{alignat*} nil cons head isnil tail = λ c . λn . n ; = λh . λ t . λ c . λn . c h ( t c n ) ; = λ l . l ( λh . λ t . h ) fls = λ l . l tru fls ; = λ l . ( λh . λ t . fls ) tru ; = λ l . fst ( l ( λ x . λ p . pair ( snd p ) ( cons x ( snd p ))) ( pair nil nil )) ;
tail
的思路类似于 prd
:
( n i l , n i l ) → ( n i l , c o n s a n i l ) → ( c o n s a n i l , c o n s b ( c o n s a n i l ) ) → ⋯ → ( t a i l e , l i s t r e v e r s e d )
(\mathtt{nil}, \mathtt{nil}) \rightarrow (\mathtt{nil}, \mathtt{cons}\ a\ \mathtt{nil}) \rightarrow (\mathtt{cons}\ a\ \mathtt{nil}, \mathtt{cons}\ b\ (\mathtt{cons}\ a\ \mathtt{nil})) \rightarrow \dots \rightarrow \ (\mathtt{tail_e}, \mathtt{list_{reversed}})
( nil , nil ) → ( nil , cons a nil ) → ( cons a nil , cons b ( cons a nil )) → ⋯ → ( tai l e , lis t reversed )
除此之外,还有另一种构建方法:
n i l = p a i r t r u t r u ; c o n s = λ h . λ t . p a i r f l s ( p a i r h t ) ; h e a d = λ z . f s t ( s n d z ) ; t a i l = λ z . s n d ( s n d z ) ; i s n i l = n i l ; \begin{alignat*}{2}
&\mathtt{nil} &&= \mathtt{pair}\ \mathtt{tru}\ \mathtt{tru}; \\
&\mathtt{cons} &&= \lambda h. \lambda t. \mathtt{pair}\ \mathtt{fls}\ (\mathtt{pair}\ h\ t); \\
&\mathtt{head} &&= \lambda z.\mathtt{fst}\ (\mathtt{snd}\ z); \\
&\mathtt{tail} &&= \lambda z.\mathtt{snd}\ (\mathtt{snd}\ z); \\
&\mathtt{isnil} &&= \mathtt{nil};
\end{alignat*} nil cons head tail isnil = pair tru tru ; = λh . λ t . pair fls ( pair h t ) ; = λ z . fst ( snd z ) ; = λ z . snd ( snd z ) ; = nil ;
Lists List 可以用类似 church numerals 的方式编码到 lambda calculus 中,其对应的运算为 fold-right
(或称为 reduce
),可以看成一棵右斜树。
例如 [a, b]
可以被编码为 λ c . λ n . c a ( c b n ) \lambda c.\lambda n.c\ a\ (c\ b\ n) λ c . λn . c a ( c b n ) 。
类似的可以定义下面的辅助函数:
n i l = λ c . λ n . n = f l s ; \operatorname{\mathtt{nil}} = \lambda c.\lambda n.n = \operatorname{\mathtt{fls}}; nil = λ c . λn . n = fls ;
c o n s = λ h . λ t . λ c . λ n . c h ( t c n ) ; \operatorname{\mathtt{cons}} = \lambda h.\lambda t.\lambda c.\lambda n.c\ h\ (t\ c\ n); cons = λh . λ t . λ c . λn . c h ( t c n ) ;
h e a d = λ l . l t r u n i l ; \operatorname{\mathtt{head}} = \lambda l.l\ \operatorname{\mathtt{tru}}\ \operatorname{\mathtt{nil}}; head = λ l . l tru nil ;
i s n i l = λ l . l ( λ h . λ t . f l s ) t r u ; \operatorname{\mathtt{isnil}} = \lambda l.l\ (\lambda h.\lambda t.\operatorname{\mathtt{fls}})\ \operatorname{\mathtt{tru}}; isnil = λ l . l ( λh . λ t . fls ) tru ;
这里比较复杂的是 tail
函数,它既要能够返回余下的列表,又要保持参数 c c c 不被替换:
t a i l = λ l . f s t ( l ( λ h . λ t . p a i r ( s n d t ) ( c o n s h ( s n d t ) ) ( p a i r n i l n i l ) ) ; \begin{aligned}
\operatorname{\mathtt{tail}} = \lambda l. \operatorname{\mathtt{fst}} &\\
(l\ &(\lambda h. \lambda t. \operatorname{\mathtt{pair}}\ (\operatorname{\mathtt{snd}}\ t)\ (\operatorname{\mathtt{cons}}\ h\ (\operatorname{\mathtt{snd}}\ t)) \\
& (\operatorname{\mathtt{pair}}\ \operatorname{\mathtt{nil}}\ \operatorname{\mathtt{nil}}));
\end{aligned} tail = λ l . fst ( l ( λh . λ t . pair ( snd t ) ( cons h ( snd t )) ( pair nil nil )) ;
考虑 l = λ c . λ n . c a 1 ( c a 2 ( … ( c a n n ) ) ) l = \lambda c. \lambda n. c\ a₁\ (c\ a₂\ (\dots (c\ aₙ\ n))) l = λ c . λn . c a 1 ( c a 2 ( … ( c a n n ))) 则
a n s = f s t t 1 = [ a 2 a 3 … a n ] = ( c o n s a 2 a 2 … a n ) t 1 = ( p a i r ( s n d t 2 ) [ a 1 ( s n d t 2 ) ] ) = ( p a i r [ a 2 a 3 … a n ] [ a 1 a 2 … a n ] ) t 2 = ( p a i r ( s n d t 3 ) [ a 2 ( s n d t 3 ) ] ) = ( p a i r [ a 3 a 4 … a n ] [ a 2 a 3 … a n ] ) … t n − 1 = ( p a i r ( s n d t n ) [ a n − 1 ( s n d t n ) ] ) = ( p a i r [ a n ] [ a n − 1 a n ] ) t n = ( p a i r ( s n d t n + 1 ) [ a n ( s n d t n + 1 ) ] ) = ( p a i r n i l [ a n ] ) t n + 1 = ( p a i r n i l n i l ) \begin{aligned}
& \operatorname{\mathtt{ans}} = \operatorname{\mathtt{fst}}\ t_1 = [a₂\ a₃ \dots aₙ] = (\operatorname{\mathtt{cons}}\ a₂\ a₂ \dots aₙ) \\
& t_1 = (\operatorname{\mathtt{pair}}\ (\operatorname{\mathtt{snd}}\ t₂)\ [a₁ \ (\operatorname{\mathtt{snd}}\ t₂)]) = (\operatorname{\mathtt{pair}}\ [a₂\ a₃ \dots aₙ]\ [a₁\ a₂ \dots aₙ]) \\
& t₂ = (\operatorname{\mathtt{pair}}\ (\operatorname{\mathtt{snd}}\ t₃)\ [a₂ \ (\operatorname{\mathtt{snd}}\ t₃)]) = (\operatorname{\mathtt{pair}}\ [a₃\ a₄ \dots aₙ]\ [a₂\ a₃ \dots aₙ]) \\
& \dots \\
& t_{n-1} = (\operatorname{\mathtt{pair}}\ (\operatorname{\mathtt{snd}}\ tₙ)\ [a_{n-1} \ (\operatorname{\mathtt{snd}}\ tₙ)]) = (\operatorname{\mathtt{pair}}\ [aₙ]\ [a_{n-1}\ aₙ]) \\
& tₙ = (\operatorname{\mathtt{pair}}\ (\operatorname{\mathtt{snd}}\ t_{n+1})\ [aₙ \ (\operatorname{\mathtt{snd}}\ t_{n+1})]) = (\operatorname{\mathtt{pair}}\ \operatorname{\mathtt{nil}}\ [aₙ]) \\
& t_{n+1} = (\operatorname{\mathtt{pair}}\ \operatorname{\mathtt{nil}}\ \operatorname{\mathtt{nil}})
\end{aligned} ans = fst t 1 = [ a 2 a 3 … a n ] = ( cons a 2 a 2 … a n ) t 1 = ( pair ( snd t 2 ) [ a 1 ( snd t 2 )]) = ( pair [ a 2 a 3 … a n ] [ a 1 a 2 … a n ]) t 2 = ( pair ( snd t 3 ) [ a 2 ( snd t 3 )]) = ( pair [ a 3 a 4 … a n ] [ a 2 a 3 … a n ]) … t n − 1 = ( pair ( snd t n ) [ a n − 1 ( snd t n )]) = ( pair [ a n ] [ a n − 1 a n ]) t n = ( pair ( snd t n + 1 ) [ a n ( snd t n + 1 )]) = ( pair nil [ a n ]) t n + 1 = ( pair nil nil )
Enriching the Calculus 前面在 λ 演算中定义了布尔型和自然数,理论上已经可以构建出所有的程序了。但是为了简洁,这里开始会使用 λNB 作为系统表述,即将前面 untyped arithmetic expression 的内容加进来,将其看作 primitive 的存在。二者可以轻松地进行转换:
r e a l b o o l = λ b . t r u e f a l s e ; ⇔ c h u r c h b o o l = λ b . i f b t h e n t r u e l s e f l s ; r e a l n a t = λ m . m ( λ x . s u c c x ) 0 ; r e a l e q = λ m . λ n . ( e q u a l m n t r u e f a l s e ) ; \begin{alignat*}{3}
&\mathtt{realbool} &&= \lambda b. \mathtt{true}\ \mathtt{false}; \\
\Leftrightarrow {}&\mathtt{churchbool} &&= \lambda b. \mathtt{if}\ b\ \mathtt{then}\ \mathtt{tru}\ \mathtt{else}\ \mathtt{fls}; \\
&\mathtt{realnat} &&= \lambda m. m\ (\lambda x. \mathtt{succ}\ x)\ 0; \\
&\mathtt{realeq} &&= \lambda m. \lambda n. (\mathtt{equal}\ m\ n\ \mathtt{true}\ \mathtt{false});
\end{alignat*} ⇔ realbool churchbool realnat realeq = λb . true false ; = λb . if b then tru else fls ; = λm . m ( λ x . succ x ) 0 ; = λm . λn . ( equal m n true false ) ;
注意 succ
本身的语法结构,不能对 church numerals 使用。
使用 λNB 的一个原因是 Church Numerals 的表示和运算太繁杂了,尽管结果和普通的运算等价,但是中间过程却很复杂,并且会影响到求值顺序。如果采用 call-by-value 的方法,那么对于 church numerals 来说不能提前化简数字(因为没有 apply s
和 z
),此时 scc c1
和 c2
的形式有很大差别。
Representation 首先定义所谓的 ordinary numbers(有很多等价定义,这里选取 untyped arithmetic expressions 中的定义):
a constant 0 an operation iszero
mapping numbers to booleans, and two operations, succ
and pred
, mapping numbers to numbers. 同样,church numerals 也可以做这些事:
The term c 0 c_0 c 0 represents the number 0(包括 non-canonical representations,例如 λ s . λ z . ( λ x . x ) z \lambda s. \lambda z. (\lambda x. x)\ z λ s . λ z . ( λ x . x ) z ) The terms scc
and prd
represent the arithmetic operations succ
and pred
The term iszro
represents the operation iszero
(严格来说 iszro
返回的是 tru
,但是这里先将其看作等价) 假设有一个程序对数字进行复杂的运算,并返回一个 boolean(或者其他非数字),那么如果将其中所有的 real numbers 和 arithmetic operations 换成对应的 lambda-terms,并且求解程序,则可以得到和原来完全相同的答案。即如果把程序看成一个黑盒,那么 real numbers 和 Church-numerals representation 没有任何区别。
β \beta β -equivalance上面在推导的时候用的都是 → \rightarrow → ,因为到目前为止的 = = = 都是按照定义展开。这里也可以定义 β \beta β -equivalance,定义 beta-reduction 下的等价关系:
(b e t a beta b e t a -equivalance)
等价关系 ≡ β \equiv_{\beta} ≡ β 的定义如下:
M → β M ’ ⇔ M ≡ β M ’ M \rightarrow_{\beta} M’ \Leftrightarrow M \equiv_{\beta} M’ M → β M ’ ⇔ M ≡ β M ’ ∀ M , M ≡ β M ’ \forall M, M \equiv_{\beta} M’ ∀ M , M ≡ β M ’ M ≡ β M ’ ⇔ M ’ ≡ β M M \equiv_{\beta} M’ \Leftrightarrow M’ \equiv_{\beta} M M ≡ β M ’ ⇔ M ’ ≡ β M M ≡ β M ’ ∧ M ’ ≡ β M ’’ ⇔ M ≡ β M ’’ M \equiv_{\beta} M’ \wedge M’ \equiv_{\beta} M’’ \Leftrightarrow M \equiv_{\beta} M’’ M ≡ β M ’ ∧ M ’ ≡ β M ’’ ⇔ M ≡ β M ’’ Recursion omega
前面提到 normal forms 指的是无法继续化简的式子,但是有些 term 是没有 normal form 的,被称为 diverge 。
omega 是一个 divergent combinator:
o m e g a = ( λ x . x x ) ( λ x . x x ) ;
\mathtt{omega} = (\lambda x.x\ x)\ (\lambda x.x\ x);
omega = ( λ x . x x ) ( λ x . x x ) ;
虽然它只有一个 redex,但是进行 reduce 后又得到了一个和原式相同的 omega
。
fix
omega
有一个 generalized 的形式,被称为 fixed-point combinator ,也叫 call-by-value Y-combinator 或者 Z :
f i x = λ f . ( λ x . f ( λ y . x x y ) ) ( λ x . f ( λ y . x x y ) ) ;
\mathtt{fix} = \lambda f. (\lambda x. f\ (\lambda y. x\ x\ y))\ (\lambda x. f\ (\lambda y. x\ x\ y));
fix = λ f . ( λ x . f ( λ y . x x y )) ( λ x . f ( λ y . x x y )) ;
f i x f = f ( λ y . ( f i x f ) y ) ;
\mathtt{fix}\ f = f\ (\lambda y. (\mathtt{fix}\ f)\ y);
fix f = f ( λ y . ( fix f ) y ) ;
使用方法:
h = ⟨ b o d y c o n t a i n i n g h ⟩ → g = λ f . ⟨ b o d y c o n t a i n i n g f ⟩ h = f i x g \begin{alignat*}{2}
&h &&= \langle \mathrm{body\ containing}\ h \rangle \\
\rightarrow {}& g &&= \lambda f. \langle \mathrm{body\ containing}\ f \rangle \\
&h &&= \mathtt{fix}\ g
\end{alignat*} → h g h = ⟨ body containing h ⟩ = λ f . ⟨ body containing f ⟩ = fix g
例如求 church numerals 的阶乘:
f a c = λ n . i f ( r e a l e q n c 0 ) t h e n c 1 e l s e t i m e s n ( f a c ( p r d n ) ) ; → g = λ f . λ n . i f ( r e a l e q n c 0 ) t h e n c 1 e l s e t i m e s n ( f ( p r d n ) ) ; f a c t o r i a l = f i x g ; \begin{aligned}
& \mathtt{fac} = \lambda n. \mathtt{if}\ (\mathtt{realeq}\ n\ \mathtt{c}_0)\ \mathtt{then}\ \mathtt{c}_1\ \mathtt{else}\ \mathtt{times}\ n\ (\mathtt{fac}\ (\mathtt{prd}\ n)); \\
\rightarrow\ & g = \lambda f. \lambda n. \mathtt{if}\ (\mathtt{realeq}\ n\ \mathtt{c}_0)\ \mathtt{then}\ \mathtt{c}_1\ \mathtt{else}\ \mathtt{times}\ n\ (f\ (\mathtt{prd}\ n)); \\
& \mathtt{factorial} = \mathtt{fix}\ g;
\end{aligned} → fac = λn . if ( realeq n c 0 ) then c 1 else times n ( fac ( prd n )) ; g = λ f . λn . if ( realeq n c 0 ) then c 1 else times n ( f ( prd n )) ; factorial = fix g ;
这里 factorial
使用的是 if
而不是 test
,是因为在 call-by-value 下,如果要对 test
进行 evaluate,则必须要求出其两个分支的内容后才能进一步 reduce,而这样会导致 diverge。比如要算出 f a c t o r i a l c 0 \mathtt{factorial\ c_0} factorial c 0 ,那么就必须要求出第二个分支中的 t i m e s n ( f ( p r d n ) ) \mathtt{times}\ n\ (\mathtt{f}\ (\mathtt{prd}\ n)) times n ( f ( prd n )) ,即 t i m e s n ( f c 0 ) \mathtt{times}\ n\ (\mathtt{f}\ c_0) times n ( f c 0 ) ,就套娃了。
如果要用 test
,那么可以将两个 branch 包裹在 dummy lambda-abstraction 下。因为 abstractions 也是 values,所以 call-by-value 可以在进行求值的情况下使用 test
。此时 test
得到的还是一个 lambda-abstraction,所以要对其进行强制求值,在其后面随便 apply 一个 dummy argument 即可。
g ’ = λ f . λ n . t e s t ( i s z e r o n ) ( λ x . c 1 ) ( λ x . ( t i m e s n ( f ( p r d n ) ) ) ) c 0 ; f a c t o r i a l ’ = f i x g ’ ; \begin{aligned}
& g’ = \lambda f. \lambda n. \mathtt{test}\ (\mathtt{iszero}\ n)\ (\lambda x. c_1)\ (\lambda x. (\mathtt{times}\ n\ (f\ (\mathtt{prd}\ n))))\ c_0; \\
& \mathtt{factorial’} = \mathtt{fix}\ g’;
\end{aligned} g ’ = λ f . λn . test ( iszero n ) ( λ x . c 1 ) ( λ x . ( times n ( f ( prd n )))) c 0 ; factorial’ = fix g ’ ;
Y
combinator除此之外,fix
还有一种更简单的形式:
Y = λ f . ( λ x . f ( x x ) ) ( λ x . f ( x x ) ) ;
Y = \lambda f. (\lambda x. f\ (x\ x))\ (\lambda x. f\ (x\ x));
Y = λ f . ( λ x . f ( x x )) ( λ x . f ( x x )) ;
Y f = f ( Y f ) ;
Y\ f = f\ (Y\ f);
Y f = f ( Y f ) ;
但是它无法在 call-by-value 中使用,因为 ( x x ) (x\ x) ( x x ) 不是一个 value,所以会 diverge。但是 fix
中的 ( λ y . x x y ) (\lambda y. x\ x\ y) ( λ y . x x y ) 是一个 value(lambda abstractions 也是 value)。
例子 churchnat
:将 primitive natural numbers 转换成 church numerals
c n = λ f . λ n . i f ( i s z e r o n ) t h e n c 0 e l s e s c c ( f ( p r e d n ) ) ; c h u r c h n a t = f i x c n ; \begin{aligned}
& \mathtt{cn} = \lambda f. \lambda n. \mathtt{if}\ (\mathtt{iszero}\ n)\ \mathtt{then}\ c_0\ \mathtt{else}\ \mathtt{scc}\ (f\ (\mathtt{pred}\ n)); \\
& \mathtt{churchnat} = \mathtt{fix}\ \mathtt{cn};
\end{aligned} cn = λ f . λn . if ( iszero n ) then c 0 else scc ( f ( pred n )) ; churchnat = fix cn ;
sumlist
:对 church numerals 的列表求和(这里的 test
可以改成 if
,这样可以去掉 dummy abstractions)
f ’ = λ f . λ l . t e s t ( i s n i l l ) ( λ x . c 0 ) ( λ x . ( p l u s ( h e a d l ) ( f ( t a i l l ) ) ) ) c 0 ) ; s u m l i s t = f i x f ’ \begin{aligned}
& f’ = \lambda f. \lambda l. \mathtt{test}\ (\mathtt{isnil}\ l)\ (\lambda x. c_0)\ (\lambda x. (\mathtt{plus}\ (\mathtt{head}\ l)\ (f\ (\mathtt{tail}\ l))))\ c_0); \\
& \mathtt{sumlist} = \mathtt{fix}\ f’ \\
\end{aligned} f ’ = λ f . λ l . test ( isnil l ) ( λ x . c 0 ) ( λ x . ( plus ( head l ) ( f ( tail l )))) c 0 ) ; sumlist = fix f ’
除了用 fix
的写法外,还可以不用 fix
实现。因为 List 本身就是一个归纳定义的结构,所以让 c
变成一个加号即可,而起点是 c 0 c_0 c 0 :
s u m l i s t ’ = λ l . l p l u s c 0 ;
\mathtt{sumlist’} = \lambda l. l\ \mathtt{plus}\ c_0;
sumlist’ = λ l . l plus c 0 ;
求解 fix
举例:factorial
f a c t o r i a l c 3 = f i x g c 3 → ( λ x . g ( λ y . x x y ) ) ( λ x . g ( λ y . x x y ) ) ‾ c 3 → g ( λ x . g ( λ y . x x y ) ) ( λ x . g ( λ y . x x y ) ) c 3 = g f c t c 3 where f c t = λ y . ( λ x . g ( λ y . x x y ) ) ( λ x . g ( λ y . x x y ) ) y → ∗ ( λ n . i f ( r e a l e q n c 0 ) t h e n c 1 e l s e t i m e s c 3 ( f c t ( p r d c 3 ) ) ) c 3 (By def of g ) → i f ( r e a l e q c 3 c 0 ) t h e n c 1 e l s e t i m e s c 3 ( f c t ( p r d c 3 ) ) → ∗ t i m e s c 3 ( f c t ( p r d c 3 ) ) → ∗ t i m e s c 3 ( f c t c 2 ’ ) note: not valid in call-by-value, p o w e r c 3 should be reduced first → ∗ t i m e s c 3 ( g f c t c 2 ’ ) → ∗ … → ∗ t i m e s c 3 ( t i m e s c 2 ’ ( c 1 ’ c 1 ) ) → ∗ c 6 ′ \begin{aligned}
& \mathtt{factorial}\ \mathtt{c}_3 \\
= {}& \mathtt{fix}\ g\ \mathtt{c}_3 \\
\rightarrow {}& \underline{(\lambda x. g\ (\lambda y. x\ x\ y))\ (\lambda x. g\ (\lambda y. x\ x\ y))}\ \mathtt{c}_3\ \\
\rightarrow {}& g\ (\lambda x. g\ (\lambda y. x\ x\ y))\ (\lambda x. g\ (\lambda y. x\ x\ y))\ \mathtt{c}_3\ \\
= {}& g\ \mathtt{fct}\ \mathtt{c}_3 \\
& \text{where} \quad \mathtt{fct} = \lambda y. (\lambda x. g\ (\lambda y. x\ x\ y))\ (\lambda x. g\ (\lambda y. x\ x\ y))\ y \\
\rightarrow^* & (\lambda n. \mathtt{if}\ (\mathtt{realeq}\ n\ \mathtt{c}_0)\ \mathtt{then}\ \mathtt{c}_1\ \mathtt{else}\ \mathtt{times}\ \mathtt{c}_3\ (\mathtt{fct}\ (\mathtt{prd}\ \mathtt{c}_3)))\ \mathtt{c}_3 \quad \text{(By def of $g$)}\\
\rightarrow {}& \mathtt{if}\ (\mathtt{realeq}\ \mathtt{c}_3\ \mathtt{c}_0)\ \mathtt{then}\ \mathtt{c}_1\ \mathtt{else}\ \mathtt{times}\ \mathtt{c}_3\ (\mathtt{fct}\ (\mathtt{prd}\ \mathtt{c}_3)) \\
\rightarrow^*& \mathtt{times}\ \mathtt{c}_3\ (\mathtt{fct}\ (\mathtt{prd}\ \mathtt{c}_3)) \\
\rightarrow^*& \mathtt{times}\ \mathtt{c}_3\ (\mathtt{fct}\ \mathtt{c}_2’) \\
& \text{note: not valid in call-by-value, $\mathtt{power}\ \mathtt{c}_3$ should be reduced first} \\
\rightarrow^*& \mathtt{times}\ \mathtt{c}_3\ (g\ \mathtt{fct}\ \mathtt{c}_2’) \\
\rightarrow^*& \dots \\
\rightarrow^*& \mathtt{times}\ \mathtt{c}_3\ (\mathtt{times}\ \mathtt{c}_2’\ (\mathtt{c_1}’\ \mathtt{c}_1)) \\
\rightarrow^*& \mathtt{c}_6'
\end{aligned} = → → = → ∗ → → ∗ → ∗ → ∗ → ∗ → ∗ → ∗ factorial c 3 fix g c 3 ( λ x . g ( λ y . x x y )) ( λ x . g ( λ y . x x y )) c 3 g ( λ x . g ( λ y . x x y )) ( λ x . g ( λ y . x x y )) c 3 g fct c 3 where fct = λ y . ( λ x . g ( λ y . x x y )) ( λ x . g ( λ y . x x y )) y ( λn . if ( realeq n c 0 ) then c 1 else times c 3 ( fct ( prd c 3 ))) c 3 (By def of g ) if ( realeq c 3 c 0 ) then c 1 else times c 3 ( fct ( prd c 3 )) times c 3 ( fct ( prd c 3 )) times c 3 ( fct c 2 ’ ) note: not valid in call-by-value, power c 3 should be reduced first times c 3 ( g fct c 2 ’ ) … times c 3 ( times c 2 ’ ( c 1 ’ c 1 )) c 6 ′
上面的 c n ’ \mathtt{c}_n’ c n ’ 是 behavior equivalent 的 c n c_n c n 。虽然 pred
求出来的结果和 c n − 1 c_{n-1} c n − 1 形式并不相同,但是行为相同。
观察化简过程,不难发现重点在于 f c t n → ∗ g f c t n \mathtt{fct}\ n \rightarrow^*\ g\ \mathtt{fct}\ n fct n → ∗ g fct n 。fct
是一种 self-replicator,可以复制自身,并将自己作为参数传递给 g
(when applied to an argument, supplies itself and n as arguments to g
)。而 g
就可以选择要不要继续用 fct
,用了就能继续递归。
Syntax (Lambda Terms)
Let V \mathcal{V} V be a countable set of variable names. The set of terms is the smallest set T \mathcal{T} T such that
x ∈ T x \in \mathcal{T} x ∈ T for every x ∈ V x \in \mathcal{V} x ∈ V if t 1 ∈ T t_1 \in \mathcal{T} t 1 ∈ T and x ∈ V x \in \mathcal{V} x ∈ V , then λ x . t 1 ∈ T \lambda x.t_1 \in \mathcal{T} λ x . t 1 ∈ T if t 1 ∈ T t_1 \in \mathcal{T} t 1 ∈ T and t 2 ∈ T t_2 \in \mathcal{T} t 2 ∈ T , then t 1 t 2 ∈ T t_1\ t_2 \in \mathcal{T} t 1 t 2 ∈ T 仿照之前的做法可以定义 size \operatorname{size} size 等函数。
(Free Variables)
The set of free variables of a term t, written F V ( t ) FV(t) F V ( t ) , is defined as follows:
F V ( x ) = x F V ( λ x . t 1 ) = F V ( t 1 ) ∖ x F V ( t 1 t 2 ) = F V ( t 1 ) ∪ F V ( t 2 ) \begin{alignat*}{2}
& FV(x) &&= {x} \\
& FV(\lambda x. t_1) &&= FV(t_1) \setminus {x} \\
& FV(t_1\ t_2) &&= FV(t_1) \cup FV(t_2)
\end{alignat*} F V ( x ) F V ( λ x . t 1 ) F V ( t 1 t 2 ) = x = F V ( t 1 ) ∖ x = F V ( t 1 ) ∪ F V ( t 2 )
∣ F V ( t ) ∣ ≤ size ( t ) \vert FV(t) \vert \le \operatorname{size}(t) ∣ F V ( t ) ∣ ≤ size ( t )
Substitution Substitution 是一个比较 tricky 的操作。这里会介绍一种比较直观的做法,可以用数学定义和证明。在 Chapter 6 会介绍一种 heavier 的做法,其依赖于 de Bruijn presentation
,但是更容易用 ML 来实现。
The names of bound variables do not matter
下面是三个比较特殊的 substitution 的例子:
[ x ↦ y ] ( λ x . x ) = λ x . y λ x . x ( y ≠ x in rule 3) [ x ↦ z ] ( λ z . x ) = λ z . z λ z . x ( y ∉ F V ( s ) in rule 3) [ x ↦ y z ] ( λ y . x y ) = λ y . y z y λ w . y z w (alpha-conversion) \begin{alignat*}{3}
& [x \mapsto y](\lambda x. x) &&= \cancel{\lambda x. y} \quad \lambda x.x & \text{($y \neq x$ in rule 3)} \\
& [x \mapsto z](\lambda z. x) &&= \cancel{\lambda z.z} \quad \lambda z. x & \text{($y \notin FV(s)$ in rule 3)} \\
& [x \mapsto y\ z](\lambda y. x\ y) &&= \cancel{\lambda y. y\ z\ y} \quad \lambda w. y\ z\ w & \text{(alpha-conversion)} &
\end{alignat*} [ x ↦ y ] ( λ x . x ) [ x ↦ z ] ( λ z . x ) [ x ↦ y z ] ( λ y . x y ) = λ x . y λ x . x = λ z . z λ z . x = λ y . y z y λ w . y z w ( y = x in rule 3) ( y ∈ / F V ( s ) in rule 3) (alpha-conversion)
其中第二种错误称为 variable capture ,而避免了这种错误的 substitution 称为 capture-avoiding substitution 。
第三种错误导致 substitution 失败,需要通过 alpha-conversion 解决:
[ x ↦ y z ] ( λ y . x y ) = [ x ↦ y z ] ( λ w . x w ) (alpha-conversion) = λ w . y z w \begin{aligned}
& [x \mapsto y\ z](\lambda y. x\ y) \\
= {}& [x \mapsto y\ z](\lambda w. x\ w) & \text{(alpha-conversion)} \\
= {}& \lambda w. y\ z\ w
\end{aligned} = = [ x ↦ y z ] ( λ y . x y ) [ x ↦ y z ] ( λ w . x w ) λ w . y z w (alpha-conversion)
(Convention)
Terms that differ only in the names of bound variables are interchangeable in all contexts.
(Substitution)
[ x ↦ s ] x = s [ x ↦ s ] y = y if y ≠ x [ x ↦ s ] ( λ y . t 1 ) = λ y . [ x ↦ s ] t 1 if y ≠ x and y ∉ F V ( s ) [ x ↦ s ] ( t 1 t 2 ) = [ x ↦ s ] t 1 [ x ↦ s ] t 2 \begin{alignat*}{3}
& [x \mapsto s] x &&= s \\
& [x \mapsto s] y &&=y && \text{if}\ y \neq x \\
& [x \mapsto s] (\lambda y. t_1) &&= \lambda y. [x \mapsto s] t_1 && \text{if}\ y \neq x\ \text{and}\ y \notin FV(s) \\
& [x \mapsto s] (t_1\ t_2) &&= [x \mapsto s]t_1\ [x \mapsto s]t_2
\end{alignat*} [ x ↦ s ] x [ x ↦ s ] y [ x ↦ s ] ( λ y . t 1 ) [ x ↦ s ] ( t 1 t 2 ) = s = y = λ y . [ x ↦ s ] t 1 = [ x ↦ s ] t 1 [ x ↦ s ] t 2 if y = x if y = x and y ∈ / F V ( s )
注解 :需要适当使用 alpha-conversion
Operational Semantics Figure 1: Untyped lambda-calculus
Untyped lambda-calculus 的 evaluation 有两类规则:
E-App1
、E-App2
:the congruence rulesE-AppAbs
:the computation rules这个规则仅仅是对 call by values 使用的。观察 evaluation relations 可以发现,一般先用 E-App1
化简 t 1 t_1 t 1 ,接着用 E-App2
化简 t 2 t_2 t 2 ,最后使用 E-AppAbs
进行 reduce。
由于 pure lambda calculus 中的 values 只有 lambda abstractions,所以化简得到的结果一定也是 lambda abstractions。
Rules for other evaluation strategies full beta-reduction
t 1 → t 1 ’ t 1 t 2 → t 1 ’ t 2 (E-App1)
\dfrac{t_1 \rightarrow t_1’}{t_1\ t_2 \rightarrow t_1’\ t_2} \tag{E-App1}
t 1 t 2 → t 1 ’ t 2 t 1 → t 1 ’ ( E-App1 )
t 2 → t 2 ’ t 1 t 2 → t 1 t 2 ’ (E-App1)
\dfrac{t_2 \rightarrow t_2’}{t_1\ t_2 \rightarrow t_1\ t_2’} \tag{E-App1}
t 1 t 2 → t 1 t 2 ’ t 2 → t 2 ’ ( E-App1 )
( λ x . t 12 ) t 2 → [ x ↦ t 2 ] t 12 (E-AppAbs)
(\lambda x. t_{12})\ t_2 \rightarrow [x \mapsto t_2] t_{12} \tag{E-AppAbs}
( λ x . t 12 ) t 2 → [ x ↦ t 2 ] t 12 ( E-AppAbs )
注意这里没有用到 value
normal-order strategy
n a 1 → n a 1 ’ n a 1 t 2 → n a 1 ’ t 2 (E-App1)
\dfrac{\operatorname{\mathrm{na}}_1 \rightarrow \operatorname{\mathrm{na}}_1’}{\operatorname{\mathrm{na}}_1\ t_2 \rightarrow \operatorname{\mathrm{na}}_1’\ t_2} \tag{E-App1}
na 1 t 2 → na 1 ’ t 2 na 1 → na 1 ’ ( E-App1 )
t 2 → t 2 ’ n a n f 1 t 2 → n a n f 1 t 2 ’ (E-App2)
\dfrac{t_2 \rightarrow t_2’}{\operatorname{\mathrm{nanf}}_1\ t_2 \rightarrow \operatorname{\mathrm{nanf}}_1\ t_2’} \tag{E-App2}
nanf 1 t 2 → nanf 1 t 2 ’ t 2 → t 2 ’ ( E-App2 )
t 1 → t 1 ’ λ x . t 1 → λ x . t 1 ’ (E-Abs)
\dfrac{t_1 \rightarrow t_1’}{\lambda x.t_1 \rightarrow \lambda x.t_1’} \tag{E-Abs}
λ x . t 1 → λ x . t 1 ’ t 1 → t 1 ’ ( E-Abs )
( λ x . t 12 ) t 2 → [ x ↦ t 2 ] t 12 (E-AppAbs)
(\lambda x. t_{12})\ t_2 \rightarrow [x \mapsto t_2] t_{12} \tag{E-AppAbs}
( λ x . t 12 ) t 2 → [ x ↦ t 2 ] t 12 ( E-AppAbs )
其中用到的三种 term 定义如下:
n f ∷ = ( normal forms ) λ x . n f n a n f n a n f ∷ = ( non-abstraction normal forms ) x n a n f n f n a ∷ = ( non-abstraction ) x t 1 t 2 \begin{aligned}
\operatorname{\mathrm{nf}} \Coloneqq && (\text{normal forms}) \\
& \lambda x.\operatorname{\mathrm{nf}} \\
& \operatorname{\mathrm{nanf}} \\
\operatorname{\mathrm{nanf}} \Coloneqq && (\text{non-abstraction normal forms}) \\
& x \\
& \operatorname{\mathrm{nanf}}\ \operatorname{\mathrm{nf}} \\
\operatorname{\mathrm{na}} \Coloneqq && (\text{non-abstraction}) \\
& x \\
& t_1\ t_2 \\
\end{aligned} nf : : = nanf : : = na : : = λ x . nf nanf x nanf nf x t 1 t 2 ( normal forms ) ( non-abstraction normal forms ) ( non-abstraction )
dynamic-scoping
Dynamic-scoping 需要在运行期进行替换,因此需要在 contexts(Γ \Gamma Γ )中记录变量对应的值:
v ∷ = ( values ) λ x . e \begin{aligned}
\operatorname{\mathrm{v}} \Coloneqq && (\text{values}) \\
& \lambda x. e
\end{aligned} v : : = λ x . e ( values )
Γ ∷ = ( contexts ) ∅ Γ , x = t \begin{aligned}
\Gamma \Coloneqq && (\text{contexts}) \\
& \emptyset \\
& \Gamma, x = t
\end{aligned} Γ : : = ∅ Γ , x = t ( contexts )
x = t ∈ Γ Γ ⊢ x → t (D-Var) \dfrac{x = t \in \Gamma}{\Gamma \vdash x \rightarrow t} \tag{D-Var} Γ ⊢ x → t x = t ∈ Γ ( D-Var )
e b o d y → e b o d y ’ e b o d y e a r g → e b o d y ’ e a r g (D-App-Lam) \dfrac{e_{\operatorname{\mathrm{body}}} \rightarrow e_{\operatorname{\mathrm{body}}}’}{e_{\operatorname{\mathrm{body}}}\ e_{\operatorname{\mathrm{arg}}} \rightarrow e_{\operatorname{\mathrm{body}}}’\ e_{\operatorname{\mathrm{arg}}}} \tag{D-App-Lam} e body e arg → e body ’ e arg e body → e body ’ ( D-App-Lam )
Γ , x = e a r g ⊢ e b o d y → e b o d y ’ Γ ⊢ ( λ x . e b o d y ) e a r g → ( λ x . e b o d y ’ ) e a r g (D-App-Body) \dfrac{\Gamma, x = e_{\operatorname{\mathrm{arg}}} \vdash e_{\operatorname{\mathrm{body}}} \rightarrow e_{\operatorname{\mathrm{body}}}’}{\Gamma \vdash (\lambda x. e_{\operatorname{\mathrm{body}}})\ e_{\operatorname{\mathrm{arg}}} \rightarrow (\lambda x. e_{\operatorname{\mathrm{body}}}’)\ e_{\operatorname{\mathrm{arg}}}} \tag{D-App-Body} Γ ⊢ ( λ x . e body ) e arg → ( λ x . e body ’ ) e arg Γ , x = e arg ⊢ e body → e body ’ ( D-App-Body )
Γ ⊢ ( λ x . v b o d y ) e a r g → v b o d y (D-App-Done) \dfrac{}{\Gamma \vdash (\lambda x. v_{\operatorname{\mathrm{body}}})\ e_{\operatorname{\mathrm{arg}}} \rightarrow v_{\operatorname{\mathrm{body}}}} \tag{D-App-Done} Γ ⊢ ( λ x . v body ) e arg → v body ( D-App-Done )
这里的关键在于 D-App-Body
,它在运行时会将参数加入到 runtime contexts 中,然后对 body 进行 reduce;完成后使用 D-App-Done
去掉参数。
lazy strategy
和 call by name 几乎完全一样,但是是 lazy 的。
t 1 → t 1 ’ t 1 t 2 → t 1 ’ t 2 (E-App1)
\dfrac{t_1 \rightarrow t_1’}{t_1\ t_2 \rightarrow t_1’\ t_2} \tag{E-App1}
t 1 t 2 → t 1 ’ t 2 t 1 → t 1 ’ ( E-App1 )
( λ x . t 12 ) t 2 → [ x ↦ t 2 ] t 12 (E-AppAbs)
(\lambda x. t_{12})\ t_2 \rightarrow [x \mapsto t_2] t_{12} \tag{E-AppAbs}
( λ x . t 12 ) t 2 → [ x ↦ t 2 ] t 12 ( E-AppAbs )
Big-step style relations λ x . t ⇓ λ x . t (B-Value)
\lambda x. t \Downarrow \lambda x.t \tag{B-Value}
λ x . t ⇓ λ x . t ( B-Value )
t 1 ⇓ λ x . t 12 t 2 ⇓ v 2 [ x ↦ v 2 ] t 12 ⇓ v t 1 t 2 ⇓ v (B-AppAbs)
\dfrac{
t_1 \Downarrow \lambda x. t_{12} \qquad
t_2 \Downarrow v_2 \qquad
[x \mapsto v_2] t_{12} \Downarrow v
} {
t_1\ t_2 \Downarrow v
} \tag{B-AppAbs}
t 1 t 2 ⇓ v t 1 ⇓ λ x . t 12 t 2 ⇓ v 2 [ x ↦ v 2 ] t 12 ⇓ v ( B-AppAbs )