If you want to write a lisp from scratch, Crafting Interpreters
is an excellent resource. It explains all the steps required to build the Lox language, first by building a
tree-walk interpreter, then by building a compiler and bytecode virtual machine (VM). It even covers closures
(and object-oriented programming), which is not trivial for an inexperienced language implementer to write. Lox
is not a lisp though, so modifications to the parser need to be made in order to support parenthesized prefix
notation and the book omits macros entirely.
I wrote a lisp called duck-lisp that features unhygienic
lexically scoped macros. All this means is that macros return literal code and are represented internally as
closures. This is not to be confused with Scheme's macros that provide syntactic closures.
Here's a sample macro from an assembler. Quasiquote looks weird because it's not a reader macro, but it acts in
the same way as Common Lisp's quasiquote.
(defmacro declare-label (name)
(` (var (, name) (make-label))))
Duck-lisp has a strict split between the compiler and the VM. The compiler has access to its own VM at
compile-time and can pass data to the runtime VM, but only the compile-time VM has access to the
compiler. Because of this there are two variants of the language: the runtime language, and the compile-time
language. Code is part of the runtime language by default, but compile-time execution can be done by wrapping
the code in the comptime
form. Lexically scoped variables defined in that form persist until the
end of compilation.
(var x 5)
(comptime (var x 6))
(println x) ; ⇒ 5
(comptime (println x)) ; ⇒ 6
All macros are is functions defined in the compile-time language but also declared in the runtime
language. Since functions are lexically scoped, macros can capture compile-time variables, and more importantly,
execute compile-time functions.
(comptime
;; Don't worry about the body. It is irrelevant.
(defun list* (&rest args)
(if (null? (rest args))
(first args)
(cons (first args) (apply self (rest args)))))
;; Return nil since comptime is not permitted to return closures
())
(defmacro to (variable form)
;; Capture and call the `list*' function
(list (quote setq) variable (list* (car form) variable (cdr form))))
(var x 4)
(to x (+ 1))
(println x) ; ⇒ 5
The definition of to
could be written in a lower-level pseudocode like this instead:
(comptime
(defun to (variable form)
(list (quote setq) variable (list* (car form) variable (cdr form))))
())
(declare-macro to (variable form))
Conceptually, defmacro
can decompose into comptime
, defun
,
and declare-macro
which is a pseudo-keyword that allows code in the runtime language to call
it. declare-macro
is trivial to implement. If you have already implemented defun
, then
you already have most of the code to implement declare-macro
. The real difficulty is
implementing comptime
. comptime
returns its last body form as an object which is then
converted to its AST form. This is the easy part as it is much like quotation, and once you have
written quote
, most of the work is done.
I am a little fuzzy on proper compiler terminology, so it's likely that I'm abusing the term “environment” in
the paragraph below. From what I have read, it sounds like an environment is the runtime state of the
language. So the upvalues that belong to the currently executing closure would be part of the
environment. Global variables would also be part of the environment. In the following paragraph I use the term
to refer to the scopes and variables that are known at compile-time. I am not referring to the VM's state at
compile-time or runtime. I am referring to the compiler's data structures. The “compile-time environment” refers
to the compile-time data structures that keep track of the variables that exist in the compile-time VM. The
“runtime environment” refers to the compile-time data structures that keep track of the variables that will
exist in the runtime VM.
The next part is annoying to add if the compiler is not designed with macros in mind, but if not it is still
doable to add. The compiler and VM must have the ability to compile code in the compile-time language
environment (not the runtime language environment), execute it, then yield back to the compiler to continue
compilation. The state of the stack and heap must persist between runs. In duck-lisp I added a special "yield"
instruction that acts like "return" but doesn't pop anything off the stack. Adding a compile-time environment
was a little harder. Originally I implemented macros by creating a new instance of the compiler each time a
macro was defined and used. This ruined lexical scope because the compile-time environment could not be retained
across macro definitions and expansions. How it works now is that there is an extra environment dedicated to the
compile-time language. The compiler starts by compiling code in the runtime environment, but when it encounters
a comptime
or defmacro
form it switches to the compile-time environment. Once the form
is fully compiled, the current environment is restored to the last used environment. Be sure to account for
nested comptime
forms.
Once you have comptime
and defmacro
, macro calls aren't hard. In duck-lisp, (to
x (+ 1))
is compiled as if it were (comptime (to (quote x) (quote (+
1))))
. comptime
isn't actually used to convert the returned object to AST, but the real
operation is very similar.