Mar 17, 2023 | gga

VMs — Parsing and Syntax

The Indu virtual machine has three phases: a parser that produces an abstract syntax tree, a compiler that takes the AST and produces byte-code, and a virtual machine that executes that byte-code. The VM is implemented in Clojure at the moment, and executes on the server. There are two options to get it to run in the browser: re-implement the VM in JavaScript, or compile the byte-code to JavaScript. There are advantages either way. More on that in a later post.

As mentioned previously, the parser is implemented, using instaparse, directly from an EBNF specification of the grammar. The abstract syntax tree that the parser produces is represented as a Clojure vector. The first element of the vector is always a keyword identifying the construct, the following elements are either terminals or nested vectors to represent other constructs. For example the simple expression 3 + 4 * 5 is represented as: [:addition [:expr [:integer 3]] [:expr [:multiply [:expr [:integer 4]] [:expr [:integer 5]]]]]. There are a couple of other possible representations. :addition and :multiply are both binary operations; they could be represented as a :binary-expression construct with a parameter to represent the function to be applied. I’ve never tried that; I suspect it would make re-writing trees easier. Another possibility is that addition and multiplication are not binary expressions, instead they could take a list of parameters. In that case you’d probably represent every construct as a list with the function to apply at the head, and then every parameter.

Congratulations, you’ve just re-invented Lisp.

Re-inventing Lisp is interesting. After all, Indu and Pandita are written in a Lisp dialect. The argument is that as programs become more and more complex it becomes more and more effective to have code that writes code. One approach is sophisticated type systems (though, of course, those also do other things) another approach is to have code that is easy to manipulate through having the syntax tree be the same as the code.

I buy this argument. However, it’s an argument that applies most strongly when you’re writing quite complicated programs. That is not the goal for Indu. Indu’s goal is to be a language that you can write programs in without being or becoming a professional programmer. One step on this path is to have syntax that reads like a human language.

If the Indu syntax will never be akin to the syntax tree, then using a flexible representation just creates a whole set of cases, that should never occur, that need to be handled in the parser and compiler.

The other possibility was to use a syntax representation that distinguished between the construct and the form — :binary-expression instead of :addition for example. At the moment there are no plans to allow Indu programs to write Indu programs. My suspicion is that a combination of objects that participate in lexical scoping and first class functions will provide for enough abstraction and remain a small language. Until that time, the internal representation will tend towards the simplest, most clear form.

Though I think this alternative is worth exploring instead of the Lisp approach.

This piece originally appeared on Giles Edwards-Alexander's personal blog, as VMs — Parsing and Syntax

Subscribe

Gain early access

Complete our brief survey to gain access to our beta launch and secure a discounted "Founder" price for life. Your feedback will help shape the future of Pandita and influence future product features.

Gain access