Phil Hassey » Blog Archive » 64k tinypy

64k tinypy – parsing woes

I’ve been out of town for the last few days, so I haven’t done much to tinypy. However, I have realized that my current “dumbparse” module is pretty dumb. Two things are wrong with it:

It’s slow. Noticeably so with 8k modules, etc.
It’s dumb. It doesn’t give “useful” parsing errors.

I’ve read up a bit on LR, LL, top-down, and a whole bunch of kinds of parsers and it all sounds a bit magical to me. I’d be glad for some suggestions or ideas on where I could go from here. My requirements are:

must be pretty compact (I want my parser to be < 12k in size)
must not get exponentially slower on larger files (but it doesn’t have to be blindingly fast, just decent)
must be able to notice where a syntax error is (indicated by token – I’ve got a good tokenizer already)
must parse “basic” python and be written in that as well (see my current files for examples)

Thanks! svn://www.imitationpickles.org/tinypy/trunk – for the brave

This entry was posted on Tuesday, January 22nd, 2008 at 10:03 am and is filed under crazy, development, python, tinypy. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

8 Responses to “64k tinypy – parsing woes”

Allefant Says:
January 22nd, 2008 at 11:33 am
I have no idea about LR/LL/whatever either, but the way I did a python-like parser some time ago sort of works like this: Simply look at the first token, then depending on what it is, call a function to handle it. E.g. I see an “if”, then I call parse_if(). That now looks again at the first token, and so on. As a special case, I sometimes look ahead. For example:

if 1 + 2 > 3:

I get to the “1” like: parse_program() -> parse_block() -> parse_statement() -> parse_if() -> parse_expression(). Here, I look ahead to the > before I know that I can already handle the “1 + 2”. Anway, no idea if this is an algorithm with a name, to me it’s just basically doing it all “manually”.

My code can be seen here, but I would advice against looking at it, as it’s not commented and written in my own pyplus derived language 😛 http://lland.svn.sourceforge.net/viewvc/lland/landmagic/src/
Anyway, the relevant files would be pyrser.py and expression.py.
Ben Says:
January 22nd, 2008 at 1:41 pm
Have a look at this (it’s a PDF of about 0.5 MB):

_Compiler Construction_

http://www.google.com/search?client=safari&rls=en-us&q=compiler+construction,+niklaus+wirth&ie=UTF-8&oe=UTF-8

Niklaus Wirth is a master of compact, fast and simple recursive decent parsers. His book also has the great virtue of brevity.

Sources available too:

http://www.cs.inf.ethz.ch/~wirth/books/CompilerConstruction/
philhassey Says:
January 22nd, 2008 at 1:52 pm
Ben – sounds interesting, I’ll take a look at it 🙂
Kumar McMillan Says:
January 22nd, 2008 at 4:17 pm
I know it looks like a scary edu tool but I saw a presentation on the aperiot module once and it looked surprisingly easy to use. IIRC it includes a grammar for Python. As far as speed and error reporting, I saw demos and benchmarks that made it look very attractive. It may not be as compact as you like, I don’t know about this.

http://moncs.cs.mcgill.ca/people/eposse/projects/aperiot/atglance.dtml
philhassey Says:
January 23rd, 2008 at 10:07 am
http://javascript.crockford.com/tdop/index.html

was also a helpful resource.
rh Says:
January 25th, 2008 at 8:42 pm
Sounds really cool, but is there any chance of seeing a non-svn download? Even a snapshot or something, for those of us stuck behind firewalls where the svn protocol is unavailable.
philhassey Says:
January 25th, 2008 at 9:29 pm
rh – I’ll try and do that next time 🙂
howard Says:
January 26th, 2008 at 4:41 am
Check out PyParsing
http://pyparsing.wikispaces.com/

Phil Hassey - game dev blog
	"Look, stores don't sell costumes like this to just anyone."