Phil Hassey - game dev blog
Phil Hassey as Snidely Whiplash
"You can't buy awesomeness.
You're born that way."

64k tinypy – parsing woes

I’ve been out of town for the last few days, so I haven’t done much to tinypy. However, I have realized that my current “dumbparse” module is pretty dumb. Two things are wrong with it:

  • It’s slow. Noticeably so with 8k modules, etc.
  • It’s dumb. It doesn’t give “useful” parsing errors.

I’ve read up a bit on LR, LL, top-down, and a whole bunch of kinds of parsers and it all sounds a bit magical to me. I’d be glad for some suggestions or ideas on where I could go from here. My requirements are:

  • must be pretty compact (I want my parser to be < 12k in size)
  • must not get exponentially slower on larger files (but it doesn’t have to be blindingly fast, just decent)
  • must be able to notice where a syntax error is (indicated by token – I’ve got a good tokenizer already)
  • must parse “basic” python and be written in that as well (see my current files for examples)

Thanks! svn://www.imitationpickles.org/tinypy/trunk – for the brave

8 Responses to “64k tinypy – parsing woes”

  1. Allefant Says:

    I have no idea about LR/LL/whatever either, but the way I did a python-like parser some time ago sort of works like this: Simply look at the first token, then depending on what it is, call a function to handle it. E.g. I see an “if”, then I call parse_if(). That now looks again at the first token, and so on. As a special case, I sometimes look ahead. For example:

    if 1 + 2 > 3:

    I get to the “1” like: parse_program() -> parse_block() -> parse_statement() -> parse_if() -> parse_expression(). Here, I look ahead to the > before I know that I can already handle the “1 + 2”. Anway, no idea if this is an algorithm with a name, to me it’s just basically doing it all “manually”.

    My code can be seen here, but I would advice against looking at it, as it’s not commented and written in my own pyplus derived language 😛 http://lland.svn.sourceforge.net/viewvc/lland/landmagic/src/
    Anyway, the relevant files would be pyrser.py and expression.py.

  2. Ben Says:

    Have a look at this (it’s a PDF of about 0.5 MB):

    _Compiler Construction_

    http://www.google.com/search?client=safari&rls=en-us&q=compiler+construction,+niklaus+wirth&ie=UTF-8&oe=UTF-8

    Niklaus Wirth is a master of compact, fast and simple recursive decent parsers. His book also has the great virtue of brevity.

    Sources available too:

    http://www.cs.inf.ethz.ch/~wirth/books/CompilerConstruction/

  3. philhassey Says:

    Ben – sounds interesting, I’ll take a look at it 🙂

  4. Kumar McMillan Says:

    I know it looks like a scary edu tool but I saw a presentation on the aperiot module once and it looked surprisingly easy to use. IIRC it includes a grammar for Python. As far as speed and error reporting, I saw demos and benchmarks that made it look very attractive. It may not be as compact as you like, I don’t know about this.

    http://moncs.cs.mcgill.ca/people/eposse/projects/aperiot/atglance.dtml

  5. philhassey Says:

    http://javascript.crockford.com/tdop/index.html

    was also a helpful resource.

  6. rh Says:

    Sounds really cool, but is there any chance of seeing a non-svn download? Even a snapshot or something, for those of us stuck behind firewalls where the svn protocol is unavailable.

  7. philhassey Says:

    rh – I’ll try and do that next time 🙂

  8. howard Says:

    Check out PyParsing
    http://pyparsing.wikispaces.com/