64k tinypy – parsing woes
I’ve been out of town for the last few days, so I haven’t done much to tinypy. However, I have realized that my current “dumbparse” module is pretty dumb. Two things are wrong with it:
- It’s slow. Noticeably so with 8k modules, etc.
- It’s dumb. It doesn’t give “useful” parsing errors.
I’ve read up a bit on LR, LL, top-down, and a whole bunch of kinds of parsers and it all sounds a bit magical to me. I’d be glad for some suggestions or ideas on where I could go from here. My requirements are:
- must be pretty compact (I want my parser to be < 12k in size)
- must not get exponentially slower on larger files (but it doesn’t have to be blindingly fast, just decent)
- must be able to notice where a syntax error is (indicated by token – I’ve got a good tokenizer already)
- must parse “basic” python and be written in that as well (see my current files for examples)
Thanks! svn://www.imitationpickles.org/tinypy/trunk – for the brave
January 22nd, 2008 at 11:33 am
I have no idea about LR/LL/whatever either, but the way I did a python-like parser some time ago sort of works like this: Simply look at the first token, then depending on what it is, call a function to handle it. E.g. I see an “if”, then I call parse_if(). That now looks again at the first token, and so on. As a special case, I sometimes look ahead. For example:
if 1 + 2 > 3:
I get to the “1” like: parse_program() -> parse_block() -> parse_statement() -> parse_if() -> parse_expression(). Here, I look ahead to the > before I know that I can already handle the “1 + 2”. Anway, no idea if this is an algorithm with a name, to me it’s just basically doing it all “manually”.
My code can be seen here, but I would advice against looking at it, as it’s not commented and written in my own pyplus derived language 😛 http://lland.svn.sourceforge.net/viewvc/lland/landmagic/src/
Anyway, the relevant files would be pyrser.py and expression.py.
January 22nd, 2008 at 1:41 pm
Have a look at this (it’s a PDF of about 0.5 MB):
_Compiler Construction_
http://www.google.com/search?client=safari&rls=en-us&q=compiler+construction,+niklaus+wirth&ie=UTF-8&oe=UTF-8
Niklaus Wirth is a master of compact, fast and simple recursive decent parsers. His book also has the great virtue of brevity.
Sources available too:
http://www.cs.inf.ethz.ch/~wirth/books/CompilerConstruction/
January 22nd, 2008 at 1:52 pm
Ben – sounds interesting, I’ll take a look at it 🙂
January 22nd, 2008 at 4:17 pm
I know it looks like a scary edu tool but I saw a presentation on the aperiot module once and it looked surprisingly easy to use. IIRC it includes a grammar for Python. As far as speed and error reporting, I saw demos and benchmarks that made it look very attractive. It may not be as compact as you like, I don’t know about this.
http://moncs.cs.mcgill.ca/people/eposse/projects/aperiot/atglance.dtml
January 23rd, 2008 at 10:07 am
http://javascript.crockford.com/tdop/index.html
was also a helpful resource.
January 25th, 2008 at 8:42 pm
Sounds really cool, but is there any chance of seeing a non-svn download? Even a snapshot or something, for those of us stuck behind firewalls where the svn protocol is unavailable.
January 25th, 2008 at 9:29 pm
rh – I’ll try and do that next time 🙂
January 26th, 2008 at 4:41 am
Check out PyParsing
http://pyparsing.wikispaces.com/