tinypy 64k – bootstrapped!
So.. hey, it’s done. Basically. tinypy is a 64k implementation of a subset of python. It can bootstrap itself into a single executable that can compile python files to bytecode and run them on a VM. Thanks to everyone who gave feedback thus far on this project. Double thanks to allefant who listened to me blab about it endlessly on irc for the last month 🙂
I found all the stuff people told me about for parsing was a huge help. This article http://javascript.crockford.com/tdop/tdop.html was what I ended basing it on. It’s almost like magic, but it makes for a really simple easy to follow parser. The VM is based on stuff I read about the lua VM.
So what’s next? I need to let it sit around for a week and then I’ll do a “release” I guess. I’ve gotta pick a license or something for it (probably MIT? I’m open for suggestions.) I’m also mulling over possible names. Maybe “tinypy” .. or “wedge” .. or “cupcake” .. or “garter”. Hmmn.
Anyway – I’m sure I’ll be tweaking it a bit over time, but I’m pretty happy to have it to this point now. I probably won’t do much with it until I try making a game with it. Right now it depends on libgc for garbage collection. If someone clever out there can implement a garbage collector for it that works in like 2-4k, that’d be better. My brain is pretty spent.
For the brave: svn://www.imitationpickles.org/tinypy/trunk or tinypy.zip. The following is only tested under linux, but I bet it would work in any bash environment. Maybe.
$ python boot.py
Will run the 3-phase testing + bootstrapping process. It will first use python to generate the .tpc files for the compiler. Second phase uses the VM to generate those same files. Third phase uses the bootstrapped tinypy executable to “re-bootstrap” tinypy to get the final version. The -nopos option strips out debug info from the .tpc files.
$ ./tinypy julia.py
Run the julia demo without dependence on *anything* but the tinypy executable.
$ ./tinypy your_own_code.py
Will do something! Probably print out a pretty traceback about how you tried to use a python feature / module that tinypy doesn’t support 🙂 “batteries not included”
January 31st, 2008 at 2:01 am
petite – pytite
January 31st, 2008 at 3:35 am
Pequeño is Spanish for small, and it’s pronounced peh-KEHN-yo, so maybe Pyqueno or Pyquenio.
January 31st, 2008 at 4:03 am
A simple two-space copying collector should be easy to implement. For a 1meg heap, allocate two memory spaces of that size. One is from-space, the other is to-space. Allocate new objects into from-space.
When it’s full, copy the roots into to-space contiguously, store a marker (e.g. all-zeros, or all-ones) in the first word of the roots, store where you moved them in the second word. Then walk to-space from the beginning to the end looking at each word. The content of the word is in from-space. If the pointed word is the marker, rewrite the word with the word just after (i.e. the “where you moved them” part above. Otherwise, copy the object to to-space and mark it the same way you did with the roots. When you have scanned all of to-space, discard from-space and swap the two spaces.
See “Cheney’s algorithm” on Wikipedia.
January 31st, 2008 at 4:09 am
Looking at the code, the marker could be a TP_COPIED node. It shouldn’t be more than 200 lines of code.
January 31st, 2008 at 4:52 am
You are following the steps of pyvm. That’s good because pyvm doesn’t seem to have any releases since 2006. I’m wondering what are your plants wrt python 3000?
I’ve just downloaded tinypy.zip and am now going to try it…
January 31st, 2008 at 6:25 am
worm – that is my suggestion. good work!!!!
January 31st, 2008 at 7:47 am
greetings!
could you provide us a list of (tested-)working build dependencies? a default (with build-essential) ubuntu feisty brings me a bunch of cast errors after running `python boot.py`:
warning: initialization makes pointer from integer without a cast
ending up with a AssertionError.
this looks intriguing at the least, though!
January 31st, 2008 at 8:33 am
It doesn’t work for me, you apparently have hex values in python which tinypy can’t handle. When I change them to dec it compiles, but it then tries to open /dev/fb0, which doesn’t exist in my Ubuntu installation.
Impressive job otherwise. I am definintely going to study it.
January 31st, 2008 at 8:42 am
The framebuffer thing was (obviously, in hindsight) because I was running it on a headless server. The hex thing still stands AFAIK, though.
January 31st, 2008 at 9:10 am
Installed libgc-dev in Debian. And it worked just fine 🙂 This was a good idea.
January 31st, 2008 at 9:45 am
I’ve just tried to compile it on Ubuntu without help. I installed libgc-dev and have enough of a toolchain installed to compile GTK+ stuff and daemons like Apache and others, so please provide a list of dependencies for the build process and more people will happily take a look.
January 31st, 2008 at 10:50 am
Deps are python (for bootstrapping), libgc, libsdl, gcc. I’ve only tried it on my one linux box. I guess I should put up a bug tracker or something sometime, but for now if you want, please post the error message in the comments.
hylje – can you post the error here?
Poromenos – what hex values? Can you post the error here?
Paolo – thanks for the tips, I’ll read up on that some-more.
Nymius – well, the plans are to speed up the dict implementation, make bug fixes, and maybe add my own GC. Other than that, it’s done. With a 64k limit, I really can’t add anything.
January 31st, 2008 at 11:32 am
with my Python (2.5.1c1) the tests complained. This is a hack-patch that fixes it, but I still get a mysterious exception later:
$ svn diff dump2vm.py
Index: dump2vm.py
===================================================================
— dump2vm.py (revision 370)
+++ dump2vm.py (working copy)
@@ -59,7 +59,10 @@
def do_number(t,r=None):
r = get_tmp(r)
code(“NUMBER”,r,0,0)
– write(fpack(float(t[‘value’])))
+ val = t[‘value’]
+ if type(val) is str and val.startswith(‘0x’):
+ val = int(val[2:], 16)
+ write(fpack(float(val)))
return r
def get_tag():
@@ -555,13 +558,15 @@
‘string’:do_string,’get’:do_get, ‘call’:do_call, ‘reg’:do_reg,
}
January 31st, 2008 at 11:38 am
worm, pytite and garter are all good names…
My preference for license is usually BSD (revised or new). It allows people to use/modify/redistribute/relicense, so long as they keep the original license in place with their distribution. (MIT removes this restriction I think and is slightly ‘freer’.)
January 31st, 2008 at 11:40 am
John – ah, thanks. I’m using python 2.4 and the float() builtin accepts hex strings. Thanks for the patch, I’ll work something like that in for my next release and test it against python 2.5.
January 31st, 2008 at 12:06 pm
John – I’ve patched svn so that it works with python 2.5 now. Please tell me if the hex issue persists for you. Thanks for tracking that down!
January 31st, 2008 at 12:21 pm
Nice work. For license I suggest the python license. It will keep things simple and everyone who works with python will know what to expect from the license.
January 31st, 2008 at 12:53 pm
I’ve updated the .zip to include the python 2.5 patch.
January 31st, 2008 at 1:01 pm
Mu suggestiong:
pygmie or pygmy
interesting project:)
All teh best
January 31st, 2008 at 2:09 pm
Very nice work. AT a previous job, we ended up choosing Lua over Python for an embedded device strictly because of size issues. This would have made things interesting.
Quick note, though: tests.py are failing for me, on an x86_64 machine running Python 2.5.1 on Ubuntu 7.10:
./tinypy tests.py
File “?”, line 0, in ?
File “tests.py”, line 378, in ?
,”OK”)
File “tests.py”, line 169, in t_render
if exact: assert(res == ex)
Exception:
assert failed
January 31st, 2008 at 2:11 pm
Awesome!
January 31st, 2008 at 2:17 pm
Tim – the test sent its output to tmp.txt – can you paste that to me? That will show where the error came from.
January 31st, 2008 at 2:40 pm
Very cool man. As a bytecode language snob, this kind of project really impresses me. A 64k runtime can make this suitable for a LOAD of cool embedded apps. Hell you could port it to BREW and write celphone apps in python. I would also like to add my vote for “cupcake” as a name.
January 31st, 2008 at 4:14 pm
@rahul
The Python license is specifically *not* suitable for use with other projects. Largely because Python has a rich and varied history and the license has gained all sorts of cruft as a result.
Both the BSD and MIT licenses are simple, widely recognised and widely understood.
January 31st, 2008 at 4:18 pm
Earthworm should be a nice name.
Or “Jim” 😉
January 31st, 2008 at 6:07 pm
tmp.txt:
File “tmp1.tpc”, line 5, in ?
C(“OK”).print()
File “tmp1.tpc”, line 4, in C_print
def print(self): print(self.data)
Exception:
tp_get: KeyError: data
January 31st, 2008 at 9:33 pm
Tim – this appears to have something to do with the -O3 option I’m passing to gcc in the final bootstrapping phase. Not quite sure what the deal is, but I’ll poke around a bit and see what I can figure out. (Anyone got any tips on debugging stuff like that? It works fine when I don’t use any -O options.)
January 31st, 2008 at 10:35 pm
-03 isn’t safe… and unfortunately some silly people use it for python compilation. Which stuffs up python extensions, and causes weird python bugs.
Best to use -02 or lower. I think you can debug with -02 these days too?
pygame has some distutil hacks to change gcc options if you want to look there.
January 31st, 2008 at 10:40 pm
Okay – svn has been updated to fix the odd-ball issue. I’ve got this _vm_raise() function that uses a longjmp and the optimizer, not seeing a return after that call was doing something screwy. Anyway, I adjusted my vm_raise macro to include a return right after the call to _vm_raise(), so now the optimizer knows not to do .. whatever it was doing. I also switched back to -O2 and enabled -Wall, both of which helped.
January 31st, 2008 at 11:17 pm
Pretty cool!
I suggest the Boost License http://boost.org/more/license_info.html
It’s like MIT and BSD, but I find that it’s easier to apply to new projects.
I particularly like the short form of the license:
// Copyright Joe Coder 2004 – 2006.
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt or copy at
// http://www.boost.org/LICENSE_1_0.txt)
January 31st, 2008 at 11:18 pm
That SVN update didn’t fix the problem for me. (Linux 64bit too)
gcc -Wall -O2 tinypy.c `sdl-config –cflags –libs` -lm -lgc -o tinypy
In file included from tinypy.c:2:
vm.c: In function ‘vm_signal’:
vm.c:329: warning: implicit declaration of function ‘strsignal’
./tinypy tests.py
File “?”, line 0, in ?
File “tests.py”, line 385, in ?
,”OK”)
File “tests.py”, line 178, in t_render
if exact: assert(res == ex)
Exception:
assert failed
[curtis@XXXXX trunk]$ cat tmp.txt
Exception:
tp_get: KeyError: data
January 31st, 2008 at 11:28 pm
Curtis – if you edit boot.py:82 and remove the -O2 does it work?
January 31st, 2008 at 11:42 pm
Removing -O2 succeeds.
Replacing -O2 with -O1 fails.
Replacing -O2 with -O0 succeeds.
February 1st, 2008 at 12:02 am
Hmn. I’m mystified. If you can track down the issue I’d sure appreciate it. For what it’s worth, when the problem was happening to me tinypy would emit different bytecode depending on if it were compiled with -O2 or not. But it isn’t happening here anymore.
February 1st, 2008 at 12:03 am
expanding the -O1 option (specifying the optimizations individually) succeeds
gcc -Wall -fdefer-pop -fguess-branch-probability -fcprop-registers -floop-optimize -fif-conversion -fif-conversion2 -ftree-ccp -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-ter -ftree-lrs -ftree-sra -ftree-copyrename -ftree-fre -ftree-ch -fmerge-constants tinypy.c `sdl-config –cflags –libs` -lm -lgc -o tinypy
Maybe its a compiler bug?
gcc version 4.0.0 20050519 (Red Hat 4.0.0-8)
February 1st, 2008 at 12:08 am
Maybe .. I’ve got:
gcc (GCC) 4.1.1 20060724 (prerelease) (4.1.1-3mdk)
and I had the same issue for a while before I did some cleanup on the code.
February 1st, 2008 at 3:24 am
Curtis – can you give it another svn update and try again? I did a bit more tweaking on stuff which looked suspicious. Thanks!
February 1st, 2008 at 8:20 am
[…] Linkage 2007.02.012008-02-01 09:20:45 by mike in links (no comments) permalink Tinypy 64k […]
February 1st, 2008 at 9:25 am
latest svn trunk:
gcc -Wall -O2 tinypy.c `sdl-config –cflags –libs` -lm -lgc -o tinypy
In file included from tp.c:6,
from vm.c:1,
from tinypy.c:2:
builtins.c: In function ‘tp_round’:
builtins.c:180: warning: implicit declaration of function ‘roundf’
builtins.c:180: warning: incompatible implicit declaration of built-in function ‘roundf’
In file included from tinypy.c:2:
vm.c: In function ‘vm_signal’:
vm.c:326: warning: implicit declaration of function ‘strsignal’
./tinypy tests.py
File “?”, line 0, in ?
File “tests.py”, line 385, in ?
,”OK”)
File “tests.py”, line 178, in t_render
if exact: assert(res == ex)
Exception:
assert failed
$ cat tmp.txt
File “tmp1.tpc”, line 5, in ?
C(“OK”).print()
File “tmp1.tpc”, line 4, in C_print
def print(self): print(self.data)
Exception:
tp_get: KeyError: data
February 1st, 2008 at 9:45 am
Maybe you can use tinyscheme’s garbage collector (mark & sweep IIRC).
See http://tinyscheme.sourceforge.net/home.html
February 1st, 2008 at 4:39 pm
Does hello world work?
print ‘hello world’
Might be a better demo for those of us running headless computers 🙂 ooh… or the ascii SDL driver.
Unfortunately your code breaks with the SDL ascii art backend (at least on my system).
export SDL_VIDEODRIVER=aalib
320 240
[New Thread -1213109872 (LWP 8951)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1213109872 (LWP 8951)]
0x08048ffd in real_set_pixel ()
(gdb) where
#0 0x08048ffd in real_set_pixel ()
#1 0x0804d625 in set_pixel ()
#2 0x08048f5c in _dcall ()
#3 0x0804c0ba in _tcall ()
#4 0x0804f5a7 in _vm_call ()
#5 0x08051ab5 in tp_step ()
#6 0x08053158 in vm_run_1 ()
#7 0x080531e9 in vm_call ()
#8 0x080538e2 in main ()
February 1st, 2008 at 4:42 pm
ah, your tinypy seems to be py3k compatible already 😉
Since print ‘hello world’ doesn’t work, but print (‘hello world’) does!
Sweet.
February 2nd, 2008 at 12:06 am
Rene – I totally designed this with the future in mind. Actually – since py3k strips out some of the “crufty” syntax of python, I figured I might as well go that way – makes things easier for me.
February 6th, 2008 at 10:14 am
Phil: Way to go! This looks very cool.
Don’t you want “-Os” instead of “-O2”?
Here’s a permissive licence that I created by compressing the MIT licence:
“Permission is hereby granted to any person obtaining a copy of this work to deal in this work without restriction (including the rights to use, modify, distribute, sublicense, and/or sell copies).”
https://zooko.com/simple_permissive_licence.html
March 27th, 2008 at 9:22 am
About simple permissive license, check , linked from .
November 11th, 2008 at 6:25 pm
Hi, great work!
Works fine here with FreeBSD 7.1 and Python 2.5.2
Vote for BSD Licence!
Question: What disatvantages has this python implementation ?
January 18th, 2009 at 10:29 am
[…] on a small python derivative called > tinypy. Sorry, I meant to include a relevant link: http://www.philhassey.com/blog/2008/…-bootstrapped/ […]