Wednesday, May 03, 2006

Profiling Pydev

The first thing I profiled was the grammar used, since it is used among many places, I believe it to be the something that would have a good impact overall.

One of the things we end up doing when testing some stuff is embeeding some file -- say, an image -- into a .py file. Among those, we could get a 3-4 mb sized file, and pydev would take a long time to parse it, so, my first target was to speed that up.

Making 'general' changes thinking it will get better is usually a mistake, as you need a way to measure those impacts. I started then with a file with a couple of statements and a single 'huge' multiline string and made my target optimizing parsing that file faster.

After playing around a little in the parser I discovered that the actuall speed loss was not at the grammar itself, but at the Reader that should give the chars to the tokenizer. After looking at its code, you could see it allocateded lots of memory in the process, so, I decided to create another reader from scratch and with the help of some unit-tests, and the results were pretty impressive (for big files):

Parsing a huge .py file (3mb) it was taking about 4-5 minutes... now it only takes 2-3 seconds (yeah, the previous approach had an 'exponential' behaviour depending exclusively only on the size of some file, not to mention that it would make the garbage collector work a lot more).

This will be available for 1.0.7 -- But before I do release it, I'm still looking for other 'hotspots' to optimize ;-)

Fabio

5 comments:

Kevin Menard said...

Fabio,

Given all the work you put into the indentation engine, is it likely we'll see pretty print support of existing code any time soon? While not the most gradiose of features, being able to uniformly format a source file that several people have touched would be a very welcome addition.

Thanks,
Kevin

Fabio Zadrozny said...

Hi Kevin,

Actually, internally those features are not very alike.

The indentation engine just 'positions' things and works on the source-code to know things, whereas a pretty-printer would probably work on the AST (Abstract Syntax Tree).

Now, having said that, there's already quite some work I did on a pretty-printer that works on the AST level (there is already a code-format utility that works on the source-code level, but it is kind of limited: Ctrl+Shift+F).

So, yes, it should not take long before it is available, but it will be added only after the 1.1 release (I'm currently working on profiling and bug-fixing for that release).

-- Fabio

Kevin Menard said...

Sounds good. The recent PyDev releases have been pretty solid (especially with the recent debugger fixes), so I don't have much to complain about.

joram said...

Hi Fabio,

going to the definition of a function (F3) also takes a lot of time in my projects.
Maybe some profiling in that area could increase the speed there.

Thanks,
Joram

Fabio Zadrozny said...

Hi Joram,

Actually, if you're using pydev and not pydev extensions, it uses BRM (Bicycle repair man), and BRM does have a number of deficiencies (and I just integrate it -- don't really support it).

I do support the go to definition feature of pydev extensions (http://pydev.sf.net) -- which should be pretty fast and much less error prone.

Cheers,

Fabio