The general idea is that you can change the clock prescaler value at runtime -- boost the CPU when you need it, switch to lower frequency when idle. He got pretty good results in terms of power consumption. This approach may even make recompiling the bootloader unnecessary in your case -- you can still run the chip at 8 or 16 MHz, and as soon as the bootloader starts the main program -- switch to a lower frequency.
Dynamic change of clock speed
This guy did some interesting work on the subject, you may want to search throughout his blog for tips and tricks:
http://news.jeelabs.org/2009/07/05/ligthy-power-save/
The general idea is that you can change the clock prescaler value at runtime -- boost the CPU when you need it, switch to lower frequency when idle. He got pretty good results in terms of power consumption. This approach may even make recompiling the bootloader unnecessary in your case -- you can still run the chip at 8 or 16 MHz, and as soon as the bootloader starts the main program -- switch to a lower frequency.