We recently had the good fortune to upgrade our build machine to something a little more powerful. Although the quoted CPU clock speed is actually slightly lower than our previous machine the new one has two quad-core Core i7 processors with hyperthreading whereas the old one had one dual-core processor. The new one also supports Turbo Boost which provides a limited sanctioned-by-Intel over clocking mechanism.
I eventually got Debian GNU/Linux 5.0 (Lenny) installed after some trouble that I'm putting down to out-of-date mirrors and set about configuring the machine. After a bit of fiddling I discovered that by default the modules required for Turbo Boost (and SpeedStep) weren't loaded. I added acpi-cpufreq and cpufreq_ondemand to /etc/modules and noted that the frequencies did seem to change when the system was under load by inspecting the files in /sys/devices/system/cpu/cpu*/cpufreq/. Mission accomplished I carried on configuring the machine and we started using it for real work.
But, a few days later I started noticing processes taking far longer than they should have done to complete. They were always stuck consuming 100% CPU. C++ compilations that usually took at most a couple of seconds were taking over twelve minutes!
My dabblings with strace(1) and time(1) lead me to believe that the programs appeared to be spending most of their time stuck in kernel. Initially all the afflicted processes were running in a 32-bit chroot so I suspected that the problem was related to that.
It was when my VNC slowed to a complete crawl I knew that the problem wasn't related to the chroot and that it needed solving. My usual techniques for investigation didn't yield anything useful so I decided to try running a more modern kernel. Lenny uses a v2.6.26 kernel which was released before Core i7 (although of course it may have been patched since by Debian.) The version in backports is v2.6.30 which is much newer. Initial tests failed to reproduce the problem but it took a couple of weeks of good behaviour before I believed that the problem was solved.
Wednesday, 2 September 2009
Friday, 28 August 2009
Keeping old branches building when the world has moved on
I recently upgraded the system used to build our software to Debian GNU/Linux x86-64. The main software is cross-compiled for an embedded target but there are various host programs and unit tests that are compiled natively. Parts of the software are quite old and don't compile successfully for 64-bit. Luckily -m32 mostly solved this problem so everything keeps on compiling. The cross compiler is compiled as 64-bit so it can take advantage of the extra registers to hopefully improve performance.
Unfortunately -m32 didn't solve everything. The software required many libraries and not all of these were included in the various 32-bit library compatibility packages provided by Debian. The best solution was to modify the build system to build these libraries too. This makes us less reliant on the operating system versions and let me use the same versions as we do with buildroot for the target which reaped other benefits too. One day I'd like to move to a world where the only part of the host system that is used is the compiler to compile own host and cross compilers.
Eventually the trunk is all fine and everything builds happily. But what about older branches? The changes were quite invasive and applying them to supposedly stable branches didn't give me a warm fuzzy feeling. The logical solution was to build them in an entire 32-bit environment. I considered using something like VirtualBox or QEMU but using a completely separate host would have caused confusion for all the developers.
In the end I discovered schroot and followed the instructions. It was very quick and easy to set up. I added various bind mounts to /etc/fstab to ensure that /home was mounted and it was then easy to set a 32-bit environment build going with just a single command. The instructions in the link are clear so I won't go over them again. I made use of linux32 to ensure that uname gave the expected result.
But, I thought it could be even simpler.
Each branch knows what build environments it supports. This can easily be indicated via a file in the top level directory. If the file doesn't exist then some sensible defaults can be used. Once this information exists it is easy to write a script which reads this, checks against the current native build environment and either runs the build directly or inside the chroot. Something like this:
The actual script is more complex than this because it contains better error handling. It also deals with make -C dir and deciding whether -jN can be passed too.
This can easily be extended to support building under different chroots for older operating system versions for example.
Unfortunately -m32 didn't solve everything. The software required many libraries and not all of these were included in the various 32-bit library compatibility packages provided by Debian. The best solution was to modify the build system to build these libraries too. This makes us less reliant on the operating system versions and let me use the same versions as we do with buildroot for the target which reaped other benefits too. One day I'd like to move to a world where the only part of the host system that is used is the compiler to compile own host and cross compilers.
Eventually the trunk is all fine and everything builds happily. But what about older branches? The changes were quite invasive and applying them to supposedly stable branches didn't give me a warm fuzzy feeling. The logical solution was to build them in an entire 32-bit environment. I considered using something like VirtualBox or QEMU but using a completely separate host would have caused confusion for all the developers.
In the end I discovered schroot and followed the instructions. It was very quick and easy to set up. I added various bind mounts to /etc/fstab to ensure that /home was mounted and it was then easy to set a 32-bit environment build going with just a single command. The instructions in the link are clear so I won't go over them again. I made use of linux32 to ensure that uname gave the expected result.
But, I thought it could be even simpler.
Each branch knows what build environments it supports. This can easily be indicated via a file in the top level directory. If the file doesn't exist then some sensible defaults can be used. Once this information exists it is easy to write a script which reads this, checks against the current native build environment and either runs the build directly or inside the chroot. Something like this:
top=.
while [ ! / -ef "${top}" -a ! -f "${top}/Make.rules" ]; do
top=${top}/..
done
if [ / -ef "${top}" ]; then
echo "Not in source tree."
exit 1
fi
# Defaults
HOST_BUILD_ENVIRONMENTS=i686
if [ -f "${top}/BuildInfo" ]; then
source "${top}/BuildInfo"
fi
this_env="`uname -m`"
use_chroot=
for e in ${HOST_BUILD_ENVIRONMENTS}
if [ "${e}" = "{this_env}" ]; then
use_chroot=
break
elif [ "${e} = "i686" ]; then
use_chroot="$chroot_i686"
fi
done
if [ -n "${use_chroot}" ]; then
echo "Compiling in ${use_chroot} chroot"
schroot -p -q -c ${use_chroot} -d `pwd` -- linux32 make "$@"; then
else
make "$@"
fi
The actual script is more complex than this because it contains better error handling. It also deals with make -C dir and deciding whether -jN can be passed too.
This can easily be extended to support building under different chroots for older operating system versions for example.
Thursday, 30 April 2009
FAT alternative
An interesting opinion has appeared in relation to TomTom's settlement with Microsoft regarding the viability of file systems other than FAT.
Jim Zemlin wrote in his response to the settlement:
This was also quoted again by Groklaw in their article. But is this actually true? The major benefit that FAT brings that other file systems do not is its ubiquity. It's supported without the need to install third party code on Windows, MacOSX and Linux along with countless other devices and operating systems. No other single file system has such cross platform support.
If you're developing an embedded device with internal storage (e.g. a PTP camera or MTP media player with built-in flash memory) then you can get away with using whichever file system you like (and I've worked on products which used ext2 and reiserfs in this situation.) Unfortunately as soon as you start using removable storage or need to make your built-in storage available as a block device over USB mass storage class or similar then you need to be interoperable. Being interoperable means using FAT or if you are very lucky and you have a knowledgeable user base a file system such as ext3 which can be supported on Windows and MacOSX with a little work.
FAT's simplicity makes it even more useful for interoperability. It's lack of support for ownership and ACLs means that you can just plug a USB key in and start reading and writing to it immediately. A more advanced file system such as ext3 just gets in the way if your UID doesn't match between different machines or you give the key to someone else. This problem is less of a worry for an embedded device which may just run as root anyway or can be hacked to work around this. On the desktop there may be a case here for supporting mounts in a special “ignore permissions” or “everybody squash” mode to solve this problem.
This topic has become important to be because recently I've been looking into alternatives to FAT for a completely different reason: resilience. FAT, and in particular FAT as implemented on Linux, is highly prone to corruption if someone pulls the USB key or the power cable. Other file systems such as ext3 are much more capable of dealing with this.
Jim Zemlin wrote in his response to the settlement:
The technology at the heart of this settlement is the FAT filesystem. As acknowledged by Microsoft in the press release, this file system is easily replaced with multiple technology alternatives.
This was also quoted again by Groklaw in their article. But is this actually true? The major benefit that FAT brings that other file systems do not is its ubiquity. It's supported without the need to install third party code on Windows, MacOSX and Linux along with countless other devices and operating systems. No other single file system has such cross platform support.
If you're developing an embedded device with internal storage (e.g. a PTP camera or MTP media player with built-in flash memory) then you can get away with using whichever file system you like (and I've worked on products which used ext2 and reiserfs in this situation.) Unfortunately as soon as you start using removable storage or need to make your built-in storage available as a block device over USB mass storage class or similar then you need to be interoperable. Being interoperable means using FAT or if you are very lucky and you have a knowledgeable user base a file system such as ext3 which can be supported on Windows and MacOSX with a little work.
FAT's simplicity makes it even more useful for interoperability. It's lack of support for ownership and ACLs means that you can just plug a USB key in and start reading and writing to it immediately. A more advanced file system such as ext3 just gets in the way if your UID doesn't match between different machines or you give the key to someone else. This problem is less of a worry for an embedded device which may just run as root anyway or can be hacked to work around this. On the desktop there may be a case here for supporting mounts in a special “ignore permissions” or “everybody squash” mode to solve this problem.
This topic has become important to be because recently I've been looking into alternatives to FAT for a completely different reason: resilience. FAT, and in particular FAT as implemented on Linux, is highly prone to corruption if someone pulls the USB key or the power cable. Other file systems such as ext3 are much more capable of dealing with this.
Friday, 10 April 2009
Fixing gnome-volume-manager photo import after upgrading from Etch to Lenny
I was somewhat confused by the fact that gthumb no longer popped up automatically when I plugged my camera in after upgrading from Debian GNU/Linux 4.0 (Etch) to 5.0 (Lenny). Google searches offered no hints so I was forced to dig into it myself.
It appears that under Etch I'd ticked the "Always Perform This Action" box when selecting "Import Photos" from the popup. This caused the /desktop/gnome/volume_manager/prompts/camera_import_photos gconf key to contain the value 5. It seems that this value causes the arrival of the camera to have no visible effect in the Lenny version of gnome-volume-manager.
The fix is to run gconf-editor and reset the aforementioned value to zero so that the popup appears once more. Plugging the camera in again and ticking the box again then results in the value 6 being written back.
It appears that under Etch I'd ticked the "Always Perform This Action" box when selecting "Import Photos" from the popup. This caused the /desktop/gnome/volume_manager/prompts/camera_import_photos gconf key to contain the value 5. It seems that this value causes the arrival of the camera to have no visible effect in the Lenny version of gnome-volume-manager.
The fix is to run gconf-editor and reset the aforementioned value to zero so that the popup appears once more. Plugging the camera in again and ticking the box again then results in the value 6 being written back.
Wednesday, 21 January 2009
Regaining Control Over Object Creation Through Constructor Hiding
In a step up from blogging random bits of useless information I've written an article containing useless information entitled “Regaining Control Over Object Creation Through Constructor Hiding.” The article can be found in the January 2009 edition of the ACCU journal CVu.
NFS mount yields: RPC: failed to contact portmap
I do most of my embedded software development whilst running from an NFS mounted root directory. I was therefore rather confused when I was unable to mount a different path on the same server. The following just appeared very slowly:
The cause is simple. My device isn't running a port mapper daemon. But why should I run such a daemon? The kernel can mount my root filesystem without one!
In order to mount the filesystem the kernel is trying to talk to the local port mapper daemon in order to find the local lock daemon - I don't have one of those either. The problem can easily be fixed by passing “-o nolock”
~# mount server:/path /mnt
portmap: server localhost not responding, timed out
RPC: failed to contact portmap (errno -5).
portmap: server localhost not responding, timed out
RPC: failed to contact portmap (errno -5).
lockd_up: makesock failed, error=-5
portmap: server localhost not responding, timed out
RPC: failed to contact portmap (errno -5).
mount: Mounting server:/path on /mnt failed: Input/output error
The cause is simple. My device isn't running a port mapper daemon. But why should I run such a daemon? The kernel can mount my root filesystem without one!
In order to mount the filesystem the kernel is trying to talk to the local port mapper daemon in order to find the local lock daemon - I don't have one of those either. The problem can easily be fixed by passing “-o nolock”
~# mount -o nolock server:/path /mnt
Friday, 14 November 2008
Why you shouldn't use the tool chain supplied by your embedded Linux vendor
Linux is understandably popular as an operating system for embedded systems. SoC vendors in particular like to supply a Linux "distribution" that is tailored for their platform. This "distribution" usually includes the binaries for an appropriate toolchain. In my opinion you should stop using this toolchain long before you are ready to release your product.
The binary toolchain supplied by your vendor is easy, it might even be statically linked making it even easier to use. Just plonk it in the right place and everything magically works.
But,
1. One day you're going to have a bug. That bug will either require some debugging in files that are supplied as part of the toolchain (ld.so, glibc) or require you to make some modifications to the toolchain in order to help investigate the problem. You might even find a toolchain bug that you need to fix. This sort of problem is almost guaranteed to occur at a point when it is deemed too dangerous to switch to a self-compiled tool chain. That's if your lucky and you have the source code for the tool chain and it actually compiles for you.
2. In order to comply with the GPL you need to release working source code for certain bits of the tool chain anyway. How can you be sure that this stuff actually compiles unless you have done so yourself?
3. You need to support your product long after your vendor has moved on to another generation of chips. Will their toolchain still work on whatever host operating system you are using then?
So, at the very least you should get the source code for the toolchain from your vendor and then compile it yourself. Use the version you compiled yourself. This leaves you in a much better position when the unexpected occurs. If your vendor won't give you the source for a toolchain they've given you in binary form then find another vendor that understands software licensing.
Of course you could just compile your own tool chain from scratch and use that but creating cross-compilation toolchains is certainly not easy - perhaps that subject is worthy of a future post.
The binary toolchain supplied by your vendor is easy, it might even be statically linked making it even easier to use. Just plonk it in the right place and everything magically works.
But,
1. One day you're going to have a bug. That bug will either require some debugging in files that are supplied as part of the toolchain (ld.so, glibc) or require you to make some modifications to the toolchain in order to help investigate the problem. You might even find a toolchain bug that you need to fix. This sort of problem is almost guaranteed to occur at a point when it is deemed too dangerous to switch to a self-compiled tool chain. That's if your lucky and you have the source code for the tool chain and it actually compiles for you.
2. In order to comply with the GPL you need to release working source code for certain bits of the tool chain anyway. How can you be sure that this stuff actually compiles unless you have done so yourself?
3. You need to support your product long after your vendor has moved on to another generation of chips. Will their toolchain still work on whatever host operating system you are using then?
So, at the very least you should get the source code for the toolchain from your vendor and then compile it yourself. Use the version you compiled yourself. This leaves you in a much better position when the unexpected occurs. If your vendor won't give you the source for a toolchain they've given you in binary form then find another vendor that understands software licensing.
Of course you could just compile your own tool chain from scratch and use that but creating cross-compilation toolchains is certainly not easy - perhaps that subject is worthy of a future post.
Subscribe to:
Posts (Atom)