Wednesday, 14 October 2009

Running unit tests under gdb: Part II

In part one I described a script which could be used to easily run unit tests under gdb. In describing the script I wrote:

It's not ideal — it corrupts quoting of its arguments and arbitrarily limits the number of arguments but it does work.

I am indebted to a loyal reader for pointing out the rather obvious --args gdb parameter. I'm not really sure how I'd missed it. This makes fixing the script to remove those flaws trivial. If you're running only one test that you know is going to fail then it may be sufficient to just use RUN_TEST="gdb --args". However, if the problem is intermittent or you're running a whole gaggle of tests then the fixed script is still useful.

This version of the script also ensures that the temporary commands file is cleaned up too.

cmds=`tempfile -p rug`
: > $cmds
echo 'run' >> $cmds
echo 'if $_exitcode == 0' >> $cmds
echo ' quit' >> $cmds
echo 'end' >> $cmds
trap 'if [ -n "$cmds" ]; then rm -f "$cmds" ;exit 1; fi' SIGINT SIGHUP SIGTERM ERR

if gdb --return-child-result --quiet -x $cmds --args "$@"; then
rm -f $cmds
rm -f $cmds
exit $result

Tuesday, 13 October 2009

Running unit tests under gdb: Part I

Our build system usually runs all the unit tests under valgrind. It does this by preceding the invocation of the compiled unit test with $(RUN_TEST) as in this simplified version:
my_unittest.pass : my_unittest
$(RUN_TEST) my_unittest
touch $@
This means that it's easy to run the test without valgrind by invoking
make RUN_TEST= my_unittest.pass
This can be useful when you know the test is going to fail and you don't want the huge amount of output that valgrind will spew when you abort() from the depths.

Sometimes it would be handy to run the unit tests under gdb but unfortunately invoking a program under gdb is not the same as invoking it normally so RUN_TEST=gdb doesn't work. Compare:
valgrind myprog myargs...
gdb myprog
run myargs...
So, what's needed is a gdb wrapper script that turns the former method of invocation into the latter. It turns out that it's possible to do even better than that. The following script runs the program under gdb and if it succeeds just exits normally so that the next test can be tried. If the program breaks due to an abort then the if statement unfortunately produces an error message but it does have the intended effect — you're left at the gdb prompt.

Here's my run-under-gdb script:
cmds=`tempfile -p rug`
: > $cmds
echo "run $1 $2 $3 $4 $5 $6 $7 $8 $9" >> $cmds
echo 'if $_exitcode == 0' >> $cmds
echo ' quit' >> $cmds
echo 'end' >> $cmds
if gdb -x $cmds $prog; then
rm -f $cmds
rm -f $cmds
exit $result

It's not ideal — it corrupts quoting of its arguments and arbitrarily limits the number of arguments but it does work. Part two contains an improved script that solves these problems and provides other fixes.

Monday, 5 October 2009

A better way to view man pages in a different window

In Episode 13 of Season 2 of the Ubuntu UK Podcast the “Command Line Lurve” segment was supplied by Alan Bell. He found that reading man pages in a terminal to be painful because the terminal was where he wanted to being typing the command options. His script ran gnome-help in the background to view the man page in a different window.

Alan Bell's script prompted me to write an improved version and send it in along with this email:

I listened to Alan Bell's command line luurve segment in your podcast
this evening. Whilst his script works it inhibits much of man(1)'s
functionality. In particular it does not support specifying the manual
section (compare "man open" and "man 2 open" for example.)

Here's my alternative that maintains this functionality and
automatically falls back to standard man if any options are supplied:

case "$1" in
"") x= ;;
-*) x= ;;
if [ -n "$x" ]; then
for i in "$@"; do
case "$i" in
[0-9]) section="($i)" ;;
*) gnome-help "man:$i$section" >/dev/null 2>&1 & ;;
exec man "$@"

The script also makes specifying multiple pages at once more useful
than it is with man(1).
It can be aliased to man if required as Alan described.

They were nice enough to read out my email in Episode 14 but the script didn't appear in the show notes. So here it is.

Wednesday, 2 September 2009

Running Debian GNU/Linux 5.0 (Lenny) on Core i7

We recently had the good fortune to upgrade our build machine to something a little more powerful. Although the quoted CPU clock speed is actually slightly lower than our previous machine the new one has two quad-core Core i7 processors with hyperthreading whereas the old one had one dual-core processor. The new one also supports Turbo Boost which provides a limited sanctioned-by-Intel over clocking mechanism.

I eventually got Debian GNU/Linux 5.0 (Lenny) installed after some trouble that I'm putting down to out-of-date mirrors and set about configuring the machine. After a bit of fiddling I discovered that by default the modules required for Turbo Boost (and SpeedStep) weren't loaded. I added acpi-cpufreq and cpufreq_ondemand to /etc/modules and noted that the frequencies did seem to change when the system was under load by inspecting the files in /sys/devices/system/cpu/cpu*/cpufreq/. Mission accomplished I carried on configuring the machine and we started using it for real work.

But, a few days later I started noticing processes taking far longer than they should have done to complete. They were always stuck consuming 100% CPU. C++ compilations that usually took at most a couple of seconds were taking over twelve minutes!

My dabblings with strace(1) and time(1) lead me to believe that the programs appeared to be spending most of their time stuck in kernel. Initially all the afflicted processes were running in a 32-bit chroot so I suspected that the problem was related to that.

It was when my VNC slowed to a complete crawl I knew that the problem wasn't related to the chroot and that it needed solving. My usual techniques for investigation didn't yield anything useful so I decided to try running a more modern kernel. Lenny uses a v2.6.26 kernel which was released before Core i7 (although of course it may have been patched since by Debian.) The version in backports is v2.6.30 which is much newer. Initial tests failed to reproduce the problem but it took a couple of weeks of good behaviour before I believed that the problem was solved.

Friday, 28 August 2009

Keeping old branches building when the world has moved on

I recently upgraded the system used to build our software to Debian GNU/Linux x86-64. The main software is cross-compiled for an embedded target but there are various host programs and unit tests that are compiled natively. Parts of the software are quite old and don't compile successfully for 64-bit. Luckily -m32 mostly solved this problem so everything keeps on compiling. The cross compiler is compiled as 64-bit so it can take advantage of the extra registers to hopefully improve performance.

Unfortunately -m32 didn't solve everything. The software required many libraries and not all of these were included in the various 32-bit library compatibility packages provided by Debian. The best solution was to modify the build system to build these libraries too. This makes us less reliant on the operating system versions and let me use the same versions as we do with buildroot for the target which reaped other benefits too. One day I'd like to move to a world where the only part of the host system that is used is the compiler to compile own host and cross compilers.

Eventually the trunk is all fine and everything builds happily. But what about older branches? The changes were quite invasive and applying them to supposedly stable branches didn't give me a warm fuzzy feeling. The logical solution was to build them in an entire 32-bit environment. I considered using something like VirtualBox or QEMU but using a completely separate host would have caused confusion for all the developers.

In the end I discovered schroot and followed the instructions. It was very quick and easy to set up. I added various bind mounts to /etc/fstab to ensure that /home was mounted and it was then easy to set a 32-bit environment build going with just a single command. The instructions in the link are clear so I won't go over them again. I made use of linux32 to ensure that uname gave the expected result.

But, I thought it could be even simpler.

Each branch knows what build environments it supports. This can easily be indicated via a file in the top level directory. If the file doesn't exist then some sensible defaults can be used. Once this information exists it is easy to write a script which reads this, checks against the current native build environment and either runs the build directly or inside the chroot. Something like this:
while [ ! / -ef "${top}" -a ! -f "${top}/Make.rules" ]; do

if [ / -ef "${top}" ]; then
echo "Not in source tree."
exit 1

# Defaults

if [ -f "${top}/BuildInfo" ]; then
source "${top}/BuildInfo"

this_env="`uname -m`"
if [ "${e}" = "{this_env}" ]; then
elif [ "${e} = "i686" ]; then

if [ -n "${use_chroot}" ]; then
echo "Compiling in ${use_chroot} chroot"
schroot -p -q -c ${use_chroot} -d `pwd` -- linux32 make "$@"; then
make "$@"

The actual script is more complex than this because it contains better error handling. It also deals with make -C dir and deciding whether -jN can be passed too.

This can easily be extended to support building under different chroots for older operating system versions for example.

Thursday, 30 April 2009

FAT alternative

An interesting opinion has appeared in relation to TomTom's settlement with Microsoft regarding the viability of file systems other than FAT.
Jim Zemlin wrote in his response to the settlement:

The technology at the heart of this settlement is the FAT filesystem. As acknowledged by Microsoft in the press release, this file system is easily replaced with multiple technology alternatives.

This was also quoted again by Groklaw in their article. But is this actually true? The major benefit that FAT brings that other file systems do not is its ubiquity. It's supported without the need to install third party code on Windows, MacOSX and Linux along with countless other devices and operating systems. No other single file system has such cross platform support.
If you're developing an embedded device with internal storage (e.g. a PTP camera or MTP media player with built-in flash memory) then you can get away with using whichever file system you like (and I've worked on products which used ext2 and reiserfs in this situation.) Unfortunately as soon as you start using removable storage or need to make your built-in storage available as a block device over USB mass storage class or similar then you need to be interoperable. Being interoperable means using FAT or if you are very lucky and you have a knowledgeable user base a file system such as ext3 which can be supported on Windows and MacOSX with a little work.

FAT's simplicity makes it even more useful for interoperability. It's lack of support for ownership and ACLs means that you can just plug a USB key in and start reading and writing to it immediately. A more advanced file system such as ext3 just gets in the way if your UID doesn't match between different machines or you give the key to someone else. This problem is less of a worry for an embedded device which may just run as root anyway or can be hacked to work around this. On the desktop there may be a case here for supporting mounts in a special “ignore permissions” or “everybody squash” mode to solve this problem.

This topic has become important to be because recently I've been looking into alternatives to FAT for a completely different reason: resilience. FAT, and in particular FAT as implemented on Linux, is highly prone to corruption if someone pulls the USB key or the power cable. Other file systems such as ext3 are much more capable of dealing with this.

Friday, 10 April 2009

Fixing gnome-volume-manager photo import after upgrading from Etch to Lenny

I was somewhat confused by the fact that gthumb no longer popped up automatically when I plugged my camera in after upgrading from Debian GNU/Linux 4.0 (Etch) to 5.0 (Lenny). Google searches offered no hints so I was forced to dig into it myself.

It appears that under Etch I'd ticked the "Always Perform This Action" box when selecting "Import Photos" from the popup. This caused the /desktop/gnome/volume_manager/prompts/camera_import_photos gconf key to contain the value 5. It seems that this value causes the arrival of the camera to have no visible effect in the Lenny version of gnome-volume-manager.

The fix is to run gconf-editor and reset the aforementioned value to zero so that the popup appears once more. Plugging the camera in again and ticking the box again then results in the value 6 being written back.

Wednesday, 21 January 2009

Regaining Control Over Object Creation Through Constructor Hiding

In a step up from blogging random bits of useless information I've written an article containing useless information entitled “Regaining Control Over Object Creation Through Constructor Hiding.” The article can be found in the January 2009 edition of the ACCU journal CVu.

NFS mount yields: RPC: failed to contact portmap

I do most of my embedded software development whilst running from an NFS mounted root directory. I was therefore rather confused when I was unable to mount a different path on the same server. The following just appeared very slowly:

~# mount server:/path /mnt
portmap: server localhost not responding, timed out
RPC: failed to contact portmap (errno -5).
portmap: server localhost not responding, timed out
RPC: failed to contact portmap (errno -5).
lockd_up: makesock failed, error=-5
portmap: server localhost not responding, timed out
RPC: failed to contact portmap (errno -5).
mount: Mounting server:/path on /mnt failed: Input/output error

The cause is simple. My device isn't running a port mapper daemon. But why should I run such a daemon? The kernel can mount my root filesystem without one!

In order to mount the filesystem the kernel is trying to talk to the local port mapper daemon in order to find the local lock daemon - I don't have one of those either. The problem can easily be fixed by passing “-o nolock

~# mount -o nolock server:/path /mnt