Random bits of useless information: linux

Showing posts with label linux. Show all posts

Monday, 23 September 2024

My Thirty Years of Linux

thirty-years-of-linux

I first installed Linux in the final week of September 1994 on my 486SX 25MHz computer, which for reasons that will take too long to explain had 20 megabytes of memory. This was a huge amount for the time when four megabytes was normal for such machines. During the summer before then I was developing 16-bit Windows 3.1 software using Visual C++ and Visual Basic.

I know I installed Slackware from a magazine cover CDROM (as they were known at the time), but I’ve been unable to find out which one. I’m reasonably sure that it was Slackware 1.2 from March 1994 - at least I think that the kernel version was 1.0.8 and that version would probably match the lead times for the CDROMs that I would have received by then.

I didn’t have access to the Internet at that point so all I could do was play around with the software included in the distribution. My editor of choice at that point was Microemacs. I had first encountered Microemacs on my Sinclair QL. On that machine it was rather too slow to actually use as an editor so I stuck with QD. I do remember being very impressed at how I could split the frame showing a single buffer into two windows and text I typed in one would lazily appear in the other though. It was a simpler time! Once I got the aforementioned 486SX PC I started using Microemacs under DOS as my primary editor, even though I was mostly running it full screen under Windows 3.1. I believe that I actually took the source code for Microemacs 3.9 from my QL C68 discs across to my new Linux installation to compile it there. This would mean that I never compiled Microemacs from source on DOS myself and only used binaries downloaded for me from CIX. That might be false though and I actually had the source for a later DOS version and compiled that. To my great surprise it compiled and ran on Linux quite happily!

I played around for a while. I remember trying to run XFree86 but not knowing about startx so all I got was a blank screen with an X mouse pointer moving around when I ran the X binary directly.

My solo investigations of Linux were curtailed by me going away to university the following weekend and I couldn’t take my computer with me.

When I got to university I was faced with the university computer services department’s X terminals and remember being amazed in my first few days how I could use Mosaic to view web pages about Linux for free! I discovered the Linux Documentation Project and made use of the computer services line printer to print documentation that I could take back to my room and read at my leisure. I started learning a lot about Unix.

I believe that Microemacs was available on the university Computing Services Unix machines. Either that or I was able to download its source code using FTP and compile it myself quite quickly. (I remember that the editor recommended by lecturers in the Computer Science department was Jove at that point - another Emacs-like editor that I didn’t use much.) I think that I moved to GNU Emacs relatively quickly, but the default incremental search confused me greatly and I rebound the C-s to non-incremental search for a good while! It wasn’t long before I started using XEmacs like the other cool kids (which continued until I noticed that Emacs had improved greatly in about 2006.)

Being on a joint Maths and Computer Science degree course meant that I was mostly doing Maths with only a small amount of Computer Science. This meant spending a lot of time up Gibbet Hill in the Maths department which had a small computer room with about six Sun Sparcstation 2s in it. Maths students in higher years hanging out in that room helped me to improve my configuration and I started using vtwm rather than the default twm X window manager and tinkered with its configuration myself. I later moved to fvwm 1 and later fvwm2. I also realised that there wasn’t enough Computer Science in my course and I switched to just Computer Science course from the second year.

I remember returning home one weekend during the term and working on a program that would eject the CD in the CD drive connected to my SoundBlaster 16. I think that I also messed around with SVGATextMode to use the 132x30 text-mode resolution on my Trident graphics card that I had previously used in Windows full-screen DOS sessions.

I continued tinkering with Linux over the Christmas holiday and when I went back to university I was able to take my computer with me to use in my university accommodation. Halls didn’t provide Internet connections back then. I remember trying to persuade the university that this would be a good idea in the hope that they would do so in time for my third year when I stood a chance of being back on campus. I failed in that. Having access to the Internet through the university meant that I could download source code to build and install on my Linux box but I had to carry it on 3.5 inch floppy disks back and forth to my Linux PC. I had to learn how to split tarballs across multiple floppies and cope with the common occurrence of being unable to read files written on the Sun floppy drives on my PC. This was a common problem for others too and no-one ever had a good explanation for it.

At some points during the first year I upgraded to newer Slackware versions and compiling my own newer 1.1.x development kernels. By the second year I had started running the 1.3.x development kernels and remember being excited as the 2.0 kernel approached. I reinstalled Slackware again from scratch to migrate from a.out to ELF format binaries.

Access to the Internet meant access to the comp.os.linux.* USENET hierarchy which I participated in eagerly. Borrowing the O’Reilly books from the library I learnt how to run BIND for DNS, INN for news and how to tweak my XF86Config to get the best resolution at a sensible refresh rate out of my monitor. Along with my housemate in the second year we set up an IP network and we managed to share his dial-up Internet connection to a certain extent. It was set to dial up automatically in the early morning and if I left my computer on over night email would arrive at it and the beep of xbiff would wake me up! This was a time when you could still forward email through another machine by using % in the local part!

Over those three years at university I learnt an awful lot about Linux but when I got back I home I knew that my Slackware install that I’d installed newer versions of INN, the kernel and various other packages on wasn’t really sustainable. I had heard about Debian and decided to complete reinstall using that. I must have ordered a CD containing Debian 1.3 “Bo” from The Linux Emporium. I think that I upgraded to Debian 2.0 “Hamm” slightly by accident because by that point my Linux PC was connected to the Internet permanently, though rather slowly. By that point I was working on the empeg Car MP3 player and Linux was fully entrenched in my life both in the products I was working on and the operating system I was using to build them.

I had no idea when I first installed Linux thirty years ago that it would influence my career from that point onwards. I’m glad I did!

Tuesday, 13 November 2012

ssh-and-tmux: part three

On many of the hosts I connect to using my ssh-and-tmux script I want to be able to use git to connect to repositories via ssh. Initially this is as simple as ensuring that ssh-agent forwarding is enabled by passing -A to ssh. The problem comes when I disconnect and reconnect from somewhere else. This kills the ssh session and the socket used for the agent is removed and SSH_AUTH_SOCK ends up pointing to a file that doesn't exist.

The solution is to use a fixed name for the authentication socket and symlink that name to the real one as it changes.

#!/bin/sh
set -e

if [ -z "$SSH_AUTH_SOCK" ]; then
    echo No ssh agent found. Starting one.
    eval `ssh-agent`
fi

if [ -n "$SSH_AUTH_SOCK" ]; then
    export SSH_AUTH_SOCK_OLD=$SSH_AUTH_SOCK
    export SSH_AUTH_SOCK=$HOME/.ssh/.tmux-agent
    ln -sf "$SSH_AUTH_SOCK_OLD" $SSH_AUTH_SOCK
else
    echo Failed to start ssh agent

    # Set the socket anyway in case someone reattaches later with a
    # valid agent.
    export SSH_AUTH_SOCK=$HOME/.ssh/.tmux-agent
fi

# Use a specific name for the tmux/screen session so
# that we can tell that it's one that uses this scheme.
# We have to detach other clients with tmux so that we
# don't end up with a stale agent in them when we
# return to them later.
if which tmux >/dev/null 2>/dev/null; then
    tmux -2 -L ssh-agent "$@" attach -d || tmux -2 -L ssh-agent "$@"
else
    exec screen -S ssh-agent "$@" -x -RR
fi

Now the original script just needs to run this script on the remote host rather than executing tmux directly.

I've been using this final solution for quite a while now and it works really well.

Monday, 12 November 2012

ssh-and-tmux: part two

Many of the hosts I wish to use my ssh-and-tmux script to connect to are behind firewalls so I can't connect to them directly. I find that it is more reliable to just ssh through a gateway host rather than using any VPN that might be available. This can easily be bolted on to the ssh-and-tmux script provided a good guess can be made as to which network you're actually on:

#!/bin/sh
if [ -n "$TMUX" ]; then
    echo Already in tmux
    exit 1
fi

if [ -n "$STY" ]; then
    echo Already in screen
    exit 1
fi

while true; do
    # First work out where we are based on our IP address.
    # We do this every time round the loop in case we've 
    # moved network since last time.
    addrs=`ip --family inet --oneline addr`

    work_via=gateway.example.com
    home_via=gateway.randombitsofuselessinformation.blogspot.com

    case "$addrs" in
 *192.168.1.*)
     # We're on the home network
     home_via=
     ;;
 *172.16.*)
     # We're on the work network
     work_via=
     ;;
    esac

    extra=
    via=
    case "$1" in
 work-host1)
     via=$work_via
     # Add a port forward for this host too
     extra=-L8080:work-host1:8080
     dest=work-host1
     ;;
 home-host1)
     via=$home_via
     dest=home-host1
     ;;
 home-host2)
     via=$home_via
     dest=home-host2
     ;;
 *.*)
     # Hosts with dots are assumed to be on the
            # Internet at large
     via=
     dest="$1"
     ;;
 *)
     # All other hosts are assumed to be on the work
            # network
     via=$work_via
     dest="$1"
     ;;
    esac
    
    if [ -n "$via" ]; then
 ssh $extra -A -t "$via" \
            ssh -t "$dest" "tmux -2 -L netbook attach \
            || tmux -2 -L netbook"
    else
 ssh $extra -t "$dest" "tmux -2 -L netbook attach \
            || tmux -2 -L netbook"
    fi
    
    stty sane
    echo "Dropped, press Enter to reconnect."
    if read x; then
 echo "Reconnecting..."
    else
 # Something bad happened to our tty. We'd better exit.
 exit 1
    fi
done

So now I can connect to work-host1 from home via the gateway, shut the laptop, travel to work, open it again, hit Enter and reconnect directly to work-host1 without losing any state. What's not to like? Well, there's still some more things we can do.

Sunday, 11 November 2012

ssh-and-tmux: part one

So, I finally got round to buying myself a netbook. Unlike my old laptop the netbook has a battery that actually works so I can pick it up, use it for a few minutes and then close the lid so it suspends. Later on I can pick it up again and carry on where I left off. At least that's the theory. The problem is that quite a lot of what I do ends up involving ssh connections to remote hosts and whilst those connections survive a brief period of suspension they don't last for too long.

I've used Screen on and off for over fifteen years and I thought that this would help to solve the problem. Not long after implementing the solution I switched to tmux.

Step one is to initiate tmux over the ssh connection:

ssh -t destination "tmux -2 -L netbook attach || tmux -2 -L netbook"

The -2 forces tmux to assume that the terminal supports 256 colours. All of the ones I use seem to.

The -L netbook option gives the tmux session a unique name so it won't get mixed up with a manually launched tmux session.

First I try and attach to an existing session but if that fails I create a new one.

I put this in a script named ssh-and-tmux.

Step two is to automatically reconnect when the connection is dropped. I decided not to do this automatically because that would keep throwing off other connections from elsewhere. Leaving multiple connections active at the same time would mean that the usable window size might be limited too. Instead I simply wait for the Enter key to be pressed before trying to connect again and if anything goes wrong exit.

So, the step two script is:

#!/bin/sh
if [ -n "$TMUX" ]; then
    echo Already in tmux
    exit 1
fi

if [ -n "$STY" ]; then
    echo Already in screen
    exit 1
fi

dest="$1"
while true; do
    ssh -t "$dest" "tmux -2 -L netbook attach || tmux -2 -L netbook"

    stty sane
    echo "Dropped, press Enter to reconnect."
    if read x; then
        echo "Reconnecting..."
    else
        # Something bad happened to our tty. We'd better exit.
        exit 1
    fi
done

Sometimes it can take ssh a while to notice that the connection has been dropped. In that case I can simply type " ~ . " to kill the ssh connection and reconnect.

This works well but there's more to come in the next post.

Monday, 5 October 2009

A better way to view man pages in a different window

In Episode 13 of Season 2 of the Ubuntu UK Podcast the “Command Line Lurve” segment was supplied by Alan Bell. He found that reading man pages in a terminal to be painful because the terminal was where he wanted to being typing the command options. His script ran gnome-help in the background to view the man page in a different window.

Alan Bell's script prompted me to write an improved version and send it in along with this email:

I listened to Alan Bell's command line luurve segment in your podcast
this evening. Whilst his script works it inhibits much of man(1)'s
functionality. In particular it does not support specifying the manual
section (compare "man open" and "man 2 open" for example.)

Here's my alternative that maintains this functionality and
automatically falls back to standard man if any options are supplied:

#!/bin/sh
x="$DISPLAY"
case "$1" in
"") x= ;;
-*) x= ;;
esac
if [ -n "$x" ]; then
section=
for i in "$@"; do
case "$i" in
[0-9]) section="($i)" ;;
*) gnome-help "man:$i$section" >/dev/null 2>&1 & ;;
esac
done
else
exec man "$@"
fi

The script also makes specifying multiple pages at once more useful
than it is with man(1).
It can be aliased to man if required as Alan described.

They were nice enough to read out my email in Episode 14 but the script didn't appear in the show notes. So here it is.

Thursday, 30 April 2009

FAT alternative

An interesting opinion has appeared in relation to TomTom's settlement with Microsoft regarding the viability of file systems other than FAT.
Jim Zemlin wrote in his response to the settlement:

The technology at the heart of this settlement is the FAT filesystem. As acknowledged by Microsoft in the press release, this file system is easily replaced with multiple technology alternatives.

This was also quoted again by Groklaw in their article. But is this actually true? The major benefit that FAT brings that other file systems do not is its ubiquity. It's supported without the need to install third party code on Windows, MacOSX and Linux along with countless other devices and operating systems. No other single file system has such cross platform support.
If you're developing an embedded device with internal storage (e.g. a PTP camera or MTP media player with built-in flash memory) then you can get away with using whichever file system you like (and I've worked on products which used ext2 and reiserfs in this situation.) Unfortunately as soon as you start using removable storage or need to make your built-in storage available as a block device over USB mass storage class or similar then you need to be interoperable. Being interoperable means using FAT or if you are very lucky and you have a knowledgeable user base a file system such as ext3 which can be supported on Windows and MacOSX with a little work.

FAT's simplicity makes it even more useful for interoperability. It's lack of support for ownership and ACLs means that you can just plug a USB key in and start reading and writing to it immediately. A more advanced file system such as ext3 just gets in the way if your UID doesn't match between different machines or you give the key to someone else. This problem is less of a worry for an embedded device which may just run as root anyway or can be hacked to work around this. On the desktop there may be a case here for supporting mounts in a special “ignore permissions” or “everybody squash” mode to solve this problem.

This topic has become important to be because recently I've been looking into alternatives to FAT for a completely different reason: resilience. FAT, and in particular FAT as implemented on Linux, is highly prone to corruption if someone pulls the USB key or the power cable. Other file systems such as ext3 are much more capable of dealing with this.

Friday, 10 April 2009

Fixing gnome-volume-manager photo import after upgrading from Etch to Lenny

I was somewhat confused by the fact that gthumb no longer popped up automatically when I plugged my camera in after upgrading from Debian GNU/Linux 4.0 (Etch) to 5.0 (Lenny). Google searches offered no hints so I was forced to dig into it myself.

It appears that under Etch I'd ticked the "Always Perform This Action" box when selecting "Import Photos" from the popup. This caused the /desktop/gnome/volume_manager/prompts/camera_import_photos gconf key to contain the value 5. It seems that this value causes the arrival of the camera to have no visible effect in the Lenny version of gnome-volume-manager.

The fix is to run gconf-editor and reset the aforementioned value to zero so that the popup appears once more. Plugging the camera in again and ticking the box again then results in the value 6 being written back.

Wednesday, 21 January 2009

NFS mount yields: RPC: failed to contact portmap

I do most of my embedded software development whilst running from an NFS mounted root directory. I was therefore rather confused when I was unable to mount a different path on the same server. The following just appeared very slowly:


~# mount server:/path /mnt
portmap: server localhost not responding, timed out
RPC: failed to contact portmap (errno -5).
portmap: server localhost not responding, timed out
RPC: failed to contact portmap (errno -5).
lockd_up: makesock failed, error=-5
portmap: server localhost not responding, timed out
RPC: failed to contact portmap (errno -5).
mount: Mounting server:/path on /mnt failed: Input/output error

The cause is simple. My device isn't running a port mapper daemon. But why should I run such a daemon? The kernel can mount my root filesystem without one!

In order to mount the filesystem the kernel is trying to talk to the local port mapper daemon in order to find the local lock daemon - I don't have one of those either. The problem can easily be fixed by passing “-o nolock”


~# mount -o nolock server:/path /mnt

Friday, 14 November 2008

Why you shouldn't use the tool chain supplied by your embedded Linux vendor

Linux is understandably popular as an operating system for embedded systems. SoC vendors in particular like to supply a Linux "distribution" that is tailored for their platform. This "distribution" usually includes the binaries for an appropriate toolchain. In my opinion you should stop using this toolchain long before you are ready to release your product.

The binary toolchain supplied by your vendor is easy, it might even be statically linked making it even easier to use. Just plonk it in the right place and everything magically works.

But,

1. One day you're going to have a bug. That bug will either require some debugging in files that are supplied as part of the toolchain (ld.so, glibc) or require you to make some modifications to the toolchain in order to help investigate the problem. You might even find a toolchain bug that you need to fix. This sort of problem is almost guaranteed to occur at a point when it is deemed too dangerous to switch to a self-compiled tool chain. That's if your lucky and you have the source code for the tool chain and it actually compiles for you.

2. In order to comply with the GPL you need to release working source code for certain bits of the tool chain anyway. How can you be sure that this stuff actually compiles unless you have done so yourself?

3. You need to support your product long after your vendor has moved on to another generation of chips. Will their toolchain still work on whatever host operating system you are using then?

So, at the very least you should get the source code for the toolchain from your vendor and then compile it yourself. Use the version you compiled yourself. This leaves you in a much better position when the unexpected occurs. If your vendor won't give you the source for a toolchain they've given you in binary form then find another vendor that understands software licensing.

Of course you could just compile your own tool chain from scratch and use that but creating cross-compilation toolchains is certainly not easy - perhaps that subject is worthy of a future post.

Tuesday, 5 August 2008

mipsel-linux-strip: Not enough room for program headers, try linking with -N

I was rather confused when I started getting this error when attempting to strip a Linux MIPS shared library when I moved to a new toolchain that used binutils-2.18:


BFD: st4lu6Am: Not enough room for program headers, try linking with -N
mipsel-linux-strip: st4lu6Am: Bad value

Other shared libraries could be successfully stripped.

The clue was that these shared libraries were generated with an earlier toolchain that used an older version of binutils.

It turns out that this is caused by binutils-2.18 wanting to add a NULL segment even when the binary didn't originally have one. Applying the fix makes the problem go away.

I only really mention this because the fix description doesn't contain the error message I saw thus making it hard to Google for a solution.

Sunday, 20 April 2008

Accessing older Rio MP3 players as an unprivileged user

By default unknown USB devices seemed to be owned by user root and group root on Ubuntu Gutsy and Hardy. This is inconvenient when the device is an MP3 player that you'd rather access as a normal user.

I still use my Rio S50 flash player regularly. After it has had its space boosted a little with an SD card it's perfect for listening to MP3s of radio programmes and podcasts on because it is small enough that it is easy to keep track of what is on it and the AA battery lasts forever.

Anyway, I use the rioutil tool for downloading content to the player. It uses libusb to talk to the device without requiring a kernel driver.

With Ubuntu Hardy the device nodes that libusb uses seems to have changed which broke my old rules. After a little bit of strace I was able to come up with the following udev rules which I placed in /etc/udev/rules.d/45-rio.rules:

SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", \
 SYSFS{idVendor}=="045a", SYSFS{idProduct}=="5006", GROUP="plugdev"

These rules may well work on Gutsy too.

Of course your user must be a member of the plugdev group, or you can specify a different group if you wish.

If you want to make other Rio flash portables work then just repeat the rule specifying all product numbers from 5001 (Rio600) to 500f (Rio Cali).

Friday, 18 April 2008

Cross-compiling boost 1.34.x and 1.35.0

There seem to be lots of people asking how to cross-compile boost and very few answers. One of the better answers works for v1.33.x but breaks with v1.34.0.

After digging around for a while trying to make it work I was finally given the answer by the esteemed Peter Hartley who had managed to cross-compile boost 1.34 as part of his Just The Linux distribution. His solution seems to work for v1.35.0 too.

The trick that had eluded me until that point was to tell both the user-config.jam file and bjam about the cross compiler.

Something like:

echo "using gcc : : nicearch-linux-g++ ;" > user-config.jam
make BJAM_CONFIG="-sGXX=nicearch-linux-g++" install

If, like me, you want to only generate static libraries and support multiple builds in the same tree then you might need a bit more cleverness:

build=/tmp/nicearch/build
staging=/tmp/nicearch/staging
CXX=nicearch-linux-g++
CC=nicearch-linux-gcc
mkdir -p $build $staging
echo "using gcc : : $CXX ;" > $build/user-config.jam
bjam --toolset=gcc -sGXX=$CXX -sGCC=$CC \
  --prefix=$staging --build-dir=$build \
  --user-config=$build/user-config.jam --without-python \
  variant=release link=static threading=multi

This should be relatively easy to turn into a buildroot package file but I'm no longer using buildroot to build boost so I didn't need to.

Wednesday, 16 January 2008

A 100% Linux household

It was at LinuxConf Europe 2007 back in September that I made the decision to really try and habitually run Linux day-to-day on my laptop. I've always had Linux installed on my laptop, initially Debian but when I was forced to reinstall the machine I decided to give Ubuntu a try and was impressed enough that stuff just worked on my not particularly Linux friendly laptop that I stuck with it.

Don't get me wrong: I've been a daily Linux user since 1994. I'd just not spent that much time running it as my desktop OS since leaving university. When I entered the world of work I found that I needed both Windows and Linux and got fed up with rebooting between them. I found that having one Linux machine running a VNC server and using a Windows box as a client was infinitely more usable than the reverse so I worked that way round. I used Linux via VNC for embedded software development and Windows for Windows software development. For much of the time my Windows box was effectively just used as a thin client. Often the Linux box was actually rather powerful and shared by many users.

So when I was in the position of having independent home server machines and desktop machines I ran Debian Linux on the server and Windows on the desktop. The Linux machine was the one that stayed on all the time. It was there I ran (and continue to run) mutt(1) to read my personal email and slrn(1) to read Usenet news. The Windows box was switched off or put into standby when I wasn't using it. When the desktop became a laptop the situation was the same except because the laptop was portable I installed Linux on it too so that I'd have access to Linux when I was away from home. I didn't really run Linux on it much but occasionally it proved useful.

But as I was sat at the conference I noticed that it seemed to mostly be the “suits” that dared to run Windows on their laptops at a Linux Conference. I wasn't a suit so I chose to always boot into Linux. I did the few things I needed to do easily and quickly enough. The conference left me feeling so positive about Linux in general that I decided that I needed to bite the bullet and abandon Windows at home. Windows was becoming very slow and annoying on the machine anyway so I had an added incentive to do so. Unfortunately Linux was rather slow too when I started using it in anger. I resorted to adding more memory and this helped greatly.

So, since the beginning of September I've only rebooted into Windows for two reasons. Once was to watch an episode of something that the Tivo missed using the BBC iPlayer (this was last year when Linux wasn't supported). The other was to satisfy my immediate desire to play with the Lego Mindstorms set I received for Christmas. I shouldn't need to do the first again and I've now tired of the visual programming language used by Lego Mindstorms and will investigate NXC.

I've managed to do everything else I needed to do under Linux. Some things are easier, some things are a little harder, most are faster but a few are slower. Thanks to user switching even my wife uses it for reading her email and web access. Some bugs continue to annoy me but nowhere near as much as the Windows task bar locking up for several minutes every so often just because it feels like it.

So, I've taken the plunge and I don't see myself going back. The next step is to work out how I can lose the Windows box at work too!

Thursday, 10 January 2008

Dealing with SIGINT in spawned processes

I'm writing a Linux command line application that has the ability to spawn processes of the users' choosing when they want it to. My application waits for the process to finish and then continues. But this raised a problem: If the launched process took a while to run and the user presses Ctrl-C then not only does the spawned process get killed so does my process! In this regard I'd prefer to work much more like a shell and regain control after the spawned process has terminated.

In order to solve these problems I was forced to revisit stuff that I'd read about long ago but not fully understood the implications of at the time. Thanks are due to Zefram for pointing me in the right direction.

Both processes die because they are in the same process group. When the user hits Ctrl-C a SIGINT signal is sent to all processes in the active process group. The signal is not sent to the shell that started my application because the shell arranged for me to be in a new process group (by a means not dissimilar to that below).

Process groups have a group leader - in fact it is the process ID of the group leader that is used as the process group ID.

So, step one is to make sure that the spawned process runs in its own process group (which will also contain any processes it starts unless it takes specific action to the contrary). This is done by calling setpgid(2).

But unfortunately that is insufficient. When pressing Ctrl-C the SIGINT is still set to the process group that contains my application; therefore I exit leaving the spawned process still running.

In order to explain this properly I need to briefly mention sessions. For the purposes of this explanation you can think of a session as representing a terminal. Each session can have a number of process groups. One of these process groups will be the foreground process group and there may be background process groups. The above behaviour resulted because although I'd placed the spawned process in a different process group that process group was in the background (rather like running it from a shell in the background with &.)

I needed to resolve this problem by moving the spawned process group to the foreground. This can be done with tcsetpgrp(3) but it's not quite as simple as that. By default background processes that try to write to the terminal will be sent the SIGTTOU signal. The default action for this signal is to stop the process (just as it is when you hit Ctrl-Z to suspend a process). tcsetpgrp counts as terminal output so my newly created child process just stopped as soon as I called it. In order to stop this happening I needed to arrange to ignore that signal for the duration of the call.

After the spawned process is complete I needed to put my process group back into the foreground again. Again I had to protect myself against being stopped by SIGTTOU.

The following program shows all this at work. The error handling is not too hot.


#include <stdio.h>
#include <unistd.h> 
#include <stdlib.h> 
#include <signal.h> 
#include <sys/wait.h>
#include <sys/types.h>
 
int execfgvp(const char *file, char const * const argv[]) 
{ 
    pid_t child_pid = fork(); 
    if (child_pid == 0) // We're the child
    { 
        // Create a process group for us 
        if (setpgid(0, 0) < 0) 
            exit(126); // Failed to setpgrp
         
        // Become the active process group 
        signal(SIGTTOU, SIG_IGN);  
        tcsetpgrp(0, getpid());
        signal(SIGTTOU, SIG_DFL); 
 
        execvp(file, (char * const *)argv); 
 
        // Failed to spawn process 
        exit(127); 
    } 
    else if (child_pid > 0) // We're the parent
    { 
        int status; 
        if (waitpid(child_pid, &status, 0) < 0) 
            return -1; // Failed to wait. Pass errno on.
 
        // Make us the foreground process group again. 
        signal(SIGTTOU, SIG_IGN); 
        tcsetpgrp(0, getpid());
        signal(SIGTTOU, SIG_DFL); 
     
        if (WIFEXITED(status)) 
            return WEXITSTATUS(status); 
        return -1; 
    } 
    else 
        return -1; // Fork failed. Pass errno on.
} 
 
int main() 
{ 
    const char *argv[] = { "ping", "localhost", NULL }; 
    if (execfgvp(argv[0], argv) < 0) 
    { 
        fprintf(stderr, "Failed to start process: %m\n"); 
        return 1; 
    } 
 
    printf("Process finished. Returned to foreground.\n"); 
    printf("Press a key to exit.\n"); 
    getchar(); 
 
    return 0; 
}

Friday, 21 December 2007

DBUS: "Could not get password database information for UID of current process"

Recently I've been cross compiling DBUS and HAL to make them work on an embedded device with buildroot. It's been a bit of a journey with only sparse documentation available. There are lots of dependencies on things that often aren't present on embedded devices and it's not always easy to work out what they are. So, I'm going to try and document a few of the things I've come across here in the hope that someone finds it useful.

There are various reports of people getting this message but never any responses:

Starting system message bus: Could not get password database information for UID of current process: User "???" unknown or no memory to allocate password entry

This message appears despite /etc/passwd and /etc/group existing and looking correct.

The clue is in the message. It can't read this information because it is using glibc to do it. Glibc provides this information from various sources of which the passwd and group files are only one. It is necessary to enable the glibc back-end that reads the files (libnss_files) along with the library containing the generic functions (libnsl). This can be done in buildroot by enabling the BR2_GLIBC_NSL and BR2_GLIBC_NSS_FILES configuration options. Even better they can automatically be enabled when DBUS support is selected in buildroot by modifying the start of packages/dbus/Config.in to look something like:

config BR2_PACKAGE_DBUS
        bool "dbus"
        default n
        select BR2_PACKAGE_EXPAT
        select BR2_GLIBC_NSL
        select BR2_GLIBC_NSS_FILES

I'm not sure why the lack of libnsl doesn't manifest itself as an error from ld.so - if it had then it would have been far more obvious what the cause was.

Random bits of useless information