Random bits of useless information: January 2008

Thursday, 24 January 2008

Why I Hate Perforce: 3. It's hard to find files that need adding

This is part of a series of articles explaining why I hate Perforce. Please see "Why I Hate Perforce: The Background" first.

When adding new files to a working tree it is of paramount importance that these files get checked into the revision control system at the correct point. The difficult part is finding the files that need adding - once that's been done adding them is easy.

Finding new files is made difficult because working trees usually contain a lot of other files that shouldn't be added to revision control: editor backup files, files generated during compilation, backup modified versions of files that are under revision control that you want to keep around as reminders, temporary files that haven't been cleaned up properly. It's difficult to separate the wheat from the chaff.

Other revision control systems solve this problem by allowing such files to be added to ignore lists. Usually there's a global ignore list for files that are almost always ignored such as object files and editor backup files. In addition there's a specific ignore list for each directory; this is useful for generated header files and patterns that would otherwise be too broad. CVS uses a file named .cvsignore and Git uses the similar .gitignore. Subversion uses a directory property named svn:ignore.

Perforce has no equivalent to this functionality. The Eclipse Perforce plugin seems to have invented the concept of a .p4ignore file out of necessity but I haven't tried it.

My currently suboptimal workaround for this is a script that runs find(1) and passes the results to p4 fstat to identify files that aren't under control then weeds out common files that should be ignored. I've got parts of an improved Ruby version of this script working but haven't yet polished it enough for release.

Now read Part Four.

Tuesday, 22 January 2008

Why I Hate Perforce: 2. Working copy state is stored on the server

This is part of a series of articles explaining why I hate Perforce. Please see "Why I Hate Perforce: 1. The Background" first.

A working copy (client in Perforce terminology or check-out in CVS terminology) contains absolutely nothing but the files that you instructed Perforce to place there from the depot (using your client specification) and files that you caused to be placed there yourself (e.g. object files, new source files, .p4config files etc.) Perforce itself keeps no state information in your working tree (although you may choose to with .p4config files).

From some points of view this can seem like quite a good idea. Tools such as find(1) and grep(1) can't accidentally look at such data. There's no extra directories (hidden or otherwise) to confuse the uninitiated. But this information must be stored somewhere and Perforce chooses to keep it all on the server. This has a number of consequences.

The most obvious implication of keeping all the state information on the server is that if the server is down or inaccessible then you cannot perform any operations that need that state. Perforce normally marks all files as read-only until an explicit request is made to edit them. Doing this requires a connection to the server. If such a connection is unavailable then it is necessary to resort to chmod(1) or attrib to make the file writable and then remembering to run p4 diff -se when the server is available again in order to correctly mark the files as editable. Editor plug-ins that provide version information automatically for version controlled files may block for a while until they discover that the server is unavailable.

Another annoyance with keeping the state information outside the working copy is that the working copy cannot easily be moved or copied elsewhere. This might be useful due to disk space constraints, wanting to shelve some work in progress or wanting to divide current work in two. I'll come back to this topic in a later article.

The alternative is to keep the state information locally. CVS keeps all working copy state in the working copy itself. Subversion keeps that along with pristine copies of source files which allows it to only send changes when submitting files and allows diff operations without contacting the server. This means that it is possible to copy and move around CVS and Subversion working copies and the state is copied or moved at the same time. SVK keeps information in a local per-user location but does allow moves as long as you keep it informed. Distributed version control systems keep so much information that the server is only required when new changes are to be pulled from it or pushed to it.

Now read Part Three.

Monday, 21 January 2008

Why I Hate Perforce: 1. The Background

I'm about to post a few articles explaining why Perforce and I just don't get on in many ways. But before I do I feel it is important that I make the background for these criticisms clear.

I've been using revision control systems for well over ten years. Initially I had brief outings with Microsoft Delta and then a longer and more painful experience with Microsoft SourceSafe on Windows even when sharing code among only three developers. Perforce is definitely a big improvement over these!

Once I learnt about CVS I started using that. Initially just for me on UNIX and Linux but later on shared projects that needed to compile on Windows too. I was forced to learn about tagging, vendor branches (and later why they suck in CVS) and merging. CVS wasn't perfect but it did work. I understood how it worked fundamentally, even to the point of fiddling around by hand in the repository when it became absolutely necessary.

I keenly watched the development of Subversion and periodically tried to import our CVS repository into it. I recommended to others starting new projects that they should choose Subversion rather than CVS.

I started a new job where everything was kept in a Perforce depot. I was used to the CVS workflow and initially felt a little out of water. It gradually dawned on me that many issues I had with Perforce were impeding or adding risk to my work. In the end I decided that some of these issues were fundamental in the Perforce design.

Of course Perforce has some very good features. It is certainly better than CVS in many ways. Perhaps I'll write articles about these too in the interest of fairness.

Because I come from the world of CVS it is quite likely that I'll accidentally use CVS terminology rather than Perforce terminology in these articles but I'll try not to!

Of course I may have missed features in Perforce or alternative techniques that invalidate some of points. If I have then please let me know via the comments.

Some of the scripts I use to work around the shortcomings I see in Perforce are available via my web page.

I should probably also note that I've also played with various other systems such as Bitkeeper, Clearcase, Arch, Bazaar, Darcs, Mercurial, SVK and Git. Of these the one I've tried to use most is Git and I would like to use it more given the chance.

Now read Part Two.

Wednesday, 16 January 2008

A 100% Linux household

It was at LinuxConf Europe 2007 back in September that I made the decision to really try and habitually run Linux day-to-day on my laptop. I've always had Linux installed on my laptop, initially Debian but when I was forced to reinstall the machine I decided to give Ubuntu a try and was impressed enough that stuff just worked on my not particularly Linux friendly laptop that I stuck with it.

Don't get me wrong: I've been a daily Linux user since 1994. I'd just not spent that much time running it as my desktop OS since leaving university. When I entered the world of work I found that I needed both Windows and Linux and got fed up with rebooting between them. I found that having one Linux machine running a VNC server and using a Windows box as a client was infinitely more usable than the reverse so I worked that way round. I used Linux via VNC for embedded software development and Windows for Windows software development. For much of the time my Windows box was effectively just used as a thin client. Often the Linux box was actually rather powerful and shared by many users.

So when I was in the position of having independent home server machines and desktop machines I ran Debian Linux on the server and Windows on the desktop. The Linux machine was the one that stayed on all the time. It was there I ran (and continue to run) mutt(1) to read my personal email and slrn(1) to read Usenet news. The Windows box was switched off or put into standby when I wasn't using it. When the desktop became a laptop the situation was the same except because the laptop was portable I installed Linux on it too so that I'd have access to Linux when I was away from home. I didn't really run Linux on it much but occasionally it proved useful.

But as I was sat at the conference I noticed that it seemed to mostly be the “suits” that dared to run Windows on their laptops at a Linux Conference. I wasn't a suit so I chose to always boot into Linux. I did the few things I needed to do easily and quickly enough. The conference left me feeling so positive about Linux in general that I decided that I needed to bite the bullet and abandon Windows at home. Windows was becoming very slow and annoying on the machine anyway so I had an added incentive to do so. Unfortunately Linux was rather slow too when I started using it in anger. I resorted to adding more memory and this helped greatly.

So, since the beginning of September I've only rebooted into Windows for two reasons. Once was to watch an episode of something that the Tivo missed using the BBC iPlayer (this was last year when Linux wasn't supported). The other was to satisfy my immediate desire to play with the Lego Mindstorms set I received for Christmas. I shouldn't need to do the first again and I've now tired of the visual programming language used by Lego Mindstorms and will investigate NXC.

I've managed to do everything else I needed to do under Linux. Some things are easier, some things are a little harder, most are faster but a few are slower. Thanks to user switching even my wife uses it for reading her email and web access. Some bugs continue to annoy me but nowhere near as much as the Windows task bar locking up for several minutes every so often just because it feels like it.

So, I've taken the plunge and I don't see myself going back. The next step is to work out how I can lose the Windows box at work too!

Thursday, 10 January 2008

Dealing with SIGINT in spawned processes

I'm writing a Linux command line application that has the ability to spawn processes of the users' choosing when they want it to. My application waits for the process to finish and then continues. But this raised a problem: If the launched process took a while to run and the user presses Ctrl-C then not only does the spawned process get killed so does my process! In this regard I'd prefer to work much more like a shell and regain control after the spawned process has terminated.

In order to solve these problems I was forced to revisit stuff that I'd read about long ago but not fully understood the implications of at the time. Thanks are due to Zefram for pointing me in the right direction.

Both processes die because they are in the same process group. When the user hits Ctrl-C a SIGINT signal is sent to all processes in the active process group. The signal is not sent to the shell that started my application because the shell arranged for me to be in a new process group (by a means not dissimilar to that below).

Process groups have a group leader - in fact it is the process ID of the group leader that is used as the process group ID.

So, step one is to make sure that the spawned process runs in its own process group (which will also contain any processes it starts unless it takes specific action to the contrary). This is done by calling setpgid(2).

But unfortunately that is insufficient. When pressing Ctrl-C the SIGINT is still set to the process group that contains my application; therefore I exit leaving the spawned process still running.

In order to explain this properly I need to briefly mention sessions. For the purposes of this explanation you can think of a session as representing a terminal. Each session can have a number of process groups. One of these process groups will be the foreground process group and there may be background process groups. The above behaviour resulted because although I'd placed the spawned process in a different process group that process group was in the background (rather like running it from a shell in the background with &.)

I needed to resolve this problem by moving the spawned process group to the foreground. This can be done with tcsetpgrp(3) but it's not quite as simple as that. By default background processes that try to write to the terminal will be sent the SIGTTOU signal. The default action for this signal is to stop the process (just as it is when you hit Ctrl-Z to suspend a process). tcsetpgrp counts as terminal output so my newly created child process just stopped as soon as I called it. In order to stop this happening I needed to arrange to ignore that signal for the duration of the call.

After the spawned process is complete I needed to put my process group back into the foreground again. Again I had to protect myself against being stopped by SIGTTOU.

The following program shows all this at work. The error handling is not too hot.


#include <stdio.h>
#include <unistd.h> 
#include <stdlib.h> 
#include <signal.h> 
#include <sys/wait.h>
#include <sys/types.h>
 
int execfgvp(const char *file, char const * const argv[]) 
{ 
    pid_t child_pid = fork(); 
    if (child_pid == 0) // We're the child
    { 
        // Create a process group for us 
        if (setpgid(0, 0) < 0) 
            exit(126); // Failed to setpgrp
         
        // Become the active process group 
        signal(SIGTTOU, SIG_IGN);  
        tcsetpgrp(0, getpid());
        signal(SIGTTOU, SIG_DFL); 
 
        execvp(file, (char * const *)argv); 
 
        // Failed to spawn process 
        exit(127); 
    } 
    else if (child_pid > 0) // We're the parent
    { 
        int status; 
        if (waitpid(child_pid, &status, 0) < 0) 
            return -1; // Failed to wait. Pass errno on.
 
        // Make us the foreground process group again. 
        signal(SIGTTOU, SIG_IGN); 
        tcsetpgrp(0, getpid());
        signal(SIGTTOU, SIG_DFL); 
     
        if (WIFEXITED(status)) 
            return WEXITSTATUS(status); 
        return -1; 
    } 
    else 
        return -1; // Fork failed. Pass errno on.
} 
 
int main() 
{ 
    const char *argv[] = { "ping", "localhost", NULL }; 
    if (execfgvp(argv[0], argv) < 0) 
    { 
        fprintf(stderr, "Failed to start process: %m\n"); 
        return 1; 
    } 
 
    printf("Process finished. Returned to foreground.\n"); 
    printf("Press a key to exit.\n"); 
    getchar(); 
 
    return 0; 
}

Random bits of useless information