Copyright © 2003 by James D. Allen

Introduction to Expert Programming (in C)

Lesson 9: Control Mechanisms

Programming loses much of its fun when one's programs all look the same. We can't cover all possible programming formulae, but just hope to kindle the reader's imagination with a few examples.

9.1Learn the basic paradigms.
Example: Backtracking
9.2Macros can be valuable ... and fun !
Example: Backtracking with longjmp()
9.3Don't become dogmatic !
Example: Backtracking with goto
9.4Be generous and accept the generosity of others.
Example: Prioritized monitors
 Make up your own mantras.
Omit final two examples on first reading.
9.8'my druther's I'd've gone into forensic programming.
Example: Breakpoint evasion
9.9Love means never having to say you're sorry.
Example: Kluges

L e a r n     t h e     b a s i c     p a r a d i g m s   .

Example 9.1: Backtracking

Backtracking is one of the most common techniques in programs that seek a solution or optimum. Anyone who has solved a child's maze, or gotten disoriented in an unfamiliar town or forest, will be familiar with the technique: you retrace your path (``backtrack'') when you make a wrong turn or reach a dead-end.

While it's true that most computer programs do not need any run-time search for a solution or optimum, those that do are generally the most interesting and complex programs.

There are a variety of algorithms to search for a solution or optimum -- it would be the subject of a book rather than a chapter, but in many cases depth-first tree search is the simplest. It should come as no surprise that the simplest approach is often the best as long as it's fast enough. Depth-first search has the further advantage that it's exhaustive and is guaranteed to find a solution (if one exists). The big disadvantage is that its exponential cost makes it useless for big problems.

When a problem requires a solution method more powerful than depth-first search, there will still often be depth-first search(es) used in the lower subroutines. Breadth-first tree search is a simple and important improvement over depth-first search, which has much similarity to depth-first search. (One way to implement breadth-first search is to perform many depth-first searches in parallel.) We mention a very simple form of breadth-first search in a discussion below.

Draw a tree now if you wish, in black ink, and add green lines to connect the forward pointers and (explicit or implicit) reverse pointers in the order they will be fetched during traversal. You will see why depth-first search is called ``backtracking.''

(A `tree' is simply a digraph in which each node has in-valence 1 except one (the `root') with in-valence zero. Study a book on digraphs if you can't quickly prove that trees are cycle-free.)

Actually building the complete tree structure is often inconvenient, but unnecessary for depth-first search. Instead a stack is often used, and one would need to look at a ``moving picture'' of the stack to ``see'' the full 2-D tree. The stack can be programmed as an array or list, but is often built implicitly with the `stack-frames' of a reentered procedure.

In the simplest cases, depth-first search of a depth-N tree degenerates simply into N levels of nesting in a nested loop, but the nested loop becomes inconvenient when N is large or variable.

One way to accomplish depth-first search is to use the unique programming language Prolog, which was designed specifically to express backtracking problems. A Prolog interpreter is sometimes described as a theorem-prover, but this is misleading because deterministic backtracker (which gives Prolog its beautiful simplicity) is quite different from a top-notch theorem-prover with evolving heuristics. (Such a smart prover can be written in Prolog.)

I do not attempt an exhaustive study of techniques for depth-first search or backtracking, but in the next two sections give two interesting alternatives which use C rather than Prolog.

M a c r o s     c a n     b e     v a l u a b l e         . . .     a n d     f u n   !

Example 9.2: Backtracking with longjmp()

Although the code example `Backtracking with longjmp()' may be of specific use to you, what I really want to show is the general idea of using macros to create expressive power.

Many computer languages have expressive power or macro facilities far exceeding C's. Have fun! You may have false starts before you build a macro worthy of posterity, but
  ``It's better to have loved and lost than never to have loved at all !''

I don't claim these macros are worthy of posterity. Doing it over, I'd probably drop most of the "{", "}", and ";" in the macro definitions, letting the macro-caller provide this (trivial) C syntax. The reader may be able to suggest further improvement.

The problems that Prolog solves nicely often do not allow simple solutions in C. (Recursion requires homogeneous stages to work well.) Sorry; I won't offer a general formula here, but here is a program I wrote long ago to solve Dell logic puzzles. It shouldn't be hard to adapt it to other search problems where C would otherwise be at a disadvantage to Prolog.

I'm sure there are other nice ways to solve logic puzzles, but frankly I stopped looking when I saw this approach with longjmp(). But please mail me your alternative solution, if it is elegant. I'll post the best offering here.

In a Usenet discussion, someone objected to this program because it needs to be recompiled for every problem. My response was: ``Hunh? The compilation takes less than one second, and if you don't have a C compiler, the Free Software Foundation will sell you an excellent one for ... Free!'' In a discussion about my reusable hash table manager, I offer the dictum:

Use the Source, Luke!

If you do work on this project, please note that my solution is coded so that the puzzle-specific code is all located contiguously and can be ``pasted'' in easily with an #include statement. This useful property led to otherwise peculiar choices in the coding details.

D o n ' t     b e c o m e     d o g m a t i c     ! !

Example 9.3: Backtracking with goto

Few backtracking problems will use the weird control-flow structure in my Double-dummy Solver for Bridge, but it seems worth a look anyway.

It recalls ``inside-out'' design, discussed in an earlier chapter, in that if the programmer starts by figuring what variable settings and structure he needs when about to play a card in the middle of the deal, and works ``outwards'' the code will soon fall into place.

This code makes me smile, but you're entitled to frown if you prefer. I first posted this software on the Internet about 1990, with the challenge to replicate the virtues of this code, without using goto. I've received several responses over the years; some admitted my goto was superior, but most claimed there was a simple alternative that they were too lazy to demonstrate. Reading the brief descriptions of the ``simple alternative,'' one found that they usually misunderstood the problem!

Finally one programmer, Tim Rentsch, accepted the challenge -- though even he submitted only pseudocode. Click here to compare his code with mine. I still prefer my code, but follow the link and then make your own judgement.

If you think the `goto' can be eliminated, but feel there's a better way than how Tim Rentsch did it, code away! It should be easy to do (if it can be done at all) since you may be able to rearrange and reuse much of my code. I'll be happy to post your submission if it's better than Tim's. I'm afraid I'll have to enforce a rule Tim didn't follow: Your submission must be written in Gnu-compatible C and, like mine, be pret-a-porter. (For Francophobes, that means ``ready to compile and run.'')

My message is not ``Go write a GOTO tomorrow for the Gipper'' but just to write code that gives you pride and pleasure. (Remember the subtitle of this book: ``Programming as Poetry''.)

B e     g e n e r o u s     a n d     a c c e p t     t h e     g e n e r o s i t y     o f     o t h e r s   .

Example 9.4: Prioritized Monitors

While the last example was almost whimsical, the method described in this section is used in some of the best real-time controller designs. (The motif cliché for 9.4 ``Be generous and accept others' generosity'' is, amazingly, a meaningful metaphor for the shared memory model implied in this example. But that's not the mantra's origin. It's a thank you to Mr. David Banks who described the mechanism to me.)

Many readers will be familiar with prioritized I/O (or hardware) interrupts, where a priority/25 I/O event can usurp control from a priority/24 event, but not vice versa. (In this section, we adopt the convention that higher priorities have higher priority numbers. The routines that handle different interrupts are referred to as parts of different monitors, but the monitor may also include some of the related routines (setup, scheduling, initiation) which manipulate the same data structures.) The main advantage of prioritization is that urgent work can be performed before less urgent work. A secondary advantage is that prioritization rules clarify or allow certain shared data update protocols.

Any multi-task operating system will need some policy for execution priorities, but when a strict ordering is observed among software monitors, as with interrupt handlers, certain efficacies and simplicities arise. I don't know if such systems are much discussed in textbooks, and have invented the term ``Prioritized Monitors.'' The technique is old, but many commercial ``real-time kernels'' don't support it properly, and frankly I think it may not be well understand. Perhaps readers will point me to a good on-line discussion.

The approach seems most relevant for a software system interacting with two or more external processes, as for example I/O control in operating systems or controllers, but the model might be useful in any on-line control system, or even in a database manager that allows concurrent queries. Even in a complicated batch (``offline'') system (like a simulater), adopting the philosophy may lead to simplifications in scheduling or mutex (exclusionary lock) handling.

No sample code; the key here is not the (straightforward) implementation of the control-flow, but rather the elegance of the shared data-handling rules that result. Here are some of the rules:

  1. A N-level monitor may raise the priority level to any M > N, temporarily. For example, a maximum priority level (called the ``All Interrupts Disabled'' state) will be available to facilitate otherwise difficult data manipulations.
  2. The code at level N may be interrupted by a level M monitor (M > N), independently, for example as result of hardware interrupt at some level F > M.
  3. The level-N monitor will not exit until its input-work FIFO becomes empty, whereupon it will exit through a standard scheduling apparatus (usually just `Return' or `Return from Interrupt'). (Of course a monitor may retain non-empty delayed-work queues.)
  4. The N-level monitor must not lower the priority level to any P < N, except by the exit just described. The N-level monitor cannot invoke an entry-point of the P-level monitor, which may be in some semi-critical but interrupted state. It can invoke a M-level entry, when M > N, since under the rules the M-level monitor will be in the exited (idle) state whenever the N-level monitor is running.
  5. Similarly the N-monitor can modify the M-level data structures, although it will first raise its priority to (at least) M. (Raising priority means setting a hardware register for hardware priorities, or atomically updating a mutex-like scheduling register for software priorities.)
  6. The N-monitor will not normally modify P-level data, when P < N, but since it ``knows'' P can't interrupt it, it may be able to do some manipulations. In particular, adding a new element to P's input FIFO should be straightforward with proper design, though a locking mechanism is needed -- setting priority to M (or even ``All Interrupts Disabled'') temporarily is good enough -- where the M-monitor (M > N) is the highest priority monitor which also updates P's input FIFO.
  7. When the N-monitor lowers the priority level (e.g. from K to N), it may be appropriate for it to check if higher-priority work is pending. (A priority-F interrupt may have scheduled work for the M-monitor where F > K > M > N.)
  8. There's probably one or two other interesting relevant rules, I'm overlooking as I write.

Do you get the idea? Each data structure is only updated when the processor is running at one particular priority level; by keeping strict prioritization many problems of exclusionary access are avoided.

Once the concept is in place, you may find a controller benefits from many priority levels. Logically, routines run in a disk controller have many different urgencies. Some of them, from highest to lowest urgency, are:

  1. Processor bus signalling
  2. Disk bus signalling
  3. Disk data handling
  4. On-track processing
  5. Seek initiation
  6. Processor data handling
  7. Seek scheduling
  8. Output status handling
  9. Input command handling
  10. Schedule optimizations

Prioritized Monitors is not a rare or exotic technique, but I'll bet it's been overlooked in many real-time controllers which would benefit from this technique.

Many texts on real-time programming emphasize the rule
  ``Response before deadline must be guaranteed.''
but this emphasis is often misplaced. For many applications, a rule that leads to higher performance and more robust reliability would be
  ``Missed deadlines should be rare and handled well.''
This should recall the discussion of ``unreliable'' routines (like UDP) in Lesson 3.

For example, if you don't ``slip a rev'' every now and then in your disk controller, your heuristics to improve performance with gratuitous reads may not be aggressive enough. A simple solution is to keep track of responsiveness and adjust parameters accordingly.

' m y     d r u t h e r ' s     I ' d ' v e     g o n e     i n t o     f o r e n s i c     p r o g r a m m i n g   .

Example 9.8: Breakpoint evasion

Here's a bizarre `Goto' I encountered many years ago. hide_out() does not return but it doesn't stay in an infinite loop either.

                short a = 1512, b = 12569;
                while (1) { /* Loop executes 6666 times */
                        b += 13 + b / a;
                        a += b + 6545;

I don't remember the 20-year old details (and it was disassembled object code I examined, not commented C source), but this C code depicts the concept.

Some copy-protection schemes seek to prevent users from running debug tools on their code. Among several I studied several years ago, Jazz for Macintosh circa 1985 had the most elaborate such mechanisms. (Working in a country not signatory to the Pan American Copyright Treaty, I still made it copiable within one day.) The cracker will try to discover the flow by using single-step or breakpoint facilities, but the copy-protected program counters by overwriting the interrupt vectors for single-stepping and breakpointing, or using them as data, e.g. for checksumming.

A program like Jazz or ProLock will often use abnormal control-flow transfers to make itself opaque to someone disassembling the code. An interesting ``duel'' may result between Jazz's programmer and the disassembling ``cracker.'' (Or rather a conceptual duel, as Jazz's programmer is sound asleep on another continent while the cracker is working.) The cracker counters the anti-debug measures by hand-crafting his own breakpoint mechanisms, and walking the code carefully through critical sections like checksumming. The Jazz programmer responds by combining anti-debug tricks with obfuscated control-flow, thus making it hard for the cracker to understand the code or identify useful breakpoints.

The cracker may end up in the role of a sleuth's watcher, tailing a suspect who is running in and out of doors trying to elude the watcher. The hide_out() routine above was used by Jazz in an attempt, like the watched suspect, to be alone for a moment so it could do something unseen (in this case, run its hidden valid-copy integrity algorithm).

The Jazz hide_out() needn't stop the cracker for long. (I think I borrowed another machine and wrote a C program to determine where the pseudorandom sequence would Fault.) Jazz could have made things much harder, as I did when I copied their idea in copy-protection schemes I sold to my client. (Do also pseudorandom update in (code) text during the faulting loop. That transformed text area should contain subroutine(s) called before its pseudorandom transformation, and subroutine(s) called afterword. For best results, areas used before and after should overlap!)

L o v e     m e a n s     n e v e r     h a v i n g     t o     s a y     y o u ' r e     s o r r y   .

Example 9.9: Kluges

Kluges are rightly frowned on but it would be hypocritical for me to denounce them altogether.

David Levy asked me for some remarks to include in his book on Game-Playing Software, about the software that first solved Connect-Four, but I realize now I overlooked the most interesting and perhaps most remarkable thing about that software.

When I wrote the Connect-Four solver, I didn't intend to solve the game, I just wanted a quick subroutine to evaluate endings. I wrote a straightforward depth-first search, with ``alpha-beta pruning,'' a transposition cache, and a killer move heuristic. As I exercised it, I estimated it would take roughly 20 years to solve the empty-board position.

I was not unfamiliar with game solving software and in fact had run problems like this before.
  TicTacToe? (estimated solution time = 0.2 seconds -- ``Hey, let's not waste the carpal activity on alpha-beta'')
  Go? (estimated solution time = 80 quadrillion quadrillion quadrillion centuries -- ``Hey, maybe we should try truncated search'')
... but 20 years? This was something new.

The empty-board could be solved if I borrowed a few dozen of the many idle computers in the building where I found myself, but I decided I would use only a single personal machine. Although I didn't have time to rewrite the Solver (this was just a part-time fancy as I worked full-time on production firmware for I/O systems) I became obsessed with the idea that Connect-Four was solvable. By tuning the caching and killer heuristics, the estimated completion time was whittled down to 15 months, but that was still too long.

Then a brainstorm hit me....

I knew that breadth-first searching was the key to improved game solution, though it might seem useless without accompanying heuristics. The coding is much more involved than simple depth-first, and it seemed that much of the existing code would have to be discarded.

Unlike ordinary depth-first, a breadth-first search must manage a memory with many pending positions, but I already had a transposition cache and it could play that role automatically. Replacement algorithms made the retention of prior positions ``unreliable'' (see Chapter 3) but aging algorithms meant redundant searches were rare, and the simple search routines already written should repeat searches for missing nodes automatically.

Briefly, the kluge I adopted in the Connect-Four Solver was to occasionally precede a depth-first search with a restricted-depth search, in which a fourth state (Unknown) was added to the three-state enum for Position Value (Win, Lose, & Draw). When a position's value was unknown to the Restricted Search, I went ahead and did Full Depth-first search (although restricted searches would be reattempted, occasionally, as the recursive search continued), but sometimes the Restricted Search would solve a position (e.g. if a player has a Mate-in-Three, though more often as a result of encountering a won ending previously solved and placed in the transposition cache).

The only advantage of this approach, arguably very klugey, and very poorly designed, was nevertheless the decisive one: I'd be able to use the existing code as is, adding only a few lines!! I'd have spent 20-40 hours to build a more appropriate breadth-first search if I had a paying client, but the Connect-Four Solver was just a detour on a detour for me.

In other words, I simply replaced

        game_val = dfsearch();
with something like

        if (! TryRestrict_Criterion)
                goto ordinary;
        game_val = dfsearch(maxdep = 7 + (randbits() & 7));
        if (game_val == Unsolved)
                game_val = dfsearch(maxdep = 99);

(I probably wouldn't actually use goto here -- there are several other ways to express the logic -- but goto does use the minimal syntax here, and it's a way to repeat the message: Don't become Dogmatic!. I didn't adopt the randbits() idea of stochastically varying search depth, but it might have been a good idea.)

This simple change implemented a (highly degenerate) form of breadth-first search, arguably crude to the point of crassity, but I knew that even a crude attempt at a useful feature could be much better than no attempt at all. With the change, the program zipped through its difficult analyses about 10 or 20 times faster, so I could leave it to rip for 3 or 4 weeks; alert the NSA a few weeks later that Red won Connect-Four, and got on with my life.

Laugh at me all you wish, reduce me to tears of shame if you must, but grant me one point. That little Sun-3 workstation, with the world's most trivial, kluged-up ``breadth-first'' search was the first workstation in the solar system to solve Connect-Four.

Copyright © 2003 by James D. Allen

Please send me some e-mail.

Please visit my family tree.