This code is a highly abbreviated version of code I developed in connection with my genealogical database. It is an easy, almost trivial, programming problem, but I include it because of two instructive points.

If you've gotten this far, you're a confident or aspiring programmer,
and all good programmers should have a basic knowledge of graphs
and their nomenclature.
If not, I'll summarize briefly: Given a set of tokens (or nodes)
{X, Y, ...}, a
*digraph* is a set of *ordered pairs* (X,Y),....
If (X,Y) is in the digraph, we'll say ``X is a parent of Y'' or
equivalently ``Y is a child of X.''
If, for example, (X,Y1), (Y1,Y2), (Y2,Y3), (Y3,Z) are all in the digraph,
they constitute a ``path of length 4'' from X to Z, and
Z is a ``descendant'' of X.
If no node is a descendant of itself,
then the *digraph* is *cycle-free*.

The *trees* of games in the Nim and Tic-tac-toe families
are cycle-free digraphs.
The trees of chess and checkers, however, have cycles.
The links in a genealogical database form a cycle-free digraph
since, at least until man invents a time machine, two people
cannot be each others' ancestors.

After I built my genealogical database I wanted to know the answers to questions like

- How many times does Charlemagne appear in my pedigree?
- What is the shortest path from Charlemagne to me?
- What is the longest path from Charlemagne to me?
- What path from Charlemagne to me uses the most females?

Since this is just a programming example, instead of tracing paths in a real genealogy, let's just look at the ways to get from X to Y when the only legal moves are to subtract 4, 5, or 12.

As I said, there are two instructive things to note about the software.
The first is that the essential flow is the **same**
(similarity makes things *simple*) for all
aggregation operations (counting, finding the minimum, the maximum)
even though those operations seem quite different.
(Replace **OP_COUNT** with **OP_MINLENG** or **OP_MAXLENG**
in the *#define* statement to do a different aggregating operation.)

If you wrote the three routines independently, without realizing
their close similarity, it wouldn't much matter: these short routines
can be written and modified quickly. But the *concept* of
recognizing and exploiting similarities in data structure or processing
method becomes increasingly important as problems become harder.

Some other aggregation-like operators may be handled with the same
template (as I have illustrated with a **OP_MOSTFIVE** option).
There are harder aggregate operators that can not quite be handled
this way, for example finding the *sum* of all path lengths, or
the *average* path length.

When you have mastered the original problem, modify it to find the sum and average of path lengths. The first ``suggested answer'' turns out to be a bad model here!

I will show two solutions to the exercise.
The first is unable to solve the extended problems (sum of and average
path-length) --
one would need to rip it apart, (performing code replications)
and extend the data structure slightly to support these slightly
harder ops.
For this and other reasons the first solution is a bad solution,
but I include it because it demonstrates the *simplicity* most clearly.

It may even be easier to code the ``bad but simple solution,'' debug, study and understand it, and then throw it away and write the improved version, than to write the improved version by itself. (In Lesson 1 we encouraged you to learn to type quickly and accurately; the convenience of practice throwaway code is one of the reasons.)

/* * Code to print a statistic about paths from X to Z * in a (cycle-free) digraph. The same code is used * for finding the length of the minimum path, length * of maximum path, and the total number of paths. * To make this a simple self-contained demo, we let the nodes * be non-negative integers, and just provide the arbitrary links * (x, x+4), (x, x+5), and (x, x+12). In other words we * generate a trivial (3-parent) pedigree, to make the code * and its demonstration self-contained. */ /* * The following `-2' is just an arbitrary `magic number' token * that a more structured code might reflect with an enum or * independent boolean variable. One purpose of this example * is to show that, although a tiny bit of magic is required, * (e.g. `-2' following) my simple approach is more concise, * and more easily written or studied than would usually arise * in a high-level language like C++. */ #define UNINITIALIZED -2 #define OP_COUNT /* replace this for Minleng or Maxleng */ #ifdef OP_COUNT #define PRIMRESULT 1 #define INITRESULT 0 #define STEP(res) (res) #define UPDATE(old,new) ((old) + (new)) #define RESNAME "number of distinct paths" #endif #ifdef OP_MINLENG #define PRIMRESULT 0 #define INITRESULT 99999999L #define STEP(res) ((res) + ((res) != INITRESULT)) #define UPDATE(old,new) ((new) < (old) ? (new) : (old)) #define RESNAME "length of the shortest path" #endif #ifdef OP_MAXLENG #define PRIMRESULT 0 #define INITRESULT (-1) #define STEP(res) ((res) + ((res) != INITRESULT)) #define UPDATE(old,new) ((new) > (old) ? (new) : (old)) #define RESNAME "length of the longest path" #endif #ifdef OP_MOSTFIVE #define PRIMRESULT 0 #define INITRESULT (-1) #define STEP(res) ((res) + ((res) != INITRESULT && chix == 1)) #define UPDATE(old,new) ((new) > (old) ? (new) : (old)) #define RESNAME "maximal number of 5-steps among all paths" #endif #define NUMNODES 250 typedef long long Bigint; struct node { Bigint memo; struct node *child[3]; } Node[NUMNODES]; Bigint aggreg_op(xp, zp) struct node *xp, *zp; { int chix; struct node *yp; Bigint subres, result = INITRESULT; if (xp == zp) result = PRIMRESULT; else for (chix = 0; chix < 3; chix++) { if (yp = xp->child[chix]) { if (yp->memo == UNINITIALIZED) yp->memo = aggreg_op(yp, zp); subres = STEP(yp->memo); result = UPDATE(result, subres); } } return result; } /* * Similarly, it is silly to have so much different functionality in * the `main()' routine as I show here. I tend to do it though, at * least in simple code examples, to keep things ... simple. */ main(argc, argv) char **argv; { int x, z; Bigint result; struct node *p; for (x = 0; x < NUMNODES; x++) { p = Node + x; p->memo = UNINITIALIZED; if (x >= 4) p->child[0] = p - 4; if (x >= 5) p->child[1] = p - 5; if (x >= 12) p->child[2] = p - 12; } if (argc != 3) { printf("Usage: a.out X Z\n\tnumbers in range 0 ... %d\n", NUMNODES - 1); exit(1); } x = atoi(argv[1]); z = atoi(argv[2]); if (0 <= x && x < NUMNODES && 0 <= z && z < NUMNODES) { result = aggreg_op(Node + x, Node + z); if (result == INITRESULT) printf("There are no paths from %d to %d\n", x, z); else printf("The %s from %d to %d is %Ld\n", RESNAME, x, z, result); exit(0); } }

For the record, Charlemagne appears in my fully expanded pedigree 523,116 times (that's just using the known links which appear in my database); in the shortest path he's my 31-great grandfather; in the longest path he's my 53-great grandfather.

Adam, the First Man, appears 968,292,531,344 times in my fully expanded pedigree; that's almost a trillion times! You can see why I used the special "Bigint" facility of the GNU compiler.

(The preceding was written when my database was only one-third the size it is now. The database now has over 5 million paths from Charlemagne to me, and the number of paths from Adam exceeds the capacity of Bigint.)

In the toy problem defined by the source code, the largest number that
will be printed (by *a.out 249 0*) is 8229672585400633201;
that's over 8 * quintillion*.
It doesn't take long, though, to count up to that preposterous number;
in fact the built-in timing facility prints "0.000 secs" --- less than
one millisecond to get all the way to 8 quintillion.

This may seem like no surprise to those familiar with the power of multiplication; I mentioned earlier that the 100 paths from Charlemagne to Lackland and 110 paths from Lackland to me generate 11,000 paths without further ado. But here comes the cute and/or odd surprise I promised earlier, or at least it seems cute to me:

**In the path-counting source code
there are no multiplications whatsoever.**
In fact the only operations involved in generating the total count
are (a) assigning 0 or 1 as the answer in the trivial cases,
(b) remembering an earlier answer, (c) adding two answers together.
Multiplication is fun, but those darned additions can add up pretty
fast too!

Perhaps I'm mentally deficient to think this is odd or interesting, but if you approached this problem inferring, from the Charlemagne --> Lackland --> James example, that multiplications were essential, you'd get lost quickly. On the other hand, if you just did the easy parts of the coding first, top-down and/or bottom-up, you'd soon gasp in surprise: ``Hey, I'm done! How'd that multiplication disappear?''

As promised, here's a better, more practical, version of these
functions.
**
**

/* * This is a slightly different (and more practical) version * of the aggregating-function code example, with * most of the same comments applying. */ #define UNINITIALIZED -2 typedef long long Bigint; Bigint Initres; /* * I try to keep my ``arrangement of white-space'' * consistent ... but *don't become dogmatic!* * * The following are closely related functions, all one-liners, * so use two lines per function instead of five lines, if you * think it will help the reader grasp the overall idea more quickly. */ Bigint af_identity(res, spare) Bigint res; { return res; } Bigint af_grow(res, spare) Bigint res; { return res + (res != Initres); } Bigint af_sum(old, new) Bigint old, new; { return old + new; } Bigint af_min(old, new) Bigint old, new; { return old < new ? old : new; } Bigint af_max(old, new) Bigint old, new; { return old < new ? new : old; } struct aggop { char *ag_resultname; long ag_primres, ag_initres; Bigint (*ag_step)(), (*ag_upda)(); } Aggop[] = { #define AGG_CNT 0 "number of distinct paths", 1, 0, af_identity, af_sum, #define AGG_MIN 1 "length of the shortest path", 0, 99999999L, af_grow, af_min, #define AGG_MAX 2 "length of the longest path", 0, -1, af_grow, af_max, #define AGG_TOT 3 "total of path lengths", 0, 0, af_sum, af_sum, #define AGG_NUMOPS 4 }; #define NUMNODES 250 struct node { Bigint memo[AGG_NUMOPS];; int numc; struct node *child[3]; } Node[NUMNODES]; /* NB: Must initialize for AGG_CNT before running AGG_TOT */ Bigint aggreg_op(xp, zp, agix) struct node *xp, *zp; { int chix; struct node *yp; Bigint subres, result; struct aggop *agp = Aggop + agix; if (xp == zp) { result = agp->ag_primres; } else { Initres = result = agp->ag_initres; for (chix = 0; chix < xp->numc; chix++) if (yp = xp->child[chix]) { subres = yp->memo[agix]; if (subres == UNINITIALIZED) subres = yp->memo[agix] = aggreg_op(yp, zp, agix); subres = (agp->ag_step)(subres, yp->memo[AGG_CNT]); result = (agp->ag_upda)(result, subres); } } return result; } /* * In this case, it would be easy to get rid of this function * if that seems SIMPLER, but it is usually desirable to have * ``cover'' entry-points like this. * In fact, eliminating ``covers'' like this for a *false simplicity* * is a vice not a virtue (though a vice I personally tend to suffer). */ Bigint do_aggop(x, xdest, agix) { return aggreg_op(Node + x, Node + xdest, agix); } main(argc, argv) char **argv; { int x, agix, xdest; Bigint result; struct node *p; struct aggop *agp; Bigint numpaths, totleng; for (x = 0; x < NUMNODES; x++) { p = Node + x; if (x >= 4) p->child[p->numc++] = p - 4; if (x >= 5) p->child[p->numc++] = p - 5; if (x >= 12) p->child[p->numc++] = p - 12; for (agix = 0; agix < AGG_NUMOPS; agix++) Node[x].memo[agix] = UNINITIALIZED; } if (argc != 3) { x = xdest = -1; } else { x = atoi(argv[1]); xdest = atoi(argv[2]); } if (0 > x || x >= NUMNODES || 0 > xdest || xdest >= NUMNODES) { printf("Usage: a.out X Z\n\tnumbers in range 0 ... %d\n", NUMNODES - 1); exit(1); } agp = Aggop + AGG_CNT; numpaths = do_aggop(x, xdest, AGG_CNT); if (numpaths == agp->ag_initres) { printf("There are no paths from %d to %d\n", x, xdest); return; } printf("The %s from %d to %d is %Ld\n", agp->ag_resultname, x, xdest, numpaths); agp = Aggop + AGG_MIN; printf("The %s from %d to %d is %Ld\n", agp->ag_resultname, x, xdest, do_aggop(x, xdest, AGG_MIN)); agp = Aggop + AGG_MAX; printf("The %s from %d to %d is %Ld\n", agp->ag_resultname, x, xdest, do_aggop(x, xdest, AGG_MAX)); agp = Aggop + AGG_TOT; totleng = do_aggop(x, xdest, AGG_TOT); printf("The %s from %d to %d is %Ld\n", agp->ag_resultname, x, xdest, totleng); printf("The average path-length from %d to %d is %f\n", x, xdest, totleng / (double)numpaths); exit(0); }

Go back to my home page.