A Hash Table Manager

This is the same program as the 25 Knights solver, a

Programming Example inspired by a Usenet problem

... but this version has been cleaned up slightly so that the hash-table handling is self-contained and can be cut-and-pasted into another application. Refer to the above link for more information on the example application incorporated in this program.

As the ``teacher'' of this on-line Programming Class, I don't want you to borrow my hash-table handler. I want you to write your own; then perhaps compare with mine.

You will want to write a test program, and also embed statistic gatherers and validity checkers into the handler to help you understand and debug it. (When satisfied, you may remove the debug logic, or leave it in but with "ifdef DEBUG" lines to disable it.)


Hash Tables

(Hash tables aren't defined or explained here. They are discussed a little in Appendix M, though I'm afraid they're not defined or explained there either.)

Hash table methods may include up to six parts:

The example code here provides all six. (The application doesn't do deletions, but I've supported them in the source code anyway.) Multi-Compartmentalization is an important way to save space in some applications. I've never seen it discussed in textbooks; maybe I should patent it.   :-)

When two elements in the table have the same hash function value H, there is a ``collision.'' There are several ways to deal with collisions in a hash table:

The simplest linear reprobe is often ``good enough'' if you're hacking out a quick-and-dirty program from scratch. Hashed linear, and quadratic methods give better performance but more-or-less require hash-table size to be prime.

In ``Hashed reprobe,'' J may be a second hash function (as in Chuck Falconer's library) or the first hash function itself (as in GNU's hsearch()). In either case, J = 0 must be avoided. Quadratic reprobe will probe ((P+1)/2) elements when table size P is prime, and then must fallback to simple linear reprobe.

My code uses quadratic reprobe. (Hashed reprobe with a second hash function needs slightly fewer probes, but will be slower overall because of the need to compute a second hash function.)

Hash functions themselves are used to ``mash'' data into a pseudo-random form. They are beyond our scope here. We just use a simple masher     ((s)+((s)>>1)-((s)>>3))     in the sample application, and such simple mashers are often good enough.


Software Reusability

Maybe nobody will ever use this hash-table handler except me, but I've tried to write it so it will be easy to reuse.

Sometimes the flexibility needed for reusability conflicts with efficiency (high speed). GNU's hsearch() uses table entries which are always exactly 2 pointers (usually 8 bytes) in size. What if you prefer a bigger entry? Or a smaller one? Hsearch() always invokes strcmp() to test for key equality, but what if you need a different routine? Or a faster one? Or in-line comparison for maximum speed?

One might pass the entry-size and a pointer to a compare function as parameters (as bsearch() and qsort() do), but that makes the interface clumsier and slightly slower.

Hsearch() may be good enough for most purposes, but in seeking generality, flexibility, and efficiency, it achieves none of them fully. The C++ programming language attempts to address this problem, but this is a class in C programming. Some of us feel the cure is worse than the disease with C++; if people know how to use C properly they can have ``the best of both worlds.''

The goal of being both flexible and efficient is certainly admirable. I have a different approach than that taken by either C++ or hsearch(). The key idea involves a saying from a major movie:

Use the Source, Luke!

Often library routines, whether source is provided or not, are treated as stable compiled object modules, but that's not how the following Hash-Table Handler is intended to be used. Use the source and modify as necessary. (You can probably get away with just changing the part called ``Hash particulars for this application.'')


Zip archive

I've included the source for 25 Knights below, with some discussion and the hashing material all concatenated into one source file. You may prefer to download a ZIP'ped archive of three source files: the hash handling module, the sample application, and a header file which relates the two together. After downloading, use the following commands to unpack, compile and run the program:


        unzip knihsh.zip
        cc -O -o kniprog kniprog.c hashtab.c
        kniprog -full

... Or just peruse the source code here:




/*
 * NB:  Although this application does not do Deletes in its
 *       hash tables, we have supported that in the code to make
 *       it of more general use.  But we ifdef it out when not
 *       using it since the various checks would slow down this
 *       code by almost 10%.
 *      When deletions are enabled, one way to speed up slightly 
 *       would be to prepare a streamlined euqivalent of hashfind()
 *       for use when caller does not intend to insert.
 *
 * The typedefs, macros, etc. are organized into groups.
 * There are two fundamental data types to consider:
 *      (1) position -- this consists of 30 (NBITS_P) bits, but code above
 *              "Problem-specific stuff" marker doesn't really care about
 *              the meaning of these bits.  If desperate, there is
 *              straightforward way to reduce it to 29 bits.
 *      (2) table-entry -- this consists of 16 bits (bits per short int)
 *              broken into two parts: info and search-key.  Info is
 *              2 bits, so search-key is 14 bits.  If desperate there is
 *              a way to get by with 1 bit of info.  Also, one could
 *              increase table-entry to 24 bits or 32 or whatever.
 * In principle, the search-key in the table-entry *is* a position.
 * Since this leaves 16
 *              16 == (NBITS_P - bits_per_short + #info_bits)
 * position bits unavailable in search-key, we make them
 * compartment-select bits.
 *
 * This idea -- using multiple compartments with part of the key for
 * compartment selection -- seems to be an important straightforward
 * technique but I don't recall seeing it in any textbook or discussion.
 *
 * Although everything here is lumped into a single C file, it is
 * modularized as shown by BEGIN and END.
 *
 * I prepared a simple test driver which I do not show, but this
 *  module was compiled with -DTESTING to make it usable for that test.
 */

/* Macros for positions are below.  Here's size and even it isn't referenced */
#define NBITS_P         30

/* BEGIN -- Hash particulars for this application.
 *
 * Stuff for table entry.
 * It will seem perverse that we have a structure of one
 *      field, both having typedefs.  The reason is to make
 *      it easy to reuse this code with minimal modification
 *      on another project which might have more than one
 *      field in Tab_entry.
 * Two attribute bits are defined; the rest are search-key.  The two
 *      attribute bits will have one of three values:
 *              EPEND   -- pending position of even generation
 *              OPEND   -- pending position of odd generation
 *              0       -- position process already.
 * The generation sense can be determined from MT_CELL.  As an
 *      exercise, rewrite the code to use this fact and make
 *      do with a single attribute bit.
 *
 * In specific problem here, zero happens not to be a possible
 *      search key, so we use it for Empty and (with a flag set) Deleted.
 *      If zero were valid, we might have set up the attribute bits
 *      so as to use all-zeros for empty/deleted.
 * The three predicates DELETED(s), MTSLOT(s), VALID(s) must be
 *      defined so that for any s, *exactly one* will succeed.
 */

typedef unsigned short Te_ktype;

typedef struct {
        Te_ktype        te_key;
} Tab_entry;

#define EPEND                   0x4000
#define OPEND                   0x8000
#define KEY(tekey)              ((tekey) & ~(EPEND | OPEND))
#define KEYEQ(a, b)             (!(KEY(a ^ b)))
#define HASHER(s)               ((s) + ((s) >> 1) - ((s) >> 3))
#define MTCODE                  (0)
#define MTSLOT(s)               ((s) == MTCODE)
#define	MTSET(s)		((s) = MTCODE)

#define VALID(tekey)            (tekey)
#define DELETED(s)              (0)
#define DELSET(s)

/* END -- Hash particulars for this application. */

/* From here on is more-or-less independent of application specifics */
struct hashtab {
        Tab_entry       *httab;
        int     htsziz;
        int     htsize;
        int     htnumi; /* includes deleted items also */
        int     htmaxalloc;
        int     htpmoved;
};

Tab_entry *hashfind();

/*
 * Two parameters determine how full tables get.
 * Here are two example settings.
 */
#if     0
        /* Memory is scarce: Following parms reorg from 88% to 69% */
#define TPARM1  3
#define TPARM3  2
#else
        /* Moderate: Following parms reorg from 75% to 37% */
#define TPARM1  2
#define TPARM3  6
#endif

/*
 * BEGIN -- Hash routines.
 *  Following code should work as is for a completely different hash project.
 *  Although the source needn't change, it can't be made a ".o" binary
 *      as it is dependent on the application-specific particulars.
 */

/* Caller must allocate his own struct hashtab's and pass them in.
 * It is sufficent to provide an all-zero such structure
 * and start doing lookups and insertions.  The tables
 * will spring into existence as needed.
 */

/*
 * We support several sizes,
 *      ranging from tiny (or zero) up to almost 60 million
 * Each entry in psize[] (except first few)
 *      is about 1.125 times preceding entry.
 * NUMSIZE must be one more than power-of-two; first size should be 0.
 * Each entry should be prime, though code will work (in a degraded
 *  fashion) as long as table size is odd.
 */

#define NUMSIZE 129
static long     psize[NUMSIZE] = {
 0,
 3, 7, 11, 17, 23, 29, 37, 47,
 53, 59, 67, 73, 79, 89, 101, 113,
 127, 139, 157, 179, 199, 223, 251, 283,
 317, 359, 401, 449, 503, 569, 641, 719,
 809, 911, 1021, 1151, 1297, 1459, 1637, 1847,
 2081, 2341, 2633, 2963, 3331, 3739, 4211, 4733,
 5323, 5987, 6733, 7573, 8521, 9587, 10781, 12119,
 13633, 15331, 17239, 19391, 21817, 24547, 27617, 31069,
 34949, 39317, 44221, 49747, 55967, 62969, 70841, 79697,
 89659, 100853, 113467, 127649, 143609, 161561, 181757, 204481,
 230047, 258803, 291143, 327529, 368471, 414539, 466357, 524669,
 590251, 664043, 747049, 840439, 945481, 1063661, 1196609, 1346183,
 1514459, 1703773, 1916741, 2156339, 2425889, 2729119, 3070273, 3454081,
 3885841, 4371569, 4918019, 5532763, 6224359, 7002409, 7877711, 8862421,
 9970223, 11216501, 12618569, 14195887, 15970369, 17966659, 20212483, 22739033,
 25581407, 28779073, 32376469, 36423533, 40976471, 46098527, 51860839, 58343459,
};

static setsiz(siz)
        long    siz;
{
        int     cumi, inci;

        cumi = 0;
        for (inci = NUMSIZE / 2; inci; inci >>= 1)
                if (psize[cumi + inci] < siz)
                        cumi += inci;
        return cumi + 1;
}

/* Caller must inspect return code (rc) to determine result:
 *      rc  == NULL          -->  not found, table unbuilt
 *      MTSLOT(rc->te_key)   -->  not found, can insert at rc
 *      DELETED(rc->te_key)  -->  not found, can insert at rc
 *      VALID(rc->te_key)    -->  found, at rc
 */
Tab_entry *hashfind(hashval, key, tabp)
        unsigned int    hashval;
        Te_ktype        key;
        struct hashtab *tabp;
{
        int     hk, siz, incr;
        Tab_entry *base, *hp, *delp;
        Te_ktype kp;

        siz = tabp->htsize;
        base = tabp->httab;
        if (base == 0)
                return 0;
        hk = hashval % siz;
        hp = base + hk;
        kp = hp->te_key;
        if (MTSLOT(kp) || KEYEQ(kp, key))
                return hp;
        incr = 1;
        while (1) {
                hk += incr;
                if (incr != 2)
                        incr += 2;
                if (hk >= siz) {
                        hk -= siz;
                        if (incr >= siz)
                                incr = 2;
                }
                hp = base + hk;
                kp = hp->te_key;
                if (MTSLOT(kp) || KEYEQ(kp, key))
                        return hp;
        }
}

hashreorg(tabp, siz)
        struct hashtab *tabp;
{
        int     i, oldsiz;
        Tab_entry       *oohp, *ohp, *nhp;
        Te_ktype        key;
        void    *malloc();

        oldsiz = tabp->htsize;
        ohp = tabp->httab;
        tabp->htsize = siz;
        tabp->htmaxalloc = siz - (siz >> TPARM1) - 1;
        tabp->htpmoved = 1;
        nhp = tabp->httab = malloc(siz * sizeof(Tab_entry));
        if (!nhp) {
                printf("Malloc(%d) failed\n", siz * sizeof(Tab_entry));
                /* Yes, i know msg "should" go to stderr */
                exit(1);
        }
        for (i = 0; i < siz; i++, nhp++) {
                MTSET(nhp->te_key);
        }
        if (oohp = ohp) {
                for (i = 0; i < oldsiz; i++, ohp++) {
                        key = ohp->te_key;
                        if (VALID(key))
                                *(hashfind(HASHER(KEY(key)), key, tabp)) = *ohp;
                }
                free(oohp);
        }
}

hashmreorg(tabp)
        struct hashtab *tabp;
{
        int     siz;

        if ((siz = tabp->htsziz) >= NUMSIZE - 1) {
                printf(" (maxsize) ... abort\n");
                exit(99);
        }
        siz += TPARM3;
        if (siz >= NUMSIZE)
                siz = NUMSIZE - 1;
        tabp->htsziz = siz;
        siz = psize[siz];
        hashreorg(tabp, siz);
}

/*
 * Sets a pointer pointed to by argument `tepp' to a valid Table entry
 *  for argument `key', creating it if necessary.
 * Return 1 if the entry is new, 0 otherwise.
 */
hashinsert(key, tabp, tepp)
        Te_ktype key;
        struct hashtab *tabp;
        Tab_entry **tepp;
{
        register Tab_entry      *hp;

        *tepp = hp = hashfind(HASHER(key), key, tabp);
        if (hp && VALID(hp->te_key))
                return 0;
        if (tabp->htnumi >= tabp->htmaxalloc) {
                hashmreorg(tabp);
                *tepp = hp = hashfind(HASHER(key), key, tabp);
        }
        tabp->htnumi += 1;
        hp->te_key = key;
        return 1;
}

/* END -- Hash routines. */
#ifndef TESTING

/*
 * BEGIN -- Compartment assignment for specific application
 * Next number must be at least (NBITS_P - 14)
 *  where NBITS_P is the problem-specific key size,
 *  and 14 (== bits_in_short minus 2_info_bits)
 *  is key bits in Te_ktype.
 */
#define LOG_NC          16
#define NUMCOMPART      (1 << LOG_NC)
#define P2COMP(pos)             ((pos) & (1 << LOG_NC) - 1)
#define P2RESID(pos)            ((pos) >> LOG_NC)
#define CR2POS(comp, resid)     ((resid) << LOG_NC | (comp))

/* END -- Compartment assignment for specific application  */

/* BEGIN -- Multi-compartment hashing */

struct hashtab Htab[2][NUMCOMPART];

/* Return pointer if found, else NULL */
Tab_entry *te_find(tno, pos)
{
        Te_ktype        resid;
        Tab_entry       *hp;

        resid = P2RESID(pos);
        hp = hashfind(HASHER(resid), resid, Htab[tno] + P2COMP(pos));
        return hp && VALID(hp->te_key) ? hp : 0;
}

/* Leave existing entry unchanged,
 *  but create, set flags and return 1 for new entry.
 */
te_insert(tno, pos, flags)
{
        Tab_entry *hp;

        if (hashinsert(P2RESID(pos), Htab[tno] + P2COMP(pos), &hp)) {
                hp->te_key |= flags;
                return 1;
        } else {
                return 0;
        }
}

/* END -- Multi-compartment hashing
 *
 * If you came here just to ``steal'' my hash-table handling code,
 * Then throw away everything after here.
 */

/*
 * BEGIN -- Main driver for the application.
 * Usage:
 * knight #a #b -- find the midpoint of the quickest path from a to b.
 *      (a position is denoted numerically by its internal representation.)
 * knight #a    -- same but b defaults to the color-reversal of a.
 * knight       -- same but a defaults to the position posted on Usenet.
 */
main(argc, argv)
        char    **argv;
{
        int     i, mov, npos, depth, opos, cno, kw, side;
        int     Startpos, Endpos;
        int     fullflag;
        struct hashtab *htp;
        Tab_entry       *hp, *tp;

        if (argc > 1 && !strcmp(argv[1], "-full")) {
                fullflag = 1;
                argc--;
                argv++;
        } else
                fullflag = 0;
        if (argc > 1)
                Startpos = atoi(argv[1]);
        else
                Startpos = 1072447500;  /* The position from Usenet */
        /*
         * When the goal is color-reversal of start position,
         *  there is a simple heuristic which saves much time.
         * User can signal this on command-line by omitting final arg,
         *  and we signal this to ourselves below by setting Endpos to zero.
         * However, *because we choose to recognize the case even
         *  when user doesn't*, it is slightly simpler to temporarily
         *  set Endpos to its actual value.
         */
        if (argc == 3)
                Endpos = atoi(argv[2]);
        else
                Endpos = reversec(Startpos);
        if (fullflag) {
                gen_solve(Startpos, Endpos, argv[-1]);
                showpos(Endpos);
                exit(0);
        }
        if (Endpos == reversec(Startpos))
                Endpos = 0;
        else
                te_insert(1, Endpos, OPEND);
        te_insert(0, Startpos, OPEND);
        mk_movemap();
        printf("Start pos = %d Target = %d\n", Startpos,
                        Endpos ? Endpos : reversec(Startpos));
        /* We have 9 levels of nesting; so let's not indent the
         * source code except on the levels where we actually use
         * squiggly braces.
         */
        /* Level 1 -- Iterate through generations */
        for (depth = 1; ; depth++)
        /* Level 2 -- Alternate work on Startpos and Endpos */
        for (side = 0; side < 1 + (!!Endpos); side++)
        /* Level 3 -- Iterate compartments looking for pending positions */
        for (cno = 0; cno < NUMCOMPART; cno++)
        /* Level 4 -- Iterate within compartment */
        for (htp = &Htab[side][cno], tp = htp->httab, htp->htpmoved = 0;
                        tp < htp->httab + htp->htsize;
                        tp++)
        /* Level 5 -- Process only positions of the proper generation */
        if ((kw = tp->te_key) & (depth & 1 ? OPEND : EPEND)) {
                tp->te_key = KEY(kw);   /* Clear Pending bits */
                opos = CR2POS(cno, tp->te_key);
                /* Level 6 -- Iterate through 8 knight's move directions */
                for (mov = 0; mov < 8; mov++)
                /* Level 7 -- Ignore off-board moves */
                if (npos = makemove(opos, mov))
                /* Level 8 -- Ignore positions already in database */
                if (te_insert(side, npos, depth & 1 ? EPEND : OPEND)) {
                        if (Endpos)
                                hp = te_find(! side, npos);
                        else
                                hp = te_find(side, reversec(npos));
                        /* Level 9 -- Detect success */
                        if (hp) {
                                printf("!!! Solved !!!\n");
                                printf("%d\n", npos);
                                showpos(npos);
                                exit(0);
                        }
                }
                if (htp->htpmoved) {
                        /* Fall back if `tp' might be obsolete */
                        --cno;
                        break;
                }
        }
}

/* END -- Main driver for the application */

/* BEGIN -- Problem-specific stuff
 *
 * Code above here knows little about problem details.
 * Let me rephrase that.  The Main Driver knows, for example,
 *      that there is a way to reverse-color a position;
 *      it just doesn't know *how* to do it.
 */

#define MT_CELL(pos)    (0x1f & (pos))  /* eMpTy cell */
#define KCOL(x)         (1 << 5 + (x))  /* Knight COLor */

reversec(pos)
{
        return ((pos) ^ 0x3fffffe0 ^ KCOL(MT_CELL(pos)));
}

int     momap[25][8];

makemove(pos, mov)
{
        int     hop = MT_CELL(pos);
        int     nhop = momap[hop][mov];
        return
                nhop == -1 ? 0
                : (pos & 0x3fffffe0 | nhop)
                ^ (pos & KCOL(nhop) ? KCOL(nhop) | KCOL(hop) : 0);
}

int     mpars[8][2] = {
        1, 2, 2, 1, 2, -1, 1, -2, -1, -2, -2, -1, -2, 1, -1, 2,
};

mk_movemap()
{
        int     x, y, m, nx, ny;

        for (x = 0; x < 5; x++)
        for (y = 0; y < 5; y++)
        for (m = 0; m < 8; m++) {
                nx = x + mpars[m][0];
                ny = y + mpars[m][1];
                if (nx < 0 || nx > 4 || ny < 0 || ny > 4)
                        momap[5*x + y][m] = -1;
                else
                        momap[5*x + y][m] = 5*nx + ny;
        }
}

showpos(npos)
{
        int     x, y;
        static  cnt = 0;

        printf("\n(%d) Position %d\n", cnt, npos);
        ++cnt;
        for (x = 0; x < 5; x++)
        for (y = 0; y < 6; y++)
                printf("%c", y == 5
                        ? '\n' : npos & KCOL(x*5 + y)
                        ?       'X' : x*5 + y == MT_CELL(npos)
                        ?       '.' : 'O');
}
/* END -- Problem-specific stuff */

/* BEGIN -- Driver to iterate program and print entire solution
 *
 * The next line includes <stdio.h> but Mozilla throws
 *  away part of the line.
 */
#include        

gen_solve(pos_start, pos_this, cmdname)
        char    *cmdname;
{
        int     pos_goal, subp[30];
        int     numi;
        char    command[1000], resp[100];
        FILE    *thepipe;
        int     cnt;

        showpos(pos_start);
        for (pos_goal = numi = 0; pos_this != pos_goal; numi++) {
                subp[numi] = pos_goal = pos_this;
                sprintf(command, "%s %d %d | head -3 | tail -1",
                                cmdname, pos_start, pos_this);
                thepipe = popen(command, "r");
                fread(resp, 1, 99, thepipe);
                pclose(thepipe);
                pos_this = atoi(resp);
        }
        while (--numi)
                gen_solve(subp[numi], subp[numi - 1], cmdname);
}

/* END -- Driver to iterate program */
#endif

 

Please send me some e-mail.
Go back to my home page.