DSP Math Library     
Version 1.0
Copyright 1994   All Rights Reserved   Alle Rechte Vorbehalten
James D. Yegerlehner


This archive contains a library for doing DSP-accelerated 
matrix multiplications. Ultimately it may grow into a more
complete library of math routines. I picked matrix multiplication
because it is compute intensive, occurs very commonly
in many kinds of software, and is done very well by the DSP.
If there is a function you would like to see supported, let me know.

You are welcome to copy and distribute this archive so long as you
distribute the whole thing, and you do not charge for it. You may use
this library in developing software so long as you do not sell the
software. If you have a commercial application, get ahold of me and 
I will be happy to license it to you. It probably needs a little
customization to be commercial anyway.
 
Use this library at your own risk. It appears to work, but I've
done very little testing.

You should find the following files in the archive:

dspmath.o
    dspmath.o is the Turbo C 2.03 object file containing
    the functions to link with your program. I set the standard
    object file format compiler option, so it's supposed to be 
    a standard DRI-format object file. I'm not sure
    about compatibility with other linkers like Lattice,
    GNU, Sozobon, etc. but it will only work with floating 
    point libraries using the IEEE single precision floating
    point format. If folks in the know about these things
    can let me know which compilers use this, I'll pass it along.

dspmath.h 
    Header file with function prototypes for dspmath.o. This
    describes how exactly to use the library functions.

dspmath.lod
    This is the DSP program used by the library; the initialization
    function in dspmath.o loads this automatically.

example.*
    This is an example program to illustrate how calls
    are made to the functions in dspmath.o. It was used to generate
    the benchmark numbers below.

I have a version that works with the floating point format used
by the floating point library that comes with Heat and Serve
Sozobon, and could probably be cajoled into cleaning it up and
getting it out, if there is interest.

I calculated the following times using example.c. This
table shows the time in seconds required to complete 100
multiplies of an n x n matrix by an n x n matrix.

            Technique  
            ------------------------------------------------
            68030       68030       dspmath library          
size n      math        math +      ------------------------
            (no FPU)    6888X FPU   matmlflt()  matmlfxp()
------      ---------   ----------- ----------  ------------
        |
  3     |       3 sec         ?         0.3 sec     0.1  sec
        |
 40     |    1473 sec         ?         17  sec       2  sec        


Note that for the overhead of conversion between fixed and
floating point grows with n*n (the number of matrix elements)
whereas the number of multiplies and adds to compute the 
result grows with n*n*n. As the matrix gets larger, 
the advantage of using the DSP grows.

Caveat: the underlying arithmetic used by both matmlflt() and
matmlfxp() is 24 bit (with 56 bit intermediate results) 
fixed point 2's complement fractional arithmetic. Fixed 
point is not as flexible as floating point, so some problems
are not appropriately handled with fixed point math.
In rough and vague terms, numbers within
a single matrix should be fairly close to the same order
of magnitude. If, for instance, one matrix element were,
say, 10^13, whereas all the others were on the order of
10^3, the smaller numbers would all end up looking like
zeroes to the library, since they are so many orders of
magnitude smaller. If you have any doubt, it behooves you
to learn a bit about fixed point arithmetic to see if it
is appropriate for your problem.

Known bugs
-----------
Neither matrix multiplication routine will tell you
when the matrices are too big to fit in the DSP's
local memory. It probably should, and if there's
interest I'll add this. 

In any case, it's easy to figure out. The DSP has
two memory spaces each with 16K x 24 bit words. Each
matrix value takes up 1 24-bit word. A and C must fit
in one 16K, while B must fit in the other, where C = AB.


Jim Yegerlehner    GEnie:J.YEGERLEHNE 
                   Internet:j.yegerlehne@genie.geis.com