DSP Math Library Version 1.0 Copyright 1994 All Rights Reserved Alle Rechte Vorbehalten James D. Yegerlehner This archive contains a library for doing DSP-accelerated matrix multiplications. Ultimately it may grow into a more complete library of math routines. I picked matrix multiplication because it is compute intensive, occurs very commonly in many kinds of software, and is done very well by the DSP. If there is a function you would like to see supported, let me know. You are welcome to copy and distribute this archive so long as you distribute the whole thing, and you do not charge for it. You may use this library in developing software so long as you do not sell the software. If you have a commercial application, get ahold of me and I will be happy to license it to you. It probably needs a little customization to be commercial anyway. Use this library at your own risk. It appears to work, but I've done very little testing. You should find the following files in the archive: dspmath.o dspmath.o is the Turbo C 2.03 object file containing the functions to link with your program. I set the standard object file format compiler option, so it's supposed to be a standard DRI-format object file. I'm not sure about compatibility with other linkers like Lattice, GNU, Sozobon, etc. but it will only work with floating point libraries using the IEEE single precision floating point format. If folks in the know about these things can let me know which compilers use this, I'll pass it along. dspmath.h Header file with function prototypes for dspmath.o. This describes how exactly to use the library functions. dspmath.lod This is the DSP program used by the library; the initialization function in dspmath.o loads this automatically. example.* This is an example program to illustrate how calls are made to the functions in dspmath.o. It was used to generate the benchmark numbers below. I have a version that works with the floating point format used by the floating point library that comes with Heat and Serve Sozobon, and could probably be cajoled into cleaning it up and getting it out, if there is interest. I calculated the following times using example.c. This table shows the time in seconds required to complete 100 multiplies of an n x n matrix by an n x n matrix. Technique ------------------------------------------------ 68030 68030 dspmath library size n math math + ------------------------ (no FPU) 6888X FPU matmlflt() matmlfxp() ------ --------- ----------- ---------- ------------ | 3 | 3 sec ? 0.3 sec 0.1 sec | 40 | 1473 sec ? 17 sec 2 sec Note that for the overhead of conversion between fixed and floating point grows with n*n (the number of matrix elements) whereas the number of multiplies and adds to compute the result grows with n*n*n. As the matrix gets larger, the advantage of using the DSP grows. Caveat: the underlying arithmetic used by both matmlflt() and matmlfxp() is 24 bit (with 56 bit intermediate results) fixed point 2's complement fractional arithmetic. Fixed point is not as flexible as floating point, so some problems are not appropriately handled with fixed point math. In rough and vague terms, numbers within a single matrix should be fairly close to the same order of magnitude. If, for instance, one matrix element were, say, 10^13, whereas all the others were on the order of 10^3, the smaller numbers would all end up looking like zeroes to the library, since they are so many orders of magnitude smaller. If you have any doubt, it behooves you to learn a bit about fixed point arithmetic to see if it is appropriate for your problem. Known bugs ----------- Neither matrix multiplication routine will tell you when the matrices are too big to fit in the DSP's local memory. It probably should, and if there's interest I'll add this. In any case, it's easy to figure out. The DSP has two memory spaces each with 16K x 24 bit words. Each matrix value takes up 1 24-bit word. A and C must fit in one 16K, while B must fit in the other, where C = AB. Jim Yegerlehner GEnie:J.YEGERLEHNE Internet:j.yegerlehne@genie.geis.com