Spell/ST User Guide v3.2 Murray Langton and David Tilley April, 1991 A versatile English spelling checker and dictionary maintainer Text enclosed in square brackets and the Appendices may be skipped at a first reading 0 INTRODUCTION 2 USING MAKEDICT.PRG 0.0 Distribution 2.0 Introduction 0.1 System Requirements 2.1 The 'Desk' Menu 0.2 Files You Need 2.2 The 'Dictionaries' Menu 0.3 Overview 0.4 Limitations 0.5 Your Comments APPENDICES 1 USING SPELL.PRG A What is a Word? 1.0 Introduction B Dictionary Format 1.1 The 'Desk' Menu C Fatal Error Messages 1.2 The 'File' Menu D Background Information 1.3 The 'Options' Menu E Known Bugs 1.4 The 'Dictionaries' Menu F Improvements 0 INTRODUCTION 0.0 Distribution Spell/ST and MakeDict are Copyright 1991 Murray Langton and David Tilley. Spell/ST and MakeDict are in the public domain. Commercial use of all or any part of this software or its dictionaries is for- bidden. Spell/ST's dictionaries were accumulated from many sources over the years, too many to acknowledge individually. However, the following deserve special mention: Users and staff of the University of London Computer Centre Jeff Horne's 'Codebreaker' disc of UK telephone exchange codes Mistakes in Spell/ST's dictionaries remain our fault, although we accept no liability for them. We also accept no liability for use of the Spell/ST and MakeDict programs. All manufacturers' trademarks are acknowledged. 0.1 System Requirements You need an Atari ST with at least 400 kilobytes' free memory to run Spell/ST. On a 520 ST or a 1040 ST, you may have to remove one or more of your desktop accessories before Spell/ST will work. MakeDict requires at least 360 kilobytes' free memory. Since neither program has been tested on an STE or on versions of TOS later than 1.0 (1985), we would appreciate reports on how they behave with such systems. Both Spell/ST and MakeDict work with Atari's high- or medium- resolution monitor, but not in low resolution. They have not been tested with large screens. Spell/ST and MakeDict are designed to be run from hard disc or ram disc; they are very slow when used from diskette. Because MakeDict needs access to the '.WRD' files and enough disc space to create the MASTER.DIC and MASTER.IND files, it cannot run satisfactorily on systems having only one single-sided diskette drive. 0.2 Files You Need Before using Spell/ST, you should ensure that you have the foll- owing six files: File name Description MASTER.DIC Spell/ST's master dictionary (binary) MASTER.IND Index to Spell/ST's master dictionary (binary) READ.ME Information not in SPELL.ASC SPELL.ASC This document SPELL.PRG Executable Spell/ST program SPELL.RSC Spell/ST's resource file All should be placed in the same folder as SPELL.PRG or in the root directory of the drive containing it. Before using MakeDict, you should ensure that you have at least the following eight files: File name Description COMPUTER.WRD Dictionary of computerese terms (text) MAIN.WRD Dictionary of English words (text) MAKEDICT.ASC This document (same as SPELL.ASC) MAKEDICT.PRG Executable MakeDict program MAKEDICT.RSC MakeDict's resource file MAKEDICT.SET Default location of '.WRD', '.DIC' and '.IND' files (not to be edited) NAMES.WRD Dictionary of names and places (text) READ.ME Information not in MAKEDICT.ASC All should be placed in the same folder as MAKEDICT.PRG or in the root directory of the drive containing it. You do not need MakeDict to run Spell/ST. They could be supplied separately by your BBS or archive system. 0.3 Overview Spell/ST reads text from a file, breaks that text up into words, looks up each word in a dictionary, and displays those words that it couldn't find. It has some knowledge of plurals, suffixes and prefixes. Spell/ST ignores some text-formatting commands and works on 1st. Word Plus '.DOC' files. It also detects consecutive duplicate words and most common split infinitives. Various optional facilities are available: o four formats for the display of unrecognised words o ignore words containing only upper-case letters, digits, and special characters o suggestions for the correct spelling of unrecognised words o supply up to three personal dictionaries to supplement the standard dictionaries o supply a personal 'reject' dictionary containing words which are to be treated as unrecognised, even if they appear in another dictionary o which, if any, of the dictionaries should be used o create a dictionary o log unrecognised words o add Spell/ST's report to a file There are many words which can be spelt with either 'ise' or 'ize' at the end, recognise or recognize, for example. For such words, Spell/ST will accept either form (or most variants), but will report those words which are spelt inconsistently within a document. The master dictionary used by Spell/ST is constructed from three smaller dictionaries for ease of maintenance. The 'main' diction- ary contains English words (but not words valid only in American English), the 'names' dictionary names of people and places, and the 'computer' dictionary computing terms. A few words may appear in more than one dictionary. The master dictionary may optionally contain a reject dictionary and three other dictionaries con- structed by you. As far as Spell/ST is concerned, all dictionaries are contained by one file called MASTER.DIC. The MakeDict program is used to construct a MASTER.DIC file from the various '.WRD' files supp- lied with Spell/ST or constructed by you. MakeDict is described in chapter two. 0.4 Limitations Please note that Spell/ST merely reports words from a document which were not found in a dictionary; these represent _potential_ spelling mistakes rather than actual mistakes. In practice, there will be some spelling mistakes, some technical terms, some abbre- viations, and some words which are correctly spelt but which are not (yet) in a dictionary. Spell/ST cannot locate mistakes which produce some other correct- ly spelt word, nor does it check context, grammar or punctuation. Spell/ST has no facilities for interactive spelling correction. Areas where Spell/ST is known to be weak include the following: o No distinction is made between upper- and lower-case letters o No attempt is made to check that 's is used correctly at the end of a word, though other abbreviations using ' are checked o Hyphenated words are not always recognised, especially if they are split at the end of a line Please see Appendices D, E and F for further discussion. 0.5 Your Comments Comments on how Spell/ST or MakeDict may be improved should be e-mailed to David Tilley at: DRT10@UK.AC.CAM.PHX on JANET or DRT10%PHX.CAM.AC.UK@CUNYVM.CUNY.EDU on Internet or telephoned to +44 (0)81-399 8372; but please see Section 0.4 and Appendices D, E and F before making them. Your suggestions for additions or corrections to Spell/ST's dictionaries should be sent by e-mail; other people could benefit from them in a later release. 1 USING SPELL.PRG 1.0 Introduction Double-click on the SPELL.PRG icon. You will then see the titles of five drop-down menus. Their functions are described below. 1.1 The 'Desk' Menu The 'Desk' menu looks something like the following: ---- |Desk| |------------------- | About Spell/ST... | ------------------- Our copyright notice and Spell/ST's version number are displayed when you click on the 'About Spell/ST...' item. 1.2 The 'File' Menu The 'File' menu looks something like the following: ---- |File| |------------------- | Scan document... | | Report to disc | |-------------------| | Quit Spell/ST | ------------------- Scan document: Use this item to bring up the usual (or your preferred) file sel- ector. Choose the name of the file containing the text whose spelling is to be checked. The first time you use this item, Spell/ST loads its master dictionary; this takes about four sec- onds from hard disc, and about fifty seconds from diskette. You can use this item to check in turn as many documents as you wish. The time taken to produce a complete report varies according to the size of a document and your equipment. For example, Spell/ST took about one minute to scan a copy of this guide held on hard disc and about one-and-a-quarter minutes for a copy on diskette. Report to disc: Spell/ST normally sends its reports to a scrolling window only. By clicking on this item, the report will also be added to a file, SPELL.REP in the folder from which SPELL.PRG was executed. The report produced by Spell/ST will contain: o the location of split infinitives and consecutive duplicate words o the number of lines and words in the file being checked o the time taken to check the file o how many unrecognised words were found o a list, in alphabetical order, of unrecognised words By default, only the 'main' English dictionary is used. You can stop a report by pressing 'q' or 'Q' at the 'More...' prompt. You can use the 'Options' and 'Dictionaries' menus, described below, to alter the format and content of the report and to select additional dictionaries. Quit Spell/ST: When you're finished, click on this item to return to the desk- top. 1.3 The 'Options' Menu The 'Options' menu looks something like the following: ------- |Options| |---------------------- | Alphabetical order | This item will have a tick mark... | Order of occurrence | | Words in context | |----------------------| | Duplicates | ...so will this... | Split infinitives | ...and this... | Make suggestions | | Ignore u/c words | |----------------------| | Statistics | ...and this | Frequency counts | |----------------------| | Log unknown words | ---------------------- Items marked with ticks are Spell/ST's default options. Alphabetical order: This item causes an alphabetical list of all unrecognised words to be produced. Note than a frequency count of how many times each word appeared is not included; see the 'Frequency counts' option below. [Because Spell/ST has to sort all unrecognised words into alphabetical order, it will take longer to start prod- ucing a report compared with the 'Order of occurrence' and 'Words in context' options (see below).] Order of occurrence: List all unrecognised words in order of occurrence with each word preceded by the number of the line in which it occurs. Note that all occurrences of any unrecognised word will be displayed. Words in context: Display all lines containing an unrecognised word, preceded by their line number, and underline each unrecognised word with carets (^). Duplicates: Report consecutive duplicate words and their line number. [A common mistake, especially after a document has been edited several times, is to have two consecutive words the same. Genuine duplicated words are rare in English, so Spell/ST will report any duplicated words and the relevant line number. One of the dup- licates may be at the end of the previous line. Since blank lines are not significant to Spell/ST, repetitions can be unnecessarily reported between a section heading and the text which follows it, for example.] Split infinitives: Report split infinitives and their line number. Make suggestions: Use this item to ask Spell/ST to make suggestions for the correct spelling of unrecognised words. Unrecognised words for which no plausible suggestions can be made will be listed next in alpha- betical order. We recommended you use the 'Ignore u/c words' option (see below) when you use the 'Make suggestions' option; this could help Spell/ST save time trying to find correct spell- ings for file names and computerese, for example. Ignore u/c words: Use this item to make Spell/ST ignore words containing only upper-case letters, digits and special characters. This could help Spell/ST avoid checking non-existent words like file names. Statistics: Produce statistical information on the file whose spelling is being checked. Frequency counts: Against each unrecognised word reported add the number of times it occurred in the document. This is of use only with the 'Alpha- betical order' option. Log unknown words: Cause Spell/ST to add unrecognised words to a file, NEWWORDS.LOG in the folder from which SPELL.PRG is executed. Note that no words will be added unless the main, names and computerese dict- ionaries are selected. [Unrecognised words are added to the log in alphabetical order and the name of the source document is not recorded. You can periodically add valid words from NEWWORDS.LOG to your diction- aries.] 1.4 The 'Dictionaries' Menu The 'Dictionaries' menu looks something like the following: ------------ |Dictionaries| |-------------- | Main | This item will have a tick mark | Names | | Computerese | | Reject | |--------------| | User 1 | | User 2 | | User 3 | |--------------| | All | or 'None' |--------------| | Create... | -------------- A tick indicates Spell/ST's default dictionary. Main: Select the dictionary of English words. Of the three dictionaries supplied with Spell/ST, we recommend you use only the English dictionary when you first check a docu- ment. Names: Select the dictionary of names and places. If you select the extensive 'names' dictionary without having previously scanned your document without it, Spell/ST could fail to detect a misspelling. For example, 'bangor' could be recog- nised as valid when you really meant 'banger'. Computerese: Select the dictionary of computerese words. If you select the extensive 'computerese' dictionary without hav- ing previously scanned your document without it, Spell/ST could fail to detect a misspelling. For example, 'pascal' could be recognised as valid when you really meant 'rascal'. There are many technical words which would not appear in a nor- mal dictionary. Spell/ST can be instructed to check words against personal dictionaries supplied by you, besides checking its own. The following four options may be selected if you wish such dict- ionaries to be used. Please note that, to use your own dictionaries, you'll have to construct a new master dictionary from them with the MakeDict program. Reject: Select your dictionary of words to be rejected. [You may often mistype a word as some other valid word, 'my' as 'mu' or 'trial' as 'trail', for example. To avoid changing the standard dictionaries to cope with this situation, you may cons- truct a dictionary containing words which are always to be treated as unrecognised, regardless of whether they appear in another dictionary. A reject dictionary could also be used to cause the rejection of English words which are invalid in American English; this could be used with a user dictionary (see below) containing words which are valid in American, but not English, English. You could construct Spell/ST dictionaries for another language. The only restriction is that its alphabet should be reasonably represented by ASCII.] User 1: Select your first dictionary. User 2: Select your second dictionary. User 3: Select your third dictionary. All: Cause all the above dictionaries to be selected, if they are present. None: Cause all the above dictionaries to be de-selected. [When all dictionaries are de-selected, Spell/ST will recognise no words in your document. 'None' may be used with the 'Alpha- betical order' option (see above) to produce an alphabetical list of all the words in your document.] The following menu item will be of interest to those who wish to maintain Spell/ST dictionaries. [Create: This menu item is used to instruct Spell/ST to prepare to create a dictionary. Fill the file-selector with the name of a '.WRD' file. When you next use 'Scan document', the unrecognised words in the document will be written to the '.WRD' file you specified. That file will be suitable for adding to the master dictionary with MakeDict. It is a good idea to select all Spell/ST's standard dictionaries - and all your own as well - when you use 'Create'. If you selected a 'User 1' dictionary, it will be extended into the '.WRD' file with the unknown words from your document. Words added will be marked to the right with '<', aiding the location and checking of the new words. Please note there is a limit of about 700 on the number of un- known words which may added with 'Create'.] 2 USING MAKEDICT.PRG 2.0 Introduction The MakeDict program is a utility for maintaining a Spell/ST mas- ter dictionary. It reads some or all of the various '.WRD' files supplied with Spell/ST - or amended or provided by you - and makes from them two files, MASTER.DIC and MASTER.IND. The idea is regularly to add to the '.WRD' files and occasionally to apply MakeDict to them. [Those who wish to maintain Spell/ST dictionaries should consult Appendices A and B. It is often easier to maintain your personal dictionaries, not those supplied with Spell/ST. Incidentally, we recommend Tempus 2 for editing large '.WRD' files.] A complete run of MakeDict takes a long time: about three-and-a- quarter minutes from hard disc and about ten-and-a-half minutes from diskette, so it's not something you'll want to do too often. However, it is worthwhile if you regularly use Spell/ST and wish its dictionaries accurately to reflect your needs. You'll have to use MakeDict if you wish Spell/ST to use any of the four personal dictionaries. 2.1 The 'Desk' Menu MakeDict's 'Desk' menu looks something like the following: ---- |Desk| |------------------- | About MakeDict... | ------------------- Our copyright notice and MakeDict's version number are displayed when you click on the 'About MakeDict...' item. 2.2 The 'Dictionaries' Menu The 'Dictionaries' menu looks something like the following: ------------ |Dictionaries| |---------------- | Select... | | Save setup | This item is disabled on first entry... | Make | ...as is this... | Information... | ...and this |----------------| | Quit MakeDict | ---------------- The disabled items will be activated later. Select: When you click on this item, a multiple file-selector is display- ed which you fill with the names of the '.WRD' files to be included in your master dictionary; at least one must be speci- fied and you must supply a path for these. The selector is also used to locate your master dictionary and its index. Note that their path may be different from that of the '.WRD' files. Click on the 'Okay' button when you've made your selection. MakeDict will complain if you've typed incorrect path names or the names of non-existent '.WRD' files. Save setup: Once you have decided which '.WRD' files to use, click on the 'Save setup' item to save your selections for the next time. Make: Click on this item to generate a master dictionary. This can take some time - see Section 2.0. Information: Once the generation of the master dictionary is complete, click on the 'Information' item to obtain the following statistics on the dictionaries you have used: o the time taken to read or write a dictionary o the number of words each dictionary contains o the number of 'variants' (see Appendix B) each dictionary contains o the number of bytes occupied by each dictionary Quit MakeDict: When you're finished, click on this item to return to the desk- top. APPENDIX A What is a Word? Spell/ST usually considers each line of the source file sep- arately. The exception is when a word appears to be hyphenated at the end of a line, in which case that line and the following line are effectively joined and the hyphen removed. Lines and/or words will be ignored when: o a line starts with a full stop o a word is preceded by '\' or '|' (text in round brackets after such a word is also ignored) o a letter is preceded by '$' o the first two characters in a line are '//' or '/*' The first three avoid the text-formatting commands of GCAL, LaTeX, PROFF, TeX and Tidy, whilst the last avoids MVS JCL. Within a line, a word is a sequence of characters satisfying the following conditions: o each word is as long as possible o a word starts with a letter, or a digit followed by a letter, or the digit 1 followed by a digit followed by a letter, and may contain letters and digits o a word may also contain full stops, hyphens, and apostrophes, provided that each such special character is preceded and followed by a letter or digit Note that upper-case letters are effectively converted to lower- case and words truncated to a maximum of 26 characters. One- letter words and the two-character sequence 's at the end of a word are ignored. APPENDIX B Dictionary Format Within a Spell/ST dictionary (with a name ending in '.WRD'), words are ordered first by length (shortest words first), and then alphabetically, one word per line, with no leading spaces. All letters are treated as if they were in lower case. A '.WRD' dictionary may contain comment lines preceded by an exclamation mark. If you use Spell/ST's 'create' facility to extend a '.WRD' file, the comments it contains will not appear in its extension. Spell/ST's master dictionary, contained in MASTER.DIC and cons- tructed by MakeDict from the various '.WRD' files, is a binary file not readable by you. MakeDict issues a warning if the size of a master dictionary grows to within one per cent of the maxi- mum allowed, 290,000 bytes, at present. The master dictionary supplied with Spell/ST has room for about a further 275 eight- letter words, excluding variants. Versions of Spell/ST and MakeDict having larger or smaller dictionary capacities are available on request - see Section 0.5. Alphabetical order is as follows: . - ' abcdefghijklmnopqrstuvwxyz 0123456789 The size of the main dictionary has been reduced by a factor of about 2.5 by attaching affix flags to word stems. A backslash '\' separates the word from the affix flags. Various letters after the backslash represent rules for deriving affixes, as shown in the table which follows these examples: Word\flags Represents stare\abc stare, stares, staring, stared; pose\bcstw pose, posing, posed, unposed, repose, reposing, reposed, dispose, disposing, disposed; divide\abcfhims divide, divides, dividing, divided, divider, division, dividers, divisions, undivided; affect\abchlmqs affect, affects, affecting, affected, affection, affectation, affections, affectations, unaffected. The affix flags, listed below, are ordered so that prefix and suffix flags appear in separate groups arranged in order of freq- uency of occurrence. Flag Action/condition Prefix/suffix a add -s b replace -ee by -eeing else replace -e by -ing else add -ing c replace -y by -ied else replace -e by -ed else add -ed d replace -ic by -ically else replace -y by -ily else add -ly e replace -y by -ies else replace -f by -ves else replace -fe by -ves else add -es f replace -y by -ier else replace -e by -er else add -er g replace -ate by -acy else replace -ant by -ancy else replace -ent by -ency else replace -e by -y else add -y h replace -mit by -mission else replace -ibe by -iption else replace -ume by -umption else replace -de by -sion else replace -e by -ion else add -ion i apply rule f, then add -s j double last letter, add -ed or -ing k replace -y by -iness else add -ness l replace -e by -ation else replace -y by -ication else add -ation m apply rule h, then add -s n replace -ble by -bility else replace -acious by -acity else replace -ous by -ity else replace -e by -ity else add -ity o replace -y by -iable else replace -e by -able else add -able p replace -e by -est else replace -y by -iest else add -est q apply rule l, then add -s r apply rule b, then add -s s add prefix un- apply rules c,j(-ed) only t add prefix re- apply rules a-r,x u add prefix un- apply rules a-r,x v add prefix in- apply rules a-r,x w add prefix dis- apply rules a-r,x x check consistent use of -ise and -ize APPENDIX C Fatal Error Messages Spell/ST "A line is longer than 256 characters." The document being scanned has at least one line containing more than 256 characters. Solution: shorten the offending line(s). "Can't find !" where is MASTER.DIC, MASTER.IND or SPELL.RSC. Solution: read Section 0.2. "Error at address
{message}" If this occurs, please supply us with full details. "The master dictionary is too big." This occurs if MASTER.DIC is larger than 290,000 bytes. Solution: apply MakeDict to fewer and/or smaller '.WRD' files. "There are too many unknown words." Possible solutions: select more dictionaries, use the 'Ignore u/c words' option, or correct some spelling mistakes. "Spell doesn't run at low resolution." Solution: select medium resolution or use a monochrome monitor. MakeDict "A dictionary word is too short." Solution: remove the one-letter word from the offending '.WRD' file. "Can't find !" where is MAKEDICT.RSC or MAKEDICT.SET. Solution: read Section 0.2. "Dictionary not ordered by size around line ." Solution: relocate the word in the offending '.WRD' file. "MakeDict doesn't run at low resolution." Solution: select medium resolution or use a monochrome monitor. "The master dictionary will be too big." Solution: select fewer and/or smaller '.WRD' files. APPENDIX D Background Information SPELL, implemented on mainframes under MVS and VM by Murray Langton, doesn't have to worry about how much memory is used. The dictionaries are held in human-readable form and converted to internal form each time SPELL is used. SPELL, implemented for the Atari ST micro-computer by David Tilley, has been split into two parts: MakeDict takes the human- readable dictionaries and converts them to internal form (includ- ing index construction). Spell/ST can then read the converted dictionaries more quickly. Split infinitive detection has been added. The problems with SPELL on a micro-computer are the size of the dictionaries and indices (some 300 kilobytes) and the time taken to read them, especially from diskette. We are considering ways of alleviating these problems. In the meantime, we offer you a versatile, if rather cumbersome, package. APPENDIX E Known Bugs Spell/ST and MakeDict: window actions are sluggish immediately on exit. They soon recover, however. Spell/ST's 'stack' was increased from 4 to 32 kilobytes to allow for the detection of a very large number of unknown words. This may prove insufficient for some documents when the message 'There are too many unknown words' appears. If you try to use 'Create' to add more than ~700 words, Spell/ST can crash, after which the machine hangs. In medium resolution only, if you click on the 'About Spell/ST' menu item _after_ scanning a document, the resulting form is cor- rupted and Spell/ST exits. This annoying bug is proving difficult to track down but, fortunately, the machine doesn't hang. Please report bugs via e-mail (see Section 0.5). APPENDIX F Improvements The following improvements to Spell/ST come to mind: Reduce size of master dictionary. ) Speed up document scan. ) see Appendix D Save on memory. ) Provide a Spell/ST desk accessory. No point until the above improvements are made. Allow a dot in a MakeDict folder name. Highlight unknown words in context output instead of using ^. Improve treatment of end-of-line hyphens. Provide interactive correction. *** End of document ***