SORT User's Guide -- Preliminary (5/1/81) Page 1 Introduction SORT is a program which accepts an input file of records and produces an output file, containing the records of the input file ordered as specified. Up to sixteen sort keys may be specified, with each key independently being of ASCII or numeric type, and ordered in ascending or descending sequence. Key types may be intermixed and ASCII keys may be of any practical length (65,535 byte limit). The order of the sorted output is insensitive to the position and order of the sort keys within the record; that is, a primary sort key may physically follow a secondary sort key, within the record. Variable length ASCII records (each terminated by a carriage return character), are supported. Variable length records which are too short to accomodate all of the sort keys, are logically extended with zeroes, so that all sort keys are defined. The zeroes are not written to the output file. The size of the input file is limited by the amount of disk space available for the intermediate and output files. The intermediate and output files may reside on disks other than the input file. The size of each intermediate file is a function of the userspace available to SORT and the number and length of the sort keys used in ordering the input file. The number of intermediate files is a function of the number of records in the input file. Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 2 Definitions Some Definitions Consider a telephone directory. The directory has numerous listings, each of which is independent of any other listing. Each listing contains the same type of information, which is ordered the same for all listings. The information in a telephone directory would be stored in a FILE, and each listing in the directory would be a RECORD. The individual pieces of information comprising each record are called FIELDs, and they are of the same length, type, and order for each record within the file. Let's take a closer look at a directory listing. The listing contains a name, an address, and a telephone number. Directories are commonly ordered by last name, then first name; so, the name field is actually two fields. If the directory is to be used for mailings, the mail routing code portion of the address is of interest; so, the address field is also two fields: the exact address, and the mail routing code. In summary, then, we can describe a record in a directory file with this picture: ------------ ----------- --------- --------- --------- | first name | last name | address | routing | phone # | ------------ ----------- --------- --------- --------- 14 bytes 12 bytes 18 bytes 5 bytes 16 bytes A sort KEY is that portion of a field which is used to direct the ordering of records during the sorting process. For instance, in ordering the directory by last and first name, two sort keys would be used. The first key would be the last name field, while the second key would be the first ten bytes of the first name field (usually more than enough to uniquely identify 100 Smith's or Brown's). Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 3 Examples SORT is directed to sort the file of listings with the following language [1] SORT USING phonelist.dat RECORD LENGTH 65; [2] ON ASCENDING KEY (ASCII,15,12); last name [3] ON ASCENDING KEY (ASCII,1,10); first name [4] GIVING phonelist.srt [Example 1] [Text in square brackets is not part of the example, but used in talking about the example. In addition, any text following a semicolon on a SORT command line, is treated as commentary, and does not otherwise influence the SORT program.] The file of directory listings is called phonelist.dat (line 1), and the sorted output file is to be called phonelist.srt (line 4). The primary sort key (line 2) is to be ordered in ascending sequence, is an ASCII string of characters, begins with the 15th byte of the record, and is 12 bytes in length. The secondary sort key (of importance when two primary keys are identical) is also to be ordered in ascending sequence, is also an ASCII string of characters, begins with byte 1 of the record (where bytes are numbered starting at 1), and is 10 bytes in length. Each record is fixed in length, at 65 bytes, as noted on line 1. Examples 1a and 1b, below, are sample listings of a phonelist.dat file and a phonelist.srt file, respectively. In these examples, the primary sort key is contained within the second field of the record, rather than the first field. Note that in ordering 'Alex' and 'Alex K', a space is considered earlier in the 'alphabet' than 'K'. Note also that 'Alexandria P' and 'Alexandria A' are not ordered as you would expect: this is because the 'P' and 'A' are not within the specified 10 byte sort key. (When two records have identical sort keys, the ordering of the records in the output file is undefined.) Alex K Brown 9876 Figaro Ave. 40596 413/999-2298 John Smith 123 Main 92667 555-1234 Fred Smith 100W 200N 81234 805/555-9123 Alex Brown 67 Chestnut 09876 617/555-0154 Alexandria P Brown 4563 S. Gatefield 95123 853-1234 Alexandria A Brown 25 Bristol S. 92664 750-1119 Matthew Crittendon 976 Alcove 23970 M Zibisk 1 Query Ln. 00000 C Apple 12987 Silicon Vly.95196 415/555-4321 [Example 1a] C Apple 12987 Silicon Vly.95196 415/555-4321 Alex Brown 67 Chestnut 09876 617/555-0154 Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 4 Examples Alex K Brown 9876 Figaro Ave. 40596 413/999-2298 Alexandria P Brown 4563 S. Gatefield 95123 853-1234 Alexandria A Brown 25 Bristol S. 92664 750-1119 Matthew Crittendon 976 Alcove 23970 Fred Smith 100W 200N 81234 805/555-9123 John Smith 123 Main 92667 555-1234 M Zibisk 1 Query Ln. 00000 [Example 1b] The same file of directory listings may be used to produce a file, ordered by mail routing code (a chore which can often expedite bulk mailings), by using the following: [1] SORT USING phonelist.dat RECORD LENGTH 65; [2] ON ASCENDING KEY (ASCII,45,5); [3] GIVING maillist.srt [Example 2] Since all of the directory file is ASCII data, it is reasonable to enter it using a line editor, and list it to a hardcopy device, using the LIST command of SDOS. However, if this is done, each record will terminate with a carriage return, and SORT must be told this: [1] SORT USING phonelist.dat RECORD LENGTH VARIABLE; [2] ON ASCENDING KEY (ASCII,15,12); last name [3] ON ASCENDING KEY (ASCII,1,10); first name [4] GIVING phonelist.srt [Example 3] Note that the only difference between Example 1 and Example 3 is that RECORD LENGTH is now declared VARIABLE (line 1), indicating that each record is terminated with a carriage return. The variable record length feature is particularly useful when each record is truly of variable length. Say, for example, that you have a file of English words with brief definitions, which you wish to order in ascending sequence. Since the English word can be found in the first 10 bytes of the record (which is of unknown length), the file can be easily sorted into a glossary arranged in alphabetical order. Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 5 Examples [1] SORT USING words.dat RECORD LENGTH VARIABLE; [2] ON ASCENDING KEY (ASCII,1,10); [3] GIVING glossary.dat [Example 4] SORT is by no means limited to ASCII sort keys-- the keys may also be of NUMERIC type. NUMERIC sort keys are implicitly 6 bytes in length and correspond to numbers placed in a file using the WRITE statement of Software Dynamic's Structured BASIC. Records containing any non-ASCII fields must be fixed in length, as a portion of the data may otherwise be mistaken for a carriage return. NUMERIC sort keys may be either positive or negative, integer or floating point numbers. Given a file of random numbers, the following will order the file in descending sequence [1] SORT USING random.dat RECORD LENGTH 6; [2] ON DESCENDING KEY (NUMERIC,1); [3] GIVING ordered.dat [Example 5] As a more ambitious example, consider a file of bookkeeping journal records. Each record contains a transaction date, an account number, an amount, and a memorandum ------ ----------- -------- ------------ | date | account # | amount | memorandum | ------ ----------- -------- ------------ 6 6 6 30 <--- bytes Suppose that you want to order the file by account, then date within account, then amount within date (with amount being in descending order), and then by memorandum within amount. The following accomplishes this [1] SORT USING journal.dat RECORD LENGTH 48; [2] ON ASCENDING KEY (NUMERIC,7); account [3] ON ASCENDING KEY (NUMERIC,1); date [4] ON DESCENDING KEY (NUMERIC,13); amount [5] ON ASCENDING KEY (ASCII,19,30); memorandum [6] GIVING report.dat [Example 6] Note that while the memorandum field of each record may be considered as logically variable in length, that field must Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 6 Examples actually be fixed in length, since the other fields of the record are numeric. Sometimes it is desirable to retain the logging information which SORT produces (or merely suppress it from listing to the console). Example 7 (line 5), below, a variation on Example 3, illustrates this by directing the logging information to the file phonelist.log (see Appendix B) [1] SORT USING phonelist.dat RECORD LENGTH VARIABLE; [2] ON ASCENDING KEY (ASCII,15,12); last name [3] ON ASCENDING KEY (ASCII,1,10); first name [4] GIVING phonelist.srt; [5] LOGGING phonelist.log [Example 7] When large files are sorted, temporary intermediate working files are written to disk. If calculation (see Appendix D), or experience indicates that the work files will not fit on the default disk (disk:), then another disk may be specified for the use of the work files. Example 8 (line 6), below, illustrates how to specify a working disk. [1] SORT USING phonelist.dat RECORD LENGTH VARIABLE; [2] ON ASCENDING KEY (ASCII,15,12); last name [3] ON ASCENDING KEY (ASCII,1,10); first name [4] GIVING phonelist.srt; [5] LOGGING phonelist.log; [6] WORKING wd0: [Example 8] Note that the WORKING option pertains only to the sort work files, and not to the USING or GIVING files. Furthermore, the work files will be automatically deleted upon normal termination of SORT. In the event that SORT completes abnormally, files of the name SORTWORKnn may exist on the specified (or default) working disk: they may be deleted. In special cases, several programs may be grouped to form an applications package. One program in the group may, for instance, create a data file, such as the file phonelist.dat, below, and chain to SORT to sort the file. Another program in the group might format the sorted file in a particular manner, so that it is more easily used by people. To get SORT to return to a program other than the SDOS command interpreter, the RETURNING option (below), must be used. (Note that if SORT abnormally terminates, return will be effected to the SDOS command interpreter, regardless of the RETURNING option.) Example 9 Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 7 Examples shows how the RETURNING option is specified. If the applications package does not wish the SORT options to be supplied from the console, it may place the options in a file named SORTPARAMETERS, which must reside on the default disk (disk:). If the file SORTPARAMETERS exists, the SORT language is taken from that file, rather than from the console. Example 9 (line 7) lists the contents of an instance of SORTPARAMETERS. [1] SORT USING phonelist.dat RECORD LENGTH VARIABLE; [2] ON ASCENDING KEY (ASCII,15,12); last name [3] ON ASCENDING KEY (ASCII,1,10); first name [4] GIVING phonelist.srt; [5] LOGGING phonelist.log; [6] WORKING wd0:; [7] RETURNING phonelist [Example 9] Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 8 Error Codes Appendix A Error Codes ____ _______^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H Code Meaning 400 Illegal sort parameter 401 Illegal sort direction 402 Missing ON sub-parameter 403 Illegal sort key type 404 Illegal ON sub-parameter 405 More than 16 sort keys 410 Unexpected EOF in sort parameter file 411 ()'s don't match 412 Illegal sort key descriptor 420 At least one sort key is required 421 USING or GIVING file name missing 422 Record length must be fixed if any key type is not ASCII 423 Sort key length error 424 Record lengths conflict 430 Internal sort error: bad key type 431 EOF on sort intermediate files 432 Internal sort error: linkage error 433 Extract phase: wrong length record 1001 Normal EOF in parameter file Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 9 Logging Appendix B Logging Format Below is the contents of the file phonelist.log, created in Example 7 (line 5) SORT Version 1.0h3, Copyright (C) 1980 Industrial Software Sort Parameters: Input file: phonelist.dat Record length variable Maximum record length 200 bytes Keys per record 2 Output file: phonelist.srt Logging to: phonelist.log Working disk: disk: Working space available 14182 bytes extract keys: 0.53 seconds records read 9 sort final file: 0.28 seconds output final file: 0.73 seconds entire sort: 1.95 seconds records written 9 sort time per record 216.6 mSec Note that the 'Working space available' figure will vary according to the size of the userspace. Note also that all timings are only approximate, and they will vary (although, not by much) from instance to instance of the same sorting task. Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 10 Logging Below is the contents of the file phonelist.log, created in Example 8 (line 5) SORT Version 1.0h3, Copyright (C) 1980 Industrial Software Sort Parameters: Input file: phonelist.dat Record length variable Maximum record length 200 bytes Keys per record 2 Output file: phonelist.srt Logging to: phonelist.log Working disk: wd0: Working space available 14166 bytes extract keys: 0.58 seconds records read 9 sort final file: 0.28 seconds output final file: 0.75 seconds entire sort: 1.93 seconds records written 9 sort time per record 214.8 mSec Note that a 'Working disk' has been specified, overriding the default of 'disk:'. Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 11 Logging Below is the contents of the file phonelist.log, created in Example 9 (line 5) SORT Version 1.0h3, Copyright (C) 1980 Industrial Software Sort Parameters: Input file: phonelist.dat Record length variable Maximum record length 200 bytes Keys per record 2 Output file: phonelist.srt Logging to: phonelist.log Working disk: wd0: Returning to: phonelist Working space available 14166 bytes extract keys: 0.51 seconds records read 9 sort final file: 0.28 seconds output final file: 0.76 seconds entire sort: 1.88 seconds records written 9 sort time per record 209.2 mSec Note that a 'Returning to' program has been specified, overriding the standard return to the SDOS command interpreter. Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 12 BNF of SORT Language Appendix C BNF for SORT commands This appendix contains a definition of the commands which SORT can be expected to accept. The information presented here is for completeness only; skipping this section is recommended, unless the reader is of stout heart. The formal notation used here is called BNF, short for Backus-Nauer Form (see "Programming Languages: History and Fundamentals", Jean E. Sammet, Prentice-Hall, 1969). Briefly, the notation is uppercase text denotes a required word lowercase text within < >'s denotes a syntactic unit ::= facilitates definition of a syntactic unit | indicates choice (logical OR) note that a syntactic unit may be defined in terms of itself, in which case, the definition is arbitrarily long juxtaposition denotes concatenation of the two elements The first line, below, is read A is defined as either the word SORT followed by an instance of , or by an instance of , standing alone. The second group of lines is read A is defined as either an instance of followed by the word AND followed by another instance of ; an instance of followed by another instance of ; or an instance of , by itself. Since two of the possible forms of are defined in terms of , an actual may be arbitrarily long, terminating when the third choice (, by itself) is selected. Note that not all syntactic units are defined (, for example); these units have an intuitive definition. Note, also, that while the language definition below, yields an arbitrarily long , the actual accepted by SORT is somewhat constrained. Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 13 BNF of SORT Language ::= SORT | ::= AND | | ::= | | | | | | ::= USING ::= ::= RECORD LENGTH ::= | ::= VARIABLE | VARIABLE ::= MAX | WITH MAX ::= GIVING ::= ::= LOGGING TO | LOGGING ::= ::= WORKING WITH | WORKING ::= ::= RETURNING TO | RETURNING ::= ON KEY ::= ASCENDING | DESCENDING ::= ( , ) ::= ASCII | NUMERIC ::= BINARY ::= , | Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 14 BNF of SORT Language A TOKEN is anything bounded by tab, space, end of line, or semi-colon. Leading tabs and spaces are removed from the front of the input line until the input line is exhausted, or a non- tab/space is encountered. A token may be the null string (indicating the end of the last command line), or it may be a semi-colon (indicating the end of a command line which is not the last command line). A token may also contain SUBTOKENs, in which case the token is enclosed in parentheses, and the subtokens are separated by commas. Note that tokens containing subtokens must occur completely on one command line, and may contain no tabs or spaces. Processing of command lines proceeds as follows: Remove and return the next token from the input line. If the next token is a semi-colon, repeatedly fetch the next line until the next token is not a semi-colon. If the next token is the null string, return an END OF FILE condition. If there are no more input lines following the line containing a semi-colon, then an unexpected END OF FILE error is returned. Copyright (c) 1981 Industrial Software SORT User's Guide -- Preliminary (5/1/81) Page 15 BNF of SORT Language Appendix D Sorting: Method, Timings, and Calculations N.B.: this section is still incomplete Copyright (c) 1981 Industrial Software