| Introduction | 1 |
| :---: | :---: |
| Military Products Division | 2 |
| PROM | 5 |
| PLE ${ }^{\text {TM }}$ | 4 |
| PAL®/HAL® ${ }^{\text {® }}$ Circuits | 5 |
| System Building Blocks/HMSI ${ }^{\text {™ }}$ | 6 |
| FIFO | 7 |
| Memory Support | $E$ |
| Arithmetic Elements and Logic | 9 |
| Multipliers/Dividers | $\bigcirc$ |
| 8-Bit Interface |  |
| Double-Density PLUS ${ }^{\text {™ }}$ Interface |  |
| ECL10KH |  |
| General Information |  |
| Advanced Information |  |
| Package Drawings | 5 |
| Representatives/Distributors | 7 |

## Table of Contents MULTIPLIERS/DIVIDERS

Contents for Section 10 ..... 10-2
Multiplier/Divider Selection Guide ..... 10-2
"Five New Ways to Go Forth and Multiply ..... 10-3
SN54/74S508 8x8 Multiplier/Divider ..... 10-8
SN74S516 16x16 Multiplier/Divider ..... 10-21
SN54/74S556 Flow-Thru ${ }^{\text {Tw }}$ Multiplier Slice ..... 10-37
SN54/74S557 8x8 High Speed Schottky Multipliers ..... 10-50
SN54/74S558 8x8 High Speed Schottky Multipliers ..... 10-50Die Configuration
Multiplier/Divider Selection Guide

## Co-Processor Multiplier/Divider with Accumulator

|  | PART NUMBER | MAX MULTIPLICATION TIME/ <br> MAX DIVISION TIME | PINS |
| :---: | :---: | :---: | :---: |
| 8 Bits | SN74S508 <br> SN54S508 | $0.8 \mu \mathrm{~s} / 2.2 \mu \mathrm{~s}$ | 24 |
| 16 Bits | SN74S516 | $1.5 \mu \mathrm{~s} / 3.5 \mu \mathrm{~s}$ | 24 |

## Cray Multipliers

| DESCRIPTION | PART NUMBER | MAX DELAY | PINS |
| :---: | :---: | :---: | :---: |
| $8 \times 8$ Multiplier (latched) | SN74S557 | $60 \mathrm{~ns}\left(\mathrm{X}_{\mathrm{i}}, \mathrm{Y}_{\mathrm{i}}\right.$, to $\left.\mathrm{S}_{15}\right)$ | 40 |
| $8 \times 8$ Multiplier (latched) | SN54S557 | 60 ns | 40 |
| $8 \times 8$ Multiplier (latched) | SN74S558 | 60 ns | 40 |
| $8 \times 8$ Multiplier (latched) | SN54S557 | 60 ns | 40 |
| $16 \times 16$ Multipliers | SN74S556 | 90 ns | 84 |

# Five New Ways to Go Forth and Multiply 

## Chuck Hastings

## Our Multiplier Population Explosion

Recently it has seemed as if every time you turned around Monolithic Memories was announcing another new multiplier. Want to catch your breath, and find out where each of these fits into the overall scheme of things? Read on.

Actually, there have been five new multipliers in all within the last three years, plus two which had previously been available for several years. In time order of introduction, these are:

| Parts No. | Description ${ }^{\text {A }}$ |
| :---: | :---: |
| 57/67558 | $150-\mathrm{nsec} 8 \times 8$ Flow-Through Cray Multipli |
| 57/67558-1 | 125 -nsec $8 \times 8$ Flow-Through Cray Multiplier ${ }^{\text {B }}$ |
| 54/74S508 | 8-Bit Bus-Oriented Sequential Multiplier/ Divider |
| 54/74S558 | 60-nsec $8 \times 8$ Flow-Through Cray Multiplier |
| 54/74S557 | $60-\mathrm{nsec} 8 \times 8$ Flow-Through Cray Multiplier with Transparent Output Latches |
| 54/74S516 | 16-Bit Bus-Oriented Sequential Multiplier Divider |
| 54/74S556 | 90 -nsec 16x16 Flow-Through Cray Multiplier with Transparent Input and Output Latches |

NOTES: A. Times are worst-case times for commercial-temperature-range parts.
B. Obsolete. 54/74S558 replaces these in both new and existing designs.

You will notice that the above parts fall into two categories: flow-through Cray multipliers, and bus-oriented sequential multiplier/dividers. Although all of these parts get referred to rather casually as "multipliers," there are major differences between the two general types; see Table 1 below.

## The Cray Multipliers

The essential idea of a Cray multiplier, as originally put together by Seymour Cray in the late 1950s with discrete logic at Control Data Corporation, is to wire up an array of full adders in the form of a binary-arithmetic-multiplication pencil-and-paper example. ${ }^{3}$ That is, everywhere that there is a " 1 " or a " 0 " in a longhand binary-multiplication example, the Cray type of multiplication uses a full adder. One may visualize a Cray multiplier functionally as a "diamond," as follows:


Figure 1. Pencil-and-Paper Analogy to Cray-Multiplier Operation

Flow-Through Cray Multiplier

## Bus-Oriented Sequential Multiplier/Divider

| Role in System | Building-block role - as many as 34 parts used in one superminicomputer (NORD-500 from Norsk Data ${ }^{1}$ ). | Co-processor role - one, or occasionally two, parts used in one microcomputer ${ }^{2}$. |
| :---: | :---: | :---: |
| Internal Operation | Static arithmetic-logic network; multiplies without being clocked, ${ }^{3}$ using eight bits of the multiplier at a time. | State machine; requires clocking to operate; contains edgetriggered registers; sequenced by a state counter; multiplies using two bits of the multiplier at a time ${ }^{4}$. |
| External Control | Controlled by several mode-control input signals. | Controlled by sequences of micro-opcodes which come from a microprocessor, a registered PAL, or some other sequentialcontrol device. |
| Package | 40-pin DIP ('S557/8); 84-pin LCC or 88-pin PGA ('S556) | 24-pin DIP. |
| Operations Performed | Can only perform multiplication. | Can perform multiplication, division, and multiplication-withaccumulation. |
| Storage Capabilities | Either no storage capabilities ('558 types), or optional storage for the double-length product only ('557 type), or full product and input storage ('556 type). | Four full-length registers; capable of storing both input operands and the double-length product. |
| Second Sources | 8x8, Multiple-sourced (AMD, Fairchild, Monolithic Memories). | Sole-sourced; only bipolar dividers on the market. |
| Where Used | Initial usage has been in high-end minicomputers, array processors, and signal processors. | Initial usage has been in industrial-control microcomputers, digital modems, military avionics, CRT graphic systems, video games, and cartographic analysis systems. |
| Future Prospects | Potential large market today since these parts are now lowcost and multiple-sourced, and should be used in all new minicomputer designs! | Potential huge world-wide market for enhancement of microprocessor, bit-slice processor, and microcomputer capabilities, and for small-scale signal processing! |

Table 1. A comparison of the two types of Monolithic Memories Multipliers

Our 57/67558, introduced in the mid-1970s, was the original single-chip Cray multiplier. To achieve what was for that time very high performance for a Schottky-TTL-technology part, the internal design of the $57 / 67558$ also exploited other speed-freak multiplication techniques such as Booth multiplication ${ }^{4}$ and Wallace-Tree addition ${ }^{5}$. All of these techniques achieve increased speed through extensive parallelism, and can be used at the system level as well as within LSI components. Subsequently, process improvements made it possible to offer a faster final-test option, the 57/67550-1, which attained a sales-volume level essentially equal to that of the original part.
About five years ago, AMD paid us the sincere compliment of second-sourcing these parts with the $75-n s e c ~ 25 S 558$. Three years ago, we returned the compliment with the 60-nsec 54/74S558. All of these ' 558 parts, and the 70-nsec 54/74F558 announced by Fairchild, are fully compatible drop-in equivalents except for the variations in logic delay.


When AMD introduced the 25S558, they introduced along with it the 80-nsec 25S557, a "metal option" of the same basic design with "transparent" output latches to hold the double-length product. "Transparent" means that the latches go away when you don't want them there; a latch-control line like that of the 54/74S373 controls whether these output latches store information, or simply behave as output buffers. Anyway, when we introduced our $54 / 74$ S558, we followed it within a few weeks with the 60-nsec 54/74S557, which is a much faster drop-in replacement for AMD's part. And subsequently, Fairchild has announced a 70-nsec 54/74F557.

Because AMD's 'S557 has the output latches implemented in TTL technology after the ECL-to-TTL converters, whereas our 'S557 has them implemented in ECL technology before the conversion, the latches operate much faster in ours. Our 'S557 is typically only about a nanosecond slower than our 'S558, whereas the logic-delay difference between AMD's two parts is considerably greater. Consequently, our margin of superiority over AMD for the 'S557 is even greater than for the ' S 558.
More recently, we introduced the $90-\mathrm{nsec}$ 'S556, which is a $16 \times 16$ direct size-upgrade of the 'S557/8 architecture, with the addition of input latches. In a "pipelined" mode, an 'S556 can produce a new 32 -bit product every 75 nsec.
'S557/8 Cray multipliers come in a 40-pin dual-inline package, either ceramic or plastic. Worst-case power-supply current is 280 mA . The 'S556 comes in your choice of an 84 -pin LCC (Leadless Chip Carrier) or an 88-pin PGA (Pin-Grid Array) package. Worst-case power-supply current is $800 \mathrm{~mA}(900 \mathrm{~mA}$ over military temperature range). The data-bus outputs can sink up to $8 \mathrm{~mA} \mathrm{I}_{\mathrm{OL}}$, for all of these multipliers.

References 5 and 6 discuss technical approaches to using Cray multipliers in high-performance minicomputers. The 'S558, together with PROMs organized in a "Wallace-tree" configuration, can sail right along at the rate of four $56 \times 56$ multiplications every microsecond, on the basis of fixed-point arithmetic with no renormalization. (See table 7 on page 16 of reference 5; the multiplication time is 238 nsec for a "division step," which is a fixed-point multiplication, and 319 nsec for a floating-point multiplication where extra time is required for renormalization and correction of the exponent of the product.) 34 'S558s or 'S557s are required to perform this multiplication if the computer system architecture does not call for the computation of the least-significant half of the double-length product; 49 are required if it does.


The "local" architecture of the multiplier section of a digital system can take two rather different forms. A minicomputers, which executes an unpredictable mixture of arithmetic and logical instructions one after the other, typically needs to be able to get the complete multiplication over and done with before going on to the next program step-which is probably not another multiplication. An array processor or digital correlator, however, tends to do very regular iterative computations; and the performance of such a system can often be greatly increased by a technique called "pipelining,', in which the arithmetic unit consists of stages with registers or latches in between each stage, and partial computational results move from one stage to the next on each clock.

The "flow-through" architecture of the 'S558 works equally well in synchronous or asynchronous pipelined systems, but registers or latches must be provided externally. The 'S557, however, is actually a superset of the 'S558, and the added internal-output-latch feature adapts it particularly well to pipelined systems. The'S556 provides latches atboth ends.


Even a smaller-scale system can make effective use of these parts. To return to the case of $56 \times 56$ multiplication, which corresponds to the word-length needed for multiplying mantissas in several popular floating-point-number formats, an iterative clocked scheme using just seven $8 \times 8$ multipliers, some adders, and an accumulator register can form the entire 112-bit doublelength product in just seven multiply/add cycles. A number of mid-range minicomputers today multiply in this manner. The multipliers are configured as suggested by the following block diagram:


Figure 2. 8x56 Cray Multiplier In Diamond Representation
There is even an occasional 8-bit or 16 -bit microprocessorbased system with a need for very fast multiplication, where 'S557/8s or 'S556s may get used as microprocessor peripherals ${ }^{7,8}$. Digital-video systems, in particular electronic games, with "vector graphic" capabilities are one example.

The world of 'S556/7/8 applications has turned out to include all sizes of minicomputers, digital video systems, and signal processors - FFT (Fast Fourier Transform) processors, voice recognition equipment, radar systems, digital correlators and filters, electronic seismographs, brain and body scanners, and so forth. And there are many unexpected off-beat applications, such as real-time data-rescaling circuits in instruments, altogether too numerous to list here. After all, an'S556 can multiply two 16-bit numbers together and output their entire 32-bit product in 90 nsec worst case... less time than it would take a speeding bullet to move the distance equal to the thickness of this piece of paper. How's that for Supermultiplier?

## The Multiplier/Dividers

The Monolithic Memories 'S516 and 'S508 are state-of-theart TTL-compatible intelligent peripherals for microprocessors, somewhere between arithmetic sequential circuits and specialized bipolar microprocessors. The 'S516 and 'S508 each can perform any of 28 different multiply and multiply-and-accumulate instructions, plus any of 13 different divide instructions, at bipolar speeds under the control of an internal state counter. (See Figure 2 of the 'S516 data sheet.) The state counter's sequence is in turn guided by 3-bit instruction codes which are external inputs to the ' $\mathrm{S} 516 / 508$. The ' S 516 computes with 16-bit binary numbers, and the ' S 508 computes with 8 -bit binary numbers, as the part numbers none-too-subtly imply.

A 16-bit bi-directional data bus connects the S 516 with the outside world for bringing in multipliers, multiplicands, dividends, and divisors; and returning products, quotients and remainders. It also has clock (CK) and run/wait (GO) inputs, and an overflow indication (OVR) output.

The 'S508 has all of the above inputs and outputs also, except that it has only an 8-bit bidirectional data bus. Since it comes in the same 24-pin package as the 'S516, it obviously has eight more pins available for other purposes. Four of these are used to bring out the internal-state-counter value; one each is used for a completion (DONE) status output, an output-enable control (OE) input, and a masterreset ( $\overline{\mathrm{MR}})$ control input; and one is not used at all.

A simple, general interfacing scheme can be used to team a 'S516 with any of the currently popular 16-bit microprocessors,or an 'S508 with any 8-bit microprocessor. (See Figure 7 of the' S 516 data sheet.) With a couple extra interface circuits, an'S516 can also be interfaced to an 8-bit microprocessor.Particularly if the system software is written in a highly-structured language such as PASCAL or FORTH, an'S516/508 can be retrofitted into an existing system with a large gain in performance and very little impact on either hardware or software - calls to the previous software-implemented one-step-at-a-time multiply and divide subroutines are simply rerouted to substitute a command from the microprocessor to the 'S516/508 to accept an operand and start its operation sequence.

The 'S516 and 'S508 are in fact two different "metal options' of one basic design; the 'S516 has twice as many data bits in each internal register. The 'S516 and 'S508 both have a worst-case clock rate of 6 MHz (commercial) or 5 MHz (military); the typical rate is 8 MHz . The simplest complete twos-complement $16 \times 16$ multiplication instruction can be performed in nine clock cycles by an 'S516, or in five by an 'S508, since 2-bits-at-a-time Booth multiplication is used;' 4 thus, the worst-case time required by the ' S 516 to multiply in this mode is $1.5 \mu \mathrm{sec}$ for a commercial part, and for an 'S508 it is 833 nsec. On the same basis, 32/16 division can be done in 21 clock cycles, or $3.5 \mu \mathrm{sec}$ worst-case, by an 'S516; and 16/8 division can be done in 13 clock cycles, or $2.2 \mu \mathrm{sec}$ worst-case, by an 'S508.

An 'S516/508 can perform either positive or negative multiplication or multiply-accumulation, and many of the instructions provide for "chaining" of successive computations to eliminate extra operand transfers on the bus; these features further enhance the computational speed of the 'S516/508 in particular applications. Arithmetic can be either integer or fractional with respect to positioning of the results.

An 'S516 can powerfully enhance the capabilities of any present-day 16 -bit or 8 -bit microprocessor in a computebound application. In fact, it can be used in any digital system where there is a need to multiply and divide on a bus. An 'S508 can likewise enhance the capabilities of any 8 -bit microprocessor.


The ' S 516 comes in an industry-standard 600-mil 24-pin dual-inline package, modified to include an integral aluminum heatsink which does not add appreciably to the package height. It requires only +5 V and ground power connections, and draws a worst-case power-supply current of 450 mA (commercial) or 500 mA (military). Power consumption is greatest at cold temperatures, and decreases substantially as operating temperature increases. The 16 databus inputs require at most 0.25 mA input current; the other inputs require at most 1 mA . The 16 databus outputs can sink up to 8 mA lol. The 'S508 also fits the above description, except that its worst-case power-supply current is 380 mA (commercial) or 400 mA (military), and it has only 8 databus inputs and outputs.

In describing applications of these parts, it is difficult to know where to start - they can be used in almost any design where a microprocessor can be used, and you know how many places that is today. So, perhaps a good starting point is to see what uses customers have thought up all by themselves. One customer even used two 'S516s in "pingpong" mode on a single 16-bit bus! So, rather than merely speculating as to what these parts might be good for, here's a list of what Monolithic Memories's customers have already proven they are good for:

- Real-time control of heavy machinery ${ }^{9}$
- Low-cost, high-performance digital modems
- CRT graphics, including video games
- Military avionics
- Cartographic analysis

As it happens, the above are ' S 516 applications, except that digital modem designs have been done with both the 'S516 and the 'S508. Several of the 'S516 designs are already in production. In each of these applications, the microprocessor could have coped all right with the computational complexity, albeit at its own less-than-tremendous speed, but a 'S516 used together with the microprocessor can provide extra muscle for handling formidable problems.


Competition? Well, since there are no second sources for the 'S516, and no competitor at present has a similar fast part capable of performing division as well as multiplication, right now the 'S516 has no direct competition. Indirectly, there are some competing parts which perform only multiplication, and would have to perform division by Newton-Raphson iteration to be usable for any application where division is required. However, the ' S 516 is (as far as we know) by far the lowest-
priced bipolar 16-bit multiplier, and the other microprocessor peripheral chips which can perform division as well as multiplication are relatively-slow MOS devices. In one case, an 8 -bit cascadable CMOS part requires a $50 \%$ reduction in clock rate to do 16 -bit arithmetic. And considerable numer-ical-analysis and programming sophistication are required to implement Newton-Raphson division with fixed-point operands. (It's easier with floating-point operands.) In contrast, the 'S516/508 can be easily interfaced to almost any microprocessor using one or two PALs, ${ }^{(4)}$ and can perform either multiplication or division on command?

The 'S516 is so much faster than the competing MOS chips that it can even take them on for floating-point computations (which some of them are designed to do) and win. A conference paper ${ }^{10}$ describes the design of an 'S516-based S-100-bus card capable of beating an Intel 8087 2:1 on floating-point arithmetic.

Some competing parts, in particular the AMI 2811 and Nippon Electric $\mu$ PD7720, include an on-board ROM which must be mask-programmed at the factory, which makes life difficult for small companies (or even larger ones) which are trying to get a microprocessor-based product to market quickly. Also, some competing parts require sequencing by external TTL jellybeans.

And, as for using AMD/TRW 64-pin 16x16 Cray multiplier chips as microprocessor peripherals, these cost much more than the 'S516, occupy about three times the circuit-board space, multiply faster, don't divide at all except by NewtonRaphson iteration, and also require one or two "overhead" microprocessor instructions to interface for a given arithmetic operation. From a system viewpoint, when this overhead time is reckoned with, these chips provide little actual gain in multiply performance over the 'S516 at lots of extra cost, and an actual loss in divide performance: the "S516 is much more cost-effective overall.
'S516s potentially fit into many, many places in commercial, industrial, and military electronics, particularly into small-scale real-time systems. The part is fast enough to enhance the performance of a 16-bit Motorola 68000, Zilog Z8000, or Intel 8086, as well as that of any 8-bit microprocessor. It is also fast enough to considerably improve the multiplication and division performance of 16-bit 2901-based "bit-slice" bipolar microcomputers, which are often used as processors in desktop graphics CRT terminals.

It is worth bringing the 'S516 to the attention of any designer who is developing:

- A personal computer or small business computer.
- A word processor, or a more grandiose "office automation system:'
- A cruise missile, or any other "smart weapon."
- A digital modem.
- A small-scale speech-processing system. (These are very multiplication-intensive. We have one magazine article on the 'S516 in such an application. ${ }^{11}$ )
- A smart instrument, which does data conversion.
- An industrial control system, particularly one which must do many coordinate transformations.
- An all-digital studio-quality high-fidelity system.
- A cost-reduced computerized medical scanning system.
- A multiprocessor system for scientific computations. ${ }^{12}$ )

If an 'S516/508 is introduced into a system configured around an older microprocessor as a "co-processor" or
helpmate for the microprocessor, and the application is arithmetic-intensive, the end effect can be a major upgrading of performance at the system level. ${ }^{2.7}$ Consequently, a major reason for designing these parts in is microprocessor life-cycle enhancement. In particular, many MOS microprocessors have single-length and double-length add and subtract instructions: but either they have no multiply or divide instructions at all, or else they perform their multiply and divide instructions so slowly as to jeopardize the ability of the entire system to handle its computing load in real time.
So picture, if you will, the entrepreneur or chief engineer of a firm making a successful microprocessor-based widget which has been on the market for a few months, which uses an older 8-bit microprocessor such as a 6800 or 8085 or $\mathbf{Z 8 0}$. Just when his/her sales are really taking off, here comes a new start-up competitor with a similar system, using a Motorola 68000, with added features and faster performance made possible by the 68000's 16-bit word length and multiply/divide capabilities. The 'S516 can, in this instance, serve as a "great equalizer" - it can be retrofitted into the older system as previously described, and provides even higher-speed multiplication and division than the 68000. (Enough so, actually, that there are designers using the 'S516 with the 68000.) Thus, the ' S 516 can dramatically extend the life cycle of existing microcomputer systems based on microprocessors which either don't have multiplication and division instructions, or perform these operations relatively slowly.

"... THE'S5I6 CAN DRAMATICALLY EXTEND THE LIFE
CYCLE OF EXISTING MICROCOMPUTER SYSTEMS BASED ON MICROPROCESSORS WHICH EITHER DON'T HAVE MULTIPLICATION AND DIVISION INSTRUCTIONS, OR PERFORM THESE OPERATIONS RELATIVELY SLOWLY..."
'S508s are somewhat easier to control from a logic-design viewpoint than 'S516s, purely because they have more control inputs and outputs. However, the shorter 'S508 word length makes the part naturally fit into smaller-scale systems than those which might use an 'S516. Essentially, the 'S508 is optimized for small-scale systems.

Now that you know what these parts are, can't you think of at least half a dozen prime uses for them right in your own back yard?

## References (all available from Monolithic Memories)

1. "Combinatorial Floating Point Processor as an Integral Part of the Computer," Tor Undheim, Electro/80 Professional Program Session Record, Session 14 reprint, paper 14/1.
2. "SN54/74S516 Co-Processor Supercharges 68000 arithmetic," Richard Wm. Blasco, Vincent Coli, Chuck Hastings and Suneel Rajpal, Monolithic Memories Application Note AN-114.
3. "How to Design Superspeed Cray Multipliers with 558s," Chuck Hastings, included within the SN54/74S557/8 data sheet.
4. "Doing Your Own Thing in High-Speed Digital Arithmetic," Chuck Hastings, Monolithic Memories Conference Proceedings reprint $\mathrm{CP}-102$.
5. "Big, Fast, and Simple - Algorithms, Architecture, and Components for High-End Superminis," Ehud Gordon and Chuck Hastings, Monolithic Memories Application Note AN-111.
6. "Fast $64 \times 64$ Multiplication using $16 \times 16$ Flow-Through Multipliers and Wallace Trees," Marvin Fox, Chuck Hastings and Suneel Rajpal, Monolithic Memories Conference Proceedings reprint CP-111.
7. "An $8 \times 8$ Multiplier and 8 -bit $\mu$ p Perform $16 \times 16$ Bit Multiplication," Shai Mor, EDN, November 5, 1979. Monolithic Memories Article Reprint AR-109.
8. "Using a 16×16 Cray Multiplier as a 16-Bit Microprocessor Peripheral to Perform 32-Bit Multiplication and Division," Chuck Hastings, Monolithic Memories Conference Proceedings reprint CP-140
9. "The Design and Application of a High-Speed Multiply/ Divide Board for the STD Bus," Michael Linse, Gary Oliver, Kirk Bailey, and Michael Alan Baxter, Monolithic Memories Application Note AN-115.
10. "Minimum Chip-Count Number Cruncher Uses Bipolar Co-Processor," C. Hastings, E. Gordon, and R. Blasco. Monolithic Memories Conference Proceedings reprint CP-109.
11. "Medium-speed Multipliers Trim Cost, Shrink Band-width in Speed Transmission," Shlomo Waser and Allen Peterson, Electronic Design, February 1, 1979; pages 58-65. Monolithic Memories Article Reprint AR-107.
12. "A Synchronous Multi-Microprocessor System for Implementing Digital Signal Processing Algorithms," T.P. Barnwell, III and C.J.M. Hodges, Southcon/82 Professional Program Session Record, Session 21 reprint, paper 21/4.

## 8x8 Multiplier/Divider SN54/74S508

## Features/Benefits

- Co-processor for enhancing the arithmetic speed of all present 8-blt microprocessors
- Bus-oriented organization
- 24-pin package
- $8 / 8$ or $16 / 8$ division in less than $2.2 \mu \mathrm{sec}$
- $8 \times 8$ multiplication in less than $.8 \mu \mathrm{sec}$
- 28 different multiplication instructions such as "fractional multiply and accumulate"
- 13 different divide instructions
- Self-contained and microprogrammable


## Description

The SN54/74S508 ('S508) is a bus-organized $8 \times 8$ Multiplier/ Divider. The device provides both multiplication and division of 2 s -complement 8 -bit numbers at high speed. There are 28 different multiply options, including: positive and negative multiply, positive and negative accumulation, multiplication by a constant, and both single-length and double-length addition in conjunction with multiplication. 13 different divide options allow single-length or double-length division, division of a previouslygenerated result, division by a constant, and continued division of a remainder or quotient.

The ' S 508 is a time-sequenced device requiring a single clock. It loads operands from, and presents results to, a bidirectional 8bit bus. Loading of the operands, reading of the results, and sequential control of the device is performed by a 3-bit instruction field.

The ' S 508 has the additional feature that operands and results can be either integers or fractions; when it deals with fractions, automatic scaling occurs. Results can be rounded if required, and an Overflow output indicates whenever a result is outside the normally-accepted number range.

For a simple multiplication of two operands and reading of the double-length result, the device takes five clock periods - one for initialization, and four for the actual multiplication. A typical clock period is 125 ns , which gives a multiplication time of 500 ns typical for $8 \times 8$ multiplication, plus 125 ns additionally for initialization, or 625 ns in all. More complex multiplications will take additional clock periods for loading the additional oper-;; ands. A simple division operation requires $8+4=12$ clock periods for a typical time of $1.5 \mu \mathrm{~s}$ ( 16 bits $/ 8$ bits), also plus 125 ns for initialization, or $1.625 \mu \mathrm{~s}$ in all.

Ordering Information

| PART NUMBER | PACKAGE | TEMPERATURE |
| :---: | :---: | :---: |
| SN54S508 | D24 | Military |
| SN74S508 | D24 | Commercial |

## Logic Symbol



## Pin Configuration


$10-8$


## NOTES:

1. $X, Y$ are input multiplier and multiplicand.
2. $X 1$ is the previous contents of the first rank of the $X$ register, (either the old $X$ or a new X).
3. Fractional or integer arithmetic is specified by having the next-to-the-last operand loaded using a 5 or 6 instruction respectively. All rows beginning with " $5 / 6$ " in effect represent two instructions. 5 does fractional arithmetic and 6 does integer arithmetic.
4. $\mathrm{Z}, \mathrm{W}$ is a double-precision number. Z is the most significant half. $\mathrm{Z}, \mathrm{W}$ represents addend upon input, and product (or accumulated sum) after multiplication.
5. $\mathrm{K}_{\mathrm{Z}}, \mathrm{K}_{\mathrm{w}}$ represents previous accumulator contents. $\mathrm{K}_{\mathrm{Z}}$ is the most-significant half.
6. $\mathrm{W}_{\text {sign }}$ is a single-length signed number, with sign extension.
7. Maximum clock cycle $=167 \mathrm{~ns}$ for al $6-\mathrm{MHz}$ clock.
8. If $n$ instruction codes are shown at the left under "instruction sequences," the number of clock cycles at the right is $n+4$ for multiplication and $n+12$ for division.
9. The code " $5 / 6666$ " represents an incomplete operation since it leaves the 'S508 in state 1 rather than in state 0,8 , or 10

|  | SUMMARY OF SIGNALS/PINS |
| :--- | :--- |
| $\mathrm{B}_{7}-\mathrm{B}_{0}$ | Bidirectional data bus inputs/outputs |
| $\mathrm{I}_{2}-\mathrm{I}_{0}$ | Instruction (sequential control) input |
| $\mathrm{A}, \mathrm{B}, \mathrm{C}, \mathrm{D}$ | Internal-state-counter outputs |
| CK | Clock pulse input |
| $\overline{\mathrm{GO}}$ | Chip activation input |
| OE | Output enable input |
| $\overline{\mathrm{MR}}$ | Master reset input |
| OVR | Arithmetic overflow output |
| $\overline{\mathrm{DONE}}$ | Arithmetic-operation completion output |

## Description (continued)

The ' S 508 device uses standard low-power Schottky technology, requires a single +5 V power supply, and is fully TTL compatible. Bus inputs require at most $250 \mu \mathrm{~A}$ input current, and control and clock inputs require at most 1 mA input current. Bus outputs are three-state, and are capable of sinking 8 mA at the low logic level. The ' S 508 is available in both commercial-temperature and military-temperature ranges, in a 600 -mil 24 -pin dual-in-line ceramic package.

## Device Operation

The 'S508 contains four 8-bit working registers. $Y$ is the multiplier register; X is the multiplicand and divisor register; W is the least-significant half of a double-length accumulator, and holds the least-significant half of the product after a multiplication operation, or the remainder after a division operation; and $Z$ is the most-significant half of this same accumulator. In addition to these registers, there is a high-speed arithmetic unit which performs addition, subtraction, and shifting steps in order to accomplish the various arithmetic operations; a loading sequencer; and a PLA control network.

Operands are loaded into the working registers in time sequence at each clock period, under the control of this sequencer. The chip-activation signal $\overline{\mathrm{GO}}$ must be LOW in order to begin the loading process and continue to the next step in the loading operation. If $\overline{\mathrm{GO}}$ is continually held HIGH, the 'S508 remains in a wait state with its outputs held in their high-impedance states, so that the other devices attached to the bus may drive it. In this condition, the ' S 508 does not respond to any codes on its instruction inputs; in effect, it does not "wake up" until $\overline{\mathrm{GO}}$ goes LOW. Also, $\overline{\mathrm{GO}}$ may change only when the clock input CK is HIGH. After all of the operands are loaded, the 'S508 jumps to the multiply routine, or to the divide routine, and performs the required operations as indicated in Figure 1. After 5 clock periods for a simple multiply or 13 clock periods for a simple divide, for example, the device is ready to place the result on the bus in time sequence.

Figure 1. 'S508 Instruction Set (Partial List)


KEY:
The numbers inside the circles indicate the state of the 'S508 multiplier/divider. These states are represented by a four-bit state counter, where A is the least-significant bit of this state counter and $D$ is the most-significant bit. These four bits are available externally on the ' S 508 .
The next state of the ' S 508 is a function of the present state and the instruction lines. For example if the ' S 508 is at state 0 and the instruction is $0,1,2$, or 3 , then the next state is state 4 (multiply instruction); if the instruction is 4 , the next state is state 5 (divide instruction); and so forth. The instructions which take the 'S508
from one state to another are indicated by the numbers written next to the state-transition path lines. "0123," for instance, implies that any of instructions 0, 1, 2, or 3 will take the 'S508 along the path marked "0123."
" X " next to a path implies that the path will be followed regardless of the value of the instruction inputs at that time. In other words, for the purpose of state transitions, X means "don't care." There are cases, however, where the particular instruction used may affect when the contents of the registers are available on the bus - see Figures 9 and 10 for contrasting examples of how this effect operates.

Figure 2. Transition Diagram for the 'S508 Multiplier/Divider

Three instruction inputs $I_{2}, I_{1}, l_{0}$, which may change only when the clock input CK is HIGH, select the required function and drive the sequencer from state to state. Thus, the action of the multiplier/divider at any clock period is a function of the machine state and the state of the control inputs. Figure 2 shows the multiply/divide state table, and all possible operations. After a Read or Round operation, the machine is driven back to state 0 , and a new sequence of arithmetic operations is assumed. If a chain operation is being performed, such as accumulation of products, state 0 is bypassed, and loading of an operand or jumping to the next arithmetic operation occurs at the end of the
previous arithmetic operation - at state 8 for a multiplication instruction, or at state 10 for a division instruction.

Register X is a dual-rank register, which allows the loading of an operand $X$ during the multiplication or division process. If the machine enters the loading sequence and a new $X$ operand has not been loaded, then the machine proceeds with the previouslyloaded X , denoted in this text as " X 1 ." This loading-whileprocessing capability allows a cycle to be saved during "chained" calculations, and also allows multiplication and division by a constant. (See Figure 13).
(continued next page)

Figures 3 and 4 show the codes and durations for the 41 different possible arithmetic operations. These operations can be concatenated in strings to perform complicated 2 s -com-
plement arithmetic operations at high-speed. Rounding and reading of results can be performed after any operation.
Figure 5 is a block diagram of the 'S508 $8 \times 8$ Multiplier/Divider.
(continued page after next)

| OPERATION |  | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| X1 - Y | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{aligned} & 0 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |
| -X1 - Y | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{aligned} & 1 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |
| $X 1 \cdot Y+K_{Z}, K_{W}$ | $\begin{gathered} \hline \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{aligned} & 2 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |
| $-X 1 \cdot Y+K_{Z}, K_{W}$ | INS CODE BUS | $\begin{aligned} & 3 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |
| X - Y | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $5 / 6$ X | O | MULTIPLY |  |  |  |  |  |
| -X - Y | INS CODE BUS | 5/6 <br> X | $Y$ | MULTIPLY |  |  |  |  |  |
| $X \cdot Y+K_{Z}, K_{W}$ | INS CODE BUS | 5/6 <br> X | $Y$ | MULTIPLY |  |  |  |  |  |
| $-X \cdot Y+K_{Z}, K_{W}$ | $\begin{gathered} \hline \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{gathered} \hline 5 / 6 \\ \times \\ \hline \end{gathered}$ | $\begin{aligned} & 3 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |
| $X \cdot Y+Z$ | INS CODE BUS | $\begin{gathered} 5 / 6 \\ X \end{gathered}$ | Y Z | $\begin{aligned} & 0 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |
| -X P Y + Z | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{gathered} 5 / 6 \\ \mathrm{X} \\ \hline \end{gathered}$ | $\begin{aligned} & 6 \\ & z \end{aligned}$ | $\begin{aligned} & 1 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |
| $X \cdot Y+K_{Z} \cdot 2^{-8}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{gathered} 5 / 6 \\ \mathrm{X} \\ \hline \end{gathered}$ | 6 | $\begin{aligned} & 2 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |
| $-X \cdot Y+K_{z} \cdot 2^{-8}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{gathered} 5 / 6 \\ X \end{gathered}$ | 6 - | $\begin{aligned} & 3 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |
| $X \cdot Y+Z, W$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \\ \hline \end{gathered}$ | $\begin{gathered} 5 / 6 \\ \mathrm{X} \\ \hline \end{gathered}$ | $\begin{aligned} & 6 \\ & Z \end{aligned}$ | $\begin{aligned} & 6 \\ & W \end{aligned}$ | $\begin{aligned} & 0 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |
| -X $\cdot \mathrm{Y}+\mathrm{Z}, \mathrm{W}$ | INS CODE BUS | $\begin{gathered} 5 / 6 \\ \times \\ \hline \end{gathered}$ | $\begin{aligned} & 6 \\ & z \end{aligned}$ | $\begin{gathered} 6 \\ w \\ \hline \end{gathered}$ | $\begin{aligned} & 1 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |
| $X \cdot Y+W_{\text {sign }}$ | INS CODE BUS | $\begin{gathered} 5 / 6 \\ \times \\ \hline \end{gathered}$ | $6$ | $\begin{gathered} 6 \\ W \end{gathered}$ | $Y$ | MULTIPLY |  |  |  |
| $-\mathrm{X} \cdot \mathrm{Y}+\mathrm{W}_{\text {sign }}$ | $\begin{gathered} \hline \text { INS CODE } \\ \text { BUS } \\ \hline \end{gathered}$ | 5/6 <br> X | 6 - | $\begin{aligned} & 6 \\ & \mathrm{w} \end{aligned}$ | $Y$ | MULTIPLY |  |  |  |

NOTES: 1) $X 1$ is the previous contents of the first rank of the $X$ register (either old $X$ or a new $X$ ).
2) $K_{Z} \cdot 2^{-8}$ is a single-length signed number comprising the most-significant half of the previous double-length product and here gets added in at the least-significant end of the new result.
3) $\mathrm{W}_{\text {sign }}$ is a single-length signed number, with sign-extension as needed.
4) Fractional or integer arithmetic is specified by having the next-to-last operand loaded using a 5 or 6 instruction respectively. All rows beginning with " $5 / 6$ " in effect represent two instructions. 5 does fractional arithmetic and 6 does integer arithmetic.

Figure 3. Multiplication Codes and Times for $\mathbf{8 \times 8}$ Multiplication in the 'S508

TIME-SLOT

| OPERATION |  | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathrm{K}_{\mathrm{Z}}, \mathrm{K}_{\mathrm{W}} / \mathrm{X}_{1}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | 4 | DIVIDE |  |  |  |  |  |  |  |  |  |  | 1 |  |  |  |
| $\mathrm{K}_{\mathrm{W}} / \mathrm{X}$ | INS CODE BUS | $\begin{array}{\|c\|} \hline 5 / 6 \\ X \\ \hline \end{array}$ | 4 <br> - | DIVIDE |  |  |  |  |  |  |  |  |  |  | 1 |  |  |
| $\mathrm{K}_{\mathrm{Z}} / \mathrm{X}$ | INS CODE BUS | $\begin{gathered} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{gathered}$ | 5 - | DIVIDE |  |  |  |  |  |  |  |  |  |  | 1 |  |  |
| Z, W/X | INS CODE BUS | $\begin{gathered} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{gathered}$ | $\begin{aligned} & 6 \\ & Z \end{aligned}$ | $\begin{gathered} 4 \\ w \end{gathered}$ | DIVIDE |  |  |  |  |  |  |  |  |  |  | 1 |  |
| Z/X | INS CODE BUS | $\begin{gathered} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{gathered}$ | $\begin{aligned} & 6 \\ & z \end{aligned}$ | $5$ | DIVIDE |  |  |  |  |  |  |  |  |  |  | 1 |  |
| W/X | INS CODE BUS | $\begin{array}{\|c} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{array}$ | $6$ | $\begin{gathered} 6 \\ w \end{gathered}$ | $4$ | DIVIDE |  |  |  |  |  |  |  |  |  |  | 1 |
| $\mathrm{W}_{\text {sign }} / \mathrm{X}$ | INS CODE BUS | $\begin{array}{\|c} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{array}$ | 6 0 | 6 $W$ | 5 <br> - | DIVIDE |  |  |  |  |  |  |  |  |  |  | 1 |

NOTES: 1) $X 1$ is the previous contents of the first rank of the $X$ register (either old $X$ or a new $X$ ).
2) Fractional division divides a 16 -bit $2 s$-complement number in 1 clock period less than integer division.
3) $W_{\text {sign }}$ is a single-length signed number, with sign-extension as needed.
4) Division operation $W_{\text {sign }} / X$ requires that the $Z$ register be initialized with all-zero contents at the time $Z$ is loaded.
5) Fractional or integer arithmetic is specified by having the operand loaded using a 5 or 6 instruction respectively. All rows beginning with " $5 / 6$ " in effect represent two instructions, one of which does fractional arithmetic and one of which does integer arithmetic.

Figure 4. Division Codes and Time for 16/8 Division in 'S508


Figure 5. Internal Architecture of the 'S508

## Multiplication

The ' S 508 provides 2 s -complement 8-bit multiplication, and can also accumulate previously-generated double-length products. No time penalty is incurred for accumulation, since the machine accumulates while the multiplication operation is proceeding. In addition to accumulation, the device can add into a product either a single-length or a double-length number. It can also use a previously-loaded operand as a constant, so that constant multiplication and accumulation is possible.

One key feature is the ability to perform both positive multiplications and negative multiplications, again without any speed penalty. This feature allows complex-arithmetic multiplications to be programmed with very little overhead. Another important feature is the ability to work with either fractions or integers.

## Division

The 'S508 also provides a range of division operations. A double-length number in $Z, W$ is divided by $X$; the result $Q$ is stored in $\mathbf{Z}$, and the remainder R in W . Again all numbers are in the 2 s -complement number representation, with the most significant bit of an operand (whether single-length or doublelength) having a negative weight. In order to facilitate repeated division, with the multiple-length quotient always keeping the same sign, the remainder is always the same sign as the dividend. Fractional or integer operation is possible, and division and multiplication operations can be concatenated. For example, the operations $(A x B) / C,(A+B) / C$ can easily be performed. The dividend can be any previously-generated result - product, quotient, or remainder; or it may be a double-length or singlelength signed operand.

## Reading Results

The result of an arithmetic operation, or of a string of operations, can be read onto the 8 -bit bus if the machine is at the end of an operation or at the start of a new sequence. The read operation requires that the $\overline{G O}$ signal be held LOW so that the information is read out onto the bidirectional bus, when code 7 is specified. (See Figure 6.) Since there is a doublelength accumulator $Z, W$, reading can take two cycles. First, register $Z$ is read. After another clock has been received, if code 7 is still present, the least-significant half of the product from the W register is placed on the bus, or likewise the remainder if a division operation had been performed.

If the ' S 508 is instructed to perform a read operation during the loading sequence, then the sequence is broken and the machine is forced back to state 0 ready to start the sequence again. Continual read operations at state 0 just swap the contents of register $Z$ and $W$.

The ' S 508 has a direct master reset input $\overline{M R}$. Alternatively, initialization of the 'S508 can also easily be performed by continually presenting instruction code 7 , which after a maximum of 13 clock periods forces the machine back to state 0 .

## Integer and Fractional Arithmetic

The 'S508 can work with either fractional or integer number representations. When working with integers, all numbers are scaled from the least-significant end and the least-significant bit is assumed to have a weight of $2^{0}$. For integer multiplication, accumulation, and division, all numbers are scaled from this least-significant weight, and results are correct if interpreted in this manner. The double-length register $\mathrm{Z}, \mathrm{W}$ can therefore hold numbers in the range $-2^{15}$ to $+2^{15}-1$; the operands $X$ and $Y$, and single-length results, are in the range $-2^{7}$ to $+2^{7}-1$.

When working with fractions, the machine automatically performs scaling so that input operands and results have a consistent format. All numbers in the fractional representation are scaled from the most significant end, which has a weight of $-2^{0}$ (negative). The binary point is one place to the right of this mostsignificant bit, so that the next bit has a weight of $2^{-1}$. The double-length register $Z, W$ therefore holds numbers in the range -1 to $+1-2^{-15}$ and the operands $X$ and $Y$ and single-length results are in the range -1 to $+1-2^{7}$. Since automatic scaling occurs, the product of two numbers always has the leastsignificant bit as a 0 , unless an accumulation is performed with the least-significant bit being a 1 .

During a chain operation with the partial results not being read onto the bus, the 'S508 will stay in either the fractional or integer mode. At the start of a sequence of operations, fractional or integer operation is designated by loading operands using instruction code 5 or instruction code 6 respectively.

Mixed fractional and integer arithmetic is also possible, by redefining the weight of the least-significant or most-significant bits. However, care must be exercised, due to the automatic scaling feature, when fractional arithmetic is programmed.

## Rounding

Rounding can be performed on the result of a multiplication or division. Generally rounding would only be called out during fractional operation, but nothing in the 'S508 precludes forming a rounded result during integer arithmetic.

Rounding for multiplication provides the best single-length most-significant half of the product. Rounding occurs at the end of a multiplication, and is performed instead of a Load or Read operation when a code 5 is specified, instead of a code 7, to get from state 8 or state 10 back to state 0 . (See Figure 2; also, note that this mode of operation precludes "stealing" a cycle according to the method illustrated in Figure 9.) The ' 5508 looks at the most-significant bit of the least-significant half of the product $W_{7}$, and adds 1 to the most-significant half of the product at the least-significant end if $W_{7}$ is a 1. After the operation, the ' S 508 is in state 0 , so that the rounded product can be read, and the $W$ register is clear.

Rounding for division is performed by forcing the leastsignificant bit of the quotient in $Z$ to a 1 unless the division is exact (remainder is zero). This method of rounding causes a slightly higher variance in the result than having an additional iterative division operation, but is considerably easier to perform. Again, after rounding the ' S 508 goes to state 0 , so that a read operation can be performed, and the W register is clear.

## Overflow

The ' S 508 has an overflow output OVR which is cleared prior to each operation, and is set during an operation if the product or quotient goes outside the normally-accepted range.

For multiplication, overflow can only occur if the most negative number in the operand range is used: $(-1) \times(-1)=+1$, which cannot be held in the ' S 508 's internal registers. Overflow can more easily occur during either positive or negative accumulation of products. For fractional arithmetic, if the product or accumulation goes outside the range of -1 to $+1-2^{-15}$, then the overflow flipflop will be set.

Overflow may also occur during division if the quotient goes outside the generally-accepted number range of -1 to $+1-2^{-7}$ during fractional operation. This would occur if the divisor is less than the dividend, or equal to the dividend if a positive quotient is being generated. For integer arithmetic the numbers must be scaled by $2^{7}$.


Figure 6. 'S508 Internal Circuitry of "GO" Line and Three-State-Enable
During the states $0,1,2,3,8,10$ and 11 if the "GO" line ( $\overline{\mathrm{GO}}$ ) is held at logic HIGH then the machine will be in a wait state until $\overline{\mathrm{GO}}$ goes to logic LOW.


Figure 7. Interfacing the 'S508 to an 8-bit Microprocessor

Figure 7 shows the block diagram of a minimum 8-bit microprocessor system with its arithmetic capabilities enhanced by the use of a ' $\mathrm{S} 5088 \times 8$ multiplier/divider. The relatively small number of instruction lines (only 3) of the ' S 508 provides a unique way to control the multiplier/divider. As may be seen from Figure 7, these three instruction lines are assigned to the three leastsignificant bits (LSBs) of the address bus, while the remaining
address bits are decoded by a Programmable Array Logic (PAL®) circuit to determine when the multiplier/divider is selected. For example, suppose the ' $\mathbf{S 5 0 8}$ is assigned address 100; then any address in the range of 100-107 will enable the 'S508 (i.e., the $\overline{G O}$ line is LOW). Thus, if the address is 100 the ' S 508 instruction is 0 ; if the address is 106 the ' 5508 instruction is 6 ; and so forth.
Absolute Maximum Ratings
Supply voltage $\mathrm{V}_{\mathrm{CC}}$ ..... 7.0 V
Input voltage ..... 7.0 V
Off-state output voltage ..... 5.5 V
Storage temperature $-65^{\circ}$ to $+150^{\circ} \mathrm{C}$

## Operating Conditions

| SYMBOL | PARAMETER | FIGURE | MILITARY |  |  | COMMERCIAL |  |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  | MIN | TYP | MAX | MIN | TYP | MAX |  |
| $\mathrm{V}_{\mathrm{CC}}$ | Supply voltage |  | 4.5 | 5 | 5.5 | 4.75 | 5 | 5.25 | V |
| $\mathrm{T}_{\text {A }}$ | Operating free-air temperature |  | -55 |  | $125 \dagger$ | 0 |  | 75 | ${ }^{\circ} \mathrm{C}$ |
| ${ }_{\text {f MAX }}$ | Clock frequency | 8 | 5 |  |  | 6 |  |  | MHz |
| ${ }^{\text {t CWP }}$ | Positive clock pulse width | 8 | 90 |  |  | 70 |  |  | ns |
| ${ }^{\text {t }}$ CWN | Negative clock pulse width | 8 | 60 |  |  | 50 |  |  | ns |
| ${ }^{\text {t }}$ BS | Bus setup time for inputting data * | 8 | 60 |  |  | 50 |  |  | ns |
| ${ }^{\text {t }} \mathrm{BH}$ | Bus hold time for inputting data * | 8 | 45 |  |  | 35 |  |  | ns |
| tinss | Instruction, $\overline{\mathrm{GO}}$ setup time | 8 | 10 |  |  | 10 |  |  | ns |
| ${ }^{\text {I }}$ NSSH | Instruction, $\overline{\mathrm{GO}}$ hold time | 8 | 20 |  |  | 20 |  |  | ns |

* During operations when the bus is being used to input data.
$\dagger$ :Case temperature.


## Electrical Characteristics Over Operating Conditions

| SYMBOL | PARAMETER | TEST CONDITIONS |  | MIN | TYP | MAX | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathrm{V}_{\text {IL }}$ | Low-level input voltage |  |  |  |  | 0.8 | V |
| $\mathrm{V}_{\mathrm{IH}}$ | High-level input voltage |  |  | 2 |  |  | V |
| $\mathrm{V}_{\text {IC }}$ | Input clamp voltage | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MIN} \quad \mathrm{I}_{1}=-18 \mathrm{~mA}$ |  |  |  | -1.5 | V |
| IL | Low-level input current | $V_{C C}=M A X \quad V_{1}=0.5 \mathrm{~V}$ | $\mathrm{B}_{7}-\mathrm{B}_{0}$ |  |  | -250 | $\mu \mathrm{A}$ |
|  |  |  | All other inputs |  |  | -1 | mA |
| $\mathrm{IIH}^{\text {H }}$ | High-level input current | $V_{C C}=\operatorname{MAX} \quad V_{1}=2.4 \mathrm{~V}$ |  |  |  | 250 | $\mu \mathrm{A}$ |
| $I_{1}$ | Maximum input current | $\mathrm{V}_{\text {CC }}=\mathrm{MAX} \quad \mathrm{V}_{1}=5.5 \mathrm{~V}$ |  |  |  | 1 | mA |
| $\mathrm{V}_{\mathrm{OL}}$ | Low-level output voltage | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MIN} \mathrm{I}_{\mathrm{OL}}=8 \mathrm{~mA}$ |  |  | 0.3 | 0.5 | V |
| $\mathrm{V}_{\mathrm{OH}}$ | High-level output voltage | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MIN} \mathrm{IOH}^{\prime}=-2 \mathrm{~mA}$ |  | 2.4 |  |  | V |
| ${ }^{\text {I OS }}$ | Output short-circuit current* | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MAX} \quad \mathrm{V}_{\mathrm{O}}=0 \mathrm{~V}$ |  | -10 |  | -90 | mA |
| ${ }^{1} \mathrm{CC}$ | Supply current | $V_{C C}=M A X$ | SN54S508 |  | 300 | 400 | mA |
|  |  |  | SN74S508 |  | 300 | 380 |  |

* Not more than one output should be shorted at a time, and the duration of the short-circuit should not exceed one second.


## Switching Characteristics <br> Over Operating Conditions

| SYMBOL | PARAMETER |  | FIGURE | MILITARY |  |  | COMMERCIAL |  |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  | MIN | TYP | MAX | MIN | TYP | MAX |  |
| $\mathrm{t}_{\mathrm{BO}}$ | Bus output delay for outputting data* |  |  | 8 |  | 70 | 120 |  | 70 | 95 | ns |
| ${ }^{\text {t PXZ }}$ | Output disable delay | From $\mathrm{I}_{2} \mathrm{I}_{0}$ to bus |  |  | 40 | 70 |  | 40 | 65 | ns |
|  |  | From OE, $\overline{\mathrm{GO}}$ to bus |  |  | 20 | 50 |  | 20 | 40 |  |
| ${ }^{\text {t P }}$ PX | Output enable delay | From $\mathrm{I}_{2}{ }^{-1} 0$ to bus |  |  | 45 | 90 |  | 45 | 80 | ns |
|  |  | From OE, $\overline{\mathrm{GO}}$ to bus |  |  | 25 | 55 |  | 25 | 45 |  |
| tovR | Overflow output delay from CK |  | 8 |  | 70 | 120 |  | 70 | 95 | ns |
| ${ }^{\text {t }} \mathrm{DN}$ | $\overline{\text { DONE output delay }}$ |  | 8 |  | 30 | 90 |  | 30 | 70 | ns |

* During operations when the bus is being used to output data.


## AC Test Conditions

Inputs $0 \mathrm{~V}_{\text {LOW }}, 3 \mathrm{~V}_{\text {HIGH }}$. Rise and fall time $1-3 \mathrm{~ns}$ from 1 V to 2 V . Measurements are made from 1.5 V IN to $1.5 \mathrm{~V}_{\mathrm{OUT}}$, except that tPXZ is measured by a delta in the outputs of 0.5 V from $\mathrm{V}_{\mathrm{OL}}$ or $\mathrm{V}_{\mathrm{OH}}$ respectively.

## Timing

Timing waveforms are shown in Figure 8. Specific instruction timing examples are shown in Figures 9 through 13.

## Test Load



* The "TEST POINT" is driven by the output under test, and observed by instrumentation.


NOTE: $\overline{\mathrm{GO}}$ and $\mathrm{I}_{2}-\mathrm{I}_{0}$ can change only when CK is high.

Figure 8. Timing Diagram of the 'S508


NOTES: Register $Z$ is read at the same time that the "DONE" signal is set. If the instruction remains at code 7 after time-slot 7 , the contents of registers $Z$ and $W$ are swapped each cycle.
\# "Any code" means code 0 through 7 . However code 6 will load a new value of $X$, and code 7 will cause the ' $S 508$ to attempt to drive the data bus.

Figure 9. Instruction Timing Example No. 1: Load X, Load Y, Multiply, Read W. By Presenting Code 7 on the Instruction Lines During the Last Multiply Cycle (State 8), the Results May Be Read During Time Slots 6 and 7


NOTES: The instruction lines may be changed only when CK is high.
\#"Any code" means code 0 through code 7.
Code 6 may be used here since a new X explicitly gets loaded for the next multiply operation. However, code 7 will cause the ' S 508 to attempt to drive the data bus.

Figure 10. Instruction Timing Example No. 2: Repeat: "Load X, Load Y, Multiply, Read Z, Read W"


NOTE: If code 7 is given (instead of code 0 through 6), the first data that is read from the bus after the $\overline{\text { DONE }}$ signal is set (time-slot 7 ) is $W$ and not $Z$. However, $Z$ is read at time-slot 8.
\#"Any code" means code 0 through code 7.
Figure 11. Instruction Timing Example No. 3: Load X, Load Y, Multiply, Read Z, Read W. This timing diagram corresponds to Table 1. Only after the DONE signal is set (after four clock pulses of the operation cycles), the result is read - $\mathbf{Z}$ during time-slot 7, and W during time-slot 8


NOTE:
"Any code" means code 0 through code 7. Code 6 or code 7 may be used here. Since $\overline{\mathrm{GO}}$ is HIGH, no new X can be loaded and the ' S 508 can not attempt to drive the bus.

Figure 12. Instruction Timing Example No. 4: Load X, Load Y, Multiply, Wait, Read Z, Read W


NOTES: This sequence of operations is suitable for use when reading is to be done only at the very end of the operation sequence. The new $X$ value is loaded during the time that the previous multiplication is being performed. See Programming Example \#3 for N

$$
\sum_{i=1}^{N} x_{i} \cdot Y_{i}
$$

\#"Any code" means code 0 through code 7.
$\dagger$ Code 6 allows loading of a new $X$ value in state 12 and it takes the ' S 508 to state 8 . In state $8, \mathrm{Y}$ is loaded via instruction 2 and the multiply-accumulate operation is initiated.

Figure 13. Instruction Timing Example No. 5: Sum of Products

## Programming Examples

In the following examples assume that each line with a separate instruction corresponds to one clock pulse. Instruction codes are $0,1,2,3,4,5,6,7$ and $x$ according to the usage explained in the key to Figure 2.
Programming Example 1
Calculating $X \cdot Y(A \cdot B)$
INST 6
INST 0
INST $X$
INST $X$
MULT
INST $X$
INST 7
MULT
INST 7

Programming Example 2
Calculating $\mathrm{X} 1 \cdot \mathrm{Y}(\mathrm{A} \cdot \mathrm{C})$
X 1 is a previous multiplier value. It was previously loaded (in example 1) with A.
INST $0 \quad Y-C$
INST X MULT
INST X MULT
INST X MULT
INST 7 MULT and READ $Z=8 \mathrm{MSB}$ OF $(A \cdot C)$
INST 7 READ $W=8$ LSB OF $(A \cdot C)$

## Programming Example 3

Calculating $\sum_{i=1}^{N} X_{i} \cdot Y_{i}(A \cdot B+C \cdot D+E \cdot F+\ldots)$
In this case we read only after N multiplications. A new $\mathrm{X}_{\mathrm{i}}+1$ is loaded during the multiplication process for $X_{i} Y_{i}$.
Assume $\mathrm{N}=3$.
The sequence of instructions and operations for calculating

$$
\sum_{i=1}^{3} X_{i} \cdot Y_{i} \text { is: }(A \cdot B+C \cdot D+E \cdot F)
$$

$N=1 \quad\left\{\begin{array}{l}\text { INST } 6 \\ \text { INST } 0 \\ Y-A \\ \text { INST } X \\ \text { MULT } \\ \text { INST } X \\ \text { MULT } \\ \text { INST } X\end{array}\right.$ MULT $\}$ INST $A$ Perform $A$


## Programming Example 4

Multiplication plus a constant ( $\mathrm{A} \cdot \mathrm{B}+$ Constant (16 bits))
Assume that the constant is a 16 -bit 2 s -complement number.
INST $6 \quad \mathrm{X}-\mathrm{A}$
INST $6 \quad \mathrm{Z}-\mathrm{C}$ LOAD 8 MSB of constant
INST 6 W-D LOAD 8 LSB of constant
INST $0 \quad Y-B$
INST $X \quad$ MULT
INST $X \quad$ MULT $\}$ Perform $A \cdot B+(Z, W)$
INST $X$ MULT
INST 7 MULT and READ $Z=8$ MSB of (A•B $+(C, D)$ )
INST 7 READ $W=8$ LSB of ( $A \cdot B+C, D$ )

## Programming Example 5

Dividing a 16 -bit number by an 8 -bit number ( $(B, C) / A)$

| INST 6 | $X-A$ |
| :---: | :---: |
| INST 6 | $Z-B$ |
| INST 4 | W-C |
| INST $x$ ( |  |
| INST X |  |
| INST X |  |
| INST $X$ |  |
| INST X |  |
| INST $X$ | Perform Division $\underline{(Z, W)}$ |
| INST $X$ | X |
| INST X |  |
| INST X |  |
| INST X |  |
| INST X ) | (B, C) |
| INST 7 | DIVIDE and READ the quotient $Z=\frac{(B, C)}{A}$ |
| INST 7 | READ the remainder $W$ of $\frac{(B, C)}{}$ |
| INST 7 | READ the remainder $W$ of $A$ |

## 16x16 Multiplier/Divider SN74S516

## Features/Benefits

- Co-processor for enhancing the arithmetic speed of all present 16-bit and 8-bit microprocessors
- Bus-oriented organization
- 24-pin package
- 16/16 or 32/16 division in less than $3.5 \mu \mathrm{sec}$
- 16x16 multiplication in less than $1.5 \mu \mathrm{sec}$
- 28 different multiplication instructions such as "fractional multiply and accumulate"
- 13 different divide instructions
- Self-contained and microprogrammable


## Description

The SN74S516 ('S516) is a bus-organized $16 \times 16$ Multiplier/ Divider. The device provides both multiplication and division of 2 s-complement 16 -bit numbers at high speed. There are 28 different multiply options, including: positive and negative multiply, positive and negative accumulation, multiplication by a constant, and both single-length and double-length addition in conjunction with multiplication. 13 different divide options allow single-length or double-length division, division of a previouslygenerated result, division by a constant, and continued division of a remainder or quotient.
The ' S 516 is a time-sequenced device requiring a single clock. It loads operands from, and presents results to, a bidirectional 16 -bit bus. Loading of the operands, reading of the results, and sequential control of the device is performed by a 3-bit instruction field.

The 'S516 has the additional feature that operands and results can be either integers or fractions; when it deals with fractions, automatic scaling occurs. Results can be rounded if required, and an Overflow output indicates whenever a result is outside the normally-accepted number range.

For a simple multiplication of two operands the device takes nine clock periods - one for initialization, and eight for the actual multiplication. A realistic clock period is 167 ns , which gives a multiplication time of 1333 ns typical for $16 \times 16$ multiplication, plus 167 ns additionally for initialization, or 1500 ns in all. More complex multiplications will take additional clock periods for loading the additional operands. A simple division operation requires $16+4=20$ clock periods for a typical time of 3.333 ns ( 32 bits/16 bits), also plus 167 ns for initialization, or 3500 ns in all.

## Ordering Information

| PART NUMBER | PACKAGE | TEMPERATURE |
| :---: | :---: | :---: |
| SN74S516 | 24 T | Commercial |

## Logic Symbol



## Pin Configuration



| INSTRUCTION SEQUENCE | OPERATION | CLOCK CYCLES |
| :---: | :---: | :---: |
| ARITHMETIC OPERATIONS |  |  |
|  | $\mathrm{X} 1 \cdot \mathrm{Y}$ | 9 |
|  | -X1-Y | 9 |
|  | $X_{1} \cdot Y+K_{z}, K_{W}$ | 9 |
|  | $-\mathrm{X} 1 \cdot \mathrm{Y}+\mathrm{K}_{\mathrm{Z}}, K_{W}$ | 9 |
|  | $\mathrm{K}_{\mathrm{Z}}, \mathrm{K}_{\mathrm{W}} / \mathrm{X}_{1}$ | 21 |
| 5/6 | $X \cdot Y$ | 10 |
| 5/6 | -X - Y | 10 |
| $5 / 6$ | $X \cdot Y+K_{Z}, K_{W}$ | 10 |
| 5/6 | $-X \cdot Y+K_{Z}, K_{W}$ | 10 |
| 5/6 | $\mathrm{K}_{\mathrm{w}} / \mathrm{X}$ | 22 |
| 5/6 | $\mathrm{K}_{\mathrm{z}} / \mathrm{X}$ | 22 |
| 5/6 6 | $X \cdot Y+Z$ | 11 |
| 5/6 6 | $-X \cdot Y+Z$ | 11 |
| 5/6 6 | $X \cdot Y+K_{Z} \cdot 2^{-16}$ | 11 |
| $5 / 6 \quad 6$ | $-X \cdot Y+K_{Z} \cdot 2^{-16}$ | 11 |
| $5 / 6 \quad 6$ | Z, W/X | 23 |
| 5/6 6 | $Z / X$ | 23 |
| $5 / 6 \quad 6 \quad 6$ | $X \cdot Y+Z, W$ | 12 |
| $5 / 6 \quad 6 \quad 6$ | $-X \cdot Y+Z, W$ | 12 |
| $5 / 6 \quad 6 \quad 6$ | $X \cdot Y+W_{\text {sign }}$ | 12 |
| $5 / 6 \quad 6 \quad 6$ | $-X \cdot Y+W_{\text {sign }}$ | 12 |
| $5 / 6 \quad 6 \quad 6$ | W/X | 24 |
| $\begin{array}{llll}5 / 6 & 6 & 6\end{array}$ | $\mathrm{W}_{\text {sign }} / \mathrm{X}$ | 24 |
| 5/6 66 | (See Note 9 below.) | - |
| $5 / 6 \quad 6 \quad 6$ | Load X, Load Z, Load W, Clear Z | 4 |
| 5/6 6 | Load X, Load Z, Read Z | 3 |
| READING OPERATIONS |  |  |
|  | Read Z | 1 |
| 7 | Read Z, W | 2 |
| 77 | Read Z, W, Z | 3 |
| $\begin{array}{ll}7 & 7\end{array}$ | Read Z, W, Z, W | 4 |
| 5 | Round, then Read $\mathbf{Z}$ | 2 |
| 57 | Round, then Read Z, W | 3 |

NOTES:

1. $X, Y$ are input multiplier and multiplicand.
2. X 1 is the previous contents of the first rank of the $X$ register (either the old $X$ or a new X ).
3. Fractional or integer arithmetic is specified by having the next-to-the-last operand loaded using a 5 or 6 instruction respectively. All rows beginning with " $5 / 6$ " in effect represent two instructions. 5 does fractional arithmetic and 6 does integer arithmetic.
4. $\mathbf{Z}, \mathrm{W}$ is a double-precision number. Z is the most significant half. $\mathrm{Z}, \mathrm{W}$ represents addend upon input, and product (or accumulated sum) after multiplication.
5. $K_{Z}, K_{W}$ represents previous accumulator contents. $K_{Z}$ is the most-significant half.
6. $\mathrm{W}_{\text {sign }}$ is a single-length signed number, with sign extension.
7. Maximum clock cycle $=167 \mathrm{~ns}$ for an $6-\mathrm{MHz}$ clock.
8. If n instruction codes are shown at the left under "instruction sequences," the number of clock cycles at the right is $\mathrm{n}+8$ for multiplication and $\mathrm{n}+20$ for division.
9. The code " $5 / 6666$ " represents an incomplete operation since it leaves the 'S516 in state 1 rather than in state 0,8 , or 10.

Figure 1. 'S516 Instruction Set (Partial List)

| SUMMARY OF SIGNALS/PINS |  |
| :--- | :--- |
| $\mathrm{B}_{15}-\mathrm{B}_{0}$ | Bidirectional data bus inputs/outputs |
| $\mathrm{I}_{2}-\mathrm{I}_{0}$ | Instruction (sequential control) input |
| CK | Clock pulse input |
| $\overline{\mathrm{GO}}$ | Chip activation input |
| OVR | Arithmetic overflow output |

## Description (continued)

The 'S516 device uses standard low-power Schottky technology, requires a single +5 V power supply, and is fully TTL compatible. Bus inputs require at most $250 \mu \mathrm{~A}$ input current, and control and clock inputs require at most 1 mA input current. Bus outputs are three-state, and are capable of sinking 8 mA at the low logic level. The 'S516 is available in both commercial-temperature and military-temperature ranges, in a 600-mil 24-pin dual-in-line ceramic package.

## Device Operation

The 'S516 contains four 16-bit working registers. $Y$ is the multiplier register; X is the multiplicand and divisor register; W is the least-significant half of a double-length accumulator, and holds the least-significant half of the product after a multiplication operation, or the remainder after a division operation; and $Z$ is the most-significant half of this same accumulator. In addition to these registers, there is a high-speed arithmetic unit which performs addition, subtraction, and shifting steps in order to accomplish the various arithmetic operations; a loading sequencer; and a PLA control network.

Operands are loaded into the working registers in time sequence at each clock period, under the control of this sequencer. The chip-activation signal $\overline{\mathrm{GO}}$ must be LOW in order to begin the loading process and continue to the next step in the loading operation. If $\overline{\mathrm{GO}}$ is continually held HIGH , the ' S 516 remains in a wait state with its outputs held in their high-impedance states, so that the other devices attached to the bus may drive it. In this condition, the ' S 516 does not respond to any codes on its instruction inputs; in effect, it does not "wake up" until GO goes LOW. Also, $\overline{\mathrm{GO}}$ may change only when the clock input CK is HIGH. After all of the operands are loaded, the ' S 516 jumps to the multiply routine, or to the divide routine, and performs the required operations as indicated in Figure 1. After 9 clock periods for a simple multiply or 21 clock periods for a simple divide, for example, the result is placed on the bus in time sequence.


KEY:
The numbers inside the circles indicate the state of the 'S516 multiplier/divider. These states are represented by a four-bit state counter, where A is the least-significant bit of this state counter and D is the most-significant bit. (These four bits are not available externally on the 'S516.)

The next state of the ' S 516 is a function of the present state and the instruction lines. For example if the ' S 516 is at state 0 and the instruction is $0,1,2$, or 3 , then the next state is state 4 (multiply instruction); if the instruction is 4 , the next state is state 5 (divide instruction); and so forth. The instructions which take the 'S516
from one state to another are indicated by the numbers written next to the state-transition path lines. "0123," for instance, implies that any of instructions $0,1,2$, or 3 will take the ' S 516 along the path marked "0123."
" $X$ " next to a path implies that the path will be followed regardless of the value of the instruction inputs at that time. In other words, for the purpose of state transitions, X means "don't care." There are cases, however, where the particular instruction used may affect when the contents of the registers are available on the bus - see Figures 9 and 10 for contrasting examples of how this effect operates.

Figure 2. Transition Diagram for the 'S516 Multiplier/Divider

Three instruction inputs $\mathrm{I}_{2}, \mathrm{I}_{1}, \mathrm{I}_{0}$, which may change only when the clock input CK is HIGH, select the required function and drive the sequencer from state to state. Thus, the action of the multiplier/divider at any clock period is a function of the machine state and the state of the control inputs. Figure 2 shows the multiply/divide state table, and all possible operations. After a Read or Round operation, the machine is driven back to state 0 , and a new sequence of arithmetic operations is assumed. If a chain operation is being performed, such as accumulation of products, state 0 is bypassed, and loading of an operand or jumping to the next arithmetic operation occurs at the end of the
previous arithmetic operation - at state 8 for a multiplication instruction, or at state 10 for a division instruction.

Register X is a dual-rank register, which allows the loading of an operand $X$ during the multiplication or division process. If the machine enters the loading sequence and a new $X$ operand has not been loaded, then the machine proceeds with the previouslyloaded X , denoted in this text as " X 1 ." This loading-whileprocessing capability allows a cycle to be saved during "chained" calculations, and also allows multiplication and division by a constant. (See Figure 13).

Figures 3 and 4 show the codes and durations for the 41 different possible arithmetic operations. These operations can be concatenated in strings to perform complicated 2 s -com-
plement arithmetic operations at high-speed. Rounding and reading of results can be performed after any operation.
Figure 5 is a block diagram of the 'S516 16×16 Multiplier/Divider.
(continued page after next)

TIME-SLOT

| OPERATION |  | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| X1 - Y | $\begin{gathered} \hline \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{aligned} & \hline 0 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |  |  |
| -X1 - Y | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{aligned} & 1 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |  |  |
| $X 1 \cdot Y+K_{Z}, K_{W}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{aligned} & 2 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |  |  |
| $-\mathrm{X} 1 \cdot \mathrm{Y}+\mathrm{K}_{\mathrm{Z}}, \mathrm{K}_{\mathrm{W}}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \\ \hline \end{gathered}$ | $3$ | MULTIPLY |  |  |  |  |  |  |  |  |  |  |
| X $\cdot \mathrm{Y}$ | INS CODE BUS | $\begin{array}{\|c\|} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{array}$ | $\begin{aligned} & 0 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |  |
| -X $\cdot \mathrm{Y}$ | $\begin{array}{\|c\|} \hline \text { INS CODE } \\ \text { BUS } \\ \hline \end{array}$ | $\begin{gathered} 5 / 6 \\ \mathrm{X} \\ \hline \end{gathered}$ | $\begin{aligned} & 1 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |  |
| $X \cdot Y+K_{Z}, K_{W}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \\ \hline \end{gathered}$ | $\begin{array}{\|c\|} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{array}$ | $\begin{aligned} & 2 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |  |
| $-X \cdot Y+K_{Z}, K_{W}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{gathered} \hline 5 / 6 \\ \mathrm{X} \end{gathered}$ | $\begin{aligned} & 3 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |  |
| $X \cdot Y+Z$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \\ \hline \end{gathered}$ | $\begin{gathered} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{gathered}$ | $\begin{aligned} & 6 \\ & Z \end{aligned}$ | $\begin{aligned} & 0 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |
| -X $\cdot \mathrm{Y}+\mathrm{Z}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \\ \hline \end{gathered}$ | $\begin{array}{\|c\|} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{array}$ | $\begin{aligned} & 6 \\ & Z \end{aligned}$ | $\begin{aligned} & 1 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |
| $X \cdot Y+K_{Z} \cdot 2^{-16}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \\ \hline \end{gathered}$ | $\begin{gathered} 5 / 6 \\ \mathrm{X} \\ \hline \end{gathered}$ | $6$ | $\begin{aligned} & 2 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |
| $-X \cdot Y+K_{Z} \cdot 2^{-16}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \end{gathered}$ | $\begin{array}{\|c\|} \hline 5 / 6 \\ x \end{array}$ | $6$ | $\begin{aligned} & 3 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |  |
| $X \cdot Y+Z, W$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \\ \hline \end{gathered}$ | $\begin{array}{\|c\|} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{array}$ | $\begin{aligned} & 6 \\ & \mathrm{z} \end{aligned}$ | $\begin{gathered} 6 \\ W \end{gathered}$ | $\begin{aligned} & 0 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |
| -X $\cdot \mathrm{Y}+\mathrm{Z}, \mathrm{W}$ | $\begin{array}{\|c\|} \hline \text { INS CODE } \\ \text { BUS } \\ \hline \end{array}$ | $\begin{array}{\|c} \hline 5 / 6 \\ \mathrm{X} \\ \hline \end{array}$ | $\begin{aligned} & 6 \\ & z \end{aligned}$ | $\begin{gathered} 6 \\ \mathrm{w} \\ \hline \end{gathered}$ | $\begin{aligned} & 1 \\ & Y \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |
| $X \cdot Y+W_{\text {sign }}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \\ \hline \end{gathered}$ | $\begin{gathered} 5 / 6 \\ \mathrm{X} \\ \hline \end{gathered}$ | $6$ | $\begin{aligned} & 6 \\ & w \\ & \hline \end{aligned}$ | $\begin{aligned} & 2 \\ & Y \\ & \hline \end{aligned}$ | MULTIPLY |  |  |  |  |  |  |  |
| $-X \cdot Y+W_{\text {sign }}$ | $\begin{gathered} \text { INS CODE } \\ \text { BUS } \\ \hline \end{gathered}$ | $\begin{gathered} \hline 5 / 6 \\ x \\ \hline \end{gathered}$ | $6$ | $\begin{gathered} 6 \\ w \\ \hline \end{gathered}$ | $Y$ $Y$ $Y$ | MULTIPLY |  |  |  |  |  |  |  |

NOTES: 1) $X 1$ is the previous contents of the first rank of the $X$ register (either old $X$ or a new $X$ ).
2) $K_{Z} \cdot 2^{-16}$ is a single-length signed number comprising the most-significant half of the previous double-length product and here gets added in at the least-significant end of the new result.
3) $W_{\text {sign }}$ is a single-length signed number, with sign-extension as needed.
4) Fractional or integer arithmetic is specified by having the next-to-the-last operand loaded using a 5 or 6 instruction respectively. All rows beginning with " $5 / 6$ " in effect represent two instructions. 5 does fractional arithmetic and 6 does integer arithmetic.

Figure 3. Multiplication Codes and Times for 16x16 Multiplication in the 'S516

TIME-SLOT


NOTES: 1) $X 1$ is the previous contents of the first rank of the $X$ register (either old $X$ or a new $X$ ).
2) Fractional division divides a 32 -bit $2 s$-complement number in 1 clock period less than integer division.
3) $W_{\text {sign }}$ is a single-length signed number, with sign-extension as needed.
4) Division operation $W_{\text {sign }} / X$ requires that the $Z$ register be initialized with all-zero contents at the time $Z$ is loaded.
5) Fractional or integer arithmetic is specified by having the next-to-the-last operand loaded using a 5 or 6 instruction respectively. All rows beginning with " $5 / 6$ " in effect represent two instructions, one of which does fractional arithmetic and one of which does integer arithmetic.

Figure 4. Division Codes and Times for 32/16 Division in 'S516


Figure 5. Internal Architecture of the 'S516

## Initialization

The 'S516 has no direct master reset input. However, initialization of the 'S516 can easily be performed by continually presenting instruction code 7 , which after a maximum of 21 clock periods forces the machine back to state 0.

## Multiplication

The 'S516 provides 2s-complement 16-bit multiplication, and can also accumulate previously-generated double-length products. No time penalty is incurred for accumulation, since the machine accumulates while the multiplication operation is proceeding. In addition to accumulation, the device can add into a product either a single-length or a double-length number. It can also use a previously-loaded operand as a constant, so that constant multiplication and accumulation is possible.

One key feature is the ability to perform both positive multiplications and negative multiplications, again without any speed penalty. This feature allows complex-arithmetic multiplications to be programmed with very little overhead. Another important feature is the ability to work with either fractions or integers.

## Division

The 'S516 also provides a range of division operations. A double-length number in $\mathrm{Z}, \mathrm{W}$ is divided by X ; the result Q is stored in Z , and the remainder R in W. Again all numbers are in the 2 s -complement number representation, with the most significant bit of an operand (whether single-length or doublelength) having a negative weight. In order to facilitate repeated division, with the multiple-length quotient always keeping the same sign, the remainder is always the same sign as the dividend. Fractional or integer operation is possible, and division and multiplication operations can be concatenated. For example, the operations $(A x B) / C,(A+B) / C$ can easily be performed. The dividend can be any previously-generated result - product, quotient, or remainder; or it may be a double-length or singlelength signed operand.

## Reading Results

The result of an arithmetic operation, or of a string of operations, can be read onto the 16 -bit bus if the machine is at the end of an operation or at the start of a new sequence. The read operation requires that the $\overline{\mathrm{GO}}$ signal be held LOW so that the information is read out onto the bidirectional bus, when code 7 is specified. (See Figure 6.) Since there is a double-length accumulator $\mathrm{Z}, \mathrm{W}$, reading can take two cycles. First, register $\mathbf{Z}$ is read. After another clock has been received, if code 7 is still present, the least-significant half of the product from the $W$ register is placed on the bus, or likewise the remainder if a division operation had been performed.
If the 'S516 is instructed to perform a read operation during the loading sequence, then the sequence is broken and the machine is forced back to state 0 ready to start the sequence again. Control read operations at state 0 just swap the contents of register Z and W .

## Integer and Fractional Arithmetic

The 'S516 can work with either fractional or integer number representations. When working with integers, all numbers are scaled from the least-significant end, and the least-significant bit
is assumed to have a weight of $2^{0}$. For integer multiplication, accumulation, and division, all numbers are scaled from this least-significant weight, and results are correct if interpreted in this manner. The double-length register $Z, W$ can therefore hold numbers in the range $-2^{31}$ to $+2^{31}-1$, the operands $X$ and $Y$, and single-length results, are in the range $-2^{15}$ to $+2^{15}-1$.
When working with fractions, the machine automatically performs scaling so that input operands and results have a consistent format. All numbers in the fractional representation are scaled from the most significant end, which has a weight of $-2^{0}$ (negative). The binary point is one place to the right of this mostsignificant bit, so that the next bit has a weight of $2^{-1}$. The double-length register $Z, W$ therefore holds numbers in the range -1 to $+1-2^{-31}$ and the operands $X$ and $Y$ and single-length results are in the range -1 to $+1-2^{15}$. Since automatic scaling occurs, the product of two numbers always has the leastsignificant bit as a 0 , unless an accumulation is performed with the least-significant bit being a 1 .

During a chain operation with the partial results not being read onto the bus, the 'S516 will stay in either the fractional or integer mode. At the start of a sequence of operations, fractional or integer operation is designated by loading operands using instruction code 5 or instruction code 6 respectively.

Mixed fractional and integer arithmetic is also possible, by redefining the weight of the least-significant or most-significant bits. However, care must be exercised, due to the automatic scaling feature, when fractional arithmetic is programmed.

## Rounding

Rounding can be performed on the result of a multiplication or division. Generally rounding would only be called out during fractional operation, but nothing in the 'S516 precludes forming a rounded result during integer arithmetic.

Rounding for multiplication provides the best single-length most-significant half of the product. Rounding occurs at the end of a multiplication, and is performed instead of a Load or Read operation when a code 5 is specified, instead of a code 7 , to get from state 8 or state 10 back to state 0 . (See Figure 2; also, note that this mode of operation precludes "stealing" a cycle according to the method illustrated in Figure 9.) The ' $\$ 516$ looks at the most-significant bit of the least-significant half of the product $W_{15}$, and adds 1 to the most-significant half of the product at the least-significant end if $\mathrm{W}_{15}$ is a 1 . After the operation, the ' S 516 is in state 0 , so that the rounded product can be read, and the $W$ register is cleared.

Rounding for division is performed by forcing the leastsignificant bit of the quotient in $Z$ to a 1 unless the division is exact (remainder is zero). This method of rounding causes a slightly higher variance in the result than having an additional iterative division operation, but is considerably easier to perform. Again, after rounding the ' S 516 goes to state 0 , so that a read operation can be performed, and the W register is cleared.

## Overflow

The 'S516 has an overflow output OVR which is cleared prior to each operation, and is set during an operation if the product or quotient goes outside the normally-accepted range.

For multiplication, overflow can only occur if the most negative number in the operand range is used: $(-1) \times(-1)=+1$, which cannot be held in the 'S516's internal registers. Overflow can more easily occur during either positive or negative accumulation of products. For fractional arithmetic, if the product or accumulation goes outside the range of -1 to $+1-2^{-31}$, then the overflow flipflop will be set.

The overflow flip-flop is enabled in state 8 for the multiply operation or in state 10 for a divide operation. It only gets reset when a transition to state 0 from states $0,3,8,10$ and 11, when instruction 7 is being presented to the ' S 516 .

Overflow may also occur during division if the quotient goes outside the generally-accepted number range of -1 to $+1-2^{-15}$ during fractional operation. This would occur if the divisor is less than the dividend, or equal to the dividend if a positive quotient is being generated. For integer arithmetic the numbers must be scaled by $2^{15}$.


Figure 6. 'S516 Internal Circuitry of "GO" Line and Three-State-Enable

During the states $0,1,3,8,10$ and 11 if the "GO" line ( $\overline{\mathrm{GO}}$ ) is held at logic HIGH then the machine will be in a wait state until $\overline{\mathrm{GO}}$ goes to logic LOW.


Figure 7. Interfacing the 'S516 to a Microprocessor

Figure 7 shows the block diagram of a microprocessor system with its arithmetic capabilities enhanced by the use of a 'S516 $16 \times 16$ multiplier/divider. The relatively small number of instruction lines (only 3) of the 'S516 provides a unique way to control the multiplier/divider. As may be seen from Figure 7, these three instruction lines are assigned to the three leastsignificant bits (LSBs) of the address bus, while the remaining
address bits are decoded by a Programmable Array Logic (PAL®) circuit to determine when the multiplier/divider is selected. For example, suppose the ' S 516 is assigned address 100; then any address in the range of 100-107 will enable the 'S516 (i.e., the $\overline{\mathrm{GO}}$ line is LOW). Thus, if the address is 100 the ' S 516 instruction is 0 ; if the address is 106 the ' S 516 instruction is 6 ; and so forth.

## Data Formats

## Fractional Multiply

$X_{i}, Y_{1}$ - Input, Multiplicand, Multipler

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{-1}$ | $2^{-2}$ | $2^{-3}$ | $2^{-4}$ | $2^{-5}$ | $2^{-6}$ | $2^{-7}$ | $2^{-8}$ | $2^{-9}$ | $2^{-10}$ | $2^{-11}$ | $2^{-12}$ | $2^{-13}$ | $2^{-14}$ | $2^{-15}$ |

$Z_{i}$ - MS Half Output Product

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{-1}$ | $2^{-2}$ | $2^{-3}$ | $2^{-4}$ | $2^{-5}$ | $2^{-6}$ | $2^{-7}$ | $2^{-8}$ | $2^{-9}$ | $2^{-10}$ | $2^{-11}$ | $2^{-12}$ | $2^{-13}$ | $2^{-14}$ | $2^{-15}$ |

$W_{i}$ - LS Half Output Product*

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $2^{-16}$ | $2^{-17}$ | $2^{-18}$ | $2^{-19}$ | $2^{-20}$ | $2^{-21}$ | $2^{-22}$ | $2^{-23}$ | $2^{-24}$ | $2^{-25}$ | $2^{-26}$ | $2^{-27}$ | $2^{-28}$ | $2^{-29}$ | $2^{-30}$ | "0" |

* The least significant bit of $W_{i}$ is always a binary 0 due to normalization. Note that $-1 x-1$ yields an overflow in fractional multiply.


## Integer Multiply

$X_{i}, Y_{1}$ - Input, Multiplicand, Multiplier

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{14}$ | $2^{13}$ | $2^{12}$ | $2^{11}$ | $2^{10}$ | $2^{9}$ | $2^{8}$ | $2^{7}$ | $2^{6}$ | $2^{5}$ | $2^{4}$ | $2^{3}$ | $2^{2}$ | $2^{1}$ | $2^{0}$ |

## $\mathbf{Z}_{\mathbf{i}}$ - MS Half Output Product

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{30}$ | $2^{29}$ | $2^{28}$ | $2^{27}$ | $2^{26}$ | $2^{25}$ | $2^{24}$ | $2^{23}$ | $2^{22}$ | $2^{21}$ | $2^{20}$ | $2^{19}$ | $2^{18}$ | $2^{17}$ | $2^{16}$ |

$\mathbf{W}_{\mathbf{i}}$ - LS Half Output Product**

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $2^{15}$ | $2^{14}$ | $2^{13}$ | $2^{12}$ | $2^{11}$ | $2^{10}$ | $2^{9}$ | $2^{8}$ | $2^{7}$ | $2^{6}$ | $2^{5}$ | $2^{4}$ | $2^{3}$ | $2^{2}$ | $2^{1}$ | $2^{0}$ |

** The least significant bit of $W_{i}$ is a valid data bit. Note that $2^{-15} \times 2^{-15}$ yields $+2^{30}$ which can be represented in the output bits without overflowing.

## Fractional Divide

## $Z_{i}$ - Input Dividend

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{-1}$ | $2^{-2}$ | $2^{-3}$ | $2^{-4}$ | $2^{-5}$ | $2^{-6}$ | $2^{-7}$ | $2^{-8}$ | $2^{-9}$ | $2^{-10}$ | $2^{-11}$ | $2^{-12}$ | $2^{-13}$ | $2^{-14}$ | $2^{-15}$ |

## X - Input Divisor

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{-1}$ | $2^{-2}$ | $2^{-3}$ | $2^{-4}$ | $2^{-5}$ | $2^{-6}$ | $2^{-7}$ | $2^{-8}$ | $2^{-9}$ | $2^{-10}$ | $2^{-11}$ | $2^{-12}$ | $2^{-13}$ | $2^{-14}$ | $2^{-15}$ |

## $Z_{i}$ - Output Quotient

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{-1}$ | $2^{-2}$ | $2^{-3}$ | $2^{-4}$ | $2^{-5}$ | $2^{-6}$ | $2^{-7}$ | $2^{-8}$ | $2^{-9}$ | $2^{-10}$ | $2^{-11}$ | $2^{-12}$ | $2^{-13}$ | $2^{-14}$ | $2^{-15}$ |

W- Output Partial Remainder $\dagger$

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{-1}$ | $2^{-2}$ | $2^{-3}$ | $2^{-4}$ | $2^{-5}$ | $2^{-6}$ | $2^{-7}$ | $2^{-8}$ | $2^{-9}$ | $2^{-10}$ | $2^{-11}$ | $2^{-12}$ | $2^{-13}$ | $2^{-14}$ | $2^{-15}$ |

+ Note that the partial remainder $\mathrm{R}=2^{-15}(\mathrm{~W})$


## Integer Divide Example (Z, W)/X

## $Z_{i}$ - MSB Input Dividend

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{30}$ | $2^{29}$ | $2^{28}$ | $2^{27}$ | $2^{26}$ | $2^{25}$ | $2^{24}$ | $2^{23}$ | $2^{22}$ | $2^{21}$ | $2^{20}$ | $2^{19}$ | $2^{18}$ | $2^{17}$ | $2^{16}$ |

## $\mathbf{W}_{\mathrm{i}}$ - LSB Input Dividend

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | 0

## X - Input Divisor

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{14}$ | $2^{13}$ | $2^{12}$ | $2^{11}$ | $2^{10}$ | $2^{9}$ | $2^{8}$ | $2^{7}$ | $2^{6}$ | $2^{5}$ | $2^{4}$ | $2^{3}$ | $2^{2}$ | $2^{1}$ | $2^{0}$ |

## $Z_{i}$ - Output Quotient

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Sign | $2^{14}$ | $2^{13}$ | $2^{12}$ | $2^{11}$ | $2^{10}$ | $2^{9}$ | $2^{8}$ | $2^{7}$ | $2^{6}$ | $2^{5}$ | $2^{4}$ | $2^{3}$ | $2^{2}$ | $2^{1}$ | $2^{0}$ |

## $\mathbf{W}_{\mathbf{i}}$ - Output Partial Remainder

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| Sign | $2^{14}$ | $2^{13}$ | $2^{12}$ | $2^{11}$ | $2^{10}$ | $2^{9}$ | $2^{8}$ | $2^{7}$ | $2^{6}$ | $2^{5}$ | $2^{4}$ | $2^{3}$ | $2^{2}$ | $2^{1}$ | $2^{0}$ |

## Absolute Maximum Ratings



Off-state output voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 V
Storage temperature
$-65^{\circ}$ to $+150^{\circ} \mathrm{C}$
Operating Conditions

| SYMBOL | PARAMETERS | FIGURE | MINCOMMERCIAL <br> TYP | MAX |
| :--- | :--- | :--- | :--- | :---: | UNIT

* During operations when the bus is being used to input data.


## Electrical Characteristics Over Operating Conditions

| SYMBOL | PARAMETER | TEST CONDITIONS |  | MIN | TYP | MAX | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathrm{V}_{\text {IL }}$ | Low-level input voltage | . |  |  |  | 0.8 | V |
| $\mathrm{V}_{\mathrm{IH}}$ | High-level input voltage |  |  | 2 |  |  | V |
| $\mathrm{V}_{1 \mathrm{C}}$ | Input clamp voltage | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MIN} \quad \mathrm{I}_{\mathrm{I}}=-18 \mathrm{~mA}$ |  |  |  | -1.5 | V |
| ILL | Low-level input current | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MAX} \quad \mathrm{V}_{\mathrm{I}}=0.5 \mathrm{~V}$ | $\mathrm{B}_{15} \mathrm{~B}^{-\mathrm{B}_{0}}$ |  |  | -250 | $\mu \mathrm{A}$ |
|  |  |  | All other inputs |  |  | -1 | mA |
| $\mathrm{IIH}^{\text {H }}$ | High-level input current | $V_{\text {CC }}=\operatorname{MAX} \quad V_{1}=2.4 \mathrm{~V}$ |  |  |  | 250 | $\mu \mathrm{A}$ |
| 1 | Maximum input current | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MAX} \quad \mathrm{V}_{1}=5.5 \mathrm{~V}$ |  |  |  | 1 | mA |
| $\mathrm{V}_{\mathrm{OL}}$ | Low-level output voltage | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MIN} \mathrm{I}_{\mathrm{OL}}=8 \mathrm{~mA}$ |  |  | 0.3 | 0.5 | V |
| $\mathrm{V}_{\mathrm{OH}}$ | High-level output voltage | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MIN} \mathrm{I}_{\mathrm{OH}}=-2 \mathrm{~mA}$ |  | 2.4 |  |  | V |
| IOS | Output short-circuit current* | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MAX} \mathrm{V}_{\mathrm{O}}=0 \mathrm{~V}$ |  | -10 |  | -90 | mA |
| ${ }^{\text {ICC }}$ | Supply current | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MAX}$ |  |  | 370 | $450 \dagger$ | mA |

* Not more than one output should be shorted at a time, and the duration of the short-circuit should not exceed one second.
$\dagger$ At cold temperatures see the "ICC vs Temperature" curves on the next page for more complete information. The typical values shown here are at 5.0 V .


## Switching Characterictics Over Operating Conditions

| SYMBOL | PARAMETER |  | FIGURE | MINCOMMERCIAL <br> TYP | MAX |
| :--- | :--- | :--- | :--- | :--- | :---: | UNIT

[^0]ICC vs. Temperature


## AC Test Conditions

Inputs $0 \mathrm{~V}_{\text {LOW }}, 3 \mathrm{~V}_{\mathrm{HIGH}}$. Rise and fall time $1-3 \mathrm{~ns}$ from 1 V to 2 V . Measurements are made from $1.5 \mathrm{~V}_{\text {IN }}$ to $1.5 \mathrm{~V}_{\mathrm{OUT}}$, except that $\mathrm{t}_{\mathrm{PXZ}}$ is measured by a delta in the outputs of 0.5 V from $\mathrm{V}_{\mathrm{OL}}$ or $\mathrm{V}_{\mathrm{OH}}$ respectively.

## Timing

Timing waveforms are shown in Figure 8. Specific instruction timing examples are shown in Figures 9 through 13.

## Test Waveforms

| TEST | $\mathbf{v}_{\mathbf{x}}{ }^{\text {a }}$ |  | OUTPUT WAVEFORM - MEAS. LEVEL |
| :---: | :---: | :---: | :---: |
| All tpd | 5.0 V |  |  |
| tpxz | $\begin{array}{\|c\|} \hline \text { tPHZ } \\ \hline 0.0 \mathrm{~V} \\ \hline \end{array}$ | tPLZ |  |
| tpzx | tPZH | tPzL <br> 5.0 V |  |

*At diode; see "Test Circuit" figure below.

## Test Load



* The "TEST POINT" is driven by the output under test, and observed by instrumentation.


NOTE: $\overline{\mathrm{GO}}$ and $\mathrm{I}_{2}-\mathrm{I}_{0}$ can change only when CK is high.
Figure 8. Timing Diagram of the 'S516


NOTES: Register $\mathbf{Z}$ is read at the same time that the overflow signal (if present) is set. If the instruction remains at code 7 after time-slot 11 , the contents of registers $Z$ and $W$ are swapped each cycle.
$\dagger$ "Any code" means any of code 0 through code 7 . However, code 6 will load a new value of $X$, and code 7 will cause the ' $S 516$ to attempt to drive the data bus. *Not available externally on the 'S516.

Figure 9. Instruction Timing Example No. 1: Load X, Load Y, Multiply, Read Z, Read W. By Presenting Code 7 on the Instruction Lines During the Last Multiply Cycle (State 8), the Results May Be Read During Time-Slots 10 and 11


NOTES: The instruction lines may be changed only when CK is high.
$\dagger$ "Any code" means any of code 0 through code 7 . Code 6 may be used here since a new $X$ explicitly gets loaded for the next multiply operation. However, code 7 will cause the 'S516 to attempt to drive the data bus.
*Not available externally on the 'S516.

Figure 10. Instruction Timing Example No. 2: Repeat: "Load X, Load Y, Multiply, Read Z, Read W"


NOTES: Code 7 is given in time-slot 9 , but has no effect until time-slot 10 since $\overline{\mathrm{GO}}$ is HIGH . After $\overline{\mathrm{GO}}$ goes LOW in time-slot $10, \mathrm{Z}$ may be read. †"Any code" means any of code 0 through code 7. *Not available externally on the 'S516.

Figure 11. Instruction Timing Example No. 3: Load X, Load Y, Multiply, Read Z, Read W. This Timing Diagram Corresponds to Table 1. Only After Eight Clock Pulses of the Operation Cycle, the Result Is Read - Z During Time-Slot 10 and W During Time-Slot 11


NOTES: $\dagger$ "Any code" means any of code 0 through code 7 . Code 6 or code 7 may be used here; since $\overline{\mathrm{GO}}$ is HIGH , no new X can be loaded, and the 'S516 cannot attempt to drive the data bus.
*Not available externally on the 'S516.

Figure 12. Instruction Timing Example No. 4: Load X, Load Y, Multiply, Wait, Read Z, Read W


NOTES: This sequence of operations is suitable for use when reading is to be done only at the very end of the operation sequence. The new X value is loaded during the time that the previous multiplication is being performed. See Programming Example \#3 for N

$$
\sum_{i=1}^{N} x_{i} \cdot Y_{i}
$$

$\dagger$ "Any code" means any of code 0 through code 7 . However, code 7 will cause the ' S 516 to attempt to drive the data bus. *Not available externally on the 'S516.
$\dagger \dagger$ Code 6 allows loading of a new $X$ in State 12 and it takes the 'S516 State Counter to State 8. In State 8, Y is loaded via instruction 2 and the next multiply-accumulate cycle is initiated.

Figure 13. Instruction Timing Example No. 5: Sum of Products

## Die Configuration



Die Size: 210x234 mil ${ }^{\mathbf{2}}$

## Programming Examples

In the following examples assume that each line with a separate instruction corresponds to one clock pulse. Instruction codes are $0,1,2,3,4,5,6,7$ and $x$ according to the usage explained in the key to Figure 2.

## Programming Example 1

| Calculating $X \cdot Y(A \cdot B)$ |  |
| :--- | :--- |
| INST 6 | $X-A$ |
| INST 0 | $Y \leftarrow B$ |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST 7 | MULT AND READ $Z=16$ MSB OF (A•B) |
| INST 7 | READ W $=16$ LSB OF (A•B) |

## Programming Example 2

Calculating $\mathrm{X} 1 \cdot \mathrm{Y}(\mathrm{A} \cdot \mathrm{C})$
X 1 is a previous multiplier value. It was previously loaded (in example 1) with A.

| INST 0 | $Y-C$ |
| :--- | :--- |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST $X$ | MULT |
| INST 7 | MULT and READ $Z=16$ MSB OF (A•C) |
| INST 7 | READ $W=16$ LSB OF (A•C) |

## Programming Example 3

Calculating $\sum_{i=1}^{N} X_{i} \cdot Y_{i} \quad(A \cdot B+C \cdot D+E \cdot F+\ldots)$
In this case we read only after N multiplications. A new $\mathrm{X}_{\mathrm{i}}+1$ is loaded during the multiplication process for $X_{i} Y_{i}$.
Assume $\mathrm{N}=3$.
The sequence of instructions and operations for calculating

$$
\sum_{i=1}^{3} X_{i} \cdot Y_{i} \text { is: }(A \cdot B+C \cdot D+E \cdot F)
$$


INST 2 Y-D
INST $X$ MULT
$N=2$ $\begin{array}{ll}\text { INST X } & \text { MULT } \\ \text { INST } X & \text { MULT }\end{array}$
INST X MULT $\}$ Perform C•D $+\left(\mathrm{K}_{\mathrm{z}}, \mathrm{K}_{\mathrm{w}}\right)$
INST X MULT
INST X MULT
INST X MULT
(INST 6 MULT and LOAD $X-E$
$Z-16 M S B$ of (C•D $+A \cdot B$ )
$W-16 L S B$ of (C•D + A•B)
INST 2 Y-F
INST X MULT
INST X MULT INST X MULT
$\left.\begin{array}{ll}\text { INST } X & \text { MULT } \\ \text { INST } X & \text { MULT }\end{array}\right\}$ Perform E•F $+\left(\mathrm{K}_{\mathbf{z}}, \mathrm{K}_{\mathrm{w}}\right)$
INST X MULT
$\begin{array}{ll}\text { INST X } & \text { MULT } \\ \text { INST } X & \text { MULT }\end{array}$
$\begin{array}{ll}\text { INST X } & \text { MULT } \\ \text { INST } 7 & \text { MULT and }\end{array}$
READ Z
READ $Z=16 \mathrm{MSB}$ of $(E \cdot F+C \cdot D+A \cdot B)$
READ W (INST 7 READ $W=16$ LSB of ( $E \cdot F+C \cdot D+A \cdot B$ )

| Programming Example 4 |  |
| :---: | :---: |
| Multiplication plus a constant ( $\mathrm{A} \cdot \mathrm{B}+$ Constant) |  |
|  | Assume that the constant is a 32-bit 2s-complement number. |
| INST 6 | X - A |
| INST 6 | $Z-C$ LOAD 16 MSB of constant |
| INST 6 | W-D LOAD 16 LSB of constant |
| INST 0 | $Y-B$ |
| INST X | MULT |
| INST X | MULT |
| INST X | MULT |
| INST X | MULT \} Perform A•B + $(\mathrm{Z}, \mathrm{W})$ |
| INST X | MULT |
| INST X | MULT |
| INST X | MULT |
| INST 7 | MULT and READ $Z=16 \mathrm{MSB}$ of ( $\mathrm{A} \cdot \mathrm{B}+(\mathrm{C}, \mathrm{D})$ ) |
| INST 7 | READ $W=16$ LSB of ( $A \cdot B+(C, D)$ ) |

## Programming Example 5

Dividing a 32 -bit number by a 16 -bit number ( $(B, C) / A)$
INST $6 \quad X-A$
INST $6 \quad Z-B$
INST $4 \quad W-C$
INST X
INST X
INST X
INST X
INST X
INST X
INST X INST X INST X
$\left.\begin{array}{l}\text { INST } x \\ \text { INST } X\end{array}\right\}$ Perform Division $\frac{(Z, W)}{X}$
INST X
INST X
INST X
INST $X$
INST X
INST X
INST $X$
INST $X$ )
INST 7 DIVIDE and READ the quotient $Z=\frac{(B, C)}{A}$
INST 7 READ the remainder $W$ of $\frac{(B, C)}{A}$

## $16 x 16$ Flow-Thru ${ }^{\text {TM }}$ Multiplier Slice 54/74S556

## Features/Benefits

- Twos-complement, unsigned, or mixed operands
- Full 32-bit product immediately available on each cycle
- High-speed 16x16 parallel multiplier
- Latched or transparent inputs/outputs
- Three-state output controls, independent for each half of the product
- Single +5 V supply (via multiple pins)
- Available in 84-terminal Leadless-Chip Carrier and 88-Pin-Grid-Array packages


## Description

The 'S556 is a high-speed $16 \times 16$ combinatorial multiplier which can multiply two 16 -bit unsigned or signed twos-complement numbers on every cycle. Each operand $X$ and $Y$ has an associated mode-control line, XM and YM respectively. When a mode-control line is at a LOW logic level, the operand is treated as an unsigned 16 -bit number; when the mode-control line is at a HIGH logic level, the operand is treated as a 16 -bit signed twos-complement number. Additional inputs RS and RU allow the addition of a bit into the multiplier array at the appropriate bit positions for rounding. The entire 32-bit double-length product is available at the outputs at one time.

## Ordering Information

| PART NUMBER | PACKAGE | TEMPERATURE |
| :---: | :---: | :---: |
| 54 S 556 | P88, L84 |  |
| 74 S 556 | P88, L84 $^{*}$ | Military |

P88 is an 88-Pin-Grid-Array Package.
L84 is an 84-terminal Leadless-Chip Carrier Package.

* The 84-terminal leadless chip carrier, L84, and its socket, L84-2, are in development; contact the factory for further details.
The most-significant product bit, S31, is available in both true and complemented form to simplify longer-wordlength multiplications. The product outputs are three-state, controlled by assertive-low enables. The MSP outputs are controlled by the TRIM ( $\overline{\mathrm{OEM}}$ ) control input, while the LSP outputs are controlled by the TRIL ( $\overline{(\overline{O E L})}$ control input. This allows one or more multipliers to be connected to a parallel bus or to be used in a pipelined system.
All inputs and outputs have transparent latches. The latches become transparent when the input to the corresponding gate control line GX, GY, GM, GL is HIGH. If latches are not required, these control inputs may be tied HIGH, leaving the multiplier fully transparent for combinatorial cascading. The device uses a single +5 V power supply, and is available both in an 84 -terminal leadless chip carrier (LCC) package and in an 88-pin-grid-array package.


## 'S556 Logic Diagram



| SUMMARY OF SIGNALS/PINS |  |
| :---: | :--- |
| $\mathrm{X}_{15-0}$ | Multiplicand 16-bit data inputs |
| $\mathrm{Y}_{15-0}$ | Multiplier 16-bit data inputs |
| $\mathrm{XM}, \mathrm{YM}$ | Mode-control inputs for each data word; <br> LOW for unsigned data and HIGH for twos- <br> complement data |
| $\mathrm{S}_{31-0}$ | Product 32-bit output |
| $\overline{\mathrm{S}}_{31}$ | Inverted MS product bit (for expansion) |
| RS, RU | Rounding inputs for signed and unsigned <br> data, respectively |
| GX | Gate control for $\mathrm{X}_{\mathrm{i}}$, RS, RU |
| GY | Gate control for $\mathrm{Y}_{\mathrm{i}}$ |
| GL | Gate control for least-significant half <br> of product |
| GM | Gate control for most-significant haif <br> of product |
| $\frac{\text { TRIL }}{\mathrm{OEL}}$ | Three-state control for least-significant half <br> of product |
| $\frac{\text { TRIM }}{\text { OEM }}$ | Three-state control for most-significant half <br> of products |

## Rounding Inputs

| INPUTS |  | ADDS |  | USUALLY USED WITH |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| RU | RS | $\mathbf{2}^{\mathbf{1 5}}$ | $\mathbf{2}^{\mathbf{1 4}}$ | $\mathbf{X M}$ | YM |
| L | L | NO | NO | X | X |
| L | $H$ | NO | YES | $H^{\dagger}$ | $H^{\dagger}$ |
| $H$ | L | YES | NO | L | L |
| $H$ | $H$ | YES | YES | $*$ | $*$ |

$\dagger$ In mixed mode, one of these could be low but not both.

* Usually a nonsense operation.


## Mode-Control Inputs

| OPERATING <br> MODE | INPUT DATA |  | MODE- <br> CONTROL <br> INPUTS |  |
| :---: | :---: | :---: | :---: | :---: |
|  | $\mathrm{X}_{15}-\mathbf{0}$ | $\mathrm{Y}_{\mathbf{1 5}-0}$ | XM | YM |
|  | Unsigned | Unsigned | L | L |
| Mixed | Unsigned | Twos-Comp. | L | H |
|  | Twos-Comp. | Unsigned | H | L |
| Signed | Twos-Comp. | Twos-Comp. | H | H |

## 84-Terminal Leadless Chip Carrier Pinout



All $V_{C C}$ and GND pins must be connected to the respective $V_{C C}$ and GND connections on the board and should not be used for daisychaining through the IC.

## Operating Conditions

| SYMBOL | PARAMETER |  | FIGURE | MILITARY |  |  | COMMERCIAL |  |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| SYMBOL |  |  | MIN | TYP | MAX | MIN |  | MAX |  |
| $\mathrm{V}_{\mathrm{CC}}$ | Supply voltage |  |  |  | 4.5 | 5 | 5.5 | 4.75 | 5 | 5.25 | V |
| $\mathrm{T}_{\mathrm{A}}$ | Operating free-air temperature |  |  | -55 |  | * 125 | 0 |  | 75 | ${ }^{\circ} \mathrm{C}$ |
| ${ }^{\text {t }}$ S 1 | Setup time ( $\mathrm{X}_{\mathrm{i}}, \mathrm{R}_{\mathrm{i}}$ )/ $\mathbf{Y}_{\mathbf{i}}$ to GX/GY |  | 2a, 2b | 12 |  |  | 10 |  |  | ns |
| ${ }^{\text {t }}$ S 2 | Setup time $X_{i}, Y_{i}, R_{i}$ to $G M, G L$ | ${ }^{\text {tS }}$ 2L | 3a, 3b | 65 |  |  | 60 |  |  | ns |
|  |  | ${ }^{\text {t S } 2 M}$ |  | 82 |  |  | 74 |  |  |  |
| ${ }^{\text {t }}$ S 3 | Setup time GX, GY to GL, GM | ${ }^{\text {t }}$ S3L | $\begin{gathered} 4 \mathrm{a}, 4 \mathrm{~b}, 4 \mathrm{c} \\ 4 \mathrm{~d}, 4 \mathrm{e}, 4 \mathrm{f} \end{gathered}$ | 65 |  |  | 60 |  |  | ns |
|  |  | ${ }^{\text {tS3M }}$ |  | 85 |  |  | 75 |  |  |  |
| $\mathrm{t}_{\mathrm{H} 1}$ | Hold time ( $\mathrm{X}_{\mathrm{i}}, \mathrm{R}_{\mathrm{i}}$ )/ $/ \mathrm{Y}_{\mathrm{i}}$ to GX/GY |  | $2 \mathrm{a}, 2 \mathrm{~b}$ | 8 |  |  | 8 |  |  | ns |
| ${ }_{\text {t }}^{\mathrm{H} 2}$ | Hold time $\mathrm{X}_{\mathrm{i}}, \mathrm{Y}_{\mathrm{i}}, \mathrm{R}_{\mathrm{i}}$ to GM, GL | $\mathrm{t}_{\mathrm{H} 2 \mathrm{~L}}, \mathrm{t}_{\mathrm{H} 2 \mathrm{M}}$ | 3a, 3b | 3 |  |  | 3 |  |  | ns |
| ${ }_{\text {t }}^{\mathrm{H} 3}$ | Hold time GX, GY to GM, GL | $\mathrm{t}_{\mathrm{H} 3 \mathrm{~L}}, \mathrm{t}_{\mathrm{H} 3 \mathrm{M}}$ | $\begin{gathered} 4 a, 4 b, 4 c \\ 4 d, 4 e, 4 f \end{gathered}$ | 0 |  |  | 0 |  |  | ns |
| ${ }^{\text {tw}}$ | Latch enable pulse width |  | 6 | 14 |  |  | 12 |  |  | ns |

* Indicates case temperature.

Electrical Characteristics Over Operating Conditions

| SYMBOL | PARAMETER | TEST CONDITIONS |  | MIN TYP $\dagger$ | MAX | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathrm{V}_{\text {IL }}$ | Low-level input voltage** |  |  |  | 0.8 | V |
| $\mathrm{V}_{1 H}$ | High-level input voltage** |  |  | 2 |  | V |
| VIC | Input clamp voltage | $\mathrm{V}_{\text {CC }}=\mathrm{MIN}$ | $I_{1}=-18 \mathrm{~mA}$ |  | -1.5 | V |
| IIL | Low-level input current | $V_{C C}=M A X$ | $\mathrm{V}_{\mathrm{i}}=0.4 \mathrm{~V}$ |  | -0.4 | mA |
| ${ }_{1 / \mathrm{H}}$ | High-level input current | $V_{C C}=M A X$ | $\mathrm{V}_{1}=2.4 \mathrm{~V}$ |  | 75 | $\mu \mathrm{A}$ |
| 1 | Maximum input current | $V_{C C}=M A X$ | $\mathrm{V}_{1}=5.5 \mathrm{~V}$ |  | 1 | mA |
| $\mathrm{V}_{\mathrm{OL}}$ | Low-level output voltage | $V_{C C}=$ MIN | $\mathrm{I}_{\mathrm{OL}}=8 \mathrm{~mA}$ |  | 0.5 | V |
| $\mathrm{V}_{\mathrm{OH}}$ | High-level output voltage | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MIN}$ | ${ }^{1} \mathrm{OH}=-2 \mathrm{~mA}$ | 2.4 |  | V |
| ${ }^{\text {IOZL }}$ | Off-state output current | $\mathrm{V}_{\mathrm{CC}}=\mathrm{MAX}$ | $\mathrm{V}_{\mathrm{O}}=0.5 \mathrm{~V}$ |  | -100 | $\mu \mathrm{A}$ |
| $\mathrm{IOZH}^{\text {I }}$ |  |  | $\mathrm{V}_{\mathrm{O}}=2.4 \mathrm{~V}$ |  | 100 | $\mu \mathrm{A}$ |
| IOS | Output short-circuit current* | $\mathrm{V}_{C C}=\mathrm{MAX}$ | $\mathrm{V}_{\mathrm{O}}=0 \mathrm{~V}$ | -20 | -90 | mA |
| ${ }^{\prime} \mathrm{Cc}$ | Supply current | $V_{C C}=M A X$ | Commercial 74S556 | 600 | 800 | mA |
|  |  |  | Military 54S556 | 600 | 900 | mA |
| ${ }^{1} \mathrm{CC}$ | Supply current at hot temperature limit | $\mathrm{V}_{\mathrm{CC}}=5.25 \mathrm{~V}$ | $\mathrm{T}_{\mathrm{A}}=75^{\circ} \mathrm{C}$ |  | 700 | mA |
|  |  | $\mathrm{V}_{\mathrm{CC}}=5.5 \mathrm{~V}$ | $\mathrm{T}_{\mathrm{C}}=125^{\circ} \mathrm{C}$ |  | 800 | mA |

$\dagger$ Typical at 5.0 V and $25^{\circ} \mathrm{C} \mathrm{T}_{\mathrm{A}}$.

* Not more than one output should be shorted at a time and the duration of the short-circuit should not exceed one second.
** These are absolute voltages with respect to the ground pins and include all overshoots due to system and/or tester noise. Do not attempt to test these values without suitable equipment.


## Switching Characteristics Over Operating Conditions

| SYMBOL | PARAMETER |  | TEST CONDITIONS | $54 S 556$MILITARY |  |  | $\begin{gathered} \text { 74S556 } \\ \text { COMMERCIAL } \end{gathered}$ |  |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  | MIN | TYP | MAX | MIN | TYP | MAX |  |
| ${ }^{\text {t }}$ DTL | Transparent MultiplyGX, GY, GM, GL = H | $X_{i}, Y_{i}, R_{i} \text { to } S_{15-0}$ <br> Figs. 1, 2c, 3b, 4c, 4f |  | $\begin{gathered} C L=30 \mathrm{pF} \\ \mathrm{RL}=560 \Omega \\ \text { See figure } 7 \end{gathered}$ |  |  | 84 |  | 50 | 76 | ns |
| ${ }^{\text {t }}$ DTM |  | $X_{i}, Y_{i}, R_{i} \text { to } S_{31}, \bar{S}_{31-16}$ <br> Figs. 1, 2c, 3b, 4c, 4f |  |  |  | 100 |  | 60 | 90 | ns |
| ${ }^{t}$ D1L | Transparent Output Multiply GM, GL = H | GX, GY to $\mathrm{S}_{15-0}$ <br> Figs. 2a, 2b, 4d, 4e |  |  |  | 88 |  |  | 80 | ns |
| ${ }^{t} \mathrm{D} 1 \mathrm{M}$ |  | GX,GY, to $S_{31}, \bar{S}_{31-16}$ Figs. 2a, 2b, 4d, 4e |  |  |  | 102 |  |  | 92 | ns |
| ${ }^{t} \mathrm{D} 2$ | Transparent Input Multiply $G X, G Y=H$ | GM, GL to $S_{i}$ <br> Figs. 3a, 4a, 4b |  |  |  | 40 |  |  | 35 | ns |
| ${ }^{\text {t PXX }}$ | Three-State Disable Timing | $\begin{gathered} \text { TRIL ( } \overline{\mathrm{OEL}}) \text {, TRIM }(\overline{\mathrm{OEM}}) \\ \text { to } \mathrm{S}_{\mathrm{i}} \text { Fig. } 5 \\ \hline \end{gathered}$ |  |  |  | 40 |  |  | 30 | ns |
| ${ }^{\text {t P }}$ KX | Three-State Enable Timing | $\begin{gathered} \text { TRIL ( } \overline{\mathrm{OEL}}) \text {, TRIM }(\overline{\mathrm{OEM}}) \\ \text { to } \mathrm{S}_{\mathrm{i}} \text { Fig. } 5 \\ \hline \end{gathered}$ |  |  |  | 40 |  |  | 30 | ns |

Transparent Multiply - Flowthrough Operation


Figure 1.

The transparent multiply is a flowthrough operation of the 'S556. Both the input and output latches are made transparent by keeping GX, GY, GM, and GL at a HIGH level. The operands are
presented to the $X, Y$, and $R$ inputs; the results are available ${ }^{t} D T L$ and IDTM later, for the least and most significant halves of the $^{\text {m }}$ product respectively.


* With this particular timing, set-up time t S 1 will automatically be met.

Figure 2a.


Figure 2b.
By tying the GL and GM lines HIGH, the 'S556 can perform transparent output (or pipelined input) multiplies. Data present is latched at the inputs using the GX and GY control signals. The time at which the result $S$ is present at the outputs depends on when the rising edges of GX and GY occur. If the rising edges of GX and GY occur after the operand inputs change, then Figure 2a applies; the result will be available at the outputs ${ }^{\text {D DL }}$ and ${ }^{t} D 1 M^{*}$ after the rising edges of $G X$ and $G Y$. If the rising edges of GX and GY occur less than ( $\mathrm{t}_{\mathrm{W} \text { min }}{ }^{-t_{S}} \mathrm{~min}$ ) before the oper-


Figure 2c.
and inputs change, then Figure 2 b applies; in this case the result will also be available at the outputs ${ }^{\mathrm{D}} \mathrm{D} 1 \mathrm{~L}$ and ${ }^{\mathrm{t}} \mathrm{D} 1 \mathrm{M}^{*}$ after the rising edges of GX and GY. However, if the rising edges of GX and GY occur more than ( $t_{W} \mathrm{~min}^{-t_{S}} \mathrm{~min}$ ) before the operand inputs change, then Figure 2c applies; the result will appear at the outputs t DTL and ${ }_{\text {DTM }}{ }^{*}$ after the operand inputs change.

* For the least and most significant halves of the product, respectively.


## Transparent Input Multiply — Pipelined Output



Figure 3a.
By tying the GX and GY lines HIGH, the 'S556 can perform transparent input (or pipelined output) multiplies. Data is presented at the inputs, and $\mathrm{t}_{\mathrm{S} 2}$ after $X, Y$ and $R$ change, the results can be latched. The time at which the result $S$ is present at the outputs depends upon when the rising edges of GL and GM occur. If they occur at or after ( ${ }^{\mathbf{S}} \mathrm{S} 2 \mathrm{~min}^{-t^{t}} \mathrm{Wmin}$ ) from the inputs


Figure 3b.
changing, then Figure 3a applies; the result appears at the outputs $\mathrm{t}_{\mathrm{D} 2}$ after the rising edges of GL and GM. If the rising edges of GL and GM occur before ( $t_{\mathrm{S} 2} \mathrm{~min}^{-}{ }^{-} \mathrm{W}$ min ) from the inputs changing, then Figure 3b applies; the result appears at the outputs TDTL and TDTM ${ }^{*}$ after the operand inputs change.

* For the least and most significant halves of the product, respectively.


## Gated Multiply - Pipelined Input and Output



Figure 4 a.


Figure 4c.


* With this particular timing setup time $\mathrm{I}_{\mathrm{S} 1}$ will be automatically met.

Figure 4 e.
The gated multiply represents the pipelined input and output operation. The latch enable lines GX, GY, GL, GM are used to store incoming operands and outgoing results. The particular set-up times that must be met and the time the result takes to reach the outputs depends on two timing relationships. The first is when the rising edges of GX and GY occur with respect to the operand inputs changing, and the second is when the rising edges of GL and GM occur with respect to the rising edges of GX and GY. On the above timing diagrams, denote the absolute time


Figure 4b.


Figure 4d.


Figure 4 .
that the operand inputs change as $T_{X Y R}$, the absolute time that the rising edges of GX and GY occur as $\mathrm{T}_{\mathrm{GXY}}$, and the absolute time that the rising edges of GL and GM occur as TGLM. Thus, the two delays of concern can be explicitly stated as ( $T_{G X Y}{ }^{-}$ $T_{X Y R}$ ) and ( $T_{G L M}-T_{G X Y}$ ). Notice that either of these quantities can be positive or negative depending on which event occurs first. Timing for gated multiplies can then be summarized in the following table:

| $\mathbf{T}_{\mathbf{G X Y}}{ }^{-\mathbf{T}_{\mathbf{X Y R}}}$ | $\mathrm{T}_{\text {GLM }}-\mathrm{T}_{\text {GXY }}$ | FIGURE | WHICH SET-UP TIMES MUST BE MET | WHEN RESULT IS PRESENT AT OUTPUTS |
| :---: | :---: | :---: | :---: | :---: |
| $T_{G X Y}-T_{X Y R} \geq 0$ | $T_{G L M}{ }^{-T_{G X Y}} \geq{ }^{\text {S }} 3 \min ^{-t} W_{\text {min }}$ | 4a | ${ }_{\text {ts3 }}$ | $\mathrm{T}_{\mathrm{GLM}}{ }^{+t_{\mathrm{D}} \text { 2 }}$ |
| $0<T_{X Y R^{-T}}{ }_{\text {GXY }} \leq t_{W m i n}{ }^{-t_{s}} 1 \mathrm{~min}$ |  | 4b | ${ }^{t_{S 1}, t_{S 2}, t_{S 3}}$ | $\mathrm{T}_{\mathrm{GLM}}{ }^{+t_{\mathrm{D}}{ }^{\text {2 }} \text { }}$ |
|  | $\mathrm{T}_{\mathrm{GLM}}{ }^{-\mathrm{T}_{\mathrm{GXY}}} \geq \mathrm{t}_{S 3} \mathrm{~min}^{-\mathrm{t}^{W} \mathrm{Wmin}}$ | 4c |  | $\mathrm{T}_{\text {XYR }}+\left(\mathrm{t}_{\mathrm{DTL}}, \mathrm{t}_{\mathrm{DTM}}{ }^{*}\right.$ |
| $\mathrm{T}_{\mathrm{GXY}}{ }^{-T_{X Y R}} \mathbf{\geq 0}$ | $T_{G L M}{ }^{-T_{G X Y}}{ }^{<t_{S}{ }^{\text {min }}}{ }^{-t}{ }^{\text {Wmin }}$ | 4d | ${ }_{\text {t }} 3$ |  |
| $0<T_{X Y R}{ }^{-T_{G X Y}} \leq{ }^{\text {Wmin }}{ }^{-t_{S}}{ }^{\text {min }}$ |  | 4 e |  | $\mathrm{T}_{\mathrm{GXY}}+\left(\mathrm{t}_{\mathrm{D} 1 \mathrm{~L}}, \mathrm{t}_{\mathrm{D} 1 \mathrm{M}}{ }^{*}\right.$ |
|  | $\mathrm{T}_{\mathrm{GLM}}{ }^{-\mathrm{T}_{\mathrm{GXY}}}{ }^{<t_{S}} \mathrm{mmin}^{-t} \mathrm{Wmin}$ | 4 f | $\mathrm{t}_{\mathrm{S} 1}, \mathrm{t}_{\mathrm{S} 2}$ |  |

* For the least and most significant halves of the product respectively.

NOTE: TXYR represents the absolute time when the operand inputs change.
$T_{G X Y}$ and $T_{G L M}$ represent the absolute times when the rising edges of the latch controls occur.

## Three State Timing



Figure 5.

## Test Waveforms

| TEST | $\mathbf{V}_{\mathbf{X}}$ |  | OUTPUT WAVEFORM - MEAS. LEVEL |
| :---: | :---: | :---: | :---: |
| All tpd | 5.0 V |  |  |
| tpXZ | $\begin{array}{\|l} \text { tPHZ } \\ \hline \text { tPLZ } \end{array}$ | 0 5 |  |
| tPZX | tPZH <br> tPZL | 0 |  |

## Latch Enable Pulse Width (GL, GM, GX, GY)



Figure 6.

## Load Test Circuit



Figure 7.

## Recommended Bypass Capacitors

The switching currents when the outputs change can be fairly high, and bypass capacitors are recommended to adequately decouple the VCC and GND connections.
For example, on the 84-terminal LCC package, pins 21 and 22 are VCC2 supplies and should be decoupled with pin 33, a GND input, using a $0.1 \mu \mathrm{f}$ monolithic ceramic disk capacitor. The
capacitor must have good high-frequency characteristics. Also pins 64 and 65, VCC1 and VCC2, should be decoupled with pin 74, a GND input, with a similar capacitor arrangement.
For the 88-pin-grid-array package pins 21 and 22 are VCC2 supplies and should be decoupled with pin 35 , the GND pin. Pins 66 and 67, VCC1 and VCC2, should be decoupled with pin 77, the GND pin.

## Decoupling Capacitors Shown with the 84-Terminal LCC Package



Typical Supply Current Over Temperature Range


## 88 Pin-Grid-Array

## Pin Locations <br> Bottom View

(11) (14) (16) (17) (19) (21) (23) (25) (26) (28) (30) (31) (33)
(9) (12) (13) (15) (18) (20) (22) (24) (27) (29) (32) (34) (36)
(8) (10)
(35) (38)
(6) (7)

IDENTIFIER
FOR PIN 1
(4) (5)
(3) 2
(1) 88 .
(87) 86
(85) (84)
(83) (81)

(37) (39)
(82) (79)
(80) (88) (76) (73) (11) (88) (66) (64) (62) (59) (57) (56) (53)
(77) (75) (44) (12) (70) (69) (67) (65) (63) (61) (60) (58) (55)

## Pin-Guide For Pin Grid Array

| Pin No. | Pin Name | Pin No. | Pin Name | Pin No. | Pin Name | Pin No. | Pin Name |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | X9 | 23 | N/C* | 45 | S25 | 67 | VCC2 $\dagger$ |
| 2 | X10 | 24 | Y8 | 46 | S24 | 68 | N/C* |
| 3 | X11 | 25 | Y9 | 47 | S23 | 69 | S7 |
| 4 | X12 | 26 | Y10 | 48 | S22 | 70 | S6 |
| 5 | X13 | 27 | Y11 | 49 | S21 | 71 | S5 |
| 6 | X14 | 28 | Y12 | 50 | S20 | 72 | S4 |
| 7 | X15 | 29 | Y13 | 51 | S19 | 73 | S3 |
| 8 | XM | 30 | Y14 | 52 | S18 | 74 | S2 |
| 9 | GX | 31 | Y15 | 53 | S17 | 75 | S1 |
| 10 | RS | 32 | YM | 54 | S16 | 76 | SO |
| 11 | RU | 33 | GY | 55 | GND | 77 | GND |
| 12 | GND | 34 | N/C* | 56 | TRIL ( $\overline{\mathrm{OEL}}$ ) | 78 | N/C* |
| 13 | YO | 35 | GND | 57 | GL | 79 | GND |
| 14 | Y1 | 36 | TRIM ( $\overline{O E M}$ ) | 58 | S15 | 80 | X0 |
| 15 | Y2 | 37 | GM | 59 | S14 | 81 | X1 |
| 16 | Y3 | 38 | $\overline{\text { S31 }}$ | 60 | S13 | 82 | X2 |
| 17 | Y4 | 39 | S31 | 61 | S12 | 83 | X3 |
| 18 | Y5 | 40 | S30 | 62 | S11 | 84 | X4 |
| 19 | Y6 | 41 | S29 | 63 | S10 | 85 | X5 |
| 20 | Y7 | 42 | S28 | 64 | S9 | 86 | X6 |
| 21 | VCC2 $\dagger$ | 43 | S27 | 65 | S8 | 87 | X7 |
| 22 | vCC2 $\dagger$ | 44 | S26 | 66 | VCC1†† | 88 | X8 |

* Not connected. $\dagger$ vCC2 $=$ Logic $\mathrm{VCC} . ~ \dagger \dagger$ vCC1 $=$ Output buffer vcc.


## Rounding

Multiplication of two n-bit operands results in a $2 n$-bit product $\dagger$. Therefore, in a pure n-bit system it is necessary to convert the double-length product into a single-length product. This can be accomplished by truncating or rounding. The following examples illustrate the difference between the two conversion techniques in decimal arithmetic:

$$
\begin{aligned}
& 39.2 \rightarrow 39 \\
& 39.6 \rightarrow 39 \quad \text { Truncating } \\
& 39.2+0.5=39.7 \rightarrow 39 \\
& 39.6+0.5=40.1 \rightarrow 40
\end{aligned} \quad \text { Rounding }
$$

Obviously, rounding maintains more precision than truncating, but it may take one more step to implement. The additional step involves adding one-half of the weight of the single-length LSB to the MSB of the discarded part; e.g., in decimal arithmetic rounding 39.28 to one decimal point is accomplished by adding
0.05 to the number and truncating the LSB:

$$
39.28+0.05=39.33 \rightarrow 39.3
$$

The situation in binary arithmetic is quite similar, but two cases need to be considered; signed and unsigned data representation. In signed multiplication, the two MSBs of the result are identical, except when both operands are -1 ; therefore, the best single-length product is shifted one position to the right with respect to the unsigned multiplications. Figure 8 illustrates these two cases for the $16 \times 16$ multiplier. In the signed case, adding one-half of the $\mathrm{S}_{15}$ weight is accomplished by adding 1 in bit position 14, and in the unsigned case by adding 1 in bit position 15. Therefore, the 'S556 multiplier has two rounding inputs. RS and RU. Thus, to get a rounded single-length result, the appropriate $R$ input is tied to $V_{C C}$ (logic High) and the other $R$ input is grounded. If a double-length result is desired, both $R$ inputs are grounded.
†In general multiplication of an $M$-bit operand by an $N$-bit operand results in an $(M+N)$-bit product.
(a) SIGNED MULTIPLY (OMIT $S_{31}$ as $\mathbf{S}_{\mathbf{3 0}}=\mathbf{S}_{\mathbf{3 1}}=$ sign of result)

(b) UNSIGNED MULTIPLY

BINARY POINT

\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline X \& - $X_{15}$

- $Y_{15}$ \& $X_{14}$
$Y_{14}$ \& $X_{13}$
$Y_{13}$ \& $X_{12}$
$Y_{12}$ \& $X_{11}$
$Y_{11}$ \& $X_{10}$
$Y_{10}$ \& $X_{9}$
$Y_{9}$ \& $X_{8}$
$Y_{8}$ \& $X_{7}$
$Y_{7}$ \& $X_{6}$
$Y_{6}$ \& $X_{5}$

$Y_{5}$ \& $X_{4}$

$Y_{4}$ \& $$
\begin{aligned}
& X_{3} \\
& Y_{3}
\end{aligned}
$$ \& $X_{2}$

$Y_{2}$ \& $$
\begin{aligned}
& x_{1} \\
& y_{1}
\end{aligned}
$$ \& $X_{0}$ \& \& \& FULL 32-BIT PRODUCT <br>

\hline $\pm$ \& $$
\begin{aligned}
& \bullet S_{31} \\
& \bullet \quad 0 \\
& \hline
\end{aligned}
$$ \& $S_{30}$

0 \& \[
$$
\begin{gathered}
S_{29} \\
0 \\
\hline
\end{gathered}
$$

\] \& | $\mathrm{S}_{28}$ |
| :---: |
| 0 | \& \[

$$
\begin{gathered}
S_{27} \\
0 \\
\hline
\end{gathered}
$$

\] \& \[

$$
\begin{gathered}
S_{26} \\
0
\end{gathered}
$$

\] \& | $S_{25}$ |
| :--- |
| 0 | \& | $\mathrm{S}_{24}$ |
| :--- |
| 0 | \& \[

$$
\begin{gathered}
S_{23} \\
0
\end{gathered}
$$

\] \& \[

$$
\begin{gathered}
S_{22} \\
0
\end{gathered}
$$

\] \& | $\mathrm{S}_{21}$ |
| :--- |
| 0 | \& \[

$$
\begin{gathered}
S_{20} \\
0 \\
\hline
\end{gathered}
$$

\] \& \[

\mathbf{S}_{19}
\]

$$
0
$$ \& \[

$$
\begin{gathered}
\mathrm{s}_{18} \\
0 \\
\hline
\end{gathered}
$$

\] \& \[

$$
\begin{gathered}
S_{17} \\
0
\end{gathered}
$$

\] \& \[

$$
\begin{gathered}
\mathrm{s}_{16} \\
0
\end{gathered}
$$

\] \& \[

\mathbf{S}_{15}
\]

$$
1
$$ \& \[

$$
\begin{gathered}
S_{14} \\
0 \\
\hline
\end{gathered}
$$

\] \& \[

$$
\begin{array}{cc}
S_{13} & \cdots \\
0 & \ldots \\
\hline
\end{array}
$$
\] <br>

\hline \multirow[t]{2}{*}{} \& - \& $\mathrm{S}_{30}$ \& $\mathrm{S}_{29}$ \& $\mathrm{S}_{28}$ \& $\mathrm{S}_{27}$ \& $\mathrm{S}_{26}$ \& $\mathrm{S}_{25}$ \& $\mathrm{S}_{24}$ \& $\mathrm{S}_{23}$ \& $\mathrm{S}_{22}$ \& $\mathrm{S}_{21}$ \& $\mathrm{S}_{20}$ \& $\mathrm{S}_{19}$ \& $\mathrm{S}_{18}$ \& $\mathrm{S}_{17}$ \& $\mathrm{S}_{16}$ \& \& \& <br>
\hline \& \multicolumn{16}{|c|}{BEST 16-BIT PRODUCT} \& \& ADD WEIC DISC \& 1/2 THE MSB GHT OF THE CARDED PART <br>
\hline
\end{tabular}

NOTES:
(a) In signed (twos-complement) notation, the MSB of each operand is the sign bit, and the binary point is to the right of the MSB. The resulting product has a redundant sign bit and the binary point is to the right of the second MSB of the product. The best 16 -bit product is from $\mathrm{S}_{30}$ through $\mathrm{S}_{15}$, and rounding is performed by adding " 1 " to bit position $\mathrm{S}_{14}$.
(b) In unsigned notation the best 16 -bit product is the most significant half of the product and is corrected by adding "1" to bit position $\mathrm{S}_{15}$.

Figure 8. Rounding the Result of Binary Fractional Multiplication.

## Using the 'S556 in a Pipelined Positive-Edge Triggered Clock System

The 'S556 has internal latches which can be used affectively in systems where things happen on positive-going clock edges. This application is an extension of the gated multiply mode shown in Figure 1, in which a 32-bit product can be latched every ${ }^{\prime} \mathrm{S} 3$ nsec in the 'S556.
If the signals GX, GY, GM and GL can be derived from the system clock then the latches can almost have the same effect as having a register. The basic philosophy behind the recommended timing is that the input latches are closed when the output latches are open; the outputs are then closed (and have
latched results) and new data is presented to the input latches, which are opened. This is shown by the relation between GX, GY and GL, GM in Figure 9. The set-up time $\mathrm{t}_{\mathrm{S} 3}$ is shown as one value but strictly speaking, it is split as ${ }_{\mathrm{t}_{\mathrm{S}} 3 \mathrm{~L}}$ and $\mathrm{t}_{\mathrm{S} 3 \mathrm{M}}$ for the least significant and most significant half of the product respectively. The value of $t_{S 3 L}$ is less than $t_{S 3 M}$, for applications requiring the least significant bits of the result as fast as possible.
One note of caution is that a design must always meet the set-up and hold times for $X_{i}, R_{i}$ with respect to $G X$ and for $Y_{i}$ with respect to GY.
The result $\mathrm{S}_{\mathbf{i}}$ is available ${ }_{\mathrm{D}}{ }^{2}$ after the rising edge of GM and GL .


Figure 9.

## Totally Parallel 32x32 Multiplier



Figure 10. Partial Products for a $\mathbf{3 2 \times 3 2}$ Multiplication

A twos-complement $32 \times 32$ multiplication can be performed within 220 nsec using 4 ' $\mathrm{S} 556 \mathrm{~s}, 20$ 'S381s, and 7 'S182s. This $32 \times 32$ multiply operation involves adding up four partial products as shown in Figure 10. These four partial products are generated in four multipliers; the outputs are XA*YA, XA*YB, $X B^{*} Y A, X B^{\star} Y B$, where $X 31-16=X B, X 15-0=X A, Y 31-16=Y B$, $\mathrm{Y} 15-0=\mathrm{YA}$.
The implementation of this twos-complement $32 \times 32$ multiplier is shown in Figure 11. The outputs of the $16 \times 16$ multipliers are connected to two levels of adders to give a 64 -bit product. The first level of adders is needed to add the two central partial products of Figure 10, XA*YB and XB*YA. Notice the technique which is used to generate the "sign extension", or the mostsignificant sum bit of the first level of adders. The 'S556 provides, as a direct output, the complement of the most-significant product bit; having this signal immediately speeds up the signextension computation, and reduces the external parts count.


[^1]TOTAL MULTIPLY TIME = MULTIPLIER DELAY + ADDER LEVEL 1 DELAY + ADDER LEVEL 2 DELAY = 90 + 65 + 65 = 220 nsec

Figure 11. Implementation of the $32 \times 32$ Multiplier

For example, the inputs to the adder in the most significant position are the $\overline{\mathrm{S} 31}$ outputs from the two central multipliers. The sign extension of the addition of $X A^{*} Y B$ and $X B^{*} Y A$ is defined as
SIGN EXT $=\overline{\bar{A}} . \bar{B} .+\overline{\mathrm{A}} . C .+\overline{\mathrm{B}} . \mathrm{C}$. , where
$A$ is the most-significant bit of the term $X A^{*} Y B$;
$B$ is the most-significant bit of the term $X B^{*} Y A$; and $C$ is the carry-in to the most-significant bits of $X A^{*} Y B$ and $X B^{*} Y A$, in the adder.

The sign extension can be computed as the negation of the carry-out term of three terms, $\bar{A}, \bar{B}$, and $C$. This term corresponds to the negative of the carry-out of the bit position just one place to the right of the most-significant bit position of the first level of adders. The negative of the carry-out can be generated by presenting a carry-out and a binary "one" to the most significant bit of the adder. The generated sum bit then corresponds to the negation of the carry-out of the previous stage, which is the sign
extension required to be added to the 16 most-significant bits of the $X^{*}{ }^{\star} Y B$ partial product term.

The second level of adders, which performs a 48-bit add function, is fairly straightforward. These adders can be implemented using 'S381 four-bit ALUs and 'S182 carry-bypasses ("carrylookahead generators") which are available from Monolithic Memories Inc. and from other vendors.

Other configurations such as $48 \times 48$ and $64 \times 64$ multipliers can be designed using the same methodology, r1.

## References

1. "Fast $64 \times 64$ Multiplication using $16 \times 16$ Flow-through Multiplier and Wallace Trees," Marvin Fox, Chuck Hastings, and Suneel Rajpal, Monolithic Memories System Design Handbook, pages 4-77 to 4-84.

## Die Configuration



Die Size $=183 \times 243 \mathrm{mil}^{2}$

## 8x8 High Speed Schottky Multipliers SN54/74S557 SN54/74S558

## Features/Benefits

- Industry-standard $\mathbf{8 x 8}$ multiplier
- Multiplies two 8 -bit numbers; gives $\mathbf{1 6}$-bit result
- Cascadable; $\mathbf{5 6 \times 5 6}$ fully-parallel multiplication uses only $\mathbf{3 4}$ multipliers for the most-significant half of the product
- Full $8 \times 8$ multiply in $\mathbf{6 0 n s}$ worst case
- Three-state outputs for bus operation
- Transparent 16-bit latch in 'S557
- Plug-in compatible with original Monolithic Memories' 67558


## Description

The 'S557/'S558 is a high-speed $8 \times 8$ combinatorial multiplier which can multiply two eight-bit unsigned or signed twoscomplement numbers and generate the sixteen-bit unsigned or signed product. Each input operand $X$ and $Y$ has an associated Mode control line, $X_{M}$ and $Y_{M}$ respectivelv: When a Mode control line is at a Low logic level, the operand is treated as an unsigned eight-bit number; whereas, if the Mode control is at aHigh logic level, the operand is treated as an eight-bit signed twos-complement number. Additional inputs, RS and RU, (R, in the 'S557) allow the addition of a bit into the multiplier array at the appropriate bit positions for rounding signed or unsigned fractional numbers.

The 'S557 internally develops proper rounding for either signed or unsigned numbers by combining the rounding input $R$ with $X_{M}, Y_{M}, \overline{X_{M}}$, and $\overline{Y_{M}}$ as follows:
$R_{U}=\overline{X_{M}} \cdot \overline{Y_{M}} \cdot R=$ Unsigned rounding input to $2^{7}$ adder.
$R_{S}=\left(X_{M}+Y_{M}\right) R=$ Signed rounding input to $2^{6}$ adder.
Since the 'S558 has no latches, it does not require the use of pin 11 for the latch enable input G, so RS and $R_{U}$ are brought out separately.

The most-significant product bit is available in both true and complemented form to assist in expansion to larger signed multipliers. The product outputs are three-state, controlled by an assertive-low Output Enable which allows several multipliers to be connected to a parallel bus or be used in a pipelined system. The device uses a single +5 V power supply and is packaged in a standard 40-pin DIP.

## Ordering Information

| PART NUMBER | PACKAGE | TEMPERATURE |
| :---: | :---: | :---: |
| $54 \mathrm{~S} 557,54 \mathrm{~S} 558$ | $\mathrm{~J},(44),(\mathrm{L})$ | Military |
| $74 \mathrm{~S} 557,74 \mathrm{~S} 558$ | $\mathrm{~N}, \mathrm{~J}$, | Commercial |

## Logic Symbol



Pin Configuration

†For $54 / 74 \mathrm{~S} 557$ Pin 9 is R and Pin 11 is G .

## Logic Diagram


†For $54 / 74$ S557 Pin 9 is $R$ and $\operatorname{Pin} 11$ is $G$

## Die Configurations

## 'S557



Die Size: $144 \times 130$ mil $^{\mathbf{2}}$
'S558


Die Size: $144 \times 130$ mil ${ }^{2}$

## Absolute Maximum Ratings



Off-state output voltage ............................................................................................................... 5.5 V
Storage temperature . ...................................................................................................... . . $-65^{\circ}$ to $+150^{\circ} \mathrm{C}$

## Operating Conditions

| SYMBOL | PARAMETER | DEVICE | MILITARY |  |  | COMMERCIAL |  |  | UNITS |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  | MIN | TYP | MAX | MIN | TYP | MAX |  |
| $\mathrm{V}_{\mathrm{CC}}$ | Supply voltage | all | 4.5 | 5 | 5.5 | 4.75 | 5 | 5.25 | V |
| $\mathrm{T}_{\text {A }}$ | Operating free-air temperature | all | -55 |  | 125* | 0 |  | 75 | ${ }^{\circ} \mathrm{C}$ |
| ${ }^{\text {s }}$ su | $X_{i}, Y_{i}$ to $G$ set | 'S557 | 50 |  |  | 40 |  |  | ns |
| $t_{\text {h }}$ | $X_{i}, Y_{i}$ to $G$ hold time | 'S557 | 0 |  |  | 0 |  |  | ns |
| ${ }^{\text {t }}$ w | Latch enable pulse width | 'S557 | 20 |  |  | 15 |  |  | ns |

* Case temperature

Electrical Characteristics Over Operating Conditions


* Not more than one output should be shorted at a time and duration of the short-circuit should not exceed one second.
$\dagger$ Typicals at $5.0 \mathrm{~V} \mathrm{~V}_{\mathrm{CC}}$ and $25^{\circ} \mathrm{C} \mathrm{T}_{\mathrm{A}}$


## Switching Characteristics Over Operating Conditions

| SYMBOL | PARAMETER | DEVICE | TEST CONDITIONS | MILITARY |  |  | COMMERCIAL |  |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  |  | MIN | TYP $\dagger$ | MAX | MIN | TYP $\dagger$ | MAX |  |
| ${ }^{\text {P PD1 }}$ | $X_{i}, Y_{i}$ to $S_{7-0}$ | All | $\begin{aligned} & C_{L}=30 \mathrm{pF} \\ & R_{L}=560 \Omega \end{aligned}$ <br> see test figures |  | 40 | 60 |  | 40 | 50 | ns |
| ${ }^{\text {t PD2 }}$ | $X_{i}, Y_{i}$ to $S_{15-8}$ | All |  |  | 45 | 70 |  | 45 | 60 | ns |
| ${ }^{\text {t PD3 }}$ | $X_{i}, Y_{i}$ to $\bar{S}_{15}$ | All |  |  | 50 | 75 |  | 50 | 65 | ns |
| ${ }^{\text {t PD4 }}$ | $G$ to $S_{i}$ | 'S557 |  |  | 20 | 40 |  | 20 | 35 | ns |
| ${ }^{\text {t P X }}$ | $\overline{\mathrm{OE}}$ to $\mathrm{S}_{\mathrm{i}}$ | All |  |  | 20 | 40 |  | 20 | 30 | ns |
| ${ }^{\text {t P Z }}$ ( | $\overline{\mathrm{OE}}$ to $\mathrm{S}_{\mathrm{i}}$ | All |  |  | 15 | 40 |  | 15 | 30 | ns |

## Timing Waveforms

Setup and Hold Times ('S557)


NOTE: If the rising edge of G occurs before ( $\mathrm{T}_{\mathrm{SU}} \mathrm{MIN}^{-1} \mathrm{~W}_{\mathrm{MIN}}$ ) from the inputs changing, then the applicable propagation delays are tPD, tPD2 and tpD3, (and not tpD4). In this case the time at which the results arrive at the outputs depends on when the inputs change instead of when the rising edge of G occurs.

## Propagation Delay



Latch-Enable Pulse Width ('S557)


## Test Waveforms

| TEST | $\mathbf{v}_{\mathbf{X}}$ |  | OUTPUT WAVEFORM - MEAS. LEVEL |
| :---: | :---: | :---: | :---: |
| All tpd | 5.0 V |  |  |
| tpXZ | for <br> $\mathbf{t}_{\text {PHZ }}$ | $\begin{array}{\|c\|} \hline \text { for } \\ \text { t PLZ } \end{array}$ |  |
| tpZX | $\begin{array}{\|c\|} \hline \text { for } \\ \text { t PZH } \end{array}$ | $\begin{array}{\|c\|} \hline \begin{array}{c} \text { for } \\ \text { t PZL } \end{array} \\ \hline 5.0 \mathrm{~V} \\ \hline \end{array}$ |  |

## Test Load



* The "TEST POINT" is driven by the output under test, and observed by instrumentation.


## Definition of Timing Diagram

| WAVEFORM | INPUTS |
| :--- | :---: |
| OUTPUTS |  |
| X |  |
| W |  |


| SUMMARY OF SIGNALS/PINS |  |
| :---: | :--- |
| $\mathrm{X}_{7}-\mathrm{X}_{0}$ | Multiplicand 8-bit data inputs |
| $\mathrm{Y}_{7}-\mathrm{Y}_{0}$ | Multiplier 8-bit data inputs |
| $\mathrm{X}_{\mathrm{M}}, \mathrm{Y}_{\mathrm{M}}$ | Mode control inputs for each data word; LOW for <br> unsigned data and HIGH for twos-complement <br> data |
| $\mathrm{S}_{15}-\mathrm{S}_{0}$ | Product 16-bit output |
| $\bar{S}_{15}$ | Inverted MSB for expansion |
| $\mathrm{R}_{\mathrm{S}}, \mathrm{R}_{\mathrm{U}}$ | Rounding inputs for signed and unsigned data, <br> respectively ('S558 only) |
| G | Transparent latch enable ('S557 only) |
| $\overline{\mathrm{OE}}$ | Three-state enable for $\mathrm{S}_{15}-\mathrm{S}_{0}$ and $\overline{S_{15}}$ outputs |
| R | Rounding input for signed or unsigned data; <br> combined internally with $X_{M}, Y_{M}$ <br> ('S557 only) |

ROUNDING INPUTS
'S557

| INPUTS |  |  | ADDS |  |
| :---: | :---: | :---: | :---: | :---: |
| $\mathbf{X}_{\mathbf{M}}$ | $\mathbf{Y}_{\mathbf{M}}$ | $\mathbf{R}$ | $\mathbf{2}^{\mathbf{7}}$ | $\mathbf{2}^{\mathbf{6}}$ |
| L | L | $H$ | YES | NO |
| L | $H$ | $H$ | NO | YES |
| $H$ | L | $H$ | NO | YES |
| $H$ | $H$ | $H$ | NO | YES |
| $X$ | $X$ | L | NO | NO |

'S558

| INPUTS |  | ADDS |  | USUALLY USED WITH |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathbf{R}_{\mathbf{U}}$ | $\mathbf{R}_{\mathbf{S}}$ | $\mathbf{2}^{\mathbf{7}}$ | $\mathbf{2}^{\mathbf{6}}$ | $\mathbf{X}_{\mathbf{M}}$ | $\mathbf{Y}_{\mathbf{M}}$ |
| L | L | NO | NO | X | X |
| L | H | NO | YES | $\mathrm{H} \dagger$ | $\mathrm{H} \dagger$ |
| $H$ | L | YES | NO | L | L |
| $H$ | H | YES | YES | $*$ | $*$ |

†In mixed mode, one of these could be Low but not both.
*Usually a nonsense operation. See applications section of data sheet.

MODE CONTROL INPUTS

| OPERATING <br> MODE | INPUT DATA |  | MODE <br> CONTROL <br> INPUTS |  |
| :---: | :---: | :---: | :---: | :---: |
|  | $\mathbf{X}_{\mathbf{7}}-\mathbf{X}_{\mathbf{0}}$ | $\mathbf{Y}_{\mathbf{7}}-\mathbf{Y}_{\mathbf{0}}$ | $\mathbf{X}_{\mathbf{M}}$ | $\mathbf{Y}_{\mathbf{M}}$ |
|  | Unsigned | Unsigned | L | L |
| Mixed | Unsigned | Twos-Comp. | L | H |
|  | Twos-Comp. | Unsigned | H | L |
| Signed | Twos-Comp. | Twos-Comp. | H | H |

*Identical with product result passing through latch.

## Functional Description

The 'S557 and 'S558 multipliers are $8 \times 8$ full-adder Cray arrays capable of multiplying numbers in unsigned, signed, twoscomplement, or mixed notation. Each 8-bit input operand $X$ and $Y$ has associated with it a mode control which determines whether the array treats this number as signed or unsigned. If the mode control is at High logic level, then the operand is treated as a twos-complement number with the most-significant bit having a negative weight; whereas, if the mode control is at a Low logic level, then the operand is treated as an unsigned number.

The multiplier provides all 16 product bits generated by the multiplication. For expansion during signed or mixed multiplication the most-significant product bit is available in both true and complemented form. This allows an adder to be used as a subtractor in many applications and eliminates the need for certain SSI circuits.

Two additional inputs to the array, $R_{S}$ and $R_{U}$, allow the addition of a bit at the appropriate bit position so as to provide rounding to the best signed or unsigned fractional eight-bit result. These inputs can also be used for rounding in larger multipliers. In the 'S557, these two inputs are generated internally from the mode controls and a single R input.

The product outputs of the multiplier are controlled by an assertive-low Output Enable control. When this control is at a Low logic level the multiplier outputs are active, while if the control is at a High logic level then the outputs are placed in a high-impedance state. This three-state capability allows several multipliers to drive a common bus, and also allows pipelining of multiplication for higher-speed systems.

## Rounding

Multiplication of two n-bit operands results in a $2 n$-bit product $\dagger$. Therefore, in an n-bit system it is necessary to convert the double-length product into a single-length product. This can be accomplished by truncating or rounding. The following examples illustrate the difference between the two conversion techniques in decimal arithmetic:

$$
\begin{aligned}
& \left.\begin{array}{l}
39.2 \rightarrow 39 \\
39.6 \rightarrow 39
\end{array}\right\} \text { Truncating } \\
& \left.\begin{array}{l}
39.2+0.5=39.7 \rightarrow 39 \\
39.6+0.5=40.1 \rightarrow 40
\end{array}\right\} \quad \text { Rounding }
\end{aligned}
$$

Obviously, rounding maintains more precision than truncating, but it may take one more step to implement. The additional step involves adding one-half of the weight of the single-length LSB to the MSB of the discarded part; e.g., in decimal arithmetic rounding 39.28 to one decimal point is accomplished by adding 0.05 to the number and truncating the LSB:

$$
39.28+0.05=39.33 \rightarrow 39.3
$$

The situation in binary arithmetic is quite similar, but two cases need to be considered: signed and unsigned data representation. In signed multiplication, the two MSBs of the result are identical, except when both operands are -1 ; therefore, the best single-length product is shifted one position to the right with respect to the unsigned multiplications. Figure 1 illustrates these two cases for the $8 \times 8$ multiplier. In the signed case, adding one-half of the $S_{7}$ weight is accomplished by adding 1 in bit position 6, and in the unsigned case 1 is added to bit position 7. Therefore, the 'S558 multiplier has two rounding inputs, $R_{s}$ and $R_{u}$. Thus, to get a rounded single-length result, the appropriate R input is tied to $\mathrm{V}_{\mathrm{CC}}$ (logic High) and the other $R$ input is grounded. If a double-length result is desired, both $R$ inputs are grounded for the ' S 558 , and the single R input is grounded for the ' S 557 .
†In general: multiplication of an M -bit operand by an N -bit operand results in an ( $\mathrm{M}+\mathrm{N}$ )-bit product.


NOTES:
(a) In signed (twos-complement) notation, the MSB of each operand is the sign bit, and the binary point is to the right of the MSB. The resulting product has a redundant sign bit and the binary point is to the right of the second MSB of the product. The best eight-bit product is from $\mathrm{S}_{14}$ through $\mathrm{S}_{7}$, and rounding is performed by adding " 1 " to bit position $\mathrm{S}_{6}$.
(b) In unsigned notation the best 8 -bit product is the most significant half of the product and is corrected by adding " 1 " to bit position $\mathrm{S}_{7}$.

Figure 1. Rounding the Result of Binary Fractional Multiplication

## Signed Expansion

The most-significant product bit has both true and complement outputs available. When building larger signed multipliers, the partial products (except at the lower stages) are signed numbers. These unsigned and signed partial products must be added together to give the correct signed product. Having both the true and complemented form of the mostsignificant product bit available assists in this addition. For example, say that two signed partial products must be added and MSI adders are used; we then have the situation of adding together the carry from the previous adder stage plus the addition of the two negative most-significant partial-product bits. The result of adding these variables must be a positive sum and a negative carry (borrow). The equations for this are:

$$
\begin{aligned}
& S=A \oplus B \oplus C \\
& C_{\text {OUT }}=A B+B C+C A
\end{aligned}
$$

where $C$ is the carry-in and $A$ and $B$ are the sign bits of the two partial products.
Now an adder produces the equations:

$$
\begin{aligned}
& S=A \oplus B \oplus C \\
& C_{O U T}=A B+B C+C A
\end{aligned}
$$

Examining these equations, it can be seen that, if the inversions of $A$ and $B$ are used, then the most significant sum bit of the
adder is the sign extension bit.
Sign ext $=A B+B \bar{C}+\bar{C} A=\overline{\bar{A} \bar{B}+\bar{B} C+C \bar{A}}$,
and the sum remains the same.

## 16x16 Twos-Complement Multiplication

The 16-bit $X$ operand is broken into two 8 -bit operands ( $X_{7}-X_{0}$ and $\left.X_{15}-X_{8}\right)$, as is the $Y$ operand. Since the situation is that of a cross-product, four partial products are generated as follows:

$$
\begin{aligned}
& A=X_{L} * Y_{L} \\
& B=X_{L} * Y_{H} \\
& C=X_{H} * Y_{L} \\
& D=X_{H} * Y_{H}
\end{aligned}
$$

where the subscript $L$ stands for bits 7-0, ("low or least-significant half), and the subscript H stands for bits 15-8.
Expanded twos-complement multiplication requires a sign extension of the B and C partial products. Thus, $\mathrm{B}_{15}$ and $\mathrm{C}_{15}$ need to be extended eight positions to the left (to align with $\mathrm{D}_{15}$ ). In this approach two more adders are required. But the complement of the MSB ( $\bar{S}_{15}$ ) on the 'S557/8 can be used to save these two adders. Figure 2 shows the implementation of $16 \times 16$ signed twos-complement multiplication in this manner.


* THESE ARE ADDER BLOCKS USING THE 'S381, A 4-BIT ALU FUNCTION GENERATOR, TO PERFORM A HIGH-SPEED ADD OPERATION. THE 'S182 IS A LOOKAHEAD CARRY GENERATOR AND REDUCES THE PROPAGATION DELAY. ALL OF THE ABOVE PARTS ARE AVAILABLE FROM MONOLITHIC MEMORIES INCORPORATED.
TOTAL MULTIPLY TIME = MULTIPLIER DELAY + ADDER LEVEL 1 DELAY + ADDER LEVEL 2 DELAY = $\mathbf{6 0 + 4 4 + 6 4 = 1 6 8 \mathbf { n s e c } , ~}$
Figure 2. 16x16 Twos-Complement Signed Multiplication


Figure 3. Unsigned Expansions of the $8 \times 8$ Multiplier to $16 \times 16$ Multiplication

## Applications:

## How to Design Superspeed Cray Multipliers with '558s by Chuck Hastings

Multiplication, as most of us think of it, is performed by repeated addition and shifting. When we multiply using pencil and paper, according to the familiar elementary-school method, we first write down the multiplicand, and then write down the multiplier immediately under it and underline the multiplier. Then we take the least-significant digit of the multiplier, multiply that digit by the entire multiplicand, and record the answer in the top row of our workspace, underneath the line. Then we repeat, using now the second-least-significant multiplier digit, and record that answer below the first one, pushed one digit position (that is, "shifted") to the left. This process continues until we run out of multiplier digits (or out of patience), at which point we add up the constants of the whole diamond-shaped workspace and record at the bottom an answer which consists of either $m+n-1$ digits or $m+n$ digits, where there are $m$ digits in the multiplier and n digits in the multiplicand. An example, voila':

| 125 | (multiplicand) |
| :--- | :--- |
| $\times 107$ | (multiplier) |
| 875 | $(7 \times 125)$ |
| 000 | $(0 \times 125$, shifted left one digit position) |
| 125 | $(1 \times 125$, shifted left two digit positions) |
| 13375 | (sum of the above) |

Figure 4. Decimal Multiplication

The decimal number system has no monopoly on truth our ancestors simply happened to have ten fingers at the time when someone came up with the idea of counting. Binary numbers, as you know, are more copacetic than are decimal numbers with digital-logic elements, which like to settle comfortably into one voltage state ("High) or another ("Low"), rather than into one of ten different states. So we can repeat the above example using binary numbers, right? First, we convert our multiplicand and multiplier to binary:

$$
\begin{aligned}
& 125_{10}=01111101_{2} \\
& 107_{10}=01101011_{2}
\end{aligned}
$$

The subscripts 10 and 2 refer to the "base" or "radix" of the number system, 10 for decimai and 2 for binary. (Remember your New Math?) For sneaky reasons to be revealed soon, l've used 8 -bit binary numbers, which is one bit more than necessary for my example, and added a leading zero. So, we multiply:

$$
\begin{gathered}
\frac{01111101_{2}}{\times \frac{01101011_{2}}{01111101}}={ }^{0125_{10}} \\
01111101 \\
00000000 \\
01111101 \\
00000000 \\
01111101 \\
01111101 \\
\frac{00000000}{0011010000111111}=13375_{10}
\end{gathered}
$$

Figure 5. Binary Multiplication

I've left off the remarks this time, but they're just like the remarks in the decimal example, at least in principle. Just in case you doubt this answer, l'll convert it back:

| 1 | 1 |  |
| ---: | ---: | ---: |
| 1 | 2 |  |
| 1 | 4 |  |
| 1 | 8 |  |
| 1 | 16 |  |
| 1 | 32 |  |
| 0 | 0 | $\left(\begin{array}{r}64) \\ 0\end{array}\right.$ |
| 0 | 0 | $(128)$ |
| 0 | 0 | $(256)$ |
| 1 | 1024 | $(512)$ |
| 0 | 0 | $(2048)$ |
| 1 | 4096 |  |
| 1 | 8192 |  |
| 0 | 0 | $(16384)$ |
| 0 | 0 | $(32768)$ |
|  | 13375 |  |

Figure 6. Binary-to-Decimal Conversion

Now look carefully at the diamond-shaped array of numbers in the workspace in Figure 5. Each row is either the multiplicand 01111101 , or else all zeroes. The 01111101 rows correspond to " 1 " digits in the multiplier, and the all-zero rows to "0" digits in the multiplier. Life does get simpler in some ways when we switch to binary numbers: "multiplying a multiplier digit by the multiplicand" now means just gating a copy of the multiplicand into that position if the digit is "1," and not doing so if the digit is " 0 ."

Seymour Cray, the master computer designer from Chippewa Falls, Wisconsin, whose career has spanned three companies (Univac, Control Data, and now Cray Research) and many inventions, first observed some time in the late 1950s that computers also could actually multiply this way, if one merely provided enough components. This last qualifying remark; in those days when even transistors, let alone integrated circuits, in computers were still a novelty was by no means a trivial one! To prove his point (and satisfy a government contract), Cray designed, and Control Data built, a $48 \times 48$ multiplier which operated in one microsecond, about 1960. This multiplier was part of a special-purpose array processor for a classified application, and was so big that a CDC 1604 (then considered a large-scale processor) served as its input/output controller. In principle, such a multiplier at that time would have had to consist of 4848 -bit full adders or "mills," each of which received one input 48-bit number from the outputs of the mill immediately above it in the array, and the other 48 -bit number from a gate which either allowed the multiplicand to pass through, or else supplied an ali-zero 48 -bit number. Actually, these mills have to be somewhat longer than 48 bits. Anyway, that is at least 2304 full adders, and in 1960 a full-adder circuit normally occupied one small plug-in circuit card.

A later version of this multiplier, in the CDC 7600 supercomputer, could produce one $48 \times 48$ product out every 275 nanoseconds on a pipelined basis. The pipelining was asynchronous, and the entire humungous array of adders and gating logic could have up to three different products rippling down it at a given instant!

Back to the 1980s. Monolithic Memories has for several years produced an $8 \times 8$ Cray multiplier, the $57 / 67558$, as a single $600-$ mil $40-$ pin DIP. After we invented this part, AMD secondsourced it, and by now it has become an industry standard. We now also have faster pin-compatible parts, the 54/74S558 and 54/74S557. Like other West Coast companies 2,000 miles from Wisconsin and Minnesota where Seymour Cray does his inventing, Monolithic Memories previously used the term "combinatorial multiplier" instead of "Cray multiplier" for this type of part. However, "combinatorial multiplier" has nine extra letters and five extra syllables, and also inadvertently implies that the technique involves combinatorial logic rather than arithmetic circuits. Some West Coast designs, including our 67558, use a modified internal array with only half as many fulladder circuits and slightly different interconnections, based on the two-bit "Booth-multiplication" algorithm (see reference 1), plus the "Wallace-tree" or "carry-save adder" technique (see references 2 and 3 ). Conceptually, however, the entire chip or system continues to operate as a Cray multiplier.
The '558, in particular can be thought of as a static logic network which fits exactly the binary multiplication example of Figure 5. (See now why I insisted on using 8-bit binary numbers?) There are no flipflops or latches whatever in the ' 558 - it is a "flowthrough" device. Its 40 pins are used up as follows:

| Use of Pins | Input, Output, <br> or Voltage | Number <br> of Pins |
| :--- | :---: | :---: |
| Multiplier | 1 | 8 |
| Multiplicand | 1 | 8 |
| Double-Length Product | 0 | 16 |
| Complement of Most- | 0 | 1 |
| Significant Bit of Double- |  |  |
| Length Product | O |  |
| 3-State Output Enable | 1 | 1 |
| Number-Interpretation- |  | 2 |
| $\quad$ Mode Control | V | 2 |
| Rounding Control for Product |  | 2 |
| Power and Ground |  | 40 |

## Table 1. Use of Pins in the '558

The two number-interpretation-mode control pins, one for the multiplier and one for the multiplicand, allow the format for each of these two 8 -bit input numbers to be chosen independently, as follows:

## Control Input <br> L <br> H <br> Interpretation of 8-bit Input Number <br> 8-bit unsigned <br> 7-bit plus a sign bit

Table 2. Mode Control Input Encoding
The two rounding control pins allow either integer (rightjustified) or fractional (left-justified) interpretation of the 14-bits-plus-sign double-length product of two 7 -bits-plus-sign numbers for internal rounding of the double-length result to the most accurate 8 -bit number. The control encoding is:

| $R_{\mathbf{S}}$ Input | $\mathbf{R}_{\mathbf{U}}$ Input | Effect |
| :---: | :---: | :--- |
| $L$ | $L$ | Disable Rounding |
| $L$ | $H$ | Round Unsigned |
| $H$ | $L$ | Round Signed |
| $H$ | $H$ | Nonsense (see below) |

## Table 3. Rounding Control Input Encoding

Rounding is normally disabled if the entire 16-bit double-length product output is to be used. If only an 8-bit subset of this product is to be used, this subset can be either bits 15-8 for unsigned rounding as shown in Figure 7, or bits 14-7 for signed rounding as shown in Figure 8. In either case, a " 1 " is forced into the ' 558 's internal adder network at the bit position indicated by the arrow; adding a " 1 " into the bit position below the least-significant bit of the final answer has the effect of rounding, as you can see after a little thought. Obviously, forcing a " 1 " into both of these adder positions at the same time is a nonsense operation for most applications - it adds a " 3 " into the middle of the double-length result.


Figure 7. Unsigned Rounding


Figure 8. Signed Rounding
By now you probably have a fairly good idea of what a '558 is, and would like a few hints as to how to use it, right? First of all, there is an occasional application in things like video games for very fast multiplication, either $8 \times 8$ or $16 \times 16$, controlled by an 8 bit microprocessor, where there would be one ' 558 per system (see reference 4). More typically, however, the '558 is a building block, and several of them are used within one system; in fact, maybe more than several - "many." In the usual Silicon-Valley jargon, we can cascade a number of '558 (8x8) Cray-multiplier chips to create larger Cray multipliers at the systems level.
For the sake of concreteness, l'll discuss the case of $56 \times 56$ multipliers, which are appropriate in floating-point units which deal with "IBM-long-format" numbers which have a 56 -bit mantissa. Any computer which emulates, or uses the same floating-point format as, any of the following computers can use such a multiplier:

## IBM 360/370

Amdahl 470
Data General Eclipse
Gould/System Engineering SEL 32
Norsk Data 500 (different format)
There are two basic approaches: serial-parallel, and fully parallel. The serial-parallel approach uses seven '558s, and requires seven full multiply-and-add cycles. On the first cycle, the least-significant eight bits of the multiplier are multiplied by the entire multiplicand, and this partial product is saved. On the second cycle, the next-least significant eight bits of the multiplier are multiplied by the multiplicand, and that product (shifted eight bit positions to the left) is added into the first partial product to form the new partial product. And so forth, for five more cycles. It's almost like our decimal-multiplication example of Figure 1, except that instead of base-10 decimal digits we now have base- 256 superdigits.

The fully-parallel approach totally applies Cray's usual design philosophy (sometimes characterized as "big, fast, and simple") at the systems level. It uses 49 ' 558 s , in seven ranks; the 'i'th rank performs an operation corresponding to that done during the ' 1 'th cycle in the serial-parallel implementation. In principle, a complete mill is used to add the outputs of one rank of ' 558 s to those of the rank above it. Or, alternatively, these mills can be laid out in a "tree" arrangement, such as:


Figure 9. "Tree" Summing Arrangement of Mills for a $\mathbf{5 6 x 5 6}$ Cray Multiplier
Each letter stands for one rank of '558s, and each " + " stands for a mill of the indicated length. More involved "Wallace-tree" techniques are usually preferable. (See reference 3). If the least-significant half of the double-length product is never needed, only 34 'S558s are required. There is one subtlety which needs to be mentioned. If, conceptually, a '558 looks like a diamond -


Figure 10. A Single '558 in "Diamond" Notation
then, the $8 \times 56$ multiplier for the serial-parallel configuration (which is also one rank of the fully-parallel configuration, which has seven such ranks) looks like this:

## 8-BIT PORTION

OF
MULTIPLIER


PRODUCT
Figure 11. $8 \times 56$ Cray Multiplier in "Diamond" Notation
As you may discover after a moment's thought, each slanted double line in Figure 8 calls for addition of the outputs of two '558s - the eight most significant bits of one, and the eight least-significant bits of the next one to the left. There must also be an extra adder (or at least a "half adder") to propagate the carries from this addition all the way over to the left end of the result. The upshot is that an extra 56 -bit mill is needed, in addition to the '558s. The eight least-significant bits of the leastsignificant ' 558 do not have to go through this mill, since they do not get added to anything else.
One final note: building up a large Cray-multiplier configuration out of '558s requires a lot of full adders, or else a lot of something else equivalent to them. Monolithic Memories also makes the 54/74S381 (a 4-bit "ALU" or "Arithmetic Logic Unit") and the 54/74S182 (a carry-bypass circuit which works well with the '381); and two faster ALUs, the 54/74F381 and the $54 / 74$ F382 are in design. These ALUs and bypasses are excellent building blocks from which to assemble the mills used for summation within a rank of '558s, and also the mills used for tree-summation of the outputs of all ranks. For how to put together one of these mills using '381s, '382s, and '182s, see reference 1. For how to use PROMs as Wallace trees, see reference 3.
Now you can go ahead, design your Cray multiplier out of '558s, and start multiplying full-length numbers together in a fraction of a microsecond. Sound like fun?

## References

1. "Doing Your Own Thing in High-Speed Digital Arithmetic," Chuck Hastings, Monolithic Memories Conference Proceedings Reprint CP-102
2. "Real-Time Processing Gains Ground with Fast Digital Multiplier," Shlomo Waser and Allen Peterson, Electronics, September 29, 1977.
3. "Big, Fast and Simple - Algorithms, Architecture, and Components for High-End Superminis," Ehud "Udi" Gordon and Chuck Hastings, 1982 Southcon Professional Program, Orlando, Florida, March 23-25, 1982, paper no. 21/3.
4. "An $8 \times 8$ Multiplier and 8 -bit $\mu \mathrm{P}$ Perform $16 \times 16$-bit Multiplication," Shai Mor, EDN, November 5, 1979, Monolithic Memories Article Reprint AR-109.

NOTE: All of these references are available as application notes from Monolithic Memories Inc.


10-60


[^0]:    * During operations when the bus is being used to output data.

[^1]:    * THESE ARE ADDER BLOCKS USING THE 'S381, A 4-BIT ALU FUNCTION GENERATOR, TO PERFORM A HIGH SPEED ADD OPERATION. THE'S182 IS A LOOK-AHEAD CARRY GENERATOR WHICH REDUCES THE PROPAGATION DELAY. ALL THE ABOVE PARTS ARE AVAILABLE FROM MONOLITHIC MEMORIES INCORPORATED.

