Application-specific integrated circuits: smith pdf download






















The slave latch will then keep this value of M at the output Q, despite any changes at the D input while the clock is low Figure 2.

When the clock goes high again, the slave latch will store the captured value of M and we are back where we started our explanation. The combination of the master and slave latches acts to capture or sample the D input at the negative clock edge, the active clock edge. This type of flip-flop is a negative-edge—triggered flip-flop and its behavior is quite different from a latch. A bubble shows the input is sensitive to the negative edge. To build a positive-edge—triggered flip-flop we invert the polarity of all the clocks—as we did for a latch.

The waveforms in Figure 2. We must keep the data stable a fixed logic '1' or '0' for a time t SUprior to the active clock edge, and stable for a time t H after the active clock edge during the decision window shown. We say the trip point is 50 percent or 0. Common choices are 0. The flip-flop in Figure 2. Some people use the term register to mean an array more than one of flip-flops or latches on a data bus, for example , but some people use register to mean a single flip-flop or a latch.

This is confusing since flip-flops and latches are quite different in their behavior. When I am talking about logic cells, I use the term register to mean more than one flip-flop. To add an asynchronous set Q to '1' or asynchronous reset Q to '0' to the flip-flop of Figure 2. For both set and reset we replace all four inverters: I2, I3, I6, and I7. An input that forces Q to '1' is sometimes also called preset. An input that forces Q to '0' is often also called clear.

The arrows in Figure 2. We can break the connection between the inverter cells and use the circuit of Figure 2. The symbol for the clocked inverter shown in Figure 2.

We can use the clocked inverter to replace the inverter—TG pairs in latches and flip-flops. For example, we can replace one or both of the inverters I1 and I3 together with the TGs that follow them in Figure 2. We cannot replace inverter I6 because it is not directly connected to a TG. We can replace the TG attached to node M with a clocked inverter, and this will invert the sense of the output Q, which thus becomes QN. If we wish to build a flip-flop with a fast clock-to-QN delay it may be better to build it using clocked inverters and use inverters with TGs for a flip-flop with a fast clock-to-Q delay.

In fact, since we do not always use both Q and QN outputs of a flip-flop, some libraries include Q only or QN only flip-flops that are slightly smaller than those with both polarity outputs. It is slightly easier to layout clocked inverters than an inverter plus a TG, so flip-flops in commercial libraries include a mixture of clocked- inverter and TG implementations.

We can do so using a datapath structure. The carry out, COUT, uses the 2-of-3 majority function '1' if the majority of the inputs are '1'. In the 4-bit adder shown in Figure 2. The A inputs, B inputs, and S outputs all use m1 interconnect running in the horizontal direction—we call these data signals. We can also use m1 for control and m2 for data, but we normally do not mix these approaches in the same structure. Control signals are typically clocks and other signals common to elements.

For example, in Figure 2. To build a 4-bit adder we stack four ADD cells creating the array structure shown in Figure 2. In this case the A and B data bus inputs enter from the left and bus S, the sum, exits at the right, but we can connect A, B, and S to either side if we want. The layout of buswide logic that operates on data signals in this fashion is called a datapath. The module ADD is a datapath cell ordatapath element. Just as we do for standard cells we make all the datapath cells in a library the same height so we can abut other datapath cells on either side of the adder to create a more complex datapath.

When people talk about a datapath they always assume that it is oriented so that increasing the size in bits makes the datapath grow in height, upwards in the vertical direction, and adding different datapath elements to increase the function makes the datapath grow in width, in the horizontal direction—but we can rotate and position a completed datapath in any direction we want on a chip.

In this example the wiring is completed outside the cell; it is also possible to design the datapath cells to contain the wiring. Using three levels of metal, it is possible to wire over the top of the datapath cells. What is the difference between using a datapath, standard cells, or gate arrays? Cells are placed together in rows on a CBIC or an MGA, but there is no generally no regularity to the arrangement of the cells within the rows—we let software arrange the cells and complete the interconnect.

Datapath cell design can be harder than designing gate-array macros or standard cells. There are some newer standard-cell and gate-array tools that can take advantage of regularity in a design and position cells carefully. The problem is in finding the regularity if it is not specified. Using a datapath is one way to specify regularity to ASIC design tools.

I use heavy lines they are 1. At the risk of adding confusion where there is none, this stroke to indicate a data bus has nothing to do with mixed-logic conventions. Also, for example, A[4] is the fifth bit on the bus from the LSB. Some schematic datapath symbols include only data signals and omit the control signals—but we must not forget them.

Why might we need both of these control signals? Table 2. We need to be careful because C[0] might represent either the carry in or the carry out of the LSB stage. For an adder we set the carry in to the first stage stage zero , C[—1] or CIN[0], to '0'. Do not confuse the two different methods both of which are used in Eqs.

The delay of an n -bit RCA is proportional to n and is limited by the propagation of the carry signal through all of the stages. Equations 2.

We can use the RCA of Figure 2. Instead of propagating the carries through each stage of an RCA, Figure 2. The input, CIN, is the carry from stage i — 1. The carry in, CIN, is connected directly to the output bus S1—indicated by the schematic symbol Figure 2.

There is thus no carry propagation and the delay of a CSA is constant. At the output of a CSA we still need to add the S1 bus all the saved carries and the S2 bus all the sums to get an n -bit result using a final stage that is not shown in Figure 2. We might regard the n -bit sum as being encoded in the two buses, S1 and S2, in the form of the parity and majority functions.

We can use a CSA to add multiple inputs—as an example, an adder with four 4-bit inputs is shown in Figure 2. The last stage sums two input buses using a carry-propagate adder CPA. Notice in Figure 2. We can register the CSA stages by adding vectors of flip-flops as shown in Figure 2. This reduces the adder delay to that of the slowest adder stage, usually the CPA. By using registers between stages of combinational logic we use pipelining to increase the speed and pay a price of increased area for the registers and introduce latency.

It takes a few clock cycles the latency, equal to nclock cycles for an n -stage pipeline to fill the pipeline, but once it is filled, the answers emerge every clock cycle.

Ferris wheels work much the same way. When the fair opens it takes a while latency to fill the wheel, but once it is full the people can get on and off every few seconds. We can also pipeline the RCA of Figure 2.

We add i registers on the A and B inputs before ADD[ i ] and add n— i registers after the output S[ i ], with a single register before each C[ i ]. If we examine the propagate signals we can bypass this critical path.

Large, custom adders employ Manchester-carry chains to compute the carries and the bypass operation using TGs or just pass transistors [Weste and Eshraghian, , pp. Instead of checking the propagate signals we can check the inputs. Carry-bypass and carry-skip adders may include redundant logic since the carry is computed in two different ways—we just take the first signal to arrive. We must be careful that the redundant logic is not optimized away during logic synthesis.

If we evaluate Eq. If we continue expanding Eq. Datapath layout must fit in a bit slice, so the physical and logical structure of each bit must be similar. In a standard cell or gate array we are not so concerned about a regular physical structure, but a regular logical structure simplifies design. The Brent—Kung adder reduces the delay and increases the regularity of the carry-lookahead scheme [Brent and Kung, ]. Cell L4 is added to calculate C[2] that is lost in the translation.

The inputs, 0—7, are the propagate and carry terms formed from the inputs to the adder. The outputs of the lookahead logic are the carry bits that together with the inputs form the sum. One advantage of this adder is that delays from the inputs to the outputs are more nearly equal than in other adders. This tends to reduce the number of unwanted and unnecessary switching events and thus reduces power dissipation. A carry-select adder is often used as the fast adder in a datapath library because its layout is regular.

We can use the carry-select, carry-bypass, and carry-skip architectures to split a bit adder, for example, into three blocks. The delay of the adder is then partly dependent on the delays of the MUX between each block. Suppose the delay due to 1-bit in an adder block we shall call this a bit delay is approximately equal to the MUX delay. In this case may be faster to make the blocks 3, 4, and 5-bits long instead of being equal in size.

Now the delays into the final MUX are equal—3 bit-delays plus 2 MUX delays for the carry signal from bits 0—6 and 5 bit-delays for the carry from bits 7— Adjusting the block size reduces the delay of large adders more than 16 bits. We can extend the idea behind a carry-select adder as follows. Suppose we have an n -bit adder that generates two sums: One sum assumes a carry-in condition of '0', the other sum assumes a carry-in condition of '1'.

Both of the smaller adders generate two conditional sums as well as true and complement carry signals. This is a conditional-sum adder also often abbreviated to CSA [Sklansky, ]. We can recursively apply this technique.

The conditional-sum adder is usually the fastest of all the adders we have discussed it is the fastest when logic cell delay increases with the number of inputs—this is true for all ASICs except FPGAs. What does a design look like? We may use predesigned cells from a library or build the elements ourselves from logic cells using a schematic or a design language.

This Verilog implementation uses the same structure as Figure 2. A basic logic cell in certain Xilinx FPGAs, for example, can implement two equations of the same four variables or one equation with five variables. The equations shown in Table 2. The data in Figure 2. We can combine the different adder techniques, but the adders then lose regularity and become less suited to a datapath implementation.

This data is from a series of submicron datapath libraries. For example, a bit ripple-carry adder RCA has a delay of approximately 30 ns in a 0. The spread in delay is due to variation in delays between different inputs and outputs. Ann -bit RCA has a delay proportional to n.

The delay of an n -bit carry-select adder is approximately proportional to log 2 n. The carry-save adder delay is constant but requires a carry- propagate adder to complete an addition.

There are other adders that are not used in datapaths, but are occasionally useful in ASIC design. A serial adder is smaller but slower than the parallel adders we have described [Denyer and Renshaw, ]. The carry-completion adder is a variable delay adder and rarely used in synchronous designs [Sklansky, ]. An n -bit array multiplier has a delay proportional to n plus the delay of the CPA adders b6—f6 in Figure 2. There are two items we can attack to improve the performance of a multiplier: the number of partial products and the addition of the partial products.

A 6-bit array multiplier using a final carry-propagate adder full-adder cells a6—f6, a ripple-carry adder. Apart from the generation of the summands this multiplier uses the same structure as the carry-save adder of Figure 2. Suppose we wish to multiply 15 the multiplicand by 19 the multiplier mentally. We say B has a weight of 4 and D has a weight of 3. CSD vectors are useful to represent fixed coefficients in digital filters, for example.

We can recode using a radix other than 2. Using Eq. We pad B with a zero, B n. B 1 B 0 0, to match the first term in Eq. If B has an odd number of bits, then we extend the sign: B n B n. This is called Booth encoding and reduces the number of partial products by a factor of two and thus considerably reduces the area as well as increasing the speed of our multiplier [Booth, ]. Next we turn our attention to improving the speed of addition in the CSA array.

We can collapse the chain of adders a0—f5 5 adder delays to the Wallace tree consisting of adders 5. Each link corresponds to an adder. The holes or dots are the outputs of one stage and the inputs of the next. At each stage we have the following three choices: 1 sum three outputs using a full adder denoted by a box enclosing three dots ; 2 sum two outputs using a half adder a box with two dots ; 3 pass the outputs directly to the next stage.

The two outputs of an adder are joined by a diagonal line full adders use black dots, half adders white dots. The object of the game is to choose 1 , 2 , or 3 at each stage to maximize the performance of the multiplier. In tree-based multipliers there are two ways to do this—working forward and working backward.

In a Wallace-tree multiplier we work forward from the multiplier inputs, compressing the number of signals to be added at each stage [Wallace, ]. We can view an FA as a compressor or 3, 2 counter —it counts the number of '1's on the inputs. Thus, for example, an input of '' two '1's results in an output '10' 2. A half adder is a 2, 2 counter. To form P 5 in Figure 2. We add these in stages 1—7, compressing from Notice that we wait until stage 5 to add the last carry from column P 4 , and this means we expand rather than compress the number of signals from 2 to 3 between stages 3 and 5.

The maximum delay through the CSA array of Figure 2. To this we must add the delay of the 4-bit 9 inputs CPA stage 7. There are 26 adders 6 half adders plus the 4 adders in the CPA. The carry-save adder CSA requires 26 adders cells 1— 26, six are half adders. The final carry-propagate adder CPA consists of 4 adder cells 27— The delay of the CSA is 6 adders. The delay of the CPA is 4 adders.

In a Dadda multiplier Figure 2. Each stage has a maximum of 2, 3, 4, 6, 9, 13, 19,. Thus, for example, in Figure 2. A Dadda multiplier is usually faster and smaller than a Wallace-tree multiplier. The carry-save adder CSA requires 20 adders cells 1—20, four are half adders. The overall speed of this implementation is approximately the same as the Wallace-tree multiplier of Figure 2.

The Ferrari—Stefanelli multiplier Figure 2. There are several issues in deciding between parallel multiplier architectures: 1. Since it is easier to fold triangles rather than trapezoids into squares, a Wallace-tree multiplier is more suited to full-custom layout, but is slightly larger, than a Dadda multiplier—both are less regular than an array multiplier.

The overall multiplier speed does depend on the size and architecture of the final CPA, but this may be optimized independently of the CSA array. This means a Dadda multiplier is always at least as fast as the Wallace-tree version.

The low-order bits of any parallel multiplier settle first and can be added in the CPA before the remaining bits settle. This allows multiplication and the final addition to be overlapped in time. Any of the parallel multiplier architectures may be pipelined. We may also use a variably pipelined approach that tailors the register locations to the size of the multiplier.

Using 4, 2 , 5, 3 , 7, 3 , or 15, 4 counters increases the stage compression and permits the size of the stages to be tuned. Some ASIC cell libraries contain a 7, 3 counter—a 2-bit full-adder. A 15, 4 counter is a 3-bit full adder. There is a trade-off in using these counters between the speed and size of the logic cells and the delay as well as area of the interconnect. Power dissipation is reduced by the tree-based structures. The simplified carry-save logic produces fewer signal transitions and the tree structures produce fewer glitches than a chain.

None of the multiplier structures we have discussed take into account the possibility of staggered arrival times for different bits of the multiplicand or the multiplier. Optimization then requires a logic-synthesis tool.

Addition of numbers using redundant binary encoding avoids carry propagation and is thus potentially very fast. We can represent decimal , for example, by binary and CSD vector or 1 1 As another example, decimal can be represented by binary , 1 1 1 00, 10 1 00 1 , or 10 1 00 CSD vector. Each n -bit redundant binary number requires a rather wasteful 2 n -bit binary number for storage. Thus 10 1 is represented as , for example using sign magnitude.

The other disadvantage of redundant binary arithmetic is the need to convert to and from binary representation. We add, subtract, or multiply residue numbers using the modulus of each bit position—without any carry. The most useful choices are relative primes such as 3 and 5. The combinational datapath cells, NAND, NOR, and so on, and sequential datapath cells flip-flops and latches have standard-cell equivalents and function identically.

I use a bold outline 1 point for datapath cells instead of the regular 0. We call a set of identical cells a vector of datapath elements in the same way that a bold symbol, A , represents a vector and A represents a scalar.

We can build a ripple-borrow subtracter a type of borrow-propagate subtracter , a borrow-save subtracter, and a borrow-select subtracter in the same way we built these adder architectures. A barrel shifter rotates or shifts an input bus by a specified amount. For example if we have an eight- input barrel shifter with input ' ' and we specify a shift of ' ' 3, coded by bit position the right-shifted 8-bit output is ' '.

A barrel shifter may rotate left or right or switch between the two under a separate control. A barrel shifter may also have an output width that is smaller than the input. To use a simple example, we may have an 8-bit input and a 4-bit output.

This situation is equivalent to having a barrel shifter with two 4-bit inputs and a 4-bit output. A leading-one detector is used with a normalizing left-shift barrel shifter to align mantissas in floating-point numbers.

The input is ann -bit bus A, the output is an n -bit bus, S, with a single '1' in the bit position corresponding to the most significant '1' in the input.

If we feed the output, S, of the leading-one detector to the shift select input of a normalizing left-shift barrel shifter, the shifter will normalize the input A.

The output of a priority encoder is the binary-encoded position of the leading one in an input. This second, reversed, encoding scheme is useful in floating-point arithmetic. If A is a mantissa and we normalize A to ' ' we have to subtract 5 from the exponent, this exponent correction is equal to the output of the priority encoder.

Sometimes these are combined with a multiplier to form a multiplier—accumulator MAC. The carry-in control input, CIN[0], thus acts as an enable: If it is set to '0' the output is the same as the input. The implementation of arithmetic cells is often a little more complicated than we have explained.

This inverts COUT, so that in the following stage we must invert it again. In many datapath implementations all odd-bit cells operate on inverted carry signals, and thus the odd-bit and even-bit datapath elements are different.

In fact, all the adder and subtracter datapath elements we have described may use this technique. Normally this is completely hidden from the designer in the datapath assembly and any output control signals are inverted, if necessary, by inserting buffers.

The implementation may invert the odd carry signals, with CIN[0] again acting as an enable. This has the effect of selecting either the increment or decrement function.

Normally these register files are the densest logic and hardest to fit in a datapath. For large register files it may be more appropriate to use a multiport memory. In Section 2. The reason for this is that without intelligent placement software we do not know where a standard cell or a gate-array macro will be placed on a chip. We also have no idea of the condition of the clock signal coming into a sequential cell. The ability to place the clock buffers outside the sequential cells in a datapath gives us more flexibility and saves space.

For example, we can place the clock buffers for all the clocked elements at the top of the datapath together with the buffers for the control signals and river route in river routing the interconnect lines all flow in the same direction on the same layer the connections to the clock lines.

This saves space and allows us to guarantee the clock skew and timing. It may mean, however, that there is a fixed overhead associated with a datapath. For example, it might make no sense to build a 4-bit datapath if the clock and control buffers take up twice the space of the datapath logic.

Some tools allow us to design logic using a portable netlist. After we complete the design we can decide whether to implement the portable netlist in a datapath, standard cells, or even a gate array, based on area, speed, or power considerations. When OE is low, the output transistors or drivers , M1 and M2, are disconnected. This allows multiple drivers to be connected on a bus. It is up to the designer to make sure that a bus never has two drivers—a problem known as contention.

In order to prevent the problem opposite to contention—a bus floating to an intermediate voltage when there are no bus drivers—we can use a bus keeper or bus-hold cell TI calls this Bus-Friendly logic. A bus keeper normally acts like two weak low drive-strength cross-coupled inverters that act as a latch to retain the last logic state on the bus, but the latch is weak enough that it may be driven easily to the opposite state. Even though bus keepers act like latches, and will simulate like latches, they should not be used as latches, since their drive strength is weak.

Transistors M1 and M2 in Figure 2. Such large currents flowing in the output transistors must also flow in the power supply bus and can cause problems. There is always some inductance in series with the power supply, between the point at which the supply enters the ASIC package and reaches the power bus on the chip.

The inductance is due to the bond wire, lead frame, and package pin. Of course, it is not necessary to have all these features on every pad: We can build output-only or input-only pads. When OE is '0' the output buffer is placed in a high- impedance state. We can also use many of these output cell features for input cells that have to drive large on-chip loads a clock pad cell, for example.

Some gate arrays simply turn an output buffer around to drive a grid of interconnect that supplies a clock signal internally. With a typical interconnect capacitance of 0. We can also omit one of the driver transistors, M1 or M2, to form open-drain outputs that require an external pull-up or pull-down.

We may also add input hysteresis using a Schmitt trigger to the input buffer, I1 in Figure 2. ESD arises when we or machines handle the package leads like the shock I sometimes get when I touch a doorknob after walking across the carpet at work. Sometimes this problem is called electrical overstress EOS since most ESD-related failures are caused not by gate-oxide breakdown, but by the thermal stress melting that occurs when the n -channel transistor in an output driver overheats melts due to the large current that can flow in the drain diffusion connected to a pad during an ESD event.

Typical MM parameters use a pF capacitor typically charged to V discharged through a 25 W resistor, corresponding to a peak initial current of nearly 10 A. The charge-device model CDM , also called device charge—discharge represents the problem when an IC package is charged, in a shipping tube for example, and then grounded. If the maximum charge on a package is 3 nC a typical measured figure and the package capacitance to ground is 1. This failure mode is called latch- up. Latch-up can occur if the pn -diodes on a chip become forward-biased and inject minority carriers electrons in p -type material, holes in n -type material into the substrate.

The source—substrate and drain—substrate diodes can become forward-biased due to power-supply bounce or output undershoot the cell outputs fall below V SS or overshoot outputs rise to greater than V DD for example. An application-specific integrated circuit ASIC must not only provide the required functionality at the desired speed but it must also be economical. Get the information you need--fast! This all-embracing guide offers a thorough view of key knowledge and detailed insight.

ASIC design, using commercial tools and pre-designed cell libraries, is the fastest, most cost-effective, and least error-prone method of IC design. The book covers both semicustom and programmable ASIC types. After describing the fundamentals of digital logic design and the physical features of each ASIC type, the book turns to ASIC logic design - design entry, logic synthesis, simulation, and test - and then to physical design - partitioning, floorplanning, placement, and routing.

You will find here, in practical well-explained detail, everything you need to know to understand the design of an ASIC, and everything you must do to begin and to complete your own design. Examples throughout the book have been checked with a wide range of commercial tools to ensure their accuracy and utility. As in other landmark VLSI books published by Addison-Wesley - from Mead and Conway to Weste and Eshraghian - the author's teaching expertise and industry experience illuminate the presentation of useful design methods.

Any engineer, manager, or student who is working with ASICs in a design project, or who is simply interested in knowing more about the different ASIC types and design styles, will find this book to be an invaluable resource, reference, and guide.

Author : General Electric Company. Author : Norman G. Page: View: Read Now » V. Materials and process characterization -- v. PupilGarage presents the frequently asked questions for interviews along with important tips for the interview. Exams corner provides you notes and sample question papers to help you perform better in your exams. PupilGarage provides you several tutorials and notes for different subjects that would help in better understanding. Toggle navigation.

Help Preferences Sign up Log in. View by Category Toggle navigation. Products Sold on our sister site CrystalGraphics. Title: Application Specific Integrated Circuits. Tags: application circuits integrated rearranged rollback specific.

Latest Highest Rated. Whether your application is business, how-to, education, medicine, school, church, sales, marketing, online training or just for fun, PowerShow. And, best of all, most of its cool features are free and easy to use. You can use PowerShow. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free.

Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world.

That's all free as well!



0コメント

  • 1000 / 1000