In modern high-speed rendering environments, many low-level calculations are computed away from the CPU on external application specific graphics chips. By moving these low-level, repetitive, relatively simple calculations onto an external chip it is possible to achieve remarkable speedups in the total rendering time.
While rendering polygons onto the screen, it is often necessary to determine whether certain polygons are in front of other polygons. Using a BSP tree or other space partitioning device it is often possible to determine the relative Z-order of the polygons in linear time. Looking at a particular scanline, (a particular row of pixels on the screen,) it is then possible to draw each polygon onto the screen in back to front order with polygons in front overwriting polygons in back.
An easy addition to make to this model is the feature of translucency. If a polygon that is being inserted into the scene is 50% translucent then the resulting pixel values will be 50% the value of the polygon and 50% the value of the polygons behind it.
For our scanline buffer, each pixel that is inserted has 8 possible degrees of translucency, (0% - 87.5%). In addition, each pixel has 8 bits of color information.
Externally, this chip consists of 19 input pins, 9 output pins, 1 pin for ground, and 1 pin for VDD.
The 19 input pins consist of:
The 9 output pins consist of:
Phi1 and Phi2 should be connected to non-intersecting inverted square waves, (the standard for two-phase clocking.)
The entire chip takes two clock cycles to perform every operation, therefore we shall refer to cycle #1 and cycle #2 for each operation.
For a Write:
In order to perform a write into the buffer, you should raise CS (Enabling the Chip) and raise RW (Indicating a write) before the beginning of cycle #1. You should also place the address of the location that you want to write on the ADDR pins, the value you want to write in on the VAL pins, and the translucency value you want on the TRAN pins. All this is done before Phi1 goes high during cycle #1.
After Phi2 goes high during cycle #1, CS, RW, and ADDR have been latched and can be changed to other values. If you do not want to write again you should set CS low.
After Phi1 goes low during cycle #2, VAL and TRAN can be changed.
For a Read:
In order to initiate the Read cycle, you should raise CS and lower RW before the beginning of cycle #1. This will begin the internal output/clear cycle. As long as this cycle is going on the BUSY signal will be high. After Phi2 rises during cycle #1 and before Phi2 rises during cycle #2 the current value will be output on the OUT pins. This cycle will continue until it has cleared and outputted all values in the Array. As soon as the last value has been written, the BUSY signal will go low after Phi2 rises during cycle #2. As soon as this signal goes low you can change CS and RW.
Internally, the chip consists of 3 main components. The Ram Array, the Averager, and the Control Logic.
For a Write:
During the first clock cycle of the two-cycle phase, the address bits get latched and are fed into the address decoder of the SRAM array. The SRAM outputs the current value at that location and that value is then fed into the averager. Based on the translucency value and the 8 bit input value, the averager outputs the new averaged value. This value is then latched right before the second cycle starts.
During the second cycle, the WMEM signal goes high and the latched values get written back into the memory.
For a Read:
During the read cycle certain things happen during each two-cycle phase. During the first cycle, the counter outputs its current value to the address decoder. The SRAM array passes the referenced value through the tristate and out to the Pads. During the second cycle, the background color gets written into the referenced SRAM location. The counter increments. As soon as the counter reaches its maximum value the BUSY signal goes low and the read cycle ends.
The memory module consisted of a 16x8 SRAM array, a 16x1 array of address line decoders, and a collection of glue logic to tie it all together. It had 8 input lines for the value to be written, 8 output lines for the value to be read, and another 4 input lines for the address.
The SRAM array originally was going to consist of 32 addressable 8 bit locations. Unfortunately, there wasn't quite enough space for a full 32x8 array, so we decided to scale down to a 16x8 array, (the advantages of using an array size that was not a power of two were negligable.)
The array was arranged with the 8 data lines running vertically across the array, with the particular RAM cell enable lines running horizontally across. In this manner, one 8 bit value could be found on each horizontal slice of the array.
The address decoder was positioned along the side of the array and connected to the enable lines for the horizontal slices.
[See attached ram16x8.gif]
The individual RAM cell consisted of the standard two cross-coupled inverters, a dual-pass gate controlled by the enable line, and the data line itself. We decided to use a single data line (C), rather than the standard C and Cbar, in order to save space. After extensive testing using hspice we determined that the circuit would still function well with only the single data line. In addition, the use of hspice also allowed us to fine tune the transistor sizes of the inverters and the pass gates.
[See attached ram.gif]
The ram decoder consisted of a 5-input Nand gate with 10 inputs of Metal 2 running vertically across the gate. These 10 inputs corresponded to the 5 address lines and their inverses. (Since we later decided to only use a 16x8 SRAM array, the high address line was always tied low.) Using these 10 input lines, it was possible to pick out 5 of the 10 lines, connect those lines to the terminals of the Nand gate and thus decode that particular address. For example, if we wanted to decode 0010, we would connect A3, A2, A1bar, and A0. For each of the 16 decoder-cells these connections had to be wired by hand.
[See attached ram_decoder.gif]
The 10 inputs to the decoder array were generated using a collection of 5 inverters. These 5 inverters were positioned at the top of the decoder array and were connected to the 5 address lines coming out of the 5x2 MUX.
[See attached decoder_cap.gif]
In addition, the signal being generated by the RAM decoder also needed to be inverted in order to send both the Enable and the EnableBar to the dual pass transistor. A 16x1 array of these inverters were placed between the decoders and the SRAM array.
[See attached ram_cap.gif]
The chip was controlled by several different important signals. These consisted of:
The brain of the chip consisted of an FSM with 3 inputs, 1 bit of internal state, and 5 outputs.
The 3 inputs were:
The internal state bit was CD (Cycle Divider). The 5 outputs were those described above, (WMEN, CNT, OV, OC, BUSY).
This is the state transition graph which describes the chip's behavior.
CS RW D CD | CD' WMEM CNT OV OC BUSY --------------------------------------------------- 1 * * 0 | 1 0 0 0 0 0 1 * * 1 | 0 0 0 0 0 0 0 1 * 0 | 1 0 0 0 0 0 0 1 * 1 | 0 1 0 0 0 0 0 0 0 0 | 1 0 0 1 1 1 0 0 0 1 | 0 1 1 0 1 1 0 0 1 0 | 1 0 0 1 1 1 0 0 1 1 | 0 1 1 0 1 0
These state transitions correspond to the following logical relations:
These logical relations were implemented using standard cell combinations of Nands, Nors, Inverters, and Flipflops.
[See attached brain.gif]
[See attached nand.gif]
[See attached nor.gif]
[See attached inv.gif]
[See attached flipflop.gif]
The counter was a 5 bit counter implemented as a cascading series of Flipflops, Nands, Inverters, and Xors.
It had a single enable line which was And'ed with the two clock signals, and 5 output lines.
[See attached counter.gif]
The Input Buffer was responsible for latching the input values during the first clock cycle of each two clock cycle phase. The values that were latched included the 4 address lines, CS, and RW. It was unnecessary to latch the 8 bit input value since it would be eventually latched in the register at the other end of the averager.
The buffer consisted of 7 flipflops, and was enabled by (!CD) so that it would always latch in the first clock cycle.
[See attached input_buffer.gif]
The 1x8 register was implemented as a series of connected flipflops.
[See attached reg8.gif]
The TriState cell consisted of a dual pass transistor controlled by an Enable line and an EnableBar line. These cells were connected together to form a 1x8 tristate buffer with an inverter placed at one end in order to invert the Enable signal.
[See attached tri1.gif]
[See attached tri8.gif]
[See attached mux5x2.gif]