Lab Sheet 5#

This lab sheet contains the following lab activities:

Lab 1: Ripple-Carry Adder vs. Carry-Select Adder
Lab 2: Accumulator
Lab 3: Linear Feedback Shift Register (LFSR)
Lab 4: On-chip RAM Testing
Lab 5: RAM-based FIFO Testing
Lab 6: Altera FIFO IP Core Testing
Lab 7: Signed Multiplier

Hardware / Software Required#

Intel DE10-Lite FPGA board (with a USB cable)
Computer with USB ports + Intel Quartus Prime Lite + VHDL Simulator
Digital Oscilloscope or Logic Analyzer
Jumper Wires / Dupont Wires

Lab 1: Ripple-Carry Adder vs. Carry-Select Adder#

Objective#

Learn how to implement different types of binary adders using FPGA logic.

In digital logic design, there are different ways to implement N-bit binary adders. In this lab, we will explore how to implement two types of $N$ -bit binary adders, namely the Ripple-Carry Adder (RCA) and the Carry-Select Adder (CSA), and compare the synthesis results when mapped to FPGA logic resources. Their pseudo-code is given below.

Pseudo-code for Ripple-Carry Adder:

C[0] = CIN
for i=0 to N-1:
  SUM[i] = A[i] xor B[i] xor C[i]
  C[i+1] = (A[i] AND B[i]) OR (C[i] AND (A[i] XOR B[i]))
COUT = C[N]

Note

Each $i$ -th stage of the ripple-carry adder is a full adder (FA).
CIN and COUT are the carry-in and carry-out of the $N$ -bit adder, respectively.

Pseudo-code for Carry-Select Adder (1 bit per step):

C[0] = CIN
for i=0 to N-1:
     // if C[i] = 0
     sum0   = (A[i] XOR B[i])
     c_out0 = (A[i] AND B[i])

     // if C[i] = 1
     sum1   = NOT(A[i] XOR B[i])
     c_out1 = (A[i] AND B[i]) OR (A[i] XOR B[i])

     // carry select
     if C[i] == 0:
         SUM[i] = sum0
         C[i+1] = c_out0
     else:
         SUM[i] = sum1
         C[i+1] = c_out1
COUT = C[N]

Lab Procedure#

Write VHDL code to implement an N-bit adder, where N is defined as a generic (positive integer). Use the provided VHDL entity.
Write VHDL for two different architectural implementations:
- Option 1: Ripple-Carry Adder
- Option 2: Carry-Select Adder
Write your own VHDL testbench (or use the provided code) to simulate and verify your design. Capture the simulation waveforms for your lab report.
Use the Quartus Prime Lite software to synthesize both versions of the adder and compare the synthesis results.

VHDL entity for the adder

ENTITY adder IS
    GENERIC (
        N : INTEGER := 8
    );
    PORT (
        CIN  : IN  STD_LOGIC; -- carry-in
        A, B : IN  STD_LOGIC_VECTOR(N - 1 DOWNTO 0); -- operands
        SUM  : OUT STD_LOGIC_VECTOR(N - 1 DOWNTO 0); -- result
        COUT : OUT STD_LOGIC -- carry-out
    );
END ENTITY adder;

VHDL testbench for the adder

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

ENTITY tb_adder IS
END ENTITY tb_adder;

ARCHITECTURE sim OF tb_adder IS
    CONSTANT N  : INTEGER := 4;

    SIGNAL CIN  : STD_LOGIC := '0';
    SIGNAL A, B : STD_LOGIC_VECTOR(N - 1 DOWNTO 0);
    SIGNAL SUM  : STD_LOGIC_VECTOR(N - 1 DOWNTO 0);
    SIGNAL COUT : STD_LOGIC;

    -- DUT: Try RCA or CSA here
    COMPONENT ADDER IS
        GENERIC (N : INTEGER := 8);
        PORT (
            CIN  : IN  STD_LOGIC;
            A, B : IN  STD_LOGIC_VECTOR(N - 1 DOWNTO 0);
            SUM  : OUT STD_LOGIC_VECTOR(N - 1 DOWNTO 0);
            COUT : OUT STD_LOGIC
        );
    END COMPONENT;

BEGIN
    -- Instantiate an adder
    uut : ENTITY work.adder(RCA) -- use RCA or CSA
        GENERIC MAP(N => N)
        PORT MAP(
            CIN => CIN, A => A, B => B,
            SUM => SUM, COUT => COUT
        );

    -- Test process
    stim_proc : PROCESS
        VARIABLE expected : UNSIGNED(N DOWNTO 0); -- N bits + carry
    BEGIN
        FOR intA IN 0 TO 2**N - 1 LOOP
            FOR intB IN 0 TO 2**N - 1 LOOP
                -- apply inputs
                A <= STD_LOGIC_VECTOR(to_unsigned(intA, N));
                B <= STD_LOGIC_VECTOR(to_unsigned(intB, N));

                CIN <= '0';
                WAIT FOR 10 ns;

                expected := to_unsigned(intA + intB, N + 1);
                ASSERT (unsigned(SUM) = expected(N - 1 DOWNTO 0) AND
                COUT = STD_LOGIC(expected(N)))
                REPORT "Mismatch CIN=0: A=" & INTEGER'IMAGE(intA) &
                    " B=" & INTEGER'IMAGE(intB) &
                    " SUM=" & INTEGER'IMAGE(to_integer(unsigned(SUM))) &
                    " COUT=" & STD_LOGIC'IMAGE(COUT)
                    SEVERITY ERROR;

                CIN <= '1';
                WAIT FOR 10 ns;

                expected := to_unsigned(intA + intB + 1, N + 1);
                ASSERT (unsigned(SUM) = expected(N - 1 DOWNTO 0) AND
                COUT = STD_LOGIC(expected(N)))
                REPORT "Mismatch CIN=0: A=" & INTEGER'IMAGE(intA) &
                    " B=" & INTEGER'IMAGE(intB) &
                    " SUM=" & INTEGER'IMAGE(to_integer(unsigned(SUM))) &
                    " COUT=" & STD_LOGIC'IMAGE(COUT)
                    SEVERITY ERROR;

            END LOOP;
        END LOOP;
        WAIT; -- stop simulation
    END PROCESS;

END ARCHITECTURE sim;

Questions

How do the two adder implementations differ in terms of logic resource usage?
Compare the critical paths of both designs (for example, the signal path from CIN to COUT in the circuit). Use Quartus Prime's synthesis & static timing analysis (STA) report to identify the critical paths.

Lab 2: Accumulator#

Objective#

Learn how to implement an accumulator in VHDL and test its function using an FPGA board.

An accumulator is a basic sequential circuit widely used in digital systems. Its main function is to add input data to its current stored value.

Two main components of the accumulator are:

Adder: this adds the input value to the current contents of the register.
Register: this stores the accumulated sum and feeds it back into the adder on the next clock cycle.

Lab Procedure#

Write VHDL code to implement an accumulator. Use an N-bit adder and an M-bit register (where M > N, e.g., N=8, M=12) to implement the accumulator. Make the design modular by creating separate components and then instantiating them in the top-level design (acc_demo).
Connect the I/O of the accumulator as follows:
- The integer (unsigned) value from the external N-bit slide switches is internally zero-extended to M bits and applied as one input of the adder.
- The output of the accumulator register is fed back into the other M-bit input of the adder.
- The M-bit result is stored in the accumulator register, decoded into BCD, and displayed on the external 4-digit seven-segment display.
- The accumulator register is updated on the rising edge of the push-button signal (after debouncing and edge detection).
Write a VHDL testbench to simulate and test your design to verify its correctness.
Complete the FPGA design process and test your design using the FPGA board (Terasic DE10-Lite) provided in the lab.
- Change the positions of the slide switches before pressing the button, and observe the resulting change in the accumulator value.
Demonstrate your FPGA design to the lab instructor.

VHDL entity for the accumulator demo


ENTITY acc_demo IS
    GENERIC (
        N : INTEGER := 8;
        M : INTEGER := 12
    );
    PORT (
        RST_N : IN STD_LOGIC; -- global async. active-low reset
        CLK   : IN STD_LOGIC; -- clock
        BTN   : IN STD_LOGIC; -- push button
        SW    : IN STD_LOGIC_VECTOR(N - 1 DOWNTO 0); -- N-bit input
        -- 7seg digits 3~0 (output)
        HEX0  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);
        HEX1  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);  
        HEX2  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);
        HEX3  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)   
    );
END ENTITY acc_demo;

Lab 3: Linear Feedback Shift Register (LFSR)#

Objective#

Learn how to implement an LFSR in VHDL.
Learn how to LFSRs to produce pseudo-random numbers for testing purposes.

In this task, we will learn about the LFSR, which is a special type of shift register that can be used to implement a pseudo-random number generator (not a true-random number generator) for FPGA designs.

An LFSR consists of a register (flip-flops) and a feedback function. The feedback function is typically the XOR (exclusive OR) of selected bits (called taps) from the register.

On each clock cycle, all bits of the register are shifted by one position to the right (or left). The new input bit is computed from the XOR of tapped positions. The initial state of the LFRST is called seed: The seed must not be all zeros, otherwise it will stay at all 0's forever.

An $n$ -bit LFSR can generate at most ( $2^n - 1$ ) states before repeating. If the XOR taps are chosen correctly, the LFSR is called a maximum-length LFSR.

Mathematically, the tap selection depends on the so-called primitive polynomials over GF(2), where GF(2) denotes the Galois Field with two elements: 0 and 1.

Write VHDL code to implement a circuit (with the VHDL entity name lfsr_demo) that instantiates the provided the LFSR module (lfsr) in VHDL. The function of the circuit is as follows:
- When the reset input (RST_N) is asserted ('0', active-low) or at power-up or when the button (BTN) is clicked, the initial value from the slide switches (SW) is loaded into both the LFSR and a register that stores the initial value (seed).
- Otherwise, at every rising edge of the clock, the LFSR is updated by shifting the newly generated bit into the register.
- The PULSE signal is generated by comparing the current LFSR value with the initial-value register.
- The FB signal is the feedback bit from the LFSR, and it is used as an output of the circuit.
- A pre-defined primitive polynomial is selected based on the N_BITS generic. The provided primitive polynomials for the LFSR are used to generate a maximum-length sequence.
- A clock divider is used to generate a lower-frequency internal clock signal. The LFSR unit uses this low-frequency clock.
Complete the design project and test your design on the FPGA board.
Use a digital oscilloscope with two probes to observe the FB and PULSE signals from the FPGA.
Measure the period of the PULSE signal.
Repeat the process and test your design with at least three different values of N_BITS.

VHDL entity for the design

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

ENTITY lfsr_demo IS
    GENERIC (
        N_BITS  : INTEGER := 4;
        CLK_DIV : INTEGER := 50000
    );
    PORT (
        RST_N : IN STD_LOGIC; -- async. active-low reset
        CLK   : IN STD_LOGIC; -- system clock
        BTN   : IN STD_LOGIC; -- load seed into LFSR
        SW    : IN STD_LOGIC_VECTOR(N_BITS - 1 DOWNTO 0); -- seed value
        FB    : OUT STD_LOGIC; -- feedback bit
        PULSE : OUT STD_LOGIC -- single-cycle pulse
    );
END ENTITY lfsr_demo;

VHDL for the LSFR

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

ENTITY lfsr IS
    GENERIC (
        N_BITS : INTEGER := 8 -- LFSR length (4..16 supported)
    );
    PORT (
        RST_N : IN STD_LOGIC; -- active low
        CLK   : IN STD_LOGIC;
        CE    : IN STD_LOGIC;
        LOAD  : IN STD_LOGIC;
        SEED  : IN STD_LOGIC_VECTOR(N_BITS - 1 DOWNTO 0);
        Q     : OUT STD_LOGIC_VECTOR(N_BITS - 1 DOWNTO 0);
        FB    : OUT STD_LOGIC
    );
END ENTITY lfsr;

ARCHITECTURE rtl OF lfsr IS
    SIGNAL reg : STD_LOGIC_VECTOR(N_BITS - 1 DOWNTO 0) := (OTHERS => '1');
    SIGNAL fb_bit : STD_LOGIC := '0';
BEGIN
    -- valid range
    ASSERT (N_BITS >= 4 AND N_BITS <= 16)
    REPORT "LFSR: N_BITS must be between 4 and 16"
        SEVERITY FAILURE;

    feedback_proc : PROCESS (reg)
    BEGIN
        CASE N_BITS IS
                -- n = 4:  x^4 + x^3 + 1 
            WHEN 4 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(2);
                -- n = 5:  x^5 + x^3 + 1
            WHEN 5 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(2);
                -- n = 6:  x^6 + x^5 + 1
            WHEN 6 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(4);
                -- n = 7:  x^7 + x^6 + 1 
            WHEN 7 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(5);
                -- n = 8:  x^8 + x^6 + x^5 + x^4 + 1
            WHEN 8 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(5) XOR reg(4) XOR reg(3);
                -- n = 9:  x^9 + x^5 + 1
            WHEN 9 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(4);
                -- n = 10: x^10 + x^7 + 1 
            WHEN 10 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(6);
                -- n = 11: x^11 + x^9 + 1
            WHEN 11 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(8);
                -- n = 12: x^12 + x^6 + x^4 + x + 1
            WHEN 12 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(5) XOR reg(3) XOR reg(0);
                -- n = 13: x^13 + x^4 + x^3 + x + 1 
            WHEN 13 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(3) XOR reg(2) XOR reg(0);
                -- n = 14: x^14 + x^5 + x^3 + x + 1 
            WHEN 14 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(4) XOR reg(2) XOR reg(0);
                -- n = 15: x^15 + x^14 + 1 
            WHEN 15 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(13);
                -- n = 16: x^16 + x^15 + x^13 + x^4 + 1
            WHEN 16 =>
                fb_bit <= reg(N_BITS - 1) XOR reg(14) XOR reg(12) XOR reg(3);
            WHEN OTHERS =>
                -- fallback: simple tap (not maximal-length)
                fb_bit <= reg(N_BITS - 1) XOR reg(N_BITS - 2);
        END CASE;
    END PROCESS;

    ----------------------------------------------------------------------------
    -- Shift register
    ----------------------------------------------------------------------------
    PROCESS (CLK, RST_N)
    BEGIN
        IF RST_N = '0' THEN
            reg <= SEED; -- async load on reset (active low)
        ELSIF rising_edge(CLK) THEN
            IF CE = '1' THEN
                IF LOAD = '1' THEN
                    reg <= SEED; -- synchronous load
                ELSE
                    -- shift left (MSB at N_BITS-1), new bit becomes LSB
                    reg <= reg(N_BITS - 2 DOWNTO 0) & fb_bit;
                END IF;
            END IF;
        END IF;
    END PROCESS;

    Q <= reg;
    FB <= fb_bit;

END ARCHITECTURE;

Figure: Example of waveforms of an 4-bit LFSR measured by a digital oscilloscope

Questions

What happens if the initial value is set to all zeros?
How is the length (N_BITS) of the LFSR related to the period of the pulse signal?

Lab 4: On-chip RAM Testing#

Objective#

Learn how to use on-chip RAM of the MAX10 FPGA chip.

Embedded Memory Blocks

Modern FPGA architectures typically include different types of storage components such as flip-flops, LUT-RAMs (also called distributed RAMs), and embedded memory blocks (also called Block RAMs). Embedded memory blocks on an FPGA have a fixed storage capacity (typically a few kilobits), but their configuration—such as memory depth and data width—is flexible. In addition, they support both single- and dual-port read/write memory configurations, with single or dual clocks.

In this lab, we will learn how to write VHDL code to infer the use of embedded memory blocks available on the MAX10 FPGA to implement a synchronous single-port RAM.

Lab Procedure#

Write VHDL code that implements a simple on-chip RAM tester (with the VHDL entity name ram_tester).
The function of the RAM tester is as follows:
- It utilizes the provided VHDL model of a single-port synchronous RAM, which the Quartus software can infer to implement using the FPGA’s on-chip RAM resources.
- The RAM has N data bits and M address bits, and should infer FPGA block RAM when the size of the memory ( $2^M \times N$ ) is large enough to fit into the FPGA’s dedicated memory blocks rather than being mapped to distributed RAM logic.
- An LFSR is used to generate test patterns for writing data into the RAM.
- An FSM controls the RAM write and read operations for testing purposes.
- On reset, the LFSR is seeded with an initial value, and the FSM waits for a START pulse.
- When START is asserted by pressing the button, the FSM writes to all RAM addresses one by one, starting from address zero, with data generated by the LFSR.
- After writing completes, the FSM reloads the same LFSR with the same seed, then reads back the RAM contents in the same address sequence.
- Each read value is compared with the expected value from the LFSR for every memory address.
- Any mismatch sets a error flag and the LED_ERROR output is high, otherwise it is low.
- When the RAM test process is complete, the LED_DONE output is high, otherwise it is low.
Demonstrate your FPGA design to the lab instructor.

VHDL entity for the RAM tester

ENTITY ram_tester IS
    GENERIC (
        DATA_WIDTH : INTEGER := 8;
        ADDR_WIDTH : INTEGER := 16
    );
    PORT (
        RST_N     : IN STD_LOGIC; -- active-low reset
        CLK       : IN STD_LOGIC; -- system clock
        START     : IN STD_LOGIC; -- start pulse
        SW        : IN STD_LOGIC_VECTOR(DATA_WIDTH - 1 DOWNTO 0); -- SEED
        LED_DONE  : OUT STD_LOGIC;
        LED_ERROR : OUT STD_LOGIC
    );
END ENTITY ram_tester;

VHDL code for a single-port synchronous RAM

-- single-port synchronous RAM
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

ENTITY RAM_SP IS
    GENERIC (
        DATA_WIDTH : INTEGER := 8;
        ADDR_WIDTH : INTEGER := 10
    );
    PORT (
        CLK  : IN  STD_LOGIC; -- system clock
        WE   : IN  STD_LOGIC; -- write enable
        ADDR : IN  UNSIGNED(ADDR_WIDTH - 1 DOWNTO 0);
        DIN  : IN  STD_LOGIC_VECTOR(DATA_WIDTH - 1 DOWNTO 0);
        DOUT : OUT STD_LOGIC_VECTOR(DATA_WIDTH - 1 DOWNTO 0)
    );
END ENTITY RAM_SP;

ARCHITECTURE rtl OF RAM_SP IS
    TYPE ram_t IS ARRAY (0 TO 2**ADDR_WIDTH - 1) 
         OF STD_LOGIC_VECTOR(DATA_WIDTH - 1 DOWNTO 0);

    FUNCTION init_ram
        RETURN ram_t IS
        VARIABLE tmp : ram_t := (OTHERS => (OTHERS => '0'));
    BEGIN
        FOR addr_pos IN 0 TO 2**ADDR_WIDTH - 1 LOOP
            -- Initialize each address with the address itself
            tmp(addr_pos) := STD_LOGIC_VECTOR(to_unsigned(addr_pos, DATA_WIDTH));
        END LOOP;
        RETURN tmp;
    END init_ram;

    -- RAM declaration with initialization (use this for simulation only!)
    --signal ram : ram_t := init_ram;

    -- RAM declaration without initialization (use this for synthesis!)
    SIGNAL ram : ram_t;

    SIGNAL addr_reg : NATURAL RANGE 0 TO 2**ADDR_WIDTH - 1 := 0;

BEGIN
    PROCESS (CLK)
    BEGIN
        IF rising_edge(CLK) THEN
            IF WE = '1' THEN
                ram(to_integer(ADDR)) <= DIN; -- synchronous write operation
            END IF;
            addr_reg <= to_integer(ADDR);
        END IF;
    END PROCESS;
    DOUT <= ram(addr_reg); -- synchronous read operation

END ARCHITECTURE;

Figure: Example of simulation waveforms (using GHDL and GTKwave for VHDL simulation)

Questions

For a data width of 8 bits, what is the maximum memory depth that can be used and still fit on the target FPGA?
For the FPGA device on the board used in this lab session, what is the maximum size of a single block RAM, and how many such blocks are available?
How does the width of the data bus affect the number of block RAMs required to implement a given memory?
How can it be checked whether memory inference in VHDL will use block RAM rather than distributed logic?

Lab 5: RAM-based FIFO Testing#

Objective#

Learn how to use on-chip RAM of the MAX10 FPGA chip to implement a FIFO.
Understand the principles of FIFO (First-In-First-Out) memory.

A FIFO (First-In-First-Out) is a special type of memory that stores data in the order it is written and guarantees that the first data written is the first data read. It is commonly used for buffering and data streaming.

In this lab, we will use the FPGA’s embedded memory blocks to implement a FIFO with two independent, synchronous read and write ports.

Lab Procedure#

Write VHDL testbench (tb_fifo_sc) to simulate the operation of the the provided FIFO module (fifo_sc), which requires a single-clock, dual-port RAM module (ram_dp_sc).
Write VHDL code that implements a simple FIFO tester (with the VHDL entity name fifo_tester).
The function of the FIFO tester is as follows:
- It uses the provided FIFO module (fifo_sc).
- Two onboard push buttons are used for the RST_N signal and the TEST signal. Both input signals are active-low.
- Two onboard slide switches are used to select the FIFO put and get operations.
- Two onboard LEDs are used to indicate the FIFO full and empty status.
- The TEST input must be processed through debounce logic.
- If TEST is held ('0') and PUT = '1', the tester puts the next value from an 8-bit counter into the FIFO every 100 ms, provided the FIFO is not full.
- If TEST is held ('0') and GET = '1', the tester gets data from the FIFO every 100 ms, provided the FIFO is not empty.
- Data pushed comes from an 8-bit counter that increments each time. Initially, the counter starts with the value 0xA0 (hex).
- The input data to the FIFO is displayed in HEX format on two 7-segment displays.
- The output data from the FIFO is displayed in HEX format on two 7-segment displays.
Complete the design project and test your design on the FPGA board.
- Synthesize, implement, and download the design to the FPGA board.
- Apply test inputs using the switches and observe the results.
- Verify that the sequence of input data matches the sequence of output data according to FIFO operation.
- Check the FIFO full and empty indicators using the LEDs.
- Test the design with two different FIFO depth settings (e.g., memory depths of 16 and 256). This step requires changing the memory depth and recompiling the design.

Figure: Example waveform of FIFO simulation (FIFO depth = 16)

VHDL entity for the FIFO tester (fifo_tester)

ENTITY fifo_tester IS
    PORT (
        RST_N : IN STD_LOGIC;  -- active-low reset
        CLK   : IN STD_LOGIC;  -- 50 MHz clock
        TEST  : IN STD_LOGIC;  -- test button
        PUT   : IN STD_LOGIC;  -- FIFO put operation
        GET   : IN STD_LOGIC;  -- FIFO get operation
        FULL  : OUT STD_LOGIC; -- FIFO full flag (LED)
        EMPTY : OUT STD_LOGIC; -- FIFO empty flag (LED)
        HEX3  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0); -- for data output
        HEX2  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);
        HEX1  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0); -- for data input
        HEX0  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)
    );
END ENTITY;

VHDL code for dual-port, single-clock synchronous RAM

-- Simple dual-port RAM with different read/write addresses 
-- but single read/write clock

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;

ENTITY ram_dp_sc IS
    GENERIC (
        DATA_WIDTH : NATURAL := 8;
        ADDR_WIDTH : NATURAL := 6
    );
    PORT (
        clk   : IN STD_LOGIC;         -- clock
        we    : IN STD_LOGIC := '0';  -- write enable
        raddr : IN NATURAL RANGE 0 TO 2**ADDR_WIDTH - 1; -- read address
        waddr : IN NATURAL RANGE 0 TO 2**ADDR_WIDTH - 1; -- write address
        data  : IN  STD_LOGIC_VECTOR((DATA_WIDTH - 1) DOWNTO 0); -- data in
        q     : OUT STD_LOGIC_VECTOR((DATA_WIDTH - 1) DOWNTO 0)  -- data out
    );

END ENTITY;

ARCHITECTURE rtl OF ram_dp_sc IS
    -- Build a 2-D array type for the RAM
    SUBTYPE word_t IS STD_LOGIC_VECTOR((DATA_WIDTH - 1) DOWNTO 0);
    TYPE memory_t IS ARRAY(2**ADDR_WIDTH - 1 DOWNTO 0) OF word_t;
    -- Declare the RAM signal. 
    SIGNAL ram : memory_t;

BEGIN
    PROCESS (clk)
    BEGIN
        IF (rising_edge(clk)) THEN
            IF (we = '1') THEN
                ram(waddr) <= data; -- write the data
            END IF;
            q <= ram(raddr); -- read the data
        END IF;
    END PROCESS;
END rtl;

VHDL code for single-clock FIFO

-- Single-clock FIFO (Altera-like behavior, show-ahead)
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

ENTITY fifo_sc IS
    GENERIC (
        DATA_WIDTH : NATURAL := 8;
        ADDR_WIDTH : NATURAL := 10 -- FIFO depth = 2^ADDR_WIDTH
    );
    PORT (
        clk   : IN STD_LOGIC;
        rst_n : IN STD_LOGIC;
        wr_en : IN STD_LOGIC;
        rd_en : IN STD_LOGIC;
        din   : IN STD_LOGIC_VECTOR(DATA_WIDTH - 1 DOWNTO 0);
        dout  : OUT STD_LOGIC_VECTOR(DATA_WIDTH - 1 DOWNTO 0);
        full  : OUT STD_LOGIC;
        empty : OUT STD_LOGIC
    );
END ENTITY;

ARCHITECTURE rtl OF fifo_sc IS
    CONSTANT MEM_DEPTH : NATURAL := 2 ** ADDR_WIDTH;

    SIGNAL wr_ptr, rd_ptr : NATURAL RANGE 0 TO MEM_DEPTH - 1 := 0;
    SIGNAL fifo_count : NATURAL RANGE 0 TO MEM_DEPTH := 0;

    SIGNAL ram_in   : STD_LOGIC_VECTOR(DATA_WIDTH - 1 DOWNTO 0);
    SIGNAL ram_out  : STD_LOGIC_VECTOR(DATA_WIDTH - 1 DOWNTO 0);
    SIGNAL dout_reg : STD_LOGIC_VECTOR(DATA_WIDTH - 1 DOWNTO 0) := (OTHERS => '0');
    SIGNAL ram_we   : STD_LOGIC;
    SIGNAL is_full, is_empty : STD_LOGIC;

BEGIN
    ram_in <= din;
    ram_we <= (wr_en AND NOT is_full);

    -- dual-port RAM
    u_ram : ENTITY work.ram_dp_sc
        GENERIC MAP(
            DATA_WIDTH => DATA_WIDTH,
            ADDR_WIDTH => ADDR_WIDTH
        )
        PORT MAP(
            clk => clk,
            raddr => rd_ptr,
            waddr => wr_ptr,
            data => ram_in,
            we => ram_we,
            q => ram_out
        );

    fifo_wr_proc : PROCESS (clk, rst_n)
    BEGIN
        IF rst_n = '0' THEN
            wr_ptr <= 0;
        ELSIF rising_edge(clk) THEN
            IF (wr_en = '1' AND is_full = '0') THEN
                IF wr_ptr = MEM_DEPTH - 1 THEN
                    wr_ptr <= 0;
                ELSE
                    wr_ptr <= wr_ptr + 1;
                END IF;
            END IF;
        END IF;
    END PROCESS;

    fifo_rd_proc : PROCESS (clk, rst_n)
    BEGIN
        IF rst_n = '0' THEN
            rd_ptr <= 0;
            dout_reg <= (OTHERS => '0');
        ELSIF rising_edge(clk) THEN
            -- Update read pointer when valid read
            IF (rd_en = '1' AND is_empty = '0') THEN
                IF rd_ptr = MEM_DEPTH - 1 THEN
                    rd_ptr <= 0;
                ELSE
                    rd_ptr <= rd_ptr + 1;
                END IF;
            END IF;
            -- Output logic
            IF (rd_en = '1' AND is_empty = '0') THEN
                -- Normal read
                dout_reg <= ram_out;
            END IF;
        END IF;
    END PROCESS;

    fifo_cnt_proc : PROCESS (clk, rst_n)
        VARIABLE wr_rd_en : STD_LOGIC_VECTOR(1 DOWNTO 0);
    BEGIN
        IF rst_n = '0' THEN
            fifo_count <= 0;
        ELSIF rising_edge(clk) THEN
            wr_rd_en := wr_en & rd_en;
            CASE wr_rd_en IS
                WHEN "10" => -- write only
                    IF is_full = '0' THEN
                        fifo_count <= fifo_count + 1;
                    END IF;
                WHEN "01" => -- read only
                    IF is_empty = '0' THEN
                        fifo_count <= fifo_count - 1;
                    END IF;
                WHEN OTHERS =>
                    NULL; -- write+read (no net change) or idle
            END CASE;
        END IF;
    END PROCESS;

    -- Status signals
    is_full <= '1' WHEN fifo_count = MEM_DEPTH ELSE
        '0';
    is_empty <= '1' WHEN fifo_count = 0 ELSE
        '0';

    full <= is_full;
    empty <= is_empty;
    dout <= dout_reg;
END ARCHITECTURE;

Questions

What are the initial logic states of the EMPTY and FULL signals?
If the FIFO is not empty, does the data output of the FIFO give the first data to be read?
Explain the behavior of the FIFO by observing din and dout on the 7-segment display when the FIFO is empty, and both the PUT and GET inputs are '1', while TEST is clicked multiple times.
According to the Quartus synthesis report, how many bits of on-chip memory are used to implement the FIFO for FIFO depths of 16 and 256, respectively?

Lab 6: Altera FIFO IP Core Testing#

Objective#

Learn how to use an FIFO generated by Intel/Altera IP Catalog.

Lab Procedure#

Create a new project in Quartus Prime Lite first, then open the IP Catalog:
- Select FIFO: Tools > IP Catalog > Basic Functions > On-Chip Memory > FIFO
- Set the IP variation filename to fifo_ip (use VHDL file type).
- Configure the FIFO IP core:
  - Use single common clock for both FIFO read and write operations.
  - Select the data with (e.g. 8 bits) and the FIFO depth (e.g. 256).
- Make sure that the FIFO IP core will have the following interface as shown in the figure below.
- Generate the IP core and insert the .qip file into the project.
Reuse your code from Lab 5, but replace the FIFO with the one generated by the IP Catalog.
Complete the design project and test your design on the FPGA board.

Figure: Setting the IP variation filename for the FIFO

Figure: FIFO Configuration (single-clock, synchronous FIFO mode)

Figure: FIFO Configuration (with FIFO full, empty status and asynchronous clear)

Figure: FIFO File Generation

VHDL entity of the FIFO generated by the IP Catalog

LIBRARY ieee;
USE ieee.std_logic_1164.all;

LIBRARY altera_mf;
USE altera_mf.all;

ENTITY fifo_ip IS
    PORT (
        aclr   : IN STD_LOGIC ;  -- async. clear
        clock  : IN STD_LOGIC ;  -- clock
        data   : IN STD_LOGIC_VECTOR (7 DOWNTO 0); -- data in
        rdreq  : IN STD_LOGIC ;  -- FIFO read request
        wrreq  : IN STD_LOGIC ;  -- FIFO write request
        empty  : OUT STD_LOGIC ; -- FIFO empty status
        full   : OUT STD_LOGIC ; -- FIFO full status
        q      : OUT STD_LOGIC_VECTOR (7 DOWNTO 0) -- data out
    );
END fifo_ip;

Questions

Explain the behavior of the FIFO by observing din and dout on the 7-segment display when the FIFO is empty, and both the PUT and GET inputs are '1', while TEST is clicked multiple times.
Does the behavior of the FIFO in Lab 6 differ from that in Lab 5? If so, describe the difference.

Lab 7: Signed Multiplier#

Objective#

Learn how to implement a signed multiplier in VHDL.

Multiplication is an arithmetic operation that can be implemented in digital logic circuits using various methods. In this lab, we will implement an array multiplier for two signed integers in 2’s complement form: $p = a \cdot b$ , where $a$ and $b$ (the multiplicand and multiplier) are $N$ -bit signed integers, and $p$ is their $2N$ -bit product.

$\begin{align} a &= -a_{N-1}\cdot 2^{N-1} + \sum_{i=0}^{N-2} a_{i} \cdot 2^{i}, \, a_i \in \{0,1\} \\ b &= -b_{N-1}\cdot 2^{N-1} + \sum_{i=0}^{N-2} b_{i} \cdot 2^{i}, \, b_i \in \{0,1\} \\ p &= -p_{2N-1}\cdot 2^{2N-1} + \sum_{i=0}^{2N-2} p_{i} \cdot 2^{i}, \, p_i \in \{0,1\} \\ &= a \cdot b = a \cdot ( -b_{N-1}\cdot 2^{N-1} + \sum_{i=0}^{N-2} b_{i} \cdot 2^{i} ) \\ &= -b_{N-1} (a \ll (N-1)) + \sum_{i=0}^{N-2} b_{i} (a \ll i) \\ \end{align}$

Note that multiplication by $2^i$ is equivalent to shifting $a$ left by $i$ bit positions: $a \cdot 2^i = a \ll i$ . Before shifting, $a$ must be signed extended to $2N$ bits.

Lab Procedure#

Write a VHDL testbench to simulate a signed array multiplier module. Its VHDL code (mult_signed) is provided below.
- There are two different implementation styles of the signed multiplier (rtl_infer and rtl_array) in the VHDL code.
Write VHDL code (mult_tester) to test the signed multiplier.
The multiplier tester has the following functions:
- The VHDL entity of the design is given below.
- It uses the provided multiplier module (mult_signed).
- It takes input from 10-bit slide switches on the FPGA board to provide two 5-bit signed integers (2's complement representation) to the multiplier.
- The output of the multiplier is a 10-bit signed integer, which is displayed on four 7-segment display units. If the number is negative, the leftmost 7-segment unit must show a minus sign.
- For example, the slide switch input 1000001111 (bin) represents the multiplier and multiplicand of 10000 = -16 and 01111 = 15, respectively and the multiplication result is -240.
- The signed multiplier (mult_signed) is combinational, meaning it computes the full multiplication result in a single clock cycle.
- The circuit has a system clock of 50 MHz and an asynchronous reset. The input signals from the slide switches must be registered before being fed to the multiplier. The result from the multiplier must also be registered before being used by the 7-segment display.
Complete the design project and test your design on the FPGA board. Compare the synthesis results for both multiplier implementation styles.
Reimplement the signed-multiplier module as a sequential, multi-cycle multiplier that produces the same result. The multiplication will take several clock cycles rather than being completed in a single cycle. Revise the VHDL code of the top-level module to match the sequential version of the signed-multiplier module.

Questions:

Based on the provided VHDL code for a signed array multiplier, describe how the array multiplier builds the product step by step: from generating partial products, to combining them, and finally to handling the signed case.
According to the Quartus synthesis report, which implementation style of the signed multiplier uses fewer FPGA logic resources?
According to the Static Timing Analysis report, which implementation style of the signed multiplier has the maximum clock frequency (Fmax)?

Figure: Example of simulation waveforms

The VHDL entity of the multiplier (top-level design)

ENTITY multiplier IS
    PORT (
        CLK   : IN STD_LOGIC; -- system clock
        RST_N : IN STD_LOGIC; -- async active-low reset
        SW    : IN  STD_LOGIC_VECTOR(9 DOWNTO 0); -- multiplier input
        HEX0  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0); -- 7-seg outputs
        HEX1  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);
        HEX2  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);
        HEX3  : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)
    );
END ENTITY;

VHDL code for the signed array multiplier

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

ENTITY mult_signed IS
  GENERIC (
    N : INTEGER := 8 -- operand size
  );
  PORT (
    a : IN signed(N - 1 DOWNTO 0);
    b : IN signed(N - 1 DOWNTO 0);
    p : OUT signed(2 * N - 1 DOWNTO 0)
  );
END ENTITY;

-- Inferred multiplier
ARCHITECTURE rtl_infer OF mult_signed IS
BEGIN
  p <= a * b;
END rtl_infer;

-- array multiplier 
ARCHITECTURE rtl_array OF mult_signed IS
BEGIN
  PROCESS (a, b)
    VARIABLE product : signed(2 * N - 1 DOWNTO 0);
    VARIABLE pp : signed(2 * N - 1 DOWNTO 0); -- partial product
  BEGIN
    product := (OTHERS => '0');
    -- Generate partial products
    FOR i IN 0 TO N - 2 LOOP
      IF b(i) = '1' THEN
        pp := resize(a, 2 * N) SLL i; -- shift left by i
        product := product + pp;
      END IF;
    END LOOP;
    -- Handle negative numbers (2's complement)
    IF b(N - 1) = '1' THEN
      product := product - (resize(a, 2 * N) SLL (N - 1));
    END IF;
    p <= product;
  END PROCESS;
END ARCHITECTURE rtl_array;

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Created: 2025-09-03 | Last Updated: 2025-09-09

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search