Extras: Advanced Topics
Extra Module 00: Structural Verilog Coding
Up to now, you’ve been prototyping quickly with a single always @(posedge clk) that reads like your algorithm. That’s great for small/medium blocks. As designs grow, teams often shift to a structural style that mirrors real digital methodology: each register has a single point of ownership (its own clocked block), with tiny combinational helpers to express “next” intent. This improves readability, timing clarity, and scalability, without abandoning your algorithmic flow.
Design Problem: Stream 8 Samples, Report Sum/Min/Max
Design a synchronous module that ingests exactly eight 8-bit samples from a stream (gated by a valid signal), then produces their sum, minimum, and maximum, along with a one-cycle done pulse indicating the results are ready. The design must be clocked and reset synchronously.
We will implement it in two styles:
- Monolithic: one
always @(posedge clk)block with an FSM that owns all registers. - Structural: split responsibilities so that each register has a single point of ownership (one clocked block per register), with small combinational helpers as needed.
Inputs
clk: Clock. All sequential logic triggers on the rising edge.rst: Synchronous, active-high reset. When asserted, all internal state and outputs must return to defined reset values.start: A (typically one-cycle) pulse signaling the module to begin collecting a new batch of 8 samples.- If asserted while the module is already busy, it must be ignored (no effect).
sample_in[7:0]: The current sample value on the input stream.sample_valid: When1, indicatessample_inholds a new sample that should be accepted this cycle if the module is currently busy collecting.
Outputs
busy: When1, the module is in the middle of collecting samples (i.e., after a validstartuntil the 8th sample is accepted).done: A one-cycle pulse asserted exactly when the final results are captured/updated.sum_out[10:0]: The sum of the 8 accepted samples (8 × 255 = 2040 → 11 bits).min_out[7:0]: The minimum of the 8 accepted samples.max_out[7:0]: The maximum of the 8 accepted samples.
Outputs (
sum_out,min_out,max_out) must update once per batch, on the same cycle thatdoneis asserted, and remain unchanged until the next batch completes.
Monolithic style (single always@ block)
A single always @(posedge clk) owns the state machine, counters, accumulators, outputs, and pulses. This is close to our earlier “algorithm-in-one-block” style, presented in the previous modules. (This is also affectionally referred to as Adel-style coding, reflecting his preferred way of describing algorithms, enabling very rapid, ‘in-just-one-sitting’ implementations)

module stats8_monolithic (
input clk,
input rst, // sync, active-high
input start, // pulse to begin a new batch
input [7:0] sample_in,
input sample_valid,
output reg busy,
output reg done, // 1-cycle pulse when outputs valid
output reg [10:0] sum_out, // sum of 8 bytes (max 2040 -> 11 bits)
output reg [7:0] min_out,
output reg [7:0] max_out
);
localparam [1:0] S_IDLE=2'd0, S_RUN=2'd1, S_LATCH=2'd2, S_DONE=2'd3;
reg [1:0] state;
reg [10:0] sum_reg;
reg [7:0] min_reg, max_reg;
reg [2:0] count_reg; // 0..7 accepted samples
always @(posedge clk) begin
if (rst) begin
state <= S_IDLE;
busy <= 1'b0;
done <= 1'b0;
sum_reg <= 11'd0;
min_reg <= 8'hFF;
max_reg <= 8'h00;
count_reg <= 3'd0;
sum_out <= 11'd0;
min_out <= 8'd0;
max_out <= 8'd0;
end else begin
case (state)
S_IDLE: begin
if (start) begin
busy <= 1'b1;
sum_reg <= 11'd0;
min_reg <= 8'hFF;
max_reg <= 8'h00;
count_reg <= 3'd0;
state <= S_RUN;
end
end
S_RUN: begin
if (sample_valid) begin
sum_reg <= sum_reg + sample_in;
min_reg <= (sample_in < min_reg) ? sample_in : min_reg;
max_reg <= (sample_in > max_reg) ? sample_in : max_reg;
count_reg <= count_reg + 3'd1;
if (count_reg == 3'd7)
state <= S_LATCH; // 8th sample just accepted
end
end
S_LATCH: begin
sum_out <= sum_reg;
min_out <= min_reg;
max_out <= max_reg;
done <= 1'b1;
busy <= 1'b0;
state <= S_DONE;
end
S_DONE: begin
done <= 1'b0;
state <= S_IDLE;
end
default: state <= S_IDLE;
endcase
end
end
endmodule✅Monolithic coding style advantages: fast to write; reads like the algorithm; people with programming backgrounds can easily understand the sequence of events (given a relatively small FSM/code length)
❌Limits: as logic grows, a single block can get dense; ownership of each register can become less obvious; reasoning about per-register timing intent may require more careful reading.
Structured style (separate always@ blocks)
Here, each register has its own always @(posedge clk); small wires express “next” intent cleanly. This clarifies who owns what, and makes timing/dataflow explicit. (This, on the other hand, is referred to as Fred-style coding, reflecting his preferred way of describing digital systems like a true digital designer.)

// stats8_structured
// Each register gets its own always block (single point of ownership).
// Small combinational helpers express intent and timing clearly.
module stats8_structured (
input clk,
input rst, // sync, active-high
input start, // pulse to begin a new batch
input [7:0] sample_in,
input sample_valid,
output reg busy,
output reg done, // 1-cycle pulse when outputs valid
output reg [10:0] sum_out, // sum of 8 bytes (max 2040 -> 11 bits)
output reg [7:0] min_out,
output reg [7:0] max_out
);
localparam [1:0] S_IDLE=2'd0, S_RUN=2'd1, S_LATCH=2'd2, S_DONE=2'd3;
reg [1:0] state;
reg [2:0] count_reg;
// ----------------------------------------------------------------
// State
// ----------------------------------------------------------------
always @(posedge clk) begin
if (rst) begin
state <= S_IDLE;
end else begin
case (state)
S_IDLE: begin
if (start) begin
state <= S_RUN;
end
end
S_RUN: begin
if ((sample_valid) && (count_reg == 3'd7)) begin
state <= S_LATCH; // 8th sample just accepted
end
end
S_LATCH: begin
state <= S_DONE;
end
S_DONE: begin
state <= S_IDLE;
end
default: state <= S_IDLE;
endcase
end
end
// ----------------------------------------------------------------
// Combinational helpers (control signals)
// ----------------------------------------------------------------
wire start_new = (state == S_IDLE) & start;
wire take_sample = (state == S_RUN) & sample_valid;
wire last_flag = (state == S_LATCH);
// ----------------------------------------------------------------
// COUNTER: how many samples have been accepted
// ----------------------------------------------------------------
always @(posedge clk) begin
if (rst) begin
count_reg <= 3'd0;
end else begin
if (start_new) begin
count_reg <= 3'd0;
end else if (take_sample) begin
count_reg <= count_reg + 3'd1;
end
end
end
// ----------------------------------------------------------------
// SUM accumulator (11 bits)
// MIN tracker
// MAX tracker
// ----------------------------------------------------------------
reg [10:0] sum_reg;
reg [7:0] min_reg;
reg [7:0] max_reg;
always @(posedge clk) begin
if (rst) begin
sum_reg <= 11'd0;
min_reg <= 8'hFF;
max_reg <= 8'h00;
end else begin
if (start_new) begin
sum_reg <= 11'd0;
min_reg <= 8'hFF;
max_reg <= 8'h00;
end else if (take_sample) begin
sum_reg <= sum_reg + sample_in;
if (sample_in < min_reg) min_reg <= sample_in;
if (sample_in > max_reg) max_reg <= sample_in;
end
end
end
// ----------------------------------------------------------------
// OUTPUT LATCHES
// ----------------------------------------------------------------
always @(posedge clk) begin
if (rst) begin
sum_out <= 11'd0;
min_out <= 8'd0;
max_out <= 8'd0;
end else begin
if (last_flag) begin
sum_out <= sum_reg;
min_out <= min_reg;
max_out <= max_reg;
end
end
end
// ----------------------------------------------------------------
// BUSY flag: asserted after start, deasserted on last sample
// ----------------------------------------------------------------
always @(posedge clk) begin
if (rst) begin
busy <= 1'b0;
end else begin
if (start_new) begin
busy <= 1'b1;
end else if (last_flag) begin
busy <= 1'b0;
end
end
end
// ----------------------------------------------------------------
// DONE pulse (1 cycle) exactly when outputs are captured
// ----------------------------------------------------------------
always @(posedge clk) begin
if (rst) begin
done <= 1'b0;
end else begin
if (last_flag) begin
done <= 1'b1;
end else begin
done <= 1'b0;
end
end
end
endmodule✅Structured coding style advantages: every register has one owner; intent/timing are explicit; easier to scale, review, and pipeline; this mirrors how large digital systems are partitioned.
❌Limits:as logic grows, the number of lines in the code can get very long; people coming from a programming background may struggle to understand the intended algorithm due to block separation.
How to refactor from monolithic → structured (recipe)
List the registers you see in the monolithic block (
busy,count_reg,sum_reg,min_reg,max_reg, and output latches).Create tiny helper wires that describe when and what (e.g.,
start_new,take_sample,last_sample). Keep them combinational.For each register, make a dedicated
always @(posedge clk)that:- Handles reset and batch start defaults,
- Updates only on its enabling condition (e.g.,
take_sample). - Owns no other registers.
Latch outputs where appropriate (often when a final condition becomes true).
Keep pulses (like
done) in their own register block to avoid accidental stretching.
Benefits of the structural approach (becoming a true digital designer)
- ✅Single point of ownership per register → eliminates accidental double drivers and clarifies responsibility.
- ✅Timing intent is explicit: readers see exactly when and why each register updates; easier to reason about one-cycle “old/new” behavior.
- ✅Scales to bigger blocks: you can add features by adding another owner block and a few helper wires, instead of growing a monolith.
- ✅Closer to real methodology: mirrors how pipelines, datapaths, and control/status registers are partitioned in industry.
- ✅Refactoring is local: changing
min_reglogic doesn’t risk unintended edits tobusyorsum_reg. - ✅Review & verification friendly: easier code reviews; formal or assertion checks can be tied to specific owners/signals.
- ****✅Local debugging/reasoning: debugging “sum looks wrong” means looking at the sum block, not a 200-line state machine.
Pitfalls & guardrails (structural style)
- Don’t multi-drive a reg: one owner block per reg, period.
- Keep helpers combinational (simple
assignorwireexpressions). Don’t hide state in helpers. - Reset semantics must match across owners (use the same reset polarity/priority in the module).
- Mind the one-cycle rule: cross-block dependencies still use old registered values until the next edge; design helpers like
last_sampleto make that explicit. - Name consistently:
*_regfor owned regs,*_outfor outputs,*_nextonly if you introduce next-value combinational signals.
Extra Module 01: Advanced Verilog Syntax
This module gives you higher-level tools for building bigger, more reusable blocks without losing the “algorithm → HDL” mindset. We’ll cover parameters, arrays, for loops in clocked blocks, file-driven initialization (simulation), generate/genvar, and conditional compilation. These topics complement the “fast prototype → verify” flow you’ve been using and build on ideas already introduced.
Parameterize widths/depths for reusable components
Why: One module, many sizes. Parameters let you change bus widths, depths, and timing constants from the top module without editing the internals.
Example: Parameterized counter (width + terminal count)
module flex_counter #(
parameter WIDTH = 8, // number of bits in counter
parameter MAX = 8'd200 // terminal count value (sized to WIDTH)
)(
input clk,
input reset,
input enable,
output reg [WIDTH-1:0] count,
output reg tick // 1-cycle pulse when count == MAX
);
always @(posedge clk) begin
if (reset) begin
count <= {WIDTH{1'b0}}; // use replication syntax to set a defined number of bits (determined by WIDTH) to be 0
tick <= 1'b0;
end else if (enable) begin
if (count == MAX) begin
count <= {WIDTH{1'b0}};
tick <= 1'b1;
end else begin
count <= count + {{(WIDTH-1){1'b0}},1'b1};
tick <= 1'b0;
end
end else begin
tick <= 1'b0;
end
end
endmoduleTop-level overrides (redefine from the instantiation site):
module top (
input clk, input rst, input en, output [15:0] c16,
output t16,
output [3:0] c4,
output t4);
// 16-bit counter to 50000
flex_counter #(.WIDTH(16), .MAX(16'd50000)) u_cnt16 (
.clk(clk), .reset(rst), .enable(en), .count(c16), .tick(t16)
);
// 4-bit counter to 9
flex_counter #(.WIDTH(4), .MAX(4'd9)) u_cnt4 (
.clk(clk), .reset(rst), .enable(en), .count(c4), .tick(t4)
);
endmodule
Parameters let top-level design make decisions (bus sizes, memory depth) without touching module code.
Multi-dimensional arrays (register files, tiles, small memories)
Why: Natural way to model matrices, register files, or multi-lane buffers. Start with declaring and indexing; we’ll add resets/initialization with for later.
Example: 3×5 byte matrix (rows=3, cols=5). Indexing: [row][col]
module matrix_3x5(
input clk,
input reset,
input [1:0] row, // 0..2
input [2:0] col, // 0..4
input we, // write enable
input [7:0] data_in,
output reg [7:0] data_out
);
// 3 rows (0..2), 5 cols (0..4); each cell is 8 bits
reg [7:0] mat [0:2][0:4];
always @(posedge clk) begin
if (reset) begin
data_out <= 8'd0;
// (No mass reset here yet—covered in the next section)
end else begin
if (we) mat[row][col] <= data_in; // write one cell
data_out <= mat[row][col]; // read the same addressed cell
end
end
endmodule
Tip: keep indices well-sized (2 bits for 0..3, 3 bits for 0..7) so synthesis knows the bounds.
for loops in clocked blocks to reset/initialize arrays
Why: When resetting or clearing arrays, a for loop describes repeat structure; synthesis unrolls it into parallel hardware. Use it for resets/initial fills inside @(posedge clk)—it’s clean and scalable.
Example A: Zeroing a 4×8 on reset (two nested for loops)
module matrix_clear_4x8(
input clk, reset, load,
input [1:0] row, input [2:0] col,
input [7:0] di,
output reg [7:0] do
);
reg [7:0] buf [0:3][0:7];
integer i, j;
always @(posedge clk) begin
if (reset) begin
// Loop unrolls in hardware; intent is a mass clear
for (i = 0; i < 4; i = i + 1)
for (j = 0; j < 8; j = j + 1)
buf[i][j] <= 8'd0;
do <= 8'd0;
end else begin
if (load) buf[row][col] <= di;
do <= buf[row][col];
end
end
endmoduleExample B: Initialize from a file (simulation-time) using $fopen/$fscanf
Note: File I/O is simulation-only (testbench or non-synthesizable code). It’s great to preload memories for verification. We’ll show a TB snippet that writes into the DUT over cycles.
Testbench snippet driving a DUT’s write port from a file:
`timescale 1ns/1ps
module tb_init_from_file;
reg clk=0, reset=1, we=0;
reg [1:0] row;
reg [2:0] col;
reg [7:0] data_in;
wire [7:0] data_out;
matrix_3x5 dut(
.clk(clk), .reset(reset),
.row(row), .col(col),
.we(we), .data_in(data_in), .data_out(data_out)
);
always #5 clk = ~clk;
integer fd, status;
initial begin
// Release reset
#20 reset = 0;
// Open file with triplets: row col value (e.g., "0 0 42")
fd = $fopen("init_3x5.txt", "r");
if (fd == 0) begin
$display("ERROR: cannot open init_3x5.txt");
$finish;
end
while (!$feof(fd)) begin
status = $fscanf(fd, "%d %d %d\n", row, col, data_in);
if (status == 3) begin
@(posedge clk);
we <= 1;
@(posedge clk);
we <= 0;
end
end
$fclose(fd);
// Read back one entry as a demo
row = 0; col = 0;
@(posedge clk);
$display("mat[0][0] readback = %0d", data_out);
#50 $finish;
end
endmoduleAlternative: $readmemh/$readmemb can bulk-load 1-D memories (classic ROM/RAM initialization). For 2-D, many flows flatten to 1-D or stream values via a TB like above. (Keep file I/O in TBs; it’s non-synthesizable.)
generate/genvar to replicate logic or modules at scale
Why: When you need N repeated lanes/slices, generate keeps code compact and consistent. Combine with parameters for configurable replication.
Example A: Replicate N identical lanes of an adder slice
module lane_adder #(parameter WIDTH=8)(
input [WIDTH-1:0] a, b,
output [WIDTH-1:0] y
);
assign y = a + b;
endmodule
module vector_add #(
parameter N = 4,
parameter WIDTH = 8
)(
input [WIDTH-1:0] a [0:N-1],
input [WIDTH-1:0] b [0:N-1],
output [WIDTH-1:0] y [0:N-1]
);
genvar k;
generate
for (k = 0; k < N; k = k + 1) begin : gen_lane
lane_adder #(.WIDTH(WIDTH)) u_add (
.a(a[k]), .b(b[k]), .y(y[k])
);
end
endgenerate
endmodule
Example B: Generate replicated register slices
module reg_slice #(parameter WIDTH=8)(
input clk, reset, load,
input [WIDTH-1:0] din,
output reg [WIDTH-1:0] dout
);
always @(posedge clk) begin
if (reset) dout <= {WIDTH{1'b0}};
else if (load) dout <= din;
end
endmodule
module reg_bank #(
parameter N = 8,
parameter WIDTH = 8
)(
input clk, reset, load_all,
input [WIDTH-1:0] din [0:N-1],
output [WIDTH-1:0] dout[0:N-1]
);
genvar i;
generate
for (i = 0; i < N; i = i + 1) begin : gen_bank
reg_slice #(.WIDTH(WIDTH)) u_rs (
.clk(clk), .reset(reset), .load(load_all),
.din(din[i]), .dout(dout[i])
);
end
endgenerate
endmodule
ifdef/ifndef for conditional compilation
Why: Keep one codebase that can enable/disable features, change widths, or insert debug logic depending on a define. Great for feature flags and debug prints.
Define a macro at compile time: +define+DEBUG (simulator) or -DDEBUG (toolchain dependent).
Example A: Optional debug printing (simulation)
module datapath(
input clk, reset, en,
input [7:0] a, b,
output reg [7:0] y
);
always @(posedge clk) begin
if (reset) y <= 8'd0;
else if (en) y <= a + b;
`ifdef DEBUG
// Simulation-only prints when DEBUG is defined
if (en) $display("DBG: a=%0d b=%0d y=%0d @%0t", a, b, y, $time);
`endif
end
endmoduleExample B: Feature flag changes width and adds a port
// If WIDE_MODE is defined, we use 16-bit datapath; else 8-bit.
`ifdef WIDE_MODE
`define DW 16
`else
`define DW 8
`endif
module flex_datapath(
input clk, reset, en,
input [`DW-1:0] a, b,
output reg [`DW-1:0] y
`ifdef HAS_SATURATE
, input saturate_en // extra port only when feature exists
`endif
);
always @(posedge clk) begin
if (reset) y <= {`DW{1'b0}};
else if (en) begin
y <= a + b;
`ifdef HAS_SATURATE
// (Sketch) Example of conditional feature body
// if (saturate_en && y overflowed) y <= MAX_VALUE;
`endif
end
end
endmoduleUse ifndef to provide defaults when a macro isn’t set: