# Test Strategy: Streaming 5-Point Stencil Engine

## Scope

The testbench should treat the DUT as a black-box synchronous stencil engine
that accepts a raster-scanned 8x8 tile of signed samples and emits one
11-bit signed stencil result for each interior point of that tile exactly one
cycle after the neighborhood becomes fully available.

The benchmark is intentionally focused on stream-to-grid reconstruction and
exact stencil semantics: correct tile-position tracking on accepted input
cycles only, correct identification of the 6x6 interior points, exact
signed 5-point arithmetic, fixed one-cycle output timing, support for idle
gaps and back-to-back tiles, and reset flushing.

Python is useful here because long cycle-by-cycle expectations become tedious
once idle cycles, resets, and multiple tiles are mixed together. If used
later, it must be used only offline to generate expected outputs. The final
Verilog testbench must hardcode those expected values and must not execute
Python at runtime.

## Golden Model Plan

I plan to use a small offline Python model before writing `testbench.v`.

The helper will model only the spec-visible behavior:

1. Track whether the DUT is currently inside a tile, the accepted-sample index
   within that tile, the reconstructed 8x8 tile values seen so far, and any
   output scheduled for the next cycle.
2. On each simulated cycle, apply reset first. If `rst=1`, clear the partial
   tile state and discard any pending output.
3. If `rst=0` and `in_valid=1`, use the accepted-sample index to determine the
   row and column of `sample_in` within the current tile. Require
   `tile_start=1` only for the first accepted sample of a tile.
4. After accepting `x[r][c]`, if `r >= 2` and `c >= 2`, compute
   `y[r-1][c-1] = x[r-2][c-1] + x[r][c-1] + x[r-1][c-2] + x[r-1][c] - 4*x[r-1][c-1]`
   with normal Python integers and schedule that exact value for the next
   cycle.
5. Emit either the scheduled `(out_valid=1, stencil_out=value)` pair or the
   idle value `(out_valid=0, stencil_out=0)` for each cycle.

The final Verilog testbench should copy the generated expectations as literal
cycle/value tables or initializer arrays.

## Coverage Goals

- Reset outputs: confirm `out_valid=0` and `stencil_out=0` while `rst=1`.
- Reset input ignore: confirm cycles with `rst=1` do not accept samples even if
  `in_valid=1` and `tile_start=1`.
- Startup timing: confirm no valid result is produced until the neighborhood
  for the first interior point exists.
- First-result timing: confirm the result for tile point `(1,1)` appears
  exactly one cycle after accepting input sample `x[2][2]`.
- Exact one-cycle latency: confirm every interior-point result appears on the
  cycle immediately following the accepted input that completes its
  neighborhood, never earlier and never later.
- Idle-cycle handling: confirm cycles with `in_valid=0` do not advance tile
  position and therefore delay later outputs by the same number of bubbles.
- Output idle value: confirm `out_valid=0` and `stencil_out=0` on cycles with
  no scheduled result.
- Signed arithmetic: confirm negative samples are interpreted as signed
  two's-complement values in all additions and in the `-4*center` term.
- Width correctness: confirm results near the `[-1020, 1020]` range are
  represented exactly with no truncation, saturation, or wraparound.
- Neighbor mapping: confirm north, south, west, and east positions are used
  correctly and not confused with diagonal samples.
- Border handling: confirm no outputs are produced for border centers outside
  the 6x6 interior region.
- Tile restart: confirm `tile_start=1` begins a fresh 8x8 tile after the prior
  tile has completed.
- Back-to-back tiles: confirm the DUT can accept the first sample of a new tile
  while producing the final result of the previous tile.
- Reset flush: confirm asserting reset mid-tile or while a result is pending
  discards all pre-reset partial state and pending outputs.

## Planned Directed Scenarios

1. Reset and ignored input:
   Hold reset high for several cycles, including a cycle with `in_valid=1` and
   `tile_start=1`, then deassert reset and confirm that ignored stimulus does
   not create any later output.

2. Constant tile:
   Feed an all-constant tile such as all `23`. Every interior result should be
   exactly `0`, which checks the basic stencil equation and catches stale-data
   leakage.

3. Affine plane:
   Feed a tile defined by `x[r][c] = 3*r - 2*c + 5`. Every interior result
   should again be `0`. This is a strong directed check that the north/south
   and east/west neighbors are mapped correctly.

4. Single positive impulse:
   Feed an otherwise zero tile with one non-zero interior sample, for example
   `x[3][4] = 7`. The resulting output pattern should show `-28` at the impulse
   center and `+7` at its four cardinal neighbors only.

5. Single negative impulse:
   Repeat the previous case with a negative value such as `x[2][2] = -9` to
   verify signed arithmetic and two's-complement output encoding.

6. Boundary-adjacent activity:
   Place a non-zero sample on a border position, such as `x[0][4]` or `x[7][1]`,
   and verify that only the interior points adjacent to that border sample are
   affected and that no nonexistent border-centered outputs are emitted.

7. Idle bubbles inside a tile:
   Insert deterministic `in_valid=0` gaps at several positions within one tile.
   Verify that tile coordinates advance only on accepted samples and that the
   output schedule shifts accordingly.

8. Back-to-back distinct tiles:
   Stream two tiles with no gap, for example one constant tile followed by one
   impulse tile. Verify that the last result of tile 0 and the first accepted
   sample of tile 1 can occur on adjacent cycles without state contamination.

9. Reset mid-tile:
   Accept part of a tile, assert reset before the first valid result or in the
   middle of the valid-output region, then start a new tile. Confirm all
   pre-reset partial state and pending outputs are discarded.

10. Extreme-value tile:
    Use a deterministic pattern containing `127` and `-128` around interior
    points so that expected results exercise values near both ends of the
    allowed output range.

## Checking Method

- Maintain a cycle counter and an expected-output schedule keyed by cycle.
- On each cycle, compare observed `out_valid` against whether an output is
  expected for that cycle.
- When an output is expected, compare the exact signed 11-bit `stencil_out`
  value.
- When no output is expected, require both `out_valid=0` and `stencil_out=0`.
- Include failure messages with the cycle number, tile id, interior point
  coordinates when relevant, and both expected and observed values.
- Prefer a compact set of hand-auditable directed tiles first, then add one
  longer deterministic pseudo-random tile sequence for broader signed-value and
  bubble-placement coverage.

## Python Use

If a helper script is added later, use it only offline to generate golden
expectations for the directed and pseudo-random scenarios above.

Planned workflow:

1. Encode each cycle as `(rst, in_valid, tile_start, sample_or_none, tag)`.
2. Simulate the spec, not the RTL:
   - If `rst=1`, clear all stored state and emit the idle output.
   - If `rst=0` and `in_valid=1`, place the sample into the current tile using
     the accepted-sample index.
   - Whenever an accepted sample completes the neighborhood for one interior
     point, compute the exact stencil value with plain Python integers and
     schedule it for the next cycle.
   - On each cycle, emit either the scheduled output or the idle output.
3. Print the resulting expected cycle/value table or Verilog initializer lines.
4. Copy those literal expectations into `testbench.v`. The runnable benchmark
   must remain pure Verilog and must not call Python.

## Golden-Data Confidence

The offline Python helper should stay simple and auditable:

- Use plain integer arithmetic rather than mirroring any RTL structure.
- Add a few self-checks against hand-worked cases such as the constant tile and
  the single-impulse tile before freezing any vectors.
- Seed any pseudo-random tile generation so the expected outputs are fully
  deterministic and reproducible.

This keeps the eventual testbench self-contained while making the expected
results easy to regenerate and review.