Generate ONLY the Verilog module code for the following specification. ## Problem Description Implement a synchronous accelerator that stores one signed 2x2 weight matrix and computes one signed 2-element output vector for each accepted signed 2-element input vector. WEIGHT LOAD: - The stored weights are updated on each rising edge of `clk` for which `rst=0` and `weight_load=1`. - `weight_data` packs the four signed 8-bit weights in row-major order: `w00 = weight_data[7:0]` `w01 = weight_data[15:8]` `w10 = weight_data[23:16]` `w11 = weight_data[31:24]` - The loaded weights remain active for all later input vectors until a new weight load occurs. - If `weight_load=1`, no input element is accepted on that cycle even if `in_valid=1`. - The evaluator will only assert `weight_load` when no partial input vector is in progress and no output result from an earlier vector is still pending. INPUT VECTOR FORMAT: - Each input vector consists of exactly two accepted signed 8-bit elements. - An element is accepted on each rising edge for which `rst=0`, `weight_load=0`, and `in_valid=1`. - The first accepted element of a vector is `x0` and the second accepted element of that same vector is `x1`. - `in_data` is a signed 8-bit two's-complement value. - The evaluator will present each vector as two consecutive cycles with `in_valid=1`. It will not insert a bubble between `x0` and `x1`. - The module must support back-to-back vectors. If one vector uses cycles N and N+1 for `x0` and `x1`, the next vector may use cycles N+2 and N+3. COMPUTATION: - For each accepted vector `(x0, x1)` and the currently stored weight matrix, compute: `y0 = w00*x0 + w01*x1` `y1 = w10*x0 + w11*x1` - Multiplication and addition must use signed arithmetic. - Use exact arithmetic. Do not saturate, wrap, truncate, or round before the final 17-bit output representation. - Each output value must be represented as a signed 17-bit two's-complement result. OUTPUT SEQUENCE: - Each accepted input vector must produce exactly two output cycles with `out_valid=1`. - If `x1` of a vector is accepted on cycle N, then: on cycle N+1, `out_valid` must be 1 and `out_data` must be `y0` on cycle N+2, `out_valid` must be 1 and `out_data` must be `y1` - The outputs for a vector must appear in row order: first row 0, then row 1. - Those outputs must not appear earlier than the specified cycles and must not be delayed beyond them. - The module must still accept input elements on cycles where it is producing output for an earlier vector. - Whenever no output is scheduled for a cycle, `out_valid` must be 0 and `out_data` must be 0. RESET: - `rst` is synchronous and active-high. - While `rst=1`, clear the stored weights to zero, discard any partial input vector, cancel any pending output sequence, and drive `out_valid=0` and `out_data=0`. - A cycle with `rst=1` accepts neither a weight load nor an input element. - After reset is deasserted, the stored weights are zero until the next weight load cycle. ## Interface Specification Module Name: systolic_mac2x2 Ports: - input 1 clk // System clock - input 1 rst // Synchronous active-high reset - input 1 weight_load // Load a new packed 2x2 signed weight matrix on this cycle - input 32 weight_data // Packed row-major signed 8-bit weights: w00, w01, w10, w11 - input 1 in_valid // High when `in_data` contains the next signed 8-bit input-vector element to be accepted this cycle - input 8 in_data // Signed 8-bit input-vector element stream, carrying x0 then x1 for each vector - output 1 out_valid // High when `out_data` contains a valid output row result - output 17 out_data // Signed 17-bit output value, emitting y0 then y1 for each accepted vector ## Requirements - Generate ONLY the Verilog module code - Do NOT output any reasoning, analysis, scratchpad, or tags - Start directly with `module systolic_mac2x2` as the first line of your response - Do NOT include any testbenches - Do NOT include any explanations or comments outside the code - End with `endmodule` - Ensure the code is correct and synthesizable