Generate ONLY the Verilog module code for the following specification. ## Problem Description Design an efficient 3x3 convolution module for a CNN layer on FPGA. Input: 28x28 grayscale image (1 input channel, 8-bit unsigned pixels). Apply 4 output filters with 3x3 kernels (8-bit signed weights, hardcoded). Filter 0 (horizontal edge): [[-1,-2,-1], [0,0,0], [1,2,1]]. Filter 1 (vertical edge): [[-1,0,1], [-2,0,2], [-1,0,1]]. Filter 2 (blur): [[1,2,1], [2,4,2], [1,2,1]]. Filter 3 (sharpen): [[0,-1,0], [-1,5,-1], [0,-1,0]]. Biases per filter: [0, 0, 8, 10] (8-bit signed). Use stride 1 and zero-padding 1 to produce output feature maps of 28x28x4. Apply bias addition and ReLU activation (clamp negatives to 0, saturate at 255). Output: 4 x 8-bit unsigned packed into 32-bit per output position. Row-major input/output streaming. Module asserts 'done' for one cycle when complete. ## Interface Specification Module Name: cnn_conv3x3 Ports: - input 1 clk - input 1 rst - input 1 start - input 8 pixel_in - input 1 pixel_valid - output 1 pixel_ready - output 32 out_pixel - output 1 out_valid - input 1 out_ready - output 1 done ## Requirements - Generate ONLY the Verilog module code - Do NOT output any reasoning, analysis, scratchpad, or tags - Start directly with `module cnn_conv3x3` as the first line of your response - Do NOT include any testbenches - Do NOT include any explanations or comments outside the code - End with `endmodule` - Ensure the code is correct and synthesizable