Generate ONLY the Verilog module code for the following specification. ## Problem Description Design a hardware accelerator for a 4x4 matrix multiplication (C = A x B). Input matrices consist of 16-bit signed integers. The module should take two 4x4 matrices as input and provide the resulting 4x4 matrix. Use valid / ready handshake signals for flow control. ## Interface Specification Module Name: matrix_mult_4x4 Ports: - input 1 clk - input 1 rst - input [255:0] a_in - input [255:0] b_in - output [511:0] c_out - input 1 in_valid - output 1 in_ready - output 1 out_valid - input 1 out_ready ## Requirements - Generate ONLY the Verilog module code - Do NOT output any reasoning, analysis, scratchpad, or tags - Start directly with `module matrix_mult_4x4` as the first line of your response - Do NOT include any testbenches - Do NOT include any explanations or comments outside the code - End with `endmodule` - Ensure the code is correct and synthesizable