Collaboration diagram for QLinearConvKernels:

Computation

x = (qx - qx_zero) * qx_scale
bias = qbias * x_scale * w_scale
y = x*w + bias =>
(qy-qy_zero)*qy_scale = (qx-qx_zero)*qx_scale * (qw-qw_zero)*qw_scale + qbias*qx_scale*qw_scale = [(qx-qx_zero) * (qw-qw_zero) + qbias] * qx_scale*qw_scale
qy = qy_zero + [(qx-qx_zero)*(qw-qw_zero) + qbias] * qx_scale*qw_scale/qy_scale

Implementation

only precompute -qx_zero*(qw_qw_zero), rounding is done before adding qy_zero
int32 bias: -qx_zero*(qw_zero): k*int8*int8 > 16bits
int8 shifted qy_zero: shift added into acc
int16 scale: saturated to 8 bits
each m kernel of shape (1,C_PER_M,K,K) applied on input of shape (1,C_PER_M,H,W) for kernels that allow GROUP != 1