|
onnx2versal
|
Vector implementation for Hx4 QLinearConv, requires data to be arranged in [a,b,c,d,e,f,g,h,i] -> [a,b,c,0, d,e,f,0, g,h,i,0, 0,0,0,0], requires bias to be shifted, i.e. tbias - tw.reshape(M,-1).sum(1) * X_zero_point, requires KW<=4, INP_W%16=0, OUT_W_PAD%16=0, STEP_H==1|2, STEP_W==1|2, QLinearConvHx4Stream<28,32,28,32,1,1,1,1,8,3,3,1> total = 2723 (output_window slightly faster ~0.85x time), QLinearConvHx4Stream<26,32,13,16,2,2,1,1,8,3,3,1> total = 1930.
#include <qlinearconv.h>
Static Public Member Functions | |
| static void | registerKernelClass () |