onnx2versal
Loading...
Searching...
No Matches
QLinearSoftmaxKernels
Collaboration diagram for QLinearSoftmaxKernels:

Classes

class  QLinearSoftmaxScalar< TT, INP_H, INP_W, INP_W_PAD >
 Scalar implementation. QLinearSoftmaxScalar<10,20,32> takes 517922 cycles for expf, cycles 164026 for fastexp2. More...
 
class  QLinearSoftmaxFloatmul< TT, INP_H, INP_W, INP_W_PAD >
 Vector implementation using fastexp2 method, float multiplication for exp estimation QLinearSoftmaxFloatmul<10,10,16> takes 6410 cycles requires INP_W_PAD%16=0. More...
 
class  QLinearSoftmaxSingleaxis< TT, INP_H, INP_W, INP_W_PAD >
 Vector implementation using fastexp2 method for single axis, QLinearSoftmaxSingleaxis<10,10,16> takes 5185 cycles requires INP_W_PAD%16=0. Slightly less accurate due to srs after each mult. More...
 

Detailed Description

For exp(x) ~= (1 + x/256)**256, Exp shift is redundant as softmax divides, Minus max for numerical stability, we have x1 = 1+(x - xmax)/256 = 1 + (qx - xmax)*qx_scale/256 Then x' = x1*x1 (8 times)

For y = exp(x)/div, qy = exp(x)/div/qy_scale + qy_zero