Collaboration diagram for QLinearSoftmaxKernels:

Classes
class	QLinearSoftmaxScalar< TT, INP_H, INP_W, INP_W_PAD >
	Scalar implementation. QLinearSoftmaxScalar<10,20,32> takes 517922 cycles for expf, cycles 164026 for fastexp2. More...

class	QLinearSoftmaxFloatmul< TT, INP_H, INP_W, INP_W_PAD >
	Vector implementation using fastexp2 method, float multiplication for exp estimation QLinearSoftmaxFloatmul<10,10,16> takes 6410 cycles requires INP_W_PAD%16=0. More...

class	QLinearSoftmaxSingleaxis< TT, INP_H, INP_W, INP_W_PAD >
	Vector implementation using fastexp2 method for single axis, QLinearSoftmaxSingleaxis<10,10,16> takes 5185 cycles requires INP_W_PAD%16=0. Slightly less accurate due to srs after each mult. More...

Detailed Description

qy = saturate ((y / y_scale) + y_zero_point) => x = (qx - qx_zero) * qx_scale
Softmax(input, axis) = Exp(input) / ReduceSum(Exp(input), axis=axis, keepdims=1)

For exp(x) ~= (1 + x/256)**256, Exp shift is redundant as softmax divides, Minus max for numerical stability, we have x1 = 1+(x - xmax)/256 = 1 + (qx - xmax)*qx_scale/256 Then x' = x1*x1 (8 times)

For y = exp(x)/div, qy = exp(x)/div/qy_scale + qy_zero

Classes

Detailed Description