The qgemm operator consumes a quantized input tensor, its scale and zero point, a quantized weight, its scale and zero point, and output's scale and zero point, and computes the quantized output. xA^T + b as per torch.nn.Linear. Applies general matrix multiply: output(MxN) = input(MxK) * weights(KxN) + bias(N) More...

Collaboration diagram for Qgemm:

Modules
	QgemmKernels

Classes
class	QgemmGraph< QGEMM, TT, TTPARAM, M, K, N >
	Single instance graph that stores weights and biases. More...

class	QgemmStreamGraph< QGEMM, TT, TTPARAM, M, K, N >
	Single instance graph that stores weights and biases. More...

class	QgemmChunkNGraph< QGEMM, CONCAT, NCHUNK, TT, TTPARAM, M, K, N >
	Multiinstance graph for MxK times KxN that stores weights and biases Requires KxN_RND weight, NCHUNK%8=0, N%4=0 Chunks KxN weights by N dimension into NCHUNK chunks. Each instance has max size = 16384 and 4096 bytes respectively. Places maximum of 3x3 tiles, 8 conv tiles surrounding concat tile (max AIE DMA input=8) More...

Detailed Description

Template Parameters

QGEMM	QGemm Kernel
TT	input/output dtype, int8_t or uint8_t
TTPARAM	weight dtype, int8_t or uint8_t
M	number of rows of input matrix
K	number of cols / number of rows of weight matrix
N	number of cols of weight matrix / size of bias vector

Modules

Classes

Detailed Description