onnx2versal
Loading...
Searching...
No Matches
ConcatKernels
Collaboration diagram for ConcatKernels:

Classes

class  ConcatScalar< TT, LCNT, H, INP_W, OUT_W >
 Scalar implementation, ConcatScalar<f,5,4,32,144> takes 5858 cycles (~850 for output window) More...
 
class  ConcatFloat< TT, LCNT, H, INP_W, OUT_W >
 Vector implementation, requires INP_W%4=0, OUT_W%4=0. ConcatFloat<f,5,4,32,144> takes 715 cycles (~300 for output window). More...
 
class  ConcatInt8< TT, LCNT, H, INP_W, OUT_W >
 Vector implementation for int8_t, requires INP_W%16=0, OUT_W%16=0, ConcatInt8<f,5,4,32,144> takes 283 cycles (~same with output window). More...
 
class  ConcatFloatStream< TT, H, INP_W1, INP_W2, OUT_W >
 Scalar implementation for stream concat, ConcatFloatStream<f,4,32,32,64> takes ~1000 cycles. More...
 
class  ConcatFloatStreamWithStall< TT, H, INP_W1, INP_W2, OUT_W >
 Scalar implementation for stream concat,. More...
 
class  ConcatFloatStreamSequentially< TT >
 Scalar implementation for stream concat, ConcatFloatStreamSequentially<f,4,32,32,64> takes ~1000 cycles. More...
 
class  ConcatFloatPktStream< TT, LCNT, H, INP_W, OUT_W >
 Scalar implementation for stream concat, ConcatFloatPktStream<f,4,32,32,64> takes cycles. More...
 
class  ConcatInt8Stream< TT, H, INP_W1, INP_W2, OUT_W >
 Vector implementation for stream concat with int8, ConcatInt8Stream<f,4,32,32,64> takes cycles. More...
 
class  ConcatTwo32bitStreams< TT, LCNT, H, INP_W, OUT_W >
 Scalar implementation for concatenating 2 chunked streams, ConcatTwo32bitStreams<f,4,32,32,64> takes ~1000 cycles. More...
 

Detailed Description