On September 8, 2022 7:34:32 PM GMT+01:00, Dmitry Selyutin wrote: > This is a tricky part. svshape2 shares some of its bits with svshape; > we reserve 0b1000 and 0b1001 values from svshape for svshape2. (background): in hardware all 16 combinations are sent to the exact same unit. the slightly different Form (SVM2 vs SVM because the opcode args are different bitwidths) is for the convenience of the assembly writer. (further background): svshape is insanely powerful, the sort of thing i revered in supercomputers from the 90s. it provides Matrix "Structure Packing" schedules, DCT/FFT, Parallel Reduction, it's pretty mental. all in-place: no more loop-unrolling, no more transpose copying of regs. l.