cascadelake
The plots show the relative difference in runtime (LoopVectorization.jl - libxsmm) / libxsmm for every (m, n, k) triplet. Negative / red values are better for LoopVectorization.jl, positive / blue values are better for libxsmm.

Q₁ = -0.580. Q₂ = -0.320. Q₃ = 0.154

Q₁ = -0.709. Q₂ = -0.504. Q₃ = -0.350

Q₁ = -0.701. Q₂ = -0.533. Q₃ = -0.346

Q₁ = -0.620. Q₂ = -0.377. Q₃ = -0.177

Q₁ = -0.709. Q₂ = -0.513. Q₃ = -0.330