public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/105556] New: RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled
@ 2022-05-10 20:37 bergner at gcc dot gnu.org
  2022-05-10 20:39 ` [Bug target/105556] " bergner at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: bergner at gcc dot gnu.org @ 2022-05-10 20:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105556

            Bug ID: 105556
           Summary: RA assigns an MMA vector input operand to vs0-vs31
                    causing an MMA accumulator to be spilled
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bergner at gcc dot gnu.org
  Target Milestone: ---

With current trunk and GCC 12, the MMA optimized dgemm kernel in OpenBLAS is
seeing a performance regression compared to GCC 11 and GCC 10.  The problem is
that the core loop in dgemm uses 8 accumulator variables, which want to use all
8 accumulator registers.  Using the 8 accumulators means we should not use the
vs0 thru vs31 vector registers for the MMA instruction's normal vector input
operands. However with trunk and GCC 12, the register allocator is assigning
one vector input to one of the vs0-vs31 registers leading us to spill one of
the accumulators and that causes a bad performance loss.

The trunk and GCC 12 asm for the core loop looks like:

.L5:
        lxvp 0,0(10)
        lxv 40,0(9)
        addi 10,10,64
        addi 9,9,64
        lxv 41,-48(9)
        lxv 42,-32(9)
        lxv 43,-16(9)
        lxvp 2,32(1)
        lxvp 32,-32(10)
        xvf64gerpp 4,0,40
        xvf64gerpp 6,0,41
        xvf64gerpp 3,0,42
        xvf64gerpp 2,0,43
        lxvp 0,64(1)
        xvf64gerpp 5,32,40
        xvf64gerpp 7,32,41
        xvf64gerpp 1,32,42
        xxmtacc 0
        xvf64gerpp 0,32,43
        xxmfacc 0
        stxvp 2,32(1)
        stxvp 0,64(1)
        bdnz .L5

Note the use of vs0 in the MMA instructions which forces the spilling of ACC0.
The "better" GCC 11 and GCC 10 code looks like:
.L5:
        lxvp 44,0(10)
        lxvp 32,32(10)
        addi 9,9,64
        addi 10,10,64
        lxv 39,-64(9)
        lxv 40,-48(9)
        lxv 41,-32(9)
        lxv 42,-16(9)
        xvf64gerpp 4,44,39
        xvf64gerpp 5,32,39
        xvf64gerpp 6,44,40
        xvf64gerpp 7,32,40
        xvf64gerpp 3,44,41
        xvf64gerpp 1,32,41
        xvf64gerpp 2,44,42
        xvf64gerpp 0,32,42
        bdnz .L5

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/105556] RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled
  2022-05-10 20:37 [Bug target/105556] New: RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled bergner at gcc dot gnu.org
@ 2022-05-10 20:39 ` bergner at gcc dot gnu.org
  2022-05-10 20:48 ` bergner at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bergner at gcc dot gnu.org @ 2022-05-10 20:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105556

Peter Bergner <bergner at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-05-10
     Ever confirmed|0                           |1
      Known to fail|                            |12.0, 13.0
      Known to work|                            |10.0, 11.0
                 CC|                            |dje at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |bergner at gcc dot gnu.org
             Target|                            |powerpc*-*-*

--- Comment #1 from Peter Bergner <bergner at gcc dot gnu.org> ---
Mine.  I have a patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/105556] RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled
  2022-05-10 20:37 [Bug target/105556] New: RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled bergner at gcc dot gnu.org
  2022-05-10 20:39 ` [Bug target/105556] " bergner at gcc dot gnu.org
@ 2022-05-10 20:48 ` bergner at gcc dot gnu.org
  2022-05-18  2:33 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bergner at gcc dot gnu.org @ 2022-05-10 20:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105556

Peter Bergner <bergner at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|                            |https://gcc.gnu.org/piperma
                   |                            |il/gcc-patches/2022-May/594
                   |                            |481.html

--- Comment #2 from Peter Bergner <bergner at gcc dot gnu.org> ---
Patch submitted.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/105556] RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled
  2022-05-10 20:37 [Bug target/105556] New: RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled bergner at gcc dot gnu.org
  2022-05-10 20:39 ` [Bug target/105556] " bergner at gcc dot gnu.org
  2022-05-10 20:48 ` bergner at gcc dot gnu.org
@ 2022-05-18  2:33 ` cvs-commit at gcc dot gnu.org
  2022-05-18 14:49 ` bergner at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-18  2:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105556

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Peter Bergner <bergner@gcc.gnu.org>:

https://gcc.gnu.org/g:c6e36f05fbb081abb068958d8900ad34b303a70b

commit r13-579-gc6e36f05fbb081abb068958d8900ad34b303a70b
Author: Peter Bergner <bergner@linux.ibm.com>
Date:   Tue May 17 21:09:29 2022 -0500

    rs6000: Prefer assigning the MMA vector operands to altivec registers
[PR105556]

    When optimizing the DGEMM kernel in OpenBLAS to use MMA, the MMA code
    uses all 8 accumulators, which overlap all vs0-vs31 vector registers.
    Current trunk assigns one of the normal vector inputs to one of the MMA
    instructions, which forces us to spill one of the accumulators to memory,
    leading to poor performance.  The solution here is to replace the "wa"
    constraints for the vector input operands in the MMA instruction patterns
    with "v,?wa" so that we prefer using the altivec registers vs32-vs63
    over the vs0-vs31 registers.

    2022-05-17  Peter Bergner  <bergner@linux.ibm.com>
                Segher Boessenkool  <segher@kernel.crashing.org>

    gcc/
            PR target/105556
            * config/rs6000/mma.md (mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
            mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
            mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
            mma_<vvi4i4i4>, mma_<avvi4i4i4>): Replace "wa" constraints with
"v,?wa".
            Update other operands accordingly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/105556] RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled
  2022-05-10 20:37 [Bug target/105556] New: RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled bergner at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-05-18  2:33 ` cvs-commit at gcc dot gnu.org
@ 2022-05-18 14:49 ` bergner at gcc dot gnu.org
  2022-05-20 23:00 ` cvs-commit at gcc dot gnu.org
  2022-05-20 23:01 ` bergner at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: bergner at gcc dot gnu.org @ 2022-05-18 14:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105556

Peter Bergner <bergner at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|13.0                        |12.1.0

--- Comment #4 from Peter Bergner <bergner at gcc dot gnu.org> ---
Fixed on trunk.  I'll wait a few days before back porting to GCC 12.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/105556] RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled
  2022-05-10 20:37 [Bug target/105556] New: RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled bergner at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-05-18 14:49 ` bergner at gcc dot gnu.org
@ 2022-05-20 23:00 ` cvs-commit at gcc dot gnu.org
  2022-05-20 23:01 ` bergner at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-20 23:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105556

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-12 branch has been updated by Peter Bergner
<bergner@gcc.gnu.org>:

https://gcc.gnu.org/g:c83d78585078d6918853fbe0f74a3a78e88e3e32

commit r12-8406-gc83d78585078d6918853fbe0f74a3a78e88e3e32
Author: Peter Bergner <bergner@linux.ibm.com>
Date:   Tue May 17 21:09:29 2022 -0500

    rs6000: Prefer assigning the MMA vector operands to altivec registers
[PR105556]

    When optimizing the DGEMM kernel in OpenBLAS to use MMA, the MMA code
    uses all 8 accumulators, which overlap all vs0-vs31 vector registers.
    Current trunk assigns one of the normal vector inputs to one of the MMA
    instructions, which forces us to spill one of the accumulators to memory,
    leading to poor performance.  The solution here is to replace the "wa"
    constraints for the vector input operands in the MMA instruction patterns
    with "v,?wa" so that we prefer using the altivec registers vs32-vs63
    over the vs0-vs31 registers.

    2022-05-17  Peter Bergner  <bergner@linux.ibm.com>
                Segher Boessenkool  <segher@kernel.crashing.org>

    gcc/
            PR target/105556
            * config/rs6000/mma.md (mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
            mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
            mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
            mma_<vvi4i4i4>, mma_<avvi4i4i4>): Replace "wa" constraints with
"v,?wa".
            Update other operands accordingly.

    (cherry picked from commit c6e36f05fbb081abb068958d8900ad34b303a70b)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/105556] RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled
  2022-05-10 20:37 [Bug target/105556] New: RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled bergner at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-05-20 23:00 ` cvs-commit at gcc dot gnu.org
@ 2022-05-20 23:01 ` bergner at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: bergner at gcc dot gnu.org @ 2022-05-20 23:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105556

Peter Bergner <bergner at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #6 from Peter Bergner <bergner at gcc dot gnu.org> ---
Fixed on trunk and GCC 12.  At the moment, no need for GCC 11 or later
backports.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-05-20 23:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-10 20:37 [Bug target/105556] New: RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled bergner at gcc dot gnu.org
2022-05-10 20:39 ` [Bug target/105556] " bergner at gcc dot gnu.org
2022-05-10 20:48 ` bergner at gcc dot gnu.org
2022-05-18  2:33 ` cvs-commit at gcc dot gnu.org
2022-05-18 14:49 ` bergner at gcc dot gnu.org
2022-05-20 23:00 ` cvs-commit at gcc dot gnu.org
2022-05-20 23:01 ` bergner at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).