From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 0BA473AA9415; Thu,  4 Mar 2021 12:14:22 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0BA473AA9415
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is
 slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
Date: Thu, 04 Mar 2021 12:14:22 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: missed-optimization, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 11.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc keywords
Message-ID: <bug-98856-4-H0yX72NteN@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-98856-4@http.gcc.gnu.org/bugzilla/>
References: <bug-98856-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Mar 2021 12:14:23 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98856

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org
           Keywords|                            |ra
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
So coming back here.  We're presenting RA with a quite hard problem given we
have

(insn 7 4 8 2 (set (reg:TI 84 [ _9 ])
        (mem:TI (reg:DI 101) [0 MEM <__int128 unsigned> [(char *
{ref-all})in_8(D)]+0 S16 A8])) 73 {*movti_internal}
     (expr_list:REG_DEAD (reg:DI 101)
        (nil)))
(insn 8 7 9 2 (parallel [
            (set (reg:DI 95)
                (lshiftrt:DI (subreg:DI (reg:TI 84 [ _9 ]) 8)
                    (const_int 63 [0x3f])))
            (clobber (reg:CC 17 flags))
        ]) "t.c":7:26 703 {*lshrdi3_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
..
(insn 10 9 11 2 (parallel [
            (set (reg:DI 97)
                (lshiftrt:DI (subreg:DI (reg:TI 84 [ _9 ]) 0)
                    (const_int 63 [0x3f])))
            (clobber (reg:CC 17 flags))
        ]) "t.c":8:30 703 {*lshrdi3_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
..
(insn 12 11 13 2 (set (reg:V2DI 98 [ vect__5.3 ])
        (ashift:V2DI (subreg:V2DI (reg:TI 84 [ _9 ]) 0)
            (const_int 1 [0x1]))) "t.c":9:16 3611 {ashlv2di3}
     (expr_list:REG_DEAD (reg:TI 84 [ _9 ])
        (nil)))

where I wonder why we keep the (subreg:DI (reg:TI 84 ...) 8) around
for so long.  Probably the subreg pass gives up because of the V2DImode
subreg of that reg.

That said RA chooses xmm for reg:84 but then spills it immediately
to fulfil the subregs even though there's mov and pextrd that could
be used or the reload could use the original mem.  That we reload
even the xmm use is another odd thing.

Vlad, I'm not sure about the possibilities LRA has here but maybe
you can have a look at the testcase in comment#6 (use -O3 -march=3Dznver2
or -march=3Dcore-avx2).  For one I expected

        vmovdqu (%rsi), %xmm2
        vmovdqa %xmm2, -24(%rsp)
        movq    -16(%rsp), %rax   (2a)
        vmovdqa -24(%rsp), %xmm4  (1)
...
        movq    -24(%rsp), %rdx   (2b)

(1) to be not there (not sure how that even survives postreload
optimizations...)
(2a/b) to be 'inherited' by instead loading from (%rsi) and 8(%rsi) which
is maybe too much being asked because it requires aliasing considerations

That is, even if we don't consider using

   movq %xmm2, %rax (2a)
   pextrd %xmm2, %rdx, 1 (2b)

I expected us to not spill.=