public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/91598] [8/9 regression] 60% speed drop on neon intrinsic loop
       [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
@ 2020-03-11 12:08 ` marxin at gcc dot gnu.org
  2021-01-11 20:35 ` wilco at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-03-11 12:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marxin at gcc dot gnu.org

--- Comment #6 from Martin Liška <marxin at gcc dot gnu.org> ---
commit r10-7073-g0b8393221177617f19e7c5c5c692b8c59f85fffb
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date:   Fri Mar 6 18:29:02 2020 +0000

    [AArch64] Use intrinsics for widening multiplies (PR91598)

    Inline assembler instructions don't have latency info and the scheduler
does
    not attempt to schedule them at all - it does not even honor latencies of
    asm source operands.  As a result, SIMD intrinsics which are implemented
using
    inline assembler perform very poorly, particularly on in-order cores.
    Add new patterns and intrinsics for widening multiplies, which results in a
    63% speedup for the example in the PR, thus fixing the reported regression.

        gcc/
            PR target/91598
            * config/aarch64/aarch64-builtins.c (TYPES_TERNOPU_LANE): Add
define.
            * config/aarch64/aarch64-simd.md
            (aarch64_vec_<su>mult_lane<Qlane>): Add new insn for widening lane
mul.
            (aarch64_vec_<su>mlal_lane<Qlane>): Likewise.
            * config/aarch64/aarch64-simd-builtins.def: Add intrinsics.
            * config/aarch64/arm_neon.h:
            (vmlal_lane_s16): Expand using intrinsics rather than inline asm.
            (vmlal_lane_u16): Likewise.
            (vmlal_lane_s32): Likewise.
            (vmlal_lane_u32): Likewise.
            (vmlal_laneq_s16): Likewise.
            (vmlal_laneq_u16): Likewise.
            (vmlal_laneq_s32): Likewise.
            (vmlal_laneq_u32): Likewise.
            (vmull_lane_s16): Likewise.
            (vmull_lane_u16): Likewise.
            (vmull_lane_s32): Likewise.
            (vmull_lane_u32): Likewise.
            (vmull_laneq_s16): Likewise.
            (vmull_laneq_u16): Likewise.
            (vmull_laneq_s32): Likewise.
            (vmull_laneq_u32): Likewise.
            * config/aarch64/iterators.md (Vcondtype): New iterator for lane
mul.
            (Qlane): Likewise.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/91598] [8/9 regression] 60% speed drop on neon intrinsic loop
       [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
  2020-03-11 12:08 ` [Bug target/91598] [8/9 regression] 60% speed drop on neon intrinsic loop marxin at gcc dot gnu.org
@ 2021-01-11 20:35 ` wilco at gcc dot gnu.org
  2021-05-05 12:25 ` clyon at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: wilco at gcc dot gnu.org @ 2021-01-11 20:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilco at gcc dot gnu.org

--- Comment #7 from Wilco <wilco at gcc dot gnu.org> ---
Fixed in GCC9 and trunk. Given the regression is so large, it is worth
backporting.

The other issue is that assembler statements (and likely any RTL instruction
without specified latencies) are badly scheduled. Honoring all known latencies
and assuming a reasonable default latency otherwise would be a better approach.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/91598] [8/9 regression] 60% speed drop on neon intrinsic loop
       [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
  2020-03-11 12:08 ` [Bug target/91598] [8/9 regression] 60% speed drop on neon intrinsic loop marxin at gcc dot gnu.org
  2021-01-11 20:35 ` wilco at gcc dot gnu.org
@ 2021-05-05 12:25 ` clyon at gcc dot gnu.org
  2021-05-14  9:52 ` [Bug target/91598] [9 " jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-05-05 12:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598

--- Comment #8 from Christophe Lyon <clyon at gcc dot gnu.org> ---
All intrinsics have been re-implemented, and I can see no asm version in
arm_neon.h as of r12-513.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/91598] [9 regression] 60% speed drop on neon intrinsic loop
       [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2021-05-05 12:25 ` clyon at gcc dot gnu.org
@ 2021-05-14  9:52 ` jakub at gcc dot gnu.org
  2021-06-01  8:15 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-14  9:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|8.5                         |9.4

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 8 branch is being closed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/91598] [9 regression] 60% speed drop on neon intrinsic loop
       [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2021-05-14  9:52 ` [Bug target/91598] [9 " jakub at gcc dot gnu.org
@ 2021-06-01  8:15 ` rguenth at gcc dot gnu.org
  2021-08-12  7:29 ` tnfchris at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-01  8:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.4                         |9.5

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9.4 is being released, retargeting bugs to GCC 9.5.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/91598] [9 regression] 60% speed drop on neon intrinsic loop
       [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-06-01  8:15 ` rguenth at gcc dot gnu.org
@ 2021-08-12  7:29 ` tnfchris at gcc dot gnu.org
  2021-08-12  7:40 ` mkuvyrkov at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2021-08-12  7:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tnfchris at gcc dot gnu.org

--- Comment #11 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Can this issue be closed? all inline assembly have been removed from arm_neon.h
but backporting these are extremely unlikely.. (some are intrusive)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/91598] [9 regression] 60% speed drop on neon intrinsic loop
       [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2021-08-12  7:29 ` tnfchris at gcc dot gnu.org
@ 2021-08-12  7:40 ` mkuvyrkov at gcc dot gnu.org
  2021-08-12  7:46 ` tnfchris at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: mkuvyrkov at gcc dot gnu.org @ 2021-08-12  7:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598

--- Comment #12 from Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #11)
> Can this issue be closed? all inline assembly have been removed from
> arm_neon.h but backporting these are extremely unlikely.. (some are
> intrusive)

Hi Tamar,

Looking at this I now remember that I have a couple of minor patches approved,
but which I forgot to commit.

Let me retest and commit those, and we can close this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/91598] [9 regression] 60% speed drop on neon intrinsic loop
       [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2021-08-12  7:40 ` mkuvyrkov at gcc dot gnu.org
@ 2021-08-12  7:46 ` tnfchris at gcc dot gnu.org
  2021-08-17 10:12 ` cvs-commit at gcc dot gnu.org
  2021-08-17 10:13 ` mkuvyrkov at gcc dot gnu.org
  9 siblings, 0 replies; 10+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2021-08-12  7:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598

--- Comment #13 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Maxim Kuvyrkov from comment #12)
> (In reply to Tamar Christina from comment #11)
> > Can this issue be closed? all inline assembly have been removed from
> > arm_neon.h but backporting these are extremely unlikely.. (some are
> > intrusive)
> 
> Hi Tamar,
> 
> Looking at this I now remember that I have a couple of minor patches
> approved, but which I forgot to commit.
> 
> Let me retest and commit those, and we can close this.

Hi Maxim, Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/91598] [9 regression] 60% speed drop on neon intrinsic loop
       [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2021-08-12  7:46 ` tnfchris at gcc dot gnu.org
@ 2021-08-17 10:12 ` cvs-commit at gcc dot gnu.org
  2021-08-17 10:13 ` mkuvyrkov at gcc dot gnu.org
  9 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-08-17 10:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598

--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Maxim Kuvyrkov <mkuvyrkov@gcc.gnu.org>:

https://gcc.gnu.org/g:6d527883072ce96a33169036fca7740172223b52

commit r12-2946-g6d527883072ce96a33169036fca7740172223b52
Author: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Date:   Thu Aug 29 15:21:36 2019 +0000

    Improve autoprefetcher heuristic (partly fix regression in PR91598)

            PR rtl-optimization/91598
            * haifa-sched.c (autopref_rank_for_schedule): Prioritize
"irrelevant"
            insns after memory reads and before memory writes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/91598] [9 regression] 60% speed drop on neon intrinsic loop
       [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2021-08-17 10:12 ` cvs-commit at gcc dot gnu.org
@ 2021-08-17 10:13 ` mkuvyrkov at gcc dot gnu.org
  9 siblings, 0 replies; 10+ messages in thread
From: mkuvyrkov at gcc dot gnu.org @ 2021-08-17 10:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598

Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #15 from Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> ---
Closing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-17 10:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-91598-4@http.gcc.gnu.org/bugzilla/>
2020-03-11 12:08 ` [Bug target/91598] [8/9 regression] 60% speed drop on neon intrinsic loop marxin at gcc dot gnu.org
2021-01-11 20:35 ` wilco at gcc dot gnu.org
2021-05-05 12:25 ` clyon at gcc dot gnu.org
2021-05-14  9:52 ` [Bug target/91598] [9 " jakub at gcc dot gnu.org
2021-06-01  8:15 ` rguenth at gcc dot gnu.org
2021-08-12  7:29 ` tnfchris at gcc dot gnu.org
2021-08-12  7:40 ` mkuvyrkov at gcc dot gnu.org
2021-08-12  7:46 ` tnfchris at gcc dot gnu.org
2021-08-17 10:12 ` cvs-commit at gcc dot gnu.org
2021-08-17 10:13 ` mkuvyrkov at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).