public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
@ 2024-04-16 12:34 nsz at gcc dot gnu.org
2024-04-16 12:55 ` [Bug target/114741] " pinskia at gcc dot gnu.org
` (11 more replies)
0 siblings, 12 replies; 13+ messages in thread
From: nsz at gcc dot gnu.org @ 2024-04-16 12:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Bug ID: 114741
Summary: [14 regression] aarch64 sve: unnecessary fmov for
scalar int bit operations
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: nsz at gcc dot gnu.org
Target Milestone: ---
void foo(unsigned i, unsigned *p)
{
*p = i & 1;
}
with gcc -march=armv8-a+sve -O2 compiles to
foo:
fmov s31, w0
and z31.s, z31.s, #1
str s31, [x1]
ret
instead of
foo:
and w0, w0, 1
str w0, [x1]
ret
it is wrong with -mcpu=generic but good e.g. with -mcpu=neoverse-v1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
@ 2024-04-16 12:55 ` pinskia at gcc dot gnu.org
2024-04-16 14:55 ` wilco at gcc dot gnu.org
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-16 12:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target Milestone|--- |14.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
2024-04-16 12:55 ` [Bug target/114741] " pinskia at gcc dot gnu.org
@ 2024-04-16 14:55 ` wilco at gcc dot gnu.org
2024-04-16 15:45 ` wilco at gcc dot gnu.org
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-04-16 14:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wilco at gcc dot gnu.org
--- Comment #1 from Wilco <wilco at gcc dot gnu.org> ---
This example always goes wrong:
void foo2(unsigned *p)
{
*p &= 1;
}
Eg. with -mcpu=neoverse-v1:
ldr s31, [x0]
and z31.s, z31.s, #1
str s31, [x0]
ret
This doesn't make any sense since there are usually fewer vector units than
integer ALUs, and the typically have higher latency.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
2024-04-16 12:55 ` [Bug target/114741] " pinskia at gcc dot gnu.org
2024-04-16 14:55 ` wilco at gcc dot gnu.org
@ 2024-04-16 15:45 ` wilco at gcc dot gnu.org
2024-04-16 16:42 ` pinskia at gcc dot gnu.org
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-04-16 15:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
--- Comment #2 from Wilco <wilco at gcc dot gnu.org> ---
It looks like the underlying bug is '^' being incorrectly treated like '?' in
record_reg_classes (which is never used during reload). Fixing that results in
the expected code being generated in all cases. It looks this issue was
introduced in the original commit d1457701461d5a49ca6b5d8a6d1c83a37a6dc771
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
` (2 preceding siblings ...)
2024-04-16 15:45 ` wilco at gcc dot gnu.org
@ 2024-04-16 16:42 ` pinskia at gcc dot gnu.org
2024-04-16 18:03 ` pinskia at gcc dot gnu.org
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-16 16:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2024-04-16
Status|UNCONFIRMED |NEW
CC| |pinskia at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
` (3 preceding siblings ...)
2024-04-16 16:42 ` pinskia at gcc dot gnu.org
@ 2024-04-16 18:03 ` pinskia at gcc dot gnu.org
2024-04-16 19:18 ` tnfchris at gcc dot gnu.org
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-16 18:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Wilco from comment #2)
> It looks like the underlying bug is '^' being incorrectly treated like '?'
> in record_reg_classes (which is never used during reload). Fixing that
> results in the expected code being generated in all cases. It looks this
> issue was introduced in the original commit
> d1457701461d5a49ca6b5d8a6d1c83a37a6dc771
static const struct cpu_regmove_cost generic_armv9_a_regmove_cost =
{
1, /* GP2GP */
/* Spilling to int<->fp instead of memory is recommended so set
realistic costs compared to memmov_cost. */
3, /* GP2FP */
2, /* FP2GP */
2 /* FP2FP */
};
Note these costs are broken.
TARGET_REGISTER_MOVE_COST has this to say about the special value 2:
```
If reload sees an insn consisting of a single set between two hard registers,
and if TARGET_REGISTER_MOVE_COST applied to their classes returns a value of 2,
reload does not check to ensure that the constraints of the insn are met.
Setting a cost of other than 2 will allow reload to verify that the constraints
are met. You should do this if the ‘movm’ pattern’s constraints do not allow
such copying.
```
The way I implemented this for thunderx was have GP2GP being cost of 2 and then
be relative from there. That gave better code generation in general and even
less spilling. I know I wrote about this before too.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
` (4 preceding siblings ...)
2024-04-16 18:03 ` pinskia at gcc dot gnu.org
@ 2024-04-16 19:18 ` tnfchris at gcc dot gnu.org
2024-04-16 19:20 ` tnfchris at gcc dot gnu.org
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-04-16 19:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tnfchris at gcc dot gnu.org,
| |vmakarov at gcc dot gnu.org
--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #4)
> (In reply to Wilco from comment #2)
> > It looks like the underlying bug is '^' being incorrectly treated like '?'
> > in record_reg_classes (which is never used during reload). Fixing that
> > results in the expected code being generated in all cases. It looks this
> > issue was introduced in the original commit
> > d1457701461d5a49ca6b5d8a6d1c83a37a6dc771
>
> static const struct cpu_regmove_cost generic_armv9_a_regmove_cost =
> {
> 1, /* GP2GP */
> /* Spilling to int<->fp instead of memory is recommended so set
> realistic costs compared to memmov_cost. */
> 3, /* GP2FP */
> 2, /* FP2GP */
> 2 /* FP2FP */
> };
>
>
> Note these costs are broken.
> TARGET_REGISTER_MOVE_COST has this to say about the special value 2:
> ```
> If reload sees an insn consisting of a single set between two hard
> registers, and if TARGET_REGISTER_MOVE_COST applied to their classes returns
> a value of 2, reload does not check to ensure that the constraints of the
> insn are met. Setting a cost of other than 2 will allow reload to verify
> that the constraints are met. You should do this if the ‘movm’ pattern’s
> constraints do not allow such copying.
> ```
>
> The way I implemented this for thunderx was have GP2GP being cost of 2 and
> then be relative from there. That gave better code generation in general and
> even less spilling. I know I wrote about this before too.
I don't think this is related to this at all
the old generic costs, which armv8 was taken from doesn't use 2, and is broken
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/generic.h#L42
generic-armv8-a doesn't use 2 and is broken
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/generic_armv8_a.h#L43
neoverse-n2 uses 2 and isn't broken
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/neoversen2.h
I think the issue is what Wilco mentioned before.
The GCC docs claim that ^ shouldn't affect initial costing/IRA, but it clearly
does.
it's penalizing the r->r alternative during initial costing and not just during
lra if a reload is needed.
so the question to Vlad is it correct that ^ is treated the same way as ?
during initial costing? i.e. it's penalizing the alternative.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
` (5 preceding siblings ...)
2024-04-16 19:18 ` tnfchris at gcc dot gnu.org
@ 2024-04-16 19:20 ` tnfchris at gcc dot gnu.org
2024-04-17 13:33 ` law at gcc dot gnu.org
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-04-16 19:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
--- Comment #6 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
and the exact armv9-a cost model you quoted, also does the right codegen.
https://godbolt.org/z/obafoT6cj
There is just an inexplicable penalty being applied to the r->r alternative.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
` (6 preceding siblings ...)
2024-04-16 19:20 ` tnfchris at gcc dot gnu.org
@ 2024-04-17 13:33 ` law at gcc dot gnu.org
2024-04-17 13:47 ` wilco at gcc dot gnu.org
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: law at gcc dot gnu.org @ 2024-04-17 13:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |law at gcc dot gnu.org
Priority|P3 |P2
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
` (7 preceding siblings ...)
2024-04-17 13:33 ` law at gcc dot gnu.org
@ 2024-04-17 13:47 ` wilco at gcc dot gnu.org
2024-04-17 13:49 ` tnfchris at gcc dot gnu.org
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-04-17 13:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
--- Comment #7 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #6)
> and the exact armv9-a cost model you quoted, also does the right codegen.
> https://godbolt.org/z/obafoT6cj
>
> There is just an inexplicable penalty being applied to the r->r alternative.
Indeed it is not related to cost model - building SPEC shows a significant
regression (~1%) with -mcpu=neoverse-v1 due to AND immediate being quite common
in scalar code. The '^' incorrectly forces many cases to use the SVE
alternative.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
` (8 preceding siblings ...)
2024-04-17 13:47 ` wilco at gcc dot gnu.org
@ 2024-04-17 13:49 ` tnfchris at gcc dot gnu.org
2024-04-18 10:49 ` cvs-commit at gcc dot gnu.org
2024-04-18 10:51 ` tnfchris at gcc dot gnu.org
11 siblings, 0 replies; 13+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-04-17 13:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org
Status|NEW |ASSIGNED
--- Comment #8 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Indeed, Regtesting a patch that fixes it, so mine...
It looks like ^ doesn't work well when there's a choice of multiple register
files.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
` (9 preceding siblings ...)
2024-04-17 13:49 ` tnfchris at gcc dot gnu.org
@ 2024-04-18 10:49 ` cvs-commit at gcc dot gnu.org
2024-04-18 10:51 ` tnfchris at gcc dot gnu.org
11 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-04-18 10:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:
https://gcc.gnu.org/g:a2f4be3dae04fa8606d1cc8451f0b9d450f7e6e6
commit r14-10014-ga2f4be3dae04fa8606d1cc8451f0b9d450f7e6e6
Author: Tamar Christina <tamar.christina@arm.com>
Date: Thu Apr 18 11:47:42 2024 +0100
AArch64: remove reliance on register allocator for simd/gpreg costing.
[PR114741]
In PR114741 we see that we have a regression in codegen when SVE is enable
where
the simple testcase:
void foo(unsigned v, unsigned *p)
{
*p = v & 1;
}
generates
foo:
fmov s31, w0
and z31.s, z31.s, #1
str s31, [x1]
ret
instead of:
foo:
and w0, w0, 1
str w0, [x1]
ret
This causes an impact it not just codesize but also performance. This is
caused
by the use of the ^ constraint modifier in the pattern <optab><mode>3.
The documentation states that this modifier should only have an effect on
the
alternative costing in that a particular alternative is to be preferred
unless
a non-psuedo reload is needed.
The pattern was trying to convey that whenever both r and w are required,
that
it should prefer r unless a reload is needed. This is because if a reload
is
needed then we can construct the constants more flexibly on the SIMD side.
We were using this so simplify the implementation and to get generic cases
such
as:
double negabs (double x)
{
unsigned long long y;
memcpy (&y, &x, sizeof(double));
y = y | (1UL << 63);
memcpy (&x, &y, sizeof(double));
return x;
}
which don't go through an expander.
However the implementation of ^ in the register allocator is not according
to
the documentation in that it also has an effect during coloring. During
initial
register class selection it applies a penalty to a class, similar to how ?
does.
In this example the penalty makes the use of GP regs expensive enough that
it no
longer considers them:
r106: preferred FP_REGS, alternative NO_REGS, allocno FP_REGS
;; 3--> b 0: i 9 r106=r105&0x1
:cortex_a53_slot_any:GENERAL_REGS+0(-1)FP_REGS+1(1)PR_LO_REGS+0(0)
PR_HI_REGS+0(0):model 4
which is not the expected behavior. For GCC 14 this is a conservative fix.
1. we remove the ^ modifier from the logical optabs.
2. In order not to regress copysign we then move the copysign expansion to
directly use the SIMD variant. Since copysign only supports floating
point
modes this is fine and no longer relies on the register allocator to
select
the right alternative.
It once again regresses the general case, but this case wasn't optimized in
earlier GCCs either so it's not a regression in GCC 14. This change gives
strict better codegen than earlier GCCs and still optimizes the important
cases.
gcc/ChangeLog:
PR target/114741
* config/aarch64/aarch64.md (<optab><mode>3): Remove ^ from alt 2.
(copysign<GPF:mode>3): Use SIMD version of IOR directly.
gcc/testsuite/ChangeLog:
PR target/114741
* gcc.target/aarch64/fneg-abs_2.c: Update codegen.
* gcc.target/aarch64/fneg-abs_4.c: xfail for now.
* gcc.target/aarch64/pr114741.c: New test.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
` (10 preceding siblings ...)
2024-04-18 10:49 ` cvs-commit at gcc dot gnu.org
@ 2024-04-18 10:51 ` tnfchris at gcc dot gnu.org
11 siblings, 0 replies; 13+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-04-18 10:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #10 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Fixed, thanks for the report.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-04-18 10:51 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-16 12:34 [Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations nsz at gcc dot gnu.org
2024-04-16 12:55 ` [Bug target/114741] " pinskia at gcc dot gnu.org
2024-04-16 14:55 ` wilco at gcc dot gnu.org
2024-04-16 15:45 ` wilco at gcc dot gnu.org
2024-04-16 16:42 ` pinskia at gcc dot gnu.org
2024-04-16 18:03 ` pinskia at gcc dot gnu.org
2024-04-16 19:18 ` tnfchris at gcc dot gnu.org
2024-04-16 19:20 ` tnfchris at gcc dot gnu.org
2024-04-17 13:33 ` law at gcc dot gnu.org
2024-04-17 13:47 ` wilco at gcc dot gnu.org
2024-04-17 13:49 ` tnfchris at gcc dot gnu.org
2024-04-18 10:49 ` cvs-commit at gcc dot gnu.org
2024-04-18 10:51 ` tnfchris at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).