[Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2
@ 2023-01-28 14:00 zsojka at seznam dot cz
  2023-01-30  4:19 ` [Bug target/108583] " pinskia at gcc dot gnu.org
                   ` (31 more replies)
  0 siblings, 32 replies; 33+ messages in thread
From: zsojka at seznam dot cz @ 2023-01-28 14:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

            Bug ID: 108583
           Summary: [13 Regression] wrong code with vector division by
                    uint16 at -O2
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zsojka at seznam dot cz
  Target Milestone: ---
              Host: x86_64-pc-linux-gnu
            Target: aarch64-unknown-linux-gnu

Created attachment 54366
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54366&action=edit
reduced testcase

Output:
$ aarch64-unknown-linux-gnu-gcc -O2 testcase.c -static
$ qemu-aarch64 -- ./a.out 
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted

$ aarch64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r13-5461-20230127201120-g815e5740162-checking-yes-rtl-df-extra-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/13.0.1/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--with-cloog --with-ppl --with-isl
--with-sysroot=/usr/aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu
--with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld
--with-as=/usr/bin/aarch64-unknown-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r13-5461-20230127201120-g815e5740162-checking-yes-rtl-df-extra-aarch64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.1 20230127 (experimental) (GCC)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
@ 2023-01-30  4:19 ` pinskia at gcc dot gnu.org
  2023-01-30  4:22 ` pinskia at gcc dot gnu.org
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-01-30  4:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-01-30
          Component|rtl-optimization            |target
             Status|UNCONFIRMED                 |NEW
   Target Milestone|---                         |13.0
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed. The problem shows up in the expansion of:
  v_2 = v_1(D) / { 65535, 65535, 65535, 65535 };

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
  2023-01-30  4:19 ` [Bug target/108583] " pinskia at gcc dot gnu.org
@ 2023-01-30  4:22 ` pinskia at gcc dot gnu.org
  2023-01-30  8:31 ` rguenth at gcc dot gnu.org
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-01-30  4:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tnfchris at gcc dot gnu.org

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Most likely introduced by r13-4026-gc98aabc1427a4d .

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
  2023-01-30  4:19 ` [Bug target/108583] " pinskia at gcc dot gnu.org
  2023-01-30  4:22 ` pinskia at gcc dot gnu.org
@ 2023-01-30  8:31 ` rguenth at gcc dot gnu.org
  2023-01-30 14:20 ` tnfchris at gcc dot gnu.org
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-01-30  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (2 preceding siblings ...)
  2023-01-30  8:31 ` rguenth at gcc dot gnu.org
@ 2023-01-30 14:20 ` tnfchris at gcc dot gnu.org
  2023-01-30 14:52 ` rguenther at suse dot de
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-01-30 14:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org

--- Comment #3 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Right, so this is because in the expansion we don't have enough context to
decide how to optimize the division.

This optimization is only possible when the input is widened because you need
an additional free bit so that the second addition can't overflow.

The vectorizer has this context but since we didn't want a new IFN the context
should instead be derivable in targetm.vectorize.can_special_div_by_const hook.

So my proposal to fix this and keep targetm.vectorize.can_special_div_by_const
as a general divide optimization hook is to pass the actual `tree` operand0 as
well to the hook such that the hook has a bit more context.

I was hoping to use get_nonzero_bits to get what the actual range of the
operand is.  But it looks like for widening operations it still reports -1.

So my suggestion is the backend should just check if the operand is a
non-constant and the type of the operation being performed is matches the
precision we require.

Is this ok with you Richards? since I'll need both of you to approve.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (3 preceding siblings ...)
  2023-01-30 14:20 ` tnfchris at gcc dot gnu.org
@ 2023-01-30 14:52 ` rguenther at suse dot de
  2023-01-30 15:01 ` tnfchris at gcc dot gnu.org
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rguenther at suse dot de @ 2023-01-30 14:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 30 Jan 2023, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
> 
> Tamar Christina <tnfchris at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |rguenth at gcc dot gnu.org,
>                    |                            |rsandifo at gcc dot gnu.org
> 
> --- Comment #3 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> Right, so this is because in the expansion we don't have enough context to
> decide how to optimize the division.
> 
> This optimization is only possible when the input is widened because you need
> an additional free bit so that the second addition can't overflow.
> 
> The vectorizer has this context but since we didn't want a new IFN the 
> context should instead be derivable in 
> targetm.vectorize.can_special_div_by_const hook.

The vectorizer doesn't check for a widened operand zero.  In fact
"can_special_div_by_const" doesn't suggest that widening is required.
If the vectorizer checks that then why do we need another operand?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (4 preceding siblings ...)
  2023-01-30 14:52 ` rguenther at suse dot de
@ 2023-01-30 15:01 ` tnfchris at gcc dot gnu.org
  2023-01-30 16:52 ` rsandifo at gcc dot gnu.org
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-01-30 15:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > The vectorizer has this context but since we didn't want a new IFN the 
> > context should instead be derivable in 
> > targetm.vectorize.can_special_div_by_const hook.
> 
> The vectorizer doesn't check for a widened operand zero.  In fact
> "can_special_div_by_const" doesn't suggest that widening is required.

Correct, at the moment the generalized so far as to be a generic div by
constant operation.  But the original sequence wanted to optimize a sequence
that needs more context than what was given to it.

> If the vectorizer checks that then why do we need another operand?

Because the expansion gets triggered by intrinsics code as well:

typedef unsigned __attribute__((__vector_size__ (16))) V;

static __attribute__((__noinline__)) __attribute__((__noclone__)) V
foo (V v, unsigned short i)
{
  v /= i;
  return v;
}

It's invalid to do this particular decomposition we do in AArch64, but the hook
doesn't know this.  Since the new IFN wasn't like we're just leaving the `/`
alone in the vectorizer.  But this means the expansion code, or the backend
needs to be able to check the context in which we are expanding.

We don't have an integer vector division operation. So the operation needs to
be decomposed.  And if we decompose it, then the sequence becomes too long for
combine to match and we're back where we started.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (5 preceding siblings ...)
  2023-01-30 15:01 ` tnfchris at gcc dot gnu.org
@ 2023-01-30 16:52 ` rsandifo at gcc dot gnu.org
  2023-01-30 17:04 ` tnfchris at gcc dot gnu.org
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-01-30 16:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #6 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #3)
> The vectorizer has this context but since we didn't want a new IFN the
> context should instead be derivable in
> targetm.vectorize.can_special_div_by_const hook.
I probably got lost in the threading, but could you point to where
the idea of using an ifn was rejected?  I know there was pushback
against hard-coding a specific constant, but that doesn't prevent
the use of ifns in itself.  (E.g. the gather/scatter IFNs have
constant arguments that are checked against what the target allows.)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (6 preceding siblings ...)
  2023-01-30 16:52 ` rsandifo at gcc dot gnu.org
@ 2023-01-30 17:04 ` tnfchris at gcc dot gnu.org
  2023-01-31 10:31 ` rguenth at gcc dot gnu.org
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-01-30 17:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #7 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to rsandifo@gcc.gnu.org from comment #6)
> (In reply to Tamar Christina from comment #3)
> > The vectorizer has this context but since we didn't want a new IFN the
> > context should instead be derivable in
> > targetm.vectorize.can_special_div_by_const hook.
> I probably got lost in the threading, but could you point to where
> the idea of using an ifn was rejected?  I know there was pushback
> against hard-coding a specific constant, but that doesn't prevent
> the use of ifns in itself.  (E.g. the gather/scatter IFNs have
> constant arguments that are checked against what the target allows.)

https://patchwork.sourceware.org/project/gcc/patch/patch-15779-tamar@arm.com/#124788
is where the original patch used an IFN.
https://patchwork.sourceware.org/project/gcc/patch/patch-15779-tamar@arm.com/#124863
where Richi wouldn't ACK a new optab as it didn't have a physical instruction
backing it on any ISA, and in fairness, no one else did either effectively
stranding the change.

https://patchwork.sourceware.org/project/gcc/patch/patch-15779-tamar@arm.com/#125144
where I pinged and got no response to the ping. After which I went on IRC and
asked Richi how he'd like me to proceed.  In reply to this I was instructed he
would like to proceed the same way vector permutes are currently handled with
`can_perm` etc.  and that's where the patch thread picks off back on the ML.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (7 preceding siblings ...)
  2023-01-30 17:04 ` tnfchris at gcc dot gnu.org
@ 2023-01-31 10:31 ` rguenth at gcc dot gnu.org
  2023-01-31 11:01 ` rsandifo at gcc dot gnu.org
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-01-31 10:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #7)
> (In reply to rsandifo@gcc.gnu.org from comment #6)
> > (In reply to Tamar Christina from comment #3)
> > > The vectorizer has this context but since we didn't want a new IFN the
> > > context should instead be derivable in
> > > targetm.vectorize.can_special_div_by_const hook.
> > I probably got lost in the threading, but could you point to where
> > the idea of using an ifn was rejected?  I know there was pushback
> > against hard-coding a specific constant, but that doesn't prevent
> > the use of ifns in itself.  (E.g. the gather/scatter IFNs have
> > constant arguments that are checked against what the target allows.)
> 
> https://patchwork.sourceware.org/project/gcc/patch/patch-15779-tamar@arm.com/
> #124788 is where the original patch used an IFN.
> https://patchwork.sourceware.org/project/gcc/patch/patch-15779-tamar@arm.com/
> #124863 where Richi wouldn't ACK a new optab as it didn't have a physical
> instruction backing it on any ISA, and in fairness, no one else did either
> effectively stranding the change.

Yep.  The other option would have to expose whatever special instruction
is used in the end as IFN and rewrite the division during vectorization
"properly", either via some pattern or via doing the expansion when
emitting the vectorized division instruction.

In hindsight that might have been the better option, but then ...

> https://patchwork.sourceware.org/project/gcc/patch/patch-15779-tamar@arm.com/
> #125144 where I pinged and got no response to the ping. After which I went
> on IRC and asked Richi how he'd like me to proceed.  In reply to this I was
> instructed he would like to proceed the same way vector permutes are
> currently handled with `can_perm` etc.  and that's where the patch thread
> picks off back on the ML.

... this looked simpler and also reasonable.  Only that it breaks now.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (8 preceding siblings ...)
  2023-01-31 10:31 ` rguenth at gcc dot gnu.org
@ 2023-01-31 11:01 ` rsandifo at gcc dot gnu.org
  2023-01-31 11:39 ` tnfchris at gcc dot gnu.org
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-01-31 11:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #9 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Are we sure this is a vectoriser vs. C vectors thing?
The equivalent vectoriser test:

void __attribute__((noipa))
f (unsigned *v)
{
  for (int i = 0; i < 4; ++i)
    v[i] /= 0xffff;
}

int
main (void)
{
  unsigned v[] = { 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff };
  f (v);
  for (unsigned i = 0; i < sizeof (v) / sizeof (v[0]); i++)
    if (v[i] != 0x00010001)
      __builtin_abort ();
  return 0;
}

fails in the same way for me (compiled at -O3).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (9 preceding siblings ...)
  2023-01-31 11:01 ` rsandifo at gcc dot gnu.org
@ 2023-01-31 11:39 ` tnfchris at gcc dot gnu.org
  2023-01-31 11:44 ` rguenther at suse dot de
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-01-31 11:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #10 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to rsandifo@gcc.gnu.org from comment #9)
> Are we sure this is a vectoriser vs. C vectors thing?

it's not, the issue we're debating is how to fix it.

As Richi pointed out https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583#c4 we
can fix the vectorizer case by adding a guard around the recog code in the
vectorizer.

But that only fixes the vectorizer, and Richi asked why that wasn't enough, to
which I pointed out it's because intrinsics code hits it as well.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583#c5

This is why I mentioned that there are two ways to fix this:

1. We make the hook less general, and add checks in every place it is called.
2. We keep the hook general and pass enough context to the hook for it to
decide how to optimize the division.

it's not intrinsics only, but a vectorizer only fix won't fix intrinsics.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (10 preceding siblings ...)
  2023-01-31 11:39 ` tnfchris at gcc dot gnu.org
@ 2023-01-31 11:44 ` rguenther at suse dot de
  2023-01-31 11:58 ` tnfchris at gcc dot gnu.org
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rguenther at suse dot de @ 2023-01-31 11:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #11 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 31 Jan 2023, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
> 
> --- Comment #10 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> (In reply to rsandifo@gcc.gnu.org from comment #9)
> > Are we sure this is a vectoriser vs. C vectors thing?
> 
> it's not, the issue we're debating is how to fix it.
> 
> As Richi pointed out https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583#c4 we
> can fix the vectorizer case by adding a guard around the recog code in the
> vectorizer.
> 
> But that only fixes the vectorizer, and Richi asked why that wasn't enough, to
> which I pointed out it's because intrinsics code hits it as well.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583#c5
> 
> This is why I mentioned that there are two ways to fix this:
> 
> 1. We make the hook less general, and add checks in every place it is called.
> 2. We keep the hook general and pass enough context to the hook for it to
> decide how to optimize the division.

I don't think passing in for example the tree operand 0 helps, the
target appearantly wants to not specialize by the constant divided by
but instead also by the number divided (of we already know its type).

Do you plan to do SSA def pattern matching in the hook for this at
expand time?

As said, recognizing the special case and open-coding the full divison
in the vectorizer (or possibly even from RTL expansion) would be
another option here.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (11 preceding siblings ...)
  2023-01-31 11:44 ` rguenther at suse dot de
@ 2023-01-31 11:58 ` tnfchris at gcc dot gnu.org
  2023-01-31 12:03 ` rsandifo at gcc dot gnu.org
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-01-31 11:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #12 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #11)
> On Tue, 31 Jan 2023, tnfchris at gcc dot gnu.org wrote:
> 
> 
> I don't think passing in for example the tree operand 0 helps, the
> target appearantly wants to not specialize by the constant divided by
> but instead also by the number divided (of we already know its type).
> 

The target wants to specialize division by a specific constant.  It just
happens that there is an additional specialization when certain bits are known
to be free in operand0.

> Do you plan to do SSA def pattern matching in the hook for this at
> expand time?
> 

My intention was to only look if the DEF_STMT is a gimple assign, and then the
operands to it are half the precision of the result.  This catches *w, *>> and
any other case.  It doesn't need to know further up because it's harder to
track non-zero bits across different math operations.  and this is enough for
the use case we're interested in.

> As said, recognizing the special case and open-coding the full divison
> in the vectorizer (or possibly even from RTL expansion) would be
> another option here.

open-coding in the vectorizer will require 4 new IFNs and 4 new optabs as it
requires 4 different instructions. 2 for neon, 2 for SVE, which are
semantically different as permutes work different across SVE and NEON.  So if
you're happy with that I will do that.

> (or possibly even from RTL expansion) would be another option here.

I don't really follow this one. The open coding at expand time is what I was
trying to do before.  The issue is there's no longer a single statement if we
don't stop the vectorizer from decomposing it into shifts and multiplies.

As long as the vectorizer does this decomposition there's not much to do at
expand time.  Additionally the number of widening operations after
vectorization is large. So to semantically match them with the decomposed shift
seems impossible..

But those are only my thoughts :) ultimately I'll do the approach you two want.
 If you want me to revert to using IFNs I can do that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (12 preceding siblings ...)
  2023-01-31 11:58 ` tnfchris at gcc dot gnu.org
@ 2023-01-31 12:03 ` rsandifo at gcc dot gnu.org
  2023-01-31 12:19 ` rguenther at suse dot de
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-01-31 12:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #13 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
OK, hopefully I understand now.  Sorry for being slow.

But what specific constraints do we want to apply to the optimisation?

(In reply to Tamar Christina from comment #3)
> Right, so this is because in the expansion we don't have enough context to
> decide how to optimize the division.
> 
> This optimization is only possible when the input is widened because you
> need an additional free bit so that the second addition can't overflow.
> 
> The vectorizer has this context but since we didn't want a new IFN the
> context should instead be derivable in
> targetm.vectorize.can_special_div_by_const hook.
> 
> So my proposal to fix this and keep
> targetm.vectorize.can_special_div_by_const as a general divide optimization
> hook is to pass the actual `tree` operand0 as well to the hook such that the
> hook has a bit more context.
> 
> I was hoping to use get_nonzero_bits to get what the actual range of the
> operand is.  But it looks like for widening operations it still reports -1.
The original motivating example was:

void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
  for (int i = 0; i < (n & -16); i+=1)
    pixel[i] = (pixel[i] * level) / 0xff;
}

where we then do a 16-bit division.  But since level and pixel[i] are
unconstrained, the maximum value of pixel[i] * level is 0xfe01.
There's no free bit in that sense.  It seems more that range of the
dividend is [0, N*N] for a divisor of N.

If that's the condition we want to test for, it seems like something
we need to check in the vectoriser rather than the hook.  And it's
not something we can easily do in the vector form, since we don't
track ranges for vectors (AFAIK).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (13 preceding siblings ...)
  2023-01-31 12:03 ` rsandifo at gcc dot gnu.org
@ 2023-01-31 12:19 ` rguenther at suse dot de
  2023-01-31 13:35 ` tnfchris at gcc dot gnu.org
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rguenther at suse dot de @ 2023-01-31 12:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #14 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 31 Jan 2023, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
> 
> --- Comment #12 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #11)
...
> > As said, recognizing the special case and open-coding the full divison
> > in the vectorizer (or possibly even from RTL expansion) would be
> > another option here.
> 
> open-coding in the vectorizer will require 4 new IFNs and 4 new optabs as it
> requires 4 different instructions. 2 for neon, 2 for SVE, which are
> semantically different as permutes work different across SVE and NEON.  So if
> you're happy with that I will do that.

You probably did so elsewhere some time ago, but what exactly are those
four instructions?  (pointers to specifications appreciated)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (14 preceding siblings ...)
  2023-01-31 12:19 ` rguenther at suse dot de
@ 2023-01-31 13:35 ` tnfchris at gcc dot gnu.org
  2023-01-31 14:33 ` rguenther at suse dot de
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-01-31 13:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #15 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> OK, hopefully I understand now.  Sorry for being slow.

Not at all, Sorry if it came across a bit cranky, it wasn't meant that way!

> If that's the condition we want to test for, it seems like something
> we need to check in the vectoriser rather than the hook.  And it's
> not something we can easily do in the vector form, since we don't
> track ranges for vectors (AFAIK).

Ack, that also tracks with what I tried before, we don't indeed track ranges
for vector ops. The general case can still be handled slightly better (I think)
but it doesn't become as clear of a win as this one.

> You probably did so elsewhere some time ago, but what exactly are those
> four instructions?  (pointers to specifications appreciated)

For NEON we use:
https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/ADDHN--ADDHN2--Add-returning-High-Narrow-
https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/UADDW--UADDW2--Unsigned-Add-Wide-

In that order, and for SVE we use two
https://developer.arm.com/documentation/ddi0602/2022-12/SVE-Instructions/ADDHNB--Add-narrow-high-part--bottom--

The difference between the SVE and the NEON versions is that the SVE version
does even/odd and the NEON the typical first N, last N element wise operation.

At the moment the vectorizer doesn't know anything at all about SVE's even and
odd instructions. But if we create B we should create T as well.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (15 preceding siblings ...)
  2023-01-31 13:35 ` tnfchris at gcc dot gnu.org
@ 2023-01-31 14:33 ` rguenther at suse dot de
  2023-01-31 14:45 ` rguenther at suse dot de
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rguenther at suse dot de @ 2023-01-31 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 31 Jan 2023, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
> 
> --- Comment #15 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > OK, hopefully I understand now.  Sorry for being slow.
> 
> Not at all, Sorry if it came across a bit cranky, it wasn't meant that way!
> 
> > If that's the condition we want to test for, it seems like something
> > we need to check in the vectoriser rather than the hook.  And it's
> > not something we can easily do in the vector form, since we don't
> > track ranges for vectors (AFAIK).
> 
> Ack, that also tracks with what I tried before, we don't indeed track ranges
> for vector ops. The general case can still be handled slightly better (I think)
> but it doesn't become as clear of a win as this one.
> 
> > You probably did so elsewhere some time ago, but what exactly are those
> > four instructions?  (pointers to specifications appreciated)
> 
> For NEON we use:
> https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/ADDHN--ADDHN2--Add-returning-High-Narrow-

so thats a add + pack high

> https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/UADDW--UADDW2--Unsigned-Add-Wide-

and that unpacks (zero-extends) the high/low part of one operand of an add

I wonder if we'd open-code the pack / unpack and use regular add whether
combine can synthesize uaddw and addhn?  The pack and unpack would be
vec_perms on GIMPLE (plus V_C_E).

> In that order, and for SVE we use two
> https://developer.arm.com/documentation/ddi0602/2022-12/SVE-Instructions/ADDHNB--Add-narrow-high-part--bottom--

probably similar.

So the difficulty here will be to decide whether that's in the end
better than what the pattern handling code does now, right?  Because
I think most targets will be able to do the above but lacking the
special adds it will be slower because of the extra packing/unpacking?

That said, can we possibly do just that costing (would be a first in
the pattern code I guess) with a target hook?  Or add optabs for
the addh operations so we can query support?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (16 preceding siblings ...)
  2023-01-31 14:33 ` rguenther at suse dot de
@ 2023-01-31 14:45 ` rguenther at suse dot de
  2023-01-31 15:01 ` tnfchris at gcc dot gnu.org
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rguenther at suse dot de @ 2023-01-31 14:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #17 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 31 Jan 2023, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
> 
> --- Comment #15 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > OK, hopefully I understand now.  Sorry for being slow.
> 
> Not at all, Sorry if it came across a bit cranky, it wasn't meant that way!
> 
> > If that's the condition we want to test for, it seems like something
> > we need to check in the vectoriser rather than the hook.  And it's
> > not something we can easily do in the vector form, since we don't
> > track ranges for vectors (AFAIK).
> 
> Ack, that also tracks with what I tried before, we don't indeed track ranges
> for vector ops. The general case can still be handled slightly better (I think)
> but it doesn't become as clear of a win as this one.
> 
> > You probably did so elsewhere some time ago, but what exactly are those
> > four instructions?  (pointers to specifications appreciated)
> 
> For NEON we use:
> https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/ADDHN--ADDHN2--Add-returning-High-Narrow-

so thats a add + pack high

> https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/UADDW--UADDW2--Unsigned-Add-Wide-

and that unpacks (zero-extends) the high/low part of one operand of an add

I wonder if we'd open-code the pack / unpack and use regular add whether
combine can synthesize uaddw and addhn?  The pack and unpack would be
vec_perms on GIMPLE (plus V_C_E).

> In that order, and for SVE we use two
> https://developer.arm.com/documentation/ddi0602/2022-12/SVE-Instructions/ADDHNB--Add-narrow-high-part--bottom--

probably similar.

So the difficulty here will be to decide whether that's in the end
better than what the pattern handling code does now, right?  Because
I think most targets will be able to do the above but lacking the
special adds it will be slower because of the extra packing/unpacking?

That said, can we possibly do just that costing (would be a first in
the pattern code I guess) with a target hook?  Or add optabs for
the addh operations so we can query support?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (17 preceding siblings ...)
  2023-01-31 14:45 ` rguenther at suse dot de
@ 2023-01-31 15:01 ` tnfchris at gcc dot gnu.org
  2023-02-01  7:29 ` rguenther at suse dot de
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-01-31 15:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #18 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > 
> > Ack, that also tracks with what I tried before, we don't indeed track ranges
> > for vector ops. The general case can still be handled slightly better (I think)
> > but it doesn't become as clear of a win as this one.
> > 
> > > You probably did so elsewhere some time ago, but what exactly are those
> > > four instructions?  (pointers to specifications appreciated)
> > 
> > For NEON we use:
> > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/ADDHN--ADDHN2--Add-returning-High-Narrow-
> 
> so thats a add + pack high
> 

Yes, though with no overflow, the addition is done in twice the precision of
the original type. So it's more a widening add + pack high which narrows it
back and zero extends.

> > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/UADDW--UADDW2--Unsigned-Add-Wide-
> 
> and that unpacks (zero-extends) the high/low part of one operand of an add
> 
> I wonder if we'd open-code the pack / unpack and use regular add whether
> combine can synthesize uaddw and addhn?  The pack and unpack would be
> vec_perms on GIMPLE (plus V_C_E).

I don't think so for addhn, because it wouldn't truncate the top bits, it
truncates the bottom bits.

The instruction does
    element1 = Elem[operand1, e, 2*esize];
    element2 = Elem[operand2, e, 2*esize];

So it widens on input. 

> 
> So the difficulty here will be to decide whether that's in the end
> better than what the pattern handling code does now, right?  Because
> I think most targets will be able to do the above but lacking the
> special adds it will be slower because of the extra packing/unpacking?
> 
> That said, can we possibly do just that costing (would be a first in
> the pattern code I guess) with a target hook?  Or add optabs for
> the addh operations so we can query support?

We could, the alternative wouldn't be correct for costing I think.. if we
generate *+ , vec_perm that's gonna be more expensive.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (18 preceding siblings ...)
  2023-01-31 15:01 ` tnfchris at gcc dot gnu.org
@ 2023-02-01  7:29 ` rguenther at suse dot de
  2023-02-01 16:22 ` tnfchris at gcc dot gnu.org
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rguenther at suse dot de @ 2023-02-01  7:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #19 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 31 Jan 2023, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
> 
> --- Comment #18 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > > 
> > > Ack, that also tracks with what I tried before, we don't indeed track ranges
> > > for vector ops. The general case can still be handled slightly better (I think)
> > > but it doesn't become as clear of a win as this one.
> > > 
> > > > You probably did so elsewhere some time ago, but what exactly are those
> > > > four instructions?  (pointers to specifications appreciated)
> > > 
> > > For NEON we use:
> > > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/ADDHN--ADDHN2--Add-returning-High-Narrow-
> > 
> > so thats a add + pack high
> > 
> 
> Yes, though with no overflow, the addition is done in twice the precision of
> the original type. So it's more a widening add + pack high which narrows it
> back and zero extends.
> 
> > > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/UADDW--UADDW2--Unsigned-Add-Wide-
> > 
> > and that unpacks (zero-extends) the high/low part of one operand of an add
> > 
> > I wonder if we'd open-code the pack / unpack and use regular add whether
> > combine can synthesize uaddw and addhn?  The pack and unpack would be
> > vec_perms on GIMPLE (plus V_C_E).
> 
> I don't think so for addhn, because it wouldn't truncate the top bits, it
> truncates the bottom bits.
> 
> The instruction does
>     element1 = Elem[operand1, e, 2*esize];
>     element2 = Elem[operand2, e, 2*esize];
> 
> So it widens on input. 

OK, so that's an ADD_HIGHPART_EXPR then?  Though the highpart of an
add is only a single bit, isn't it?  For scalar you'd use the
carry bit here and instructions like adc to consume it.  Is addhn
to do such thing on vectors?

When writing generic vector code is combine able to synthesize addhn
from widen, plus, pack-high?

As said in the old discussion I'm not opposed to adding new IFNs,
but I'd like to see useful building blocks (that ideally map to ISAs)
instead of IFN-for-complex-pattern-X

The alternative way was to improve division expansion in general
which is the can_div_special_by_const_p thing, but we do not seem
to be able to capture the requirements correctly here.

> > So the difficulty here will be to decide whether that's in the end
> > better than what the pattern handling code does now, right?  Because
> > I think most targets will be able to do the above but lacking the
> > special adds it will be slower because of the extra packing/unpacking?
> > 
> > That said, can we possibly do just that costing (would be a first in
> > the pattern code I guess) with a target hook?  Or add optabs for
> > the addh operations so we can query support?
> 
> We could, the alternative wouldn't be correct for costing I think.. if we
> generate *+ , vec_perm that's gonna be more expensive.

Well, the target cost model can always detect such patterns ...

But sure, using the actual ISA is preferable for costing and also to
avoid "breaking" the combination by later "optimization".  OTOH at least
some basic constant folding for all such ISA IFNs is required to
avoid regressing cases where complete unrolling later allows constant
evaluation but first vectorizing breaking that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (19 preceding siblings ...)
  2023-02-01  7:29 ` rguenther at suse dot de
@ 2023-02-01 16:22 ` tnfchris at gcc dot gnu.org
  2023-02-02  8:03 ` tnfchris at gcc dot gnu.org
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-02-01 16:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #20 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > I don't think so for addhn, because it wouldn't truncate the top bits, it
> > truncates the bottom bits.
> > 
> > The instruction does
> >     element1 = Elem[operand1, e, 2*esize];
> >     element2 = Elem[operand2, e, 2*esize];
> > 
> > So it widens on input. 
> 
> OK, so that's an ADD_HIGHPART_EXPR then?  Though the highpart of an
> add is only a single bit, isn't it?  For scalar you'd use the
> carry bit here and instructions like adc to consume it.  Is addhn
> to do such thing on vectors?

Sorry, it's not but that's cause I read the docs wrong. This instruction
doesn't widen on addition but narrows on result.  The esize for the
instruction is determined output not the input.

To unconfuse everyone lets just explain what this is doing:

The optimization is to try to optimize unsigned division by a mask found by
setting all bit in half the size of the input.

i.e. for a divide of a short, the division needs to be by 0xff.

This works by decomposing the unsigned division of x / 0xff into
(x + ((x + 257) >> 8)) >> 8.  This is correct for all ranges as long
as the operations are done in at least double the precision of x.

If the operations are done in the same precision as x, then the only
valid ranged for this are those where x + 257 does not cause an overflow.

This is why the maximum range of a widened operation is fine, because
0xFE01 + 0x101 = 0xFF02 and so does not overflow.  In this case we can avoid
the widening and do the operation in a smaller precision.

The two vector instructions we're using does this, i.e.
addhn == (x + 257) >> 8) and uaddw == addhn_res + x.  Finally we emit a >> 8
in the end.  This shift later gets removed because a widening operation
will do the same sequence for the _lo and _hi pairs.  since we are doing two
logical right shifts we can just permute the values out.

Hope this clear it up to you both.

> 
> When writing generic vector code is combine able to synthesize addhn
> from widen, plus, pack-high?

No, it's currently unspecs. We could recognize add, shift and narrow as
it though.

> 
> But sure, using the actual ISA is preferable for costing and also to
> avoid "breaking" the combination by later "optimization".  OTOH at least
> some basic constant folding for all such ISA IFNs is required to
> avoid regressing cases where complete unrolling later allows constant
> evaluation but first vectorizing breaking that.

So I suppose one way to open code this in the vectorizer, would be to use
the hook to tell the vectorizer to decompose the operation that's within
the valid range to (x + ((x + 257) >> 8)) >> 8 (without any intermediate
promotions), and fudge the costing for it and use combine to recognize the
sequence? it's 4 instruction sequence so should be able to.

I also spent the day wondering if I could come up with a sequence that's
correct in general and that I can use combine to optimize further in case
a widening operation is found. Unfortunately while I can create a sequence:

uint32x4_t
foo (uint32x4_t v)
{
  uint32x4_t cst = vreinterpretq_u32_u16 (vdupq_n_u16 (0x1));
  uint64x2_t l1 = vaddl_u32 (vget_low_u32 (cst), vget_low_u32 (v));
  uint64x2_t l2 = vaddl_high_u32 (cst, v);
  uint32x2_t t1 = vshrn_n_u64 (l1, 16);
  uint32x4_t t2 = vshrn_high_n_u64 (t1, l2, 16);
  uint64x2_t l3 = vaddl_u32 (vget_low_u32 (t2), vget_low_u32 (v));
  uint64x2_t l4 = vaddl_high_u32 (t2, v);
  uint32x2_t r1 = vshrn_n_u64 (l3, 16);
  uint32x4_t r2 = vshrn_high_n_u64 (r1, l4, 16);
  return r2;
}

But this isn't more efficient than what's there before, though it is easier to
match in combine.  Part of the problem is that if the addition produces 33 bits
of significance then you do need a shift and can't remove them with permutes.

So the core of the optimization relies on the range of the inputs, so it does
need
to be done pre-vectorization.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (20 preceding siblings ...)
  2023-02-01 16:22 ` tnfchris at gcc dot gnu.org
@ 2023-02-02  8:03 ` tnfchris at gcc dot gnu.org
  2023-02-02  8:50 ` rguenther at suse dot de
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-02-02  8:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #21 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> 
> OK, so that's an ADD_HIGHPART_EXPR then?  Though the highpart of an
> add is only a single bit, isn't it?  For scalar you'd use the
> carry bit here and instructions like adc to consume it.  Is addhn
> to do such thing on vectors?
> 

So I think this is the only new IFN we'd need. basically we only need one
representing (a + b) >> n, for certain values of n we have a single instruction
for others we reject it.

So I think we can just do two of these back to back and that should work.  If
this is ok I can implement it today.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (21 preceding siblings ...)
  2023-02-02  8:03 ` tnfchris at gcc dot gnu.org
@ 2023-02-02  8:50 ` rguenther at suse dot de
  2023-02-02  8:55 ` tnfchris at gcc dot gnu.org
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rguenther at suse dot de @ 2023-02-02  8:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #22 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 2 Feb 2023, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
> 
> --- Comment #21 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > 
> > OK, so that's an ADD_HIGHPART_EXPR then?  Though the highpart of an
> > add is only a single bit, isn't it?  For scalar you'd use the
> > carry bit here and instructions like adc to consume it.  Is addhn
> > to do such thing on vectors?
> > 
> 
> So I think this is the only new IFN we'd need. basically we only need one
> representing (a + b) >> n, for certain values of n we have a single instruction
> for others we reject it.

But the ISA only implements n == width(a), so a true "highpart add".
So I'd add

OPTAB_D (sadd_highpart_optab, "sadd$a3_highpart")
OPTAB_D (uadd_highpart_optab, "uadd$a3_highpart")

only?  That is it's QImode + QImode -> QImode with the carry bit in
the result (sign or zero-extended)?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (22 preceding siblings ...)
  2023-02-02  8:50 ` rguenther at suse dot de
@ 2023-02-02  8:55 ` tnfchris at gcc dot gnu.org
  2023-02-08 13:57 ` tnfchris at gcc dot gnu.org
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-02-02  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #23 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #22)
> On Thu, 2 Feb 2023, tnfchris at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
> > 
> > --- Comment #21 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > > 
> > > OK, so that's an ADD_HIGHPART_EXPR then?  Though the highpart of an
> > > add is only a single bit, isn't it?  For scalar you'd use the
> > > carry bit here and instructions like adc to consume it.  Is addhn
> > > to do such thing on vectors?
> > > 
> > 
> > So I think this is the only new IFN we'd need. basically we only need one
> > representing (a + b) >> n, for certain values of n we have a single instruction
> > for others we reject it.
> 
> But the ISA only implements n == width(a), so a true "highpart add".
> So I'd add
> 
> OPTAB_D (sadd_highpart_optab, "sadd$a3_highpart")
> OPTAB_D (uadd_highpart_optab, "uadd$a3_highpart")
> 
> only?  That is it's QImode + QImode -> QImode with the carry bit in
> the result (sign or zero-extended)?

Sure that works I think, I'll do that then.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (23 preceding siblings ...)
  2023-02-02  8:55 ` tnfchris at gcc dot gnu.org
@ 2023-02-08 13:57 ` tnfchris at gcc dot gnu.org
  2023-02-09  7:41 ` rguenther at suse dot de
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-02-08 13:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #24 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> Sure that works I think, I'll do that then.

Just to check, I'm regtesting the patch, I assume you want me to revert the
hook as well right? Since nothing will be using it.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (24 preceding siblings ...)
  2023-02-08 13:57 ` tnfchris at gcc dot gnu.org
@ 2023-02-09  7:41 ` rguenther at suse dot de
  2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: rguenther at suse dot de @ 2023-02-09  7:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #25 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 8 Feb 2023, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
> 
> --- Comment #24 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > Sure that works I think, I'll do that then.
> 
> Just to check, I'm regtesting the patch, I assume you want me to revert the
> hook as well right? Since nothing will be using it.

Sure

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (25 preceding siblings ...)
  2023-02-09  7:41 ` rguenther at suse dot de
@ 2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
  2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-03-12 18:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #26 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:2246d576f922bae3629da0fe1dbfcc6ff06769ad

commit r13-6616-g2246d576f922bae3629da0fe1dbfcc6ff06769ad
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Sun Mar 12 18:39:33 2023 +0000

    middle-end: Revert can_special_div_by_const changes [PR108583]

    This reverts the changes for the CAN_SPECIAL_DIV_BY_CONST hook.

    gcc/ChangeLog:

            PR target/108583
            * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Remove.
            * doc/tm.texi.in: Likewise.
            * explow.cc (round_push, align_dynamic_address): Revert previous
patch.
            * expmed.cc (expand_divmod): Likewise.
            * expmed.h (expand_divmod): Likewise.
            * expr.cc (force_operand, expand_expr_divmod): Likewise.
            * optabs.cc (expand_doubleword_mod, expand_doubleword_divmod):
Likewise.
            * target.def (can_special_div_by_const): Remove.
            * target.h: Remove tree-core.h include
            * targhooks.cc (default_can_special_div_by_const): Remove.
            * targhooks.h (default_can_special_div_by_const): Remove.
            * tree-vect-generic.cc (expand_vector_operation): Remove hook.
            * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook.
            * tree-vect-stmts.cc (vectorizable_operation): Remove hook.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (26 preceding siblings ...)
  2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
@ 2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
  2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-03-12 18:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #27 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:03c6ba86757f0684c5419c90651106900f5ecb5a

commit r13-6617-g03c6ba86757f0684c5419c90651106900f5ecb5a
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Sun Mar 12 18:40:12 2023 +0000

    ranger: Add range-ops for widen addition and widen multiplication
[PR108583]

    This adds range-ops for widening addition and widening multiplication.

    I couldn't figure out how to write a test for this.  It looks like there
are
    self tests but not a way to write standalone ones?  I did create testcases
in
    the patch 3/4 which tests the end result.

    gcc/ChangeLog:

            PR target/108583
            * gimple-range-op.h (gimple_range_op_handler): Add
maybe_non_standard.
            * gimple-range-op.cc
(gimple_range_op_handler::gimple_range_op_handler):
            Use it.
            (gimple_range_op_handler::maybe_non_standard): New.
            * range-op.cc (class operator_widen_plus_signed,
            operator_widen_plus_signed::wi_fold, class
operator_widen_plus_unsigned,
            operator_widen_plus_unsigned::wi_fold, class
operator_widen_mult_signed,
            operator_widen_mult_signed::wi_fold, class
operator_widen_mult_unsigned,
            operator_widen_mult_unsigned::wi_fold,
            ptr_op_widen_mult_signed, ptr_op_widen_mult_unsigned,
            ptr_op_widen_plus_signed, ptr_op_widen_plus_unsigned): New.
            * range-op.h (ptr_op_widen_mult_signed, ptr_op_widen_mult_unsigned,
            ptr_op_widen_plus_signed, ptr_op_widen_plus_unsigned): New

    Co-Authored-By: Andrew MacLeod <amacleod@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (27 preceding siblings ...)
  2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
@ 2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
  2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-03-12 18:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #28 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:0b3c630fcc44063a61f6131af48a4171b1de2b37

commit r13-6618-g0b3c630fcc44063a61f6131af48a4171b1de2b37
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Sun Mar 12 18:40:50 2023 +0000

    middle-end: don't form FMAs when multiplication is not single use.
[PR108583]

    The testcase

    typedef unsigned int vec __attribute__((vector_size(32)));
    vec
    f3 (vec a, vec b, vec c)
    {
      vec d = a * b;
      return d + ((c + d) >> 1);
    }

    shows a case where we don't want to form an FMA due to the MUL not being
single
    use.  In this case to form an FMA we have to redo the MUL as well as we no
    longer have it to share.

    As such making an FMA here would be a de-optimization.

    gcc/ChangeLog:

            PR target/108583
            * tree-ssa-math-opts.cc (convert_mult_to_fma): Inhibit FMA in case
not
            single use.

    gcc/testsuite/ChangeLog:

            PR target/108583
            * gcc.dg/mla_1.c: New test.

    Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (28 preceding siblings ...)
  2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
@ 2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org
  2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org
  2023-03-12 18:45 ` tnfchris at gcc dot gnu.org
  31 siblings, 0 replies; 33+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-03-12 18:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #29 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:81fd62d1378b7ddc1fa0967cbddcdcdcdd2d8d8c

commit r13-6619-g81fd62d1378b7ddc1fa0967cbddcdcdcdd2d8d8c
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Sun Mar 12 18:42:04 2023 +0000

    middle-end: Implement preferred_div_as_shifts_over_mult [PR108583]

    This now implements a hook
    preferred_div_as_shifts_over_mult that indicates whether a target prefers
that
    the vectorizer decomposes division as shifts rather than multiplication
when
    possible.

    In order to be able to use this we need to check whether the current
precision
    has enough bits to do the operation without any of the additions
overflowing.

    We use range information to determine this and only do the operation if
we're
    sure am overflow won't occur. This now uses ranger to do this range check.

    This seems to work better than vect_get_range_info which uses range_query,
but I
    have not switched the interface of vect_get_range_info over in this PR fix.

    As Andy said before initializing a ranger instance is cheap but not free,
and if
    the intention is to call it often during a pass it should be instantiated
at
    pass startup and passed along to the places that need it.  This is a big
    refactoring and doesn't seem right to do in this PR.  But we should in GCC
14.

    Currently we only instantiate it after a long series of much cheaper
checks.

    gcc/ChangeLog:

            PR target/108583
            * target.def (preferred_div_as_shifts_over_mult): New.
            * doc/tm.texi.in: Document it.
            * doc/tm.texi: Regenerate.
            * targhooks.cc (default_preferred_div_as_shifts_over_mult): New.
            * targhooks.h (default_preferred_div_as_shifts_over_mult): New.
            * tree-vect-patterns.cc (vect_recog_divmod_pattern): Use it.

    gcc/testsuite/ChangeLog:

            PR target/108583
            * gcc.dg/vect/vect-div-bitmask-4.c: New test.
            * gcc.dg/vect/vect-div-bitmask-5.c: New test.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (29 preceding siblings ...)
  2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org
@ 2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org
  2023-03-12 18:45 ` tnfchris at gcc dot gnu.org
  31 siblings, 0 replies; 33+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-03-12 18:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #30 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:f23dc726875c26f2c38dfded453aa9beba0b9be9

commit r13-6620-gf23dc726875c26f2c38dfded453aa9beba0b9be9
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Sun Mar 12 18:42:59 2023 +0000

    AArch64: Update div-bitmask to implement new optab instead of target hook
[PR108583]

    This replaces the custom division hook with just an implementation through
    add_highpart.  For NEON we implement the add highpart (Addition +
extraction of
    the upper highpart of the register in the same precision) as ADD + LSR.

    This representation allows us to easily optimize the sequence using
existing
    sequences. This gets us a pretty decent sequence using SRA:

            umull   v1.8h, v0.8b, v3.8b
            umull2  v0.8h, v0.16b, v3.16b
            add     v5.8h, v1.8h, v2.8h
            add     v4.8h, v0.8h, v2.8h
            usra    v1.8h, v5.8h, 8
            usra    v0.8h, v4.8h, 8
            uzp2    v1.16b, v1.16b, v0.16b

    To get the most optimal sequence however we match (a + ((b + c) >> n))
where n
    is half the precision of the mode of the operation into addhn + uaddw which
is
    a general good optimization on its own and gets us back to:

    .L4:
            ldr     q0, [x3]
            umull   v1.8h, v0.8b, v5.8b
            umull2  v0.8h, v0.16b, v5.16b
            addhn   v3.8b, v1.8h, v4.8h
            addhn   v2.8b, v0.8h, v4.8h
            uaddw   v1.8h, v1.8h, v3.8b
            uaddw   v0.8h, v0.8h, v2.8b
            uzp2    v1.16b, v1.16b, v0.16b
            str     q1, [x3], 16
            cmp     x3, x4
            bne     .L4

    For SVE2 we optimize the initial sequence to the same ADD + LSR which gets
us:

    .L3:
            ld1b    z0.h, p0/z, [x0, x3]
            mul     z0.h, p1/m, z0.h, z2.h
            add     z1.h, z0.h, z3.h
            usra    z0.h, z1.h, #8
            lsr     z0.h, z0.h, #8
            st1b    z0.h, p0, [x0, x3]
            inch    x3
            whilelo p0.h, w3, w2
            b.any   .L3
    .L1:
            ret

    and to get the most optimal sequence I match (a + b) >> n (same constraint
on n)
    to addhnb which gets us to:

    .L3:
            ld1b    z0.h, p0/z, [x0, x3]
            mul     z0.h, p1/m, z0.h, z2.h
            addhnb  z1.b, z0.h, z3.h
            addhnb  z0.b, z0.h, z1.h
            st1b    z0.h, p0, [x0, x3]
            inch    x3
            whilelo p0.h, w3, w2
            b.any   .L3

    There are multiple RTL representations possible for these optimizations, I
did
    not represent them using a zero_extend because we seem very inconsistent in
this
    in the backend.  Since they are unspecs we won't match them from vector ops
    anyway. I figured maintainers would prefer this, but my maintainer ouija
board
    is still out for repairs :)

    There are no new test as new correctness tests were added to the mid-end
and
    the existing codegen tests for this already exist.

    gcc/ChangeLog:

            PR target/108583
            * config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv<mode>3):
Remove.
            (*bitmask_shift_plus<mode>): New.
            * config/aarch64/aarch64-sve2.md (*bitmask_shift_plus<mode>): New.
            (@aarch64_bitmask_udiv<mode>3): Remove.
            * config/aarch64/aarch64.cc
            (aarch64_vectorize_can_special_div_by_constant,
            TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Removed.
            (TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT,
            aarch64_vectorize_preferred_div_as_shifts_over_mult): New.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
  2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
                   ` (30 preceding siblings ...)
  2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org
@ 2023-03-12 18:45 ` tnfchris at gcc dot gnu.org
  31 siblings, 0 replies; 33+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-03-12 18:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #31 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Fixed, thanks for the report.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-03-12 18:45 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-28 14:00 [Bug rtl-optimization/108583] New: [13 Regression] wrong code with vector division by uint16 at -O2 zsojka at seznam dot cz
2023-01-30  4:19 ` [Bug target/108583] " pinskia at gcc dot gnu.org
2023-01-30  4:22 ` pinskia at gcc dot gnu.org
2023-01-30  8:31 ` rguenth at gcc dot gnu.org
2023-01-30 14:20 ` tnfchris at gcc dot gnu.org
2023-01-30 14:52 ` rguenther at suse dot de
2023-01-30 15:01 ` tnfchris at gcc dot gnu.org
2023-01-30 16:52 ` rsandifo at gcc dot gnu.org
2023-01-30 17:04 ` tnfchris at gcc dot gnu.org
2023-01-31 10:31 ` rguenth at gcc dot gnu.org
2023-01-31 11:01 ` rsandifo at gcc dot gnu.org
2023-01-31 11:39 ` tnfchris at gcc dot gnu.org
2023-01-31 11:44 ` rguenther at suse dot de
2023-01-31 11:58 ` tnfchris at gcc dot gnu.org
2023-01-31 12:03 ` rsandifo at gcc dot gnu.org
2023-01-31 12:19 ` rguenther at suse dot de
2023-01-31 13:35 ` tnfchris at gcc dot gnu.org
2023-01-31 14:33 ` rguenther at suse dot de
2023-01-31 14:45 ` rguenther at suse dot de
2023-01-31 15:01 ` tnfchris at gcc dot gnu.org
2023-02-01  7:29 ` rguenther at suse dot de
2023-02-01 16:22 ` tnfchris at gcc dot gnu.org
2023-02-02  8:03 ` tnfchris at gcc dot gnu.org
2023-02-02  8:50 ` rguenther at suse dot de
2023-02-02  8:55 ` tnfchris at gcc dot gnu.org
2023-02-08 13:57 ` tnfchris at gcc dot gnu.org
2023-02-09  7:41 ` rguenther at suse dot de
2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org
2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org
2023-03-12 18:45 ` tnfchris at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).