[Bug target/85730] complex code for modifying lowest byte in a 4-byte vector

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector
       [not found] <bug-85730-4@http.gcc.gnu.org/bugzilla/>
@ 2021-10-04  3:02 ` gabravier at gmail dot com
  2021-10-04  8:47 ` zsojka at seznam dot cz
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: gabravier at gmail dot com @ 2021-10-04  3:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

Gabriel Ravier <gabravier at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gabravier at gmail dot com

--- Comment #2 from Gabriel Ravier <gabravier at gmail dot com> ---
Seems like they've all got identical code generation over here since GCC 7, and
the GCC 6 code generation is just very bad for bar (although GCC 7 also changed
bar and baz's code generation, which previously was as bar's is in the
description)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector
       [not found] <bug-85730-4@http.gcc.gnu.org/bugzilla/>
  2021-10-04  3:02 ` [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector gabravier at gmail dot com
@ 2021-10-04  8:47 ` zsojka at seznam dot cz
  2021-10-04  9:04 ` gabravier at gmail dot com
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: zsojka at seznam dot cz @ 2021-10-04  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

--- Comment #3 from Zdenek Sojka <zsojka at seznam dot cz> ---
(In reply to Gabriel Ravier from comment #2)
> Seems like they've all got identical code generation over here since GCC 7,
> and the GCC 6 code generation is just very bad for bar (although GCC 7 also
> changed bar and baz's code generation, which previously was as bar's is in
> the description)

I've got a different observation (using the branch HEADs, at -O3):
gcc-6 - ICEs
gcc-7, gcc-8, gcc-9 - the same code generated as in comment #1
gcc-10, gcc-11, gcc-12 - all functions are the same, bar() now has the same
assembly as foo() and baz() - but is that the optimal form?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector
       [not found] <bug-85730-4@http.gcc.gnu.org/bugzilla/>
  2021-10-04  3:02 ` [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector gabravier at gmail dot com
  2021-10-04  8:47 ` zsojka at seznam dot cz
@ 2021-10-04  9:04 ` gabravier at gmail dot com
  2021-10-04  9:54 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: gabravier at gmail dot com @ 2021-10-04  9:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

--- Comment #4 from Gabriel Ravier <gabravier at gmail dot com> ---
That's a bit odd, really - I'm just using the latest released sub-versions of
each of these (except for GCC 6 since I only have access to it through Godbolt
which doesn't have GCC 6.5), i.e. GCC 6.4, 7.5, 8.5, 9.4, 10.3, 11.2 and trunk,
I wouldn't expect it to impact this stuff that much.

Though I do realize now I had messed up my comment slightly: when saying "GCC 7
also changed bar and baz's code generation" I meant "foo and baz's code
generation".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector
       [not found] <bug-85730-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2021-10-04  9:04 ` gabravier at gmail dot com
@ 2021-10-04  9:54 ` rguenth at gcc dot gnu.org
  2021-10-04 11:31 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-04  9:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
The GIMPLE IL is now using BIT_INSERT_EXPRs consistently for all cases and
combine does

Trying 8 -> 11:
    8: {r90:SI=r89:SI<<0x1;clobber flags:CC;}
      REG_DEAD r89:SI
      REG_UNUSED flags:CC
   11: strict_low_part(r92:V4QI#0)=r90:SI#0
      REG_DEAD r90:SI
Failed to match this instruction:
(set (strict_low_part (subreg:QI (reg:V4QI 92 [ v ]) 0))
    (ashift:QI (subreg:QI (reg:SI 89 [ v ]) 0)
        (const_int 1 [0x1])))

where it fails to try (add:QI (subreg...) (subreg...)) instead of the shift by
1.

Not sure whether targets should have a special-case pattern here or whether
that's for combine to un-canonicalize it?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector
       [not found] <bug-85730-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2021-10-04  9:54 ` rguenth at gcc dot gnu.org
@ 2021-10-04 11:31 ` ubizjak at gmail dot com
  2021-10-04 16:13 ` segher at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2021-10-04 11:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #5)
> The GIMPLE IL is now using BIT_INSERT_EXPRs consistently for all cases and
> combine does
> 
> Trying 8 -> 11:
>     8: {r90:SI=r89:SI<<0x1;clobber flags:CC;}
>       REG_DEAD r89:SI
>       REG_UNUSED flags:CC
>    11: strict_low_part(r92:V4QI#0)=r90:SI#0
>       REG_DEAD r90:SI
> Failed to match this instruction:
> (set (strict_low_part (subreg:QI (reg:V4QI 92 [ v ]) 0))
>     (ashift:QI (subreg:QI (reg:SI 89 [ v ]) 0)
>         (const_int 1 [0x1])))
> 
> where it fails to try (add:QI (subreg...) (subreg...)) instead of the shift
> by 1.
> 
> Not sure whether targets should have a special-case pattern here or whether
> that's for combine to un-canonicalize it?

We do have this pattern in i386.md, but please see the FIXME:

(define_insn "*ashl<mode>3_1_slp"
  [(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+<r>"))
        (ashift:SWI12 (match_operand:SWI12 1 "register_operand" "0")
                      (match_operand:QI 2 "nonmemory_operand" "cI")))
   (clobber (reg:CC FLAGS_REG))]
  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
   /* FIXME: without this LRA can't reload this pattern, see PR82524.  */
   && rtx_equal_p (operands[0], operands[1])"

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector
       [not found] <bug-85730-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-10-04 11:31 ` ubizjak at gmail dot com
@ 2021-10-04 16:13 ` segher at gcc dot gnu.org
  2021-10-07 11:51 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2021-10-04 16:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

--- Comment #7 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> Not sure whether targets should have a special-case pattern here or whether
> that's for combine to un-canonicalize it?

Is the shift defined anywhere as the canonical form?

We already get a shift at expand time, for e.g.

  long f(long a) { return a+a; }

I cannot find the code that does this easily, it is quite a maze :-)  There
is code for changing a multiplication by a power of two (which we have in
Gimple already) into a shift, but then there should be something changing
the addition with self into a multiplication, and I cannot find that either.

Combine should absolutely not un-canonicalise it.  It could try multiple
ways of writing this, but is it important enough to allow this, to justify
the (potential) combinatorial explosion this causes?

If we want combine to try many ways of writing something (not a bad idea an
sich btw, I support it), we need ways to battle such an explosion, and esp.
to make the amount of garbage RTL created manageable (it is not, currently).
Currently combine has to create GC'ed RTL to recog() it.  Maybe we could
have some "GC stuff created between points X and Y" interface?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector
       [not found] <bug-85730-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2021-10-04 16:13 ` segher at gcc dot gnu.org
@ 2021-10-07 11:51 ` ubizjak at gmail dot com
  2021-10-08  9:36 ` ubizjak at gmail dot com
  2021-10-12 16:21 ` cvs-commit at gcc dot gnu.org
  8 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2021-10-07 11:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 51564
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51564&action=edit
Prototype patch

Attached patch works around reload problem and creates:

        movl    %edi, %eax
        movb    %dil, %al
        addb    %al, %al
        ret

for all cases. Please note that some follow-up pass is needed to remove
"movb %dil, %al", since preceding instruction already moves full value from
%edi to %eax, so partial move is not needed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector
       [not found] <bug-85730-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2021-10-07 11:51 ` ubizjak at gmail dot com
@ 2021-10-08  9:36 ` ubizjak at gmail dot com
  2021-10-12 16:21 ` cvs-commit at gcc dot gnu.org
  8 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2021-10-08  9:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> ---
An interesting observation with the following testcase:

--cut here--
typedef char V __attribute__((vector_size(4)));

struct S
{
  char val;
  char pad1;
  char pad2;
  char pad3;
};

V
foo (V v)
{
  v[0] <<= 3;

  return v;
}

struct S
bar (struct S a)
{
  a.val <<= 3;

  return a;
}
--cut here--

gcc -O2:

foo:
        movsbl  %dil, %edx
        movl    %edi, %eax
        sall    $3, %edx
        movb    %dl, %al
        ret

bar:
        movl    %edi, %eax
        salb    $3, %al
        ret

So, the compiler is able to produce optimal code with equivalent struct
description, but something (in middle-end?) prevents the same optimization with
a vector (a.k.a array).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector
       [not found] <bug-85730-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2021-10-08  9:36 ` ubizjak at gmail dot com
@ 2021-10-12 16:21 ` cvs-commit at gcc dot gnu.org
  8 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-10-12 16:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:b37351e3279d192d5d4682f002abe5b2e133bba6

commit r12-4359-gb37351e3279d192d5d4682f002abe5b2e133bba6
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Tue Oct 12 18:20:38 2021 +0200

    i386: Improve workaround for PR82524 LRA limitation [PR85730]

    As explained in PR82524, LRA is not able to reload strict_low_part inout
    operand with matched input operand. The patch introduces a workaround,
    where we allow LRA to generate an instruction with non-matched input
operand
    which is split post reload to an instruction that inserts non-matched input
    operand to an inout operand and the instruction that uses matched operand.

    The generated code improves from:

            movsbl  %dil, %edx
            movl    %edi, %eax
            sall    $3, %edx
            movb    %dl, %al

    to:

            movl    %edi, %eax
            movb    %dil, %al
            salb    $3, %al

    which is still not optimal, but the code is one instruction shorter and
    does not use a temporary register.

    2021-10-12  UroÅ¡ Bizjak  <ubizjak@gmail.com>

    gcc/
            PR target/85730
            PR target/82524
            * config/i386/i386.md (*add<mode>_1_slp): Rewrite as
            define_insn_and_split pattern.  Add alternative 1 and split it
            post reload to insert operand 1 into the low part of operand 0.
            (*sub<mode>_1_slp): Ditto.
            (*and<mode>_1_slp): Ditto.
            (*<any_or:code><mode>_1_slp): Ditto.
            (*ashl<mode>3_1_slp): Ditto.
            (*<any_shiftrt:insn><mode>3_1_slp): Ditto.
            (*<any_rotate:insn><mode>3_1_slp): Ditto.
            (*neg<mode>_1_slp): New insn_and_split pattern.
            (*one_cmpl<mode>_1_slp): Ditto.

    gcc/testsuite/
            PR target/85730
            PR target/82524
            * gcc.target/i386/pr85730.c: New test.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-10-12 16:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-85730-4@http.gcc.gnu.org/bugzilla/>
2021-10-04  3:02 ` [Bug target/85730] complex code for modifying lowest byte in a 4-byte vector gabravier at gmail dot com
2021-10-04  8:47 ` zsojka at seznam dot cz
2021-10-04  9:04 ` gabravier at gmail dot com
2021-10-04  9:54 ` rguenth at gcc dot gnu.org
2021-10-04 11:31 ` ubizjak at gmail dot com
2021-10-04 16:13 ` segher at gcc dot gnu.org
2021-10-07 11:51 ` ubizjak at gmail dot com
2021-10-08  9:36 ` ubizjak at gmail dot com
2021-10-12 16:21 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).