public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
@ 2021-07-24  6:20 ` pinskia at gcc dot gnu.org
  2021-07-24 12:28 ` jakub at gcc dot gnu.org
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-24  6:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Part of the problem is __builtin_clzl returns a signed integer :).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
  2021-07-24  6:20 ` [Bug middle-end/78103] Failure to optimize with __builtin_clzl pinskia at gcc dot gnu.org
@ 2021-07-24 12:28 ` jakub at gcc dot gnu.org
  2021-07-26 10:57 ` jakub at gcc dot gnu.org
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-24 12:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I actually think this should be doable with combiner splitters.
For the
unsigned int findLastSet3(unsigned long x) {
  return x ? 8 * sizeof(unsigned long) - __builtin_clzl(x) : 0;
}
or
unsigned int findLastSet4(unsigned int x) {
  return x ? 8 * sizeof(unsigned int) - __builtin_clz(x) : 0;
}
the combiner sees everything together, in the first case
(parallel [
        (set (reg:SI 84 [ <retval> ])
            (minus:SI (const_int 64 [0x40])
                (xor:SI (minus:SI (const_int 63 [0x3f])
                        (subreg:SI (clz:DI (reg/v:DI 85 [ x ])) 0))
                    (const_int 63 [0x3f]))))
        (clobber (reg:CC 17 flags))
    ])
and in the latter case
(parallel [
        (set (reg/v:SI 85 [ x ])
            (minus:SI (const_int 32 [0x20])
                (xor:SI (minus:SI (const_int 31 [0x1f])
                        (clz:SI (reg/v:SI 85 [ x ])))
                    (const_int 31 [0x1f]))))
        (clobber (reg:CC 17 flags))
    ])
so we just need to split it - into bsr and add 1 (of course only if
!TARGET_LZCNT, otherwise it is defined even for 0 and we don't know that it
will be UB on zero).
For the #c0 findLastSet we don't know that, but again we should know that it is
UB at zero when !TARGET_LZCNT and so we could first split
(set (reg:DI 87 [ _1 ])
    (xor:DI (sign_extend:DI (minus:SI (const_int 63 [0x3f])
                (subreg:SI (clz:DI (reg/v:DI 85 [ x ])) 0)))
        (const_int 63 [0x3f])))
into bsr and xor (i.e. get rid of the sign extension, if it is UB at zero, then
we know clz will be in range 0 to 63 and sign extension or zero extension are
the same and performed already when using the bs{q,l} instruction.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
  2021-07-24  6:20 ` [Bug middle-end/78103] Failure to optimize with __builtin_clzl pinskia at gcc dot gnu.org
  2021-07-24 12:28 ` jakub at gcc dot gnu.org
@ 2021-07-26 10:57 ` jakub at gcc dot gnu.org
  2021-07-26 11:14 ` jakub at gcc dot gnu.org
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 10:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 51204
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51204&action=edit
gcc12-pr78103.patch

Untested partial fix.
The patch does 2 things, one is (shown on the first testcase) optimize sign or
zero extension of __builtin_clz{,l,ll} from 32-bits to 64-bit - no need to emit
sign extension instruction.  The other change is to match 32-bit subtraction of
a constant and __builtin_clz, can be optimized into bsr and addition.
All this only with -mno-lzcnt, with -mlzcnt a very different insn is emitted
that handles it fine.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2021-07-26 10:57 ` jakub at gcc dot gnu.org
@ 2021-07-26 11:14 ` jakub at gcc dot gnu.org
  2021-07-26 11:22 ` jakub at gcc dot gnu.org
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 11:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at gcc dot gnu.org

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Unfortunately, it doesn't work for the #c0 testcase, after the combiner
splitter kicks in, the combiner doesn't even try that 4 insn combination. 
Haven't debugged it, but it looks like if LOG_LINKS isn't added between the 2
insns that replaced the 3 insns.
There is the successful:
Trying 9, 10 -> 12:
    9: {flags:CCZ=cmp(r85:DI,0);r86:DI=0x3f-clz(r85:DI);}
      REG_DEAD r85:DI
      REG_UNUSED flags:CCZ
   10: {r86:DI=r86:DI^0x3f;clobber flags:CC;}
      REG_UNUSED flags:CC
   12: r87:DI=sign_extend(r86:DI#0)
      REG_DEAD r86:DI
Failed to match this instruction:
(set (reg:DI 87 [ _1 ])
    (xor:DI (sign_extend:DI (minus:SI (const_int 63 [0x3f])
                (subreg:SI (clz:DI (reg/v:DI 85 [ x ])) 0)))
        (const_int 63 [0x3f])))
Splitting with gen_split_440 (i386.md:14804)
Successfully matched this instruction:
(set (reg:DI 93)
    (minus:DI (const_int 63 [0x3f])
        (clz:DI (reg/v:DI 85 [ x ]))))
Successfully matched this instruction:
(set (reg:DI 87 [ _1 ])
    (zero_extend:DI (xor:SI (subreg:SI (reg:DI 93) 0)
            (const_int 63 [0x3f]))))
allowing combination of insns 9, 10 and 12
original costs 8 + 4 + 4 = 16
replacement costs 8 + 5 = 13
deferring deletion of insn with uid = 9.
modifying insn i2    10: {r93:DI=0x3f-clz(r85:DI);clobber flags:CC;}
      REG_UNUSED flags:CC
      REG_DEAD r85:DI
deferring rescan insn with uid = 10.
modifying insn i3    12: {r87:DI=zero_extend(r93:DI#0^0x3f);clobber flags:CC;}
      REG_UNUSED flags:CC
deferring rescan insn with uid = 12.

and I was hoping that the combiner would soon try 10, 12, 13 -> 14 combination
where 10 and 12 are the above 2 new insns, 13 is r88:DI=0x40
and 14 is {r85:DI=r88:DI-r87:DI;clobber flags:CC;}.
But all it tries is 12 -> 14, 13 -> 14, 13, 12 -> 14 and all those fail.

Segher, any ideas?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2021-07-26 11:14 ` jakub at gcc dot gnu.org
@ 2021-07-26 11:22 ` jakub at gcc dot gnu.org
  2021-07-26 12:11 ` jakub at gcc dot gnu.org
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 11:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Or is it because of the new pseudo that is created for the temporary?
Seems when the combiner is splitting the insns itself using find_split_point,
it tries to reuse i2dest if possible for the middle temporary, but I guess the
splitters in the md files can't do that, they don't have that information.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-07-26 11:22 ` jakub at gcc dot gnu.org
@ 2021-07-26 12:11 ` jakub at gcc dot gnu.org
  2021-07-26 21:01 ` segher at gcc dot gnu.org
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 12:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
If reusing i2dest for the find_split_case is fine (I must say I'm a little bit
worried about uses of the pseudo in debug insns, whether if we reuse it for
something else we properly adjust those insns (or reset them)), what the code
does in the:
          rtx newdest = i2dest;
...
          /* Get NEWDEST as a register in the proper mode.  We have already
             validated that we can do this.  */
          if (GET_MODE (i2dest) != split_mode && split_mode != VOIDmode)
            {
              if (REGNO (i2dest) < FIRST_PSEUDO_REGISTER)
                newdest = gen_rtx_REG (split_mode, REGNO (i2dest));
              else
                {
                  SUBST_MODE (regno_reg_rtx[REGNO (i2dest)], split_mode);
                  newdest = regno_reg_rtx[REGNO (i2dest)];
                }
            }
code if the if condition is false, can't it also try to reuse i2dest in the
      else if (m_split_insn && NEXT_INSN (NEXT_INSN (m_split_insn)) == NULL_RTX
               && (next_nonnote_nondebug_insn (i2) == i3
                   || !modified_between_p (PATTERN (m_split_insn), i2, i3)))
case (if it verifies i2dest doesn't appear anywhere in i2set and i3set) by
rewriting the SET_DEST of the i2set and all uses of it in i3set with i2dest if
the mode matches?
On the other side the
             Since nowadays we can create registers during combine just fine,
             we should just create a new one here, not reuse i2dest.  */
comment says we should prefer creating new regs.  If so, fine, but then we
shouldn't just distribute the preexisting i3links, i2links etc., but should try
to create new ones.  I see the code does:
    LOG_LINKS (i3) = NULL;
    REG_NOTES (i3) = 0;
    LOG_LINKS (i2) = NULL;
    REG_NOTES (i2) = 0;
and then somewhat later
    distribute_links (i3links);
    distribute_links (i2links);
    distribute_links (i1links);
    distribute_links (i0links);
so distributes preexisting log links, but I don't see anything that would
similarly handle newly added pseudos from the combine_split_insns, create new
log link for that, record_value_for_reg etc.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2021-07-26 12:11 ` jakub at gcc dot gnu.org
@ 2021-07-26 21:01 ` segher at gcc dot gnu.org
  2021-07-26 21:08 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: segher at gcc dot gnu.org @ 2021-07-26 21:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #13 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #10)
> Unfortunately, it doesn't work for the #c0 testcase, after the combiner
> splitter kicks in, the combiner doesn't even try that 4 insn combination. 

It does for me?


Trying 10, 13, 12 -> 14:
   10: {r86:DI=r86:DI^0x3f;clobber flags:CC;}
      REG_UNUSED flags:CC
   13: r88:DI=0x40
   12: r87:DI=sign_extend(r86:DI#0)
      REG_DEAD r86:DI
   14: {r85:DI=r88:DI-r87:DI;clobber flags:CC;}
      REG_DEAD r88:DI
      REG_DEAD r87:DI
      REG_UNUSED flags:CC
      REG_EQUAL 0x40-r87:DI
Failed to match this instruction:
(parallel [
        (set (reg/v:DI 85 [ x ])
            (minus:DI (const_int 64 [0x40])
                (xor:DI (sign_extend:DI (subreg:SI (reg:DI 86) 0))
                    (const_int 63 [0x3f]))))
        (clobber (reg:CC 17 flags))
    ])
Failed to match this instruction:
(set (reg/v:DI 85 [ x ])
    (minus:DI (const_int 64 [0x40])
        (xor:DI (sign_extend:DI (subreg:SI (reg:DI 86) 0))
            (const_int 63 [0x3f]))))
Successfully matched this instruction:
(set (reg:DI 88)
    (sign_extend:DI (subreg:SI (reg:DI 86) 0)))
Failed to match this instruction:
(set (reg/v:DI 85 [ x ]) 
    (minus:DI (const_int 64 [0x40])
        (xor:DI (reg:DI 88)
            (const_int 63 [0x3f]))))

(Because 13 is a move from constant, 4-insn is allowed here).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2021-07-26 21:01 ` segher at gcc dot gnu.org
@ 2021-07-26 21:08 ` jakub at gcc dot gnu.org
  2021-07-26 21:26 ` segher at gcc dot gnu.org
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 21:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #13)
> (In reply to Jakub Jelinek from comment #10)
> > Unfortunately, it doesn't work for the #c0 testcase, after the combiner
> > splitter kicks in, the combiner doesn't even try that 4 insn combination. 
> 
> It does for me?

But only in the unpatched gcc, no?
For #c0 findLastSet I actually need to combine 5 original instructions, and
what I was hoping for is to first combine first 3 instructions into 2,
9, 10 -> 12 to get rid of the useless sign-extension, the value is known to be
0..63, so zero extension is fine, into 10 (bsr) and 12 (xor with zero extend),
which is what the #c9 patch does.
And then I was hoping 10, 12, 13 -> 14 would be attempted to be combined
because 13 is mov of a constant.  But that doesn't happen because the 9, 10 ->
12 combination with the #c9 patch throws away the 12 -> 10 LOG_LINKS and
doesn't add a new one, even when 10 is a setter of a fresh new pseudo and 12 is
the only use of that pseudo.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2021-07-26 21:08 ` jakub at gcc dot gnu.org
@ 2021-07-26 21:26 ` segher at gcc dot gnu.org
  2021-07-27  8:52 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: segher at gcc dot gnu.org @ 2021-07-26 21:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #15 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #14)
> (In reply to Segher Boessenkool from comment #13)
> > (In reply to Jakub Jelinek from comment #10)
> > > Unfortunately, it doesn't work for the #c0 testcase, after the combiner
> > > splitter kicks in, the combiner doesn't even try that 4 insn combination. 
> > 
> > It does for me?
> 
> But only in the unpatched gcc, no?

Yes, of course.

> For #c0 findLastSet I actually need to combine 5 original instructions,

[...]

That is not something we want to ever implement: 4 insns already is too
expensive unless we try only the simplest, and/or only very specific
combinations.

> and
> what I was hoping for is to first combine first 3 instructions into 2,
> 9, 10 -> 12 to get rid of the useless sign-extension,

You should be able to combine only 10 and 12 even, to a SImode xor followed
by the sign extension (may not work out wrt costs, but it isn't even tried).
Or, why is r86 DImode anyway?

> the value is known to
> be 0..63, so zero extension is fine, into 10 (bsr) and 12 (xor with zero
> extend), which is what the #c9 patch does.
> And then I was hoping 10, 12, 13 -> 14 would be attempted to be combined
> because 13 is mov of a constant.  But that doesn't happen because the 9, 10
> -> 12 combination with the #c9 patch throws away the 12 -> 10 LOG_LINKS and
> doesn't add a new one, even when 10 is a setter of a fresh new pseudo and 12
> is the only use of that pseudo.

This is only safe if it *is* a new pseudo, and even then, you need to prevent
getting stuck somehow.

insn 10 is the most problematic things here btw, having the same pseudo as
input and as output (it is not the unique setter either).  This happens in
expand already, probably a machine pattern that forgets to create new
registers where it should?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2021-07-26 21:26 ` segher at gcc dot gnu.org
@ 2021-07-27  8:52 ` jakub at gcc dot gnu.org
  2021-07-27 12:35 ` segher at gcc dot gnu.org
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-27  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #51204|0                           |1
        is obsolete|                            |

--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 51209
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51209&action=edit
gcc12-pr78103.patch

Updated patch.  This one fixes the reuse of a pseudo you've mentioned above,
and fixes gcc.target/i386/pr101175.c regression the patch reintroduced by
adding !TARGET_LZCNT conditions to the two new define_insns.

Nothing changes for combine though, I think it really would be nice if it could
either change newly added pseudos from combine_split_insns by i2dest if
possible, or better handle new pseudos from both combine_split_insns and when
for find_split_point i2dest can't be resued with creating LOG_LINKS.

I guess I can work around this by using define_insn_and_split instead of a
combiner splitter, but combine splitters are cleaner...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2021-07-27  8:52 ` jakub at gcc dot gnu.org
@ 2021-07-27 12:35 ` segher at gcc dot gnu.org
  2021-07-27 12:41 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: segher at gcc dot gnu.org @ 2021-07-27 12:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #17 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #16)
> Created attachment 51209 [details]
> gcc12-pr78103.patch
> 
> Updated patch.  This one fixes the reuse of a pseudo you've mentioned above,
> and fixes gcc.target/i386/pr101175.c regression the patch reintroduced by
> adding !TARGET_LZCNT conditions to the two new define_insns.

Great :-)

> Nothing changes for combine though, I think it really would be nice if it
> could either change newly added pseudos from combine_split_insns by i2dest
> if possible,

Is that ever safe to do?

> or better handle new pseudos from both combine_split_insns and
> when
> for find_split_point i2dest can't be resued with creating LOG_LINKS.

You cannot safely attempt to combine the two insns split to again, neither
alone nor in combination with other insns.  Since pretty much by definition
that new reg will be set in the new i2 and used in (only) in the new i3,
no combination with that new reg is valid ever?

> I guess I can work around this by using define_insn_and_split instead of a
> combiner splitter, but combine splitters are cleaner...

Sometimes it helps to pretend you have an instruction you do not.  Both pure
splitters and define_insn_and_split have their place :-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2021-07-27 12:35 ` segher at gcc dot gnu.org
@ 2021-07-27 12:41 ` jakub at gcc dot gnu.org
  2021-07-27 13:53 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-27 12:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #17)
> > Nothing changes for combine though, I think it really would be nice if it
> > could either change newly added pseudos from combine_split_insns by i2dest
> > if possible,
> 
> Is that ever safe to do?

Except for debug insns if the mode is the same and i2dest doesn't appear in
i2set nor i3set nor anything in between, why not?
As for debug insns, as I said earlier, the current code is reusing i2dest for
the splitting found by find_split_point.

> > or better handle new pseudos from both combine_split_insns and
> > when
> > for find_split_point i2dest can't be resued with creating LOG_LINKS.
> 
> You cannot safely attempt to combine the two insns split to again, neither
> alone nor in combination with other insns.  Since pretty much by definition
> that new reg will be set in the new i2 and used in (only) in the new i3,
> no combination with that new reg is valid ever?

Sure, combining those 2 insns can't be sucessful, otherwise it would have
matched earlier.  But they can be combined with other insns, say the first of
those 2 split insns being i0 and second i1 combined into some other i3 and i2,
or perhaps just split insns i1 and i2 combined into some other i3.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2021-07-27 12:41 ` jakub at gcc dot gnu.org
@ 2021-07-27 13:53 ` jakub at gcc dot gnu.org
  2021-07-27 15:25 ` segher at gcc dot gnu.org
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-27 13:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org

--- Comment #19 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 51211
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51211&action=edit
gcc12-pr78103.patch

Updated untested patch that uses define_insn_and_split in 2 cases instead of
the combine splitters so that it can work around the lack of LOG_LINKS.
The patch doesn't do anything with -mlzcnt, even when bsr or bsr + lea would be
shorter than mov + lzcnt + sub, but I'm afraid we can't do anything about that,
we've lost information whether it is UB on zero or not with -mlzcnt and for
zero input all this is quite unsafe.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2021-07-27 13:53 ` jakub at gcc dot gnu.org
@ 2021-07-27 15:25 ` segher at gcc dot gnu.org
  2021-07-27 15:37 ` segher at gcc dot gnu.org
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: segher at gcc dot gnu.org @ 2021-07-27 15:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #20 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #18)
> (In reply to Segher Boessenkool from comment #17)
> > > Nothing changes for combine though, I think it really would be nice if it
> > > could either change newly added pseudos from combine_split_insns by i2dest
> > > if possible,
> > 
> > Is that ever safe to do?
> 
> Except for debug insns if the mode is the same and i2dest doesn't appear in
> i2set nor i3set nor anything in between, why not?

You need to do the same twenty not-very-trivial-and-not-cheap-either checks
as we have to do in similar cases.  And try to get that right :-/

You also need to check i2dest does no longer appear in the new i2 and i3,
for example.

And for what?  What is the benefit?

> > > or better handle new pseudos from both combine_split_insns and
> > > when
> > > for find_split_point i2dest can't be resued with creating LOG_LINKS.
> > 
> > You cannot safely attempt to combine the two insns split to again, neither
> > alone nor in combination with other insns.  Since pretty much by definition
> > that new reg will be set in the new i2 and used in (only) in the new i3,
> > no combination with that new reg is valid ever?
> 
> Sure, combining those 2 insns can't be sucessful, otherwise it would have
> matched earlier.  But they can be combined with other insns, say the first
> of those 2 split insns being i0 and second i1 combined into some other i3
> and i2,
> or perhaps just split insns i1 and i2 combined into some other i3.

But there is nothing checking anything of the sort, and that would be really
awkward to fit in with the existing mechanics, if it is possible at all.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2021-07-27 15:25 ` segher at gcc dot gnu.org
@ 2021-07-27 15:37 ` segher at gcc dot gnu.org
  2021-07-31  7:21 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: segher at gcc dot gnu.org @ 2021-07-27 15:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #21 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #19)
> Created attachment 51211 [details]
> gcc12-pr78103.patch
> 
> Updated untested patch that uses define_insn_and_split in 2 cases instead of
> the combine splitters so that it can work around the lack of LOG_LINKS.
> The patch doesn't do anything with -mlzcnt, even when bsr or bsr + lea would
> be shorter than mov + lzcnt + sub, but I'm afraid we can't do anything about
> that,
> we've lost information whether it is UB on zero or not with -mlzcnt and for
> zero input all this is quite unsafe.

That patch looks good to me.  This is another argument for having some
canonical form for stuff with extends, btw (you have a xor of a sign extension
with a
number that is zero in all bits that are sign copies (incl. the original sign)
in the extension, so the extension could be pulled to the outside.  But the way
things are you really have to handle both xor(sext(x),c) and sext(xor(x,c)) in
the machine description, both formulations are allowed, whether or not we teach
simplify-rtx about this (if it doesn't already know :-) )

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2021-07-27 15:37 ` segher at gcc dot gnu.org
@ 2021-07-31  7:21 ` cvs-commit at gcc dot gnu.org
  2021-07-31  8:24 ` jakub at gcc dot gnu.org
  2021-08-01 20:33 ` cvs-commit at gcc dot gnu.org
  17 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-31  7:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:91425e2adecd00091d7443104ecb367686e88663

commit r12-2649-g91425e2adecd00091d7443104ecb367686e88663
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Jul 31 09:19:32 2021 +0200

    i386: Improve extensions of __builtin_clz and constant - __builtin_clz for
-mno-lzcnt [PR78103]

    This patch improves emitted code for the non-TARGET_LZCNT case.
    As __builtin_clz* is UB on 0 argument and for !TARGET_LZCNT
    CLZ_VALUE_DEFINED_AT_ZERO is 0, it is UB even at RTL time and so we
    can take advantage of that and assume the result will be 0 to 31 or
    0 to 63.
    Given that, sign or zero extension of that result are the same and
    are actually already performed by bsrl or xorl instructions.
    And constant - __builtin_clz* can be simplified into
    bsr + constant - bitmask.
    For TARGET_LZCNT, a lot of this is already fine as is (e.g. the sign or
    zero extensions), and other optimizations are IMHO not possible
    (if we have lzcnt, we've lost information on whether it is UB at
    zero or not and so can't transform it into bsr even when that is
    1-2 insns shorter).
    The changes on the 3 testcases between unpatched and patched gcc
    are for -m64:
    pr78103-1.s:
            bsrq    %rdi, %rax
    -       xorq    $63, %rax
    -       cltq
    +       xorl    $63, %eax
    ...
            bsrq    %rdi, %rax
    -       xorq    $63, %rax
    -       cltq
    +       xorl    $63, %eax
    ...
            bsrl    %edi, %eax
            xorl    $31, %eax
    -       cltq
    ...
            bsrl    %edi, %eax
            xorl    $31, %eax
    -       cltq
    pr78103-2.s:
            bsrl    %edi, %edi
    -       movl    $32, %eax
    -       xorl    $31, %edi
    -       subl    %edi, %eax
    +       leal    1(%rdi), %eax
    ...
    -       bsrl    %edi, %edi
    -       movl    $31, %eax
    -       xorl    $31, %edi
    -       subl    %edi, %eax
    +       bsrl    %edi, %eax
    ...
            bsrq    %rdi, %rdi
    -       movl    $64, %eax
    -       xorq    $63, %rdi
    -       subl    %edi, %eax
    +       leal    1(%rdi), %eax
    ...
    -       bsrq    %rdi, %rdi
    -       movl    $63, %eax
    -       xorq    $63, %rdi
    -       subl    %edi, %eax
    +       bsrq    %rdi, %rax
    pr78103-3.s:
            bsrl    %edi, %edi
    -       movl    $32, %eax
    -       xorl    $31, %edi
    -       movslq  %edi, %rdi
    -       subq    %rdi, %rax
    +       leaq    1(%rdi), %rax
    ...
    -       bsrl    %edi, %edi
    -       movl    $31, %eax
    -       xorl    $31, %edi
    -       movslq  %edi, %rdi
    -       subq    %rdi, %rax
    +       bsrl    %edi, %eax
    ...
            bsrq    %rdi, %rdi
    -       movl    $64, %eax
    -       xorq    $63, %rdi
    -       movslq  %edi, %rdi
    -       subq    %rdi, %rax
    +       leaq    1(%rdi), %rax
    ...
    -       bsrq    %rdi, %rdi
    -       movl    $63, %eax
    -       xorq    $63, %rdi
    -       movslq  %edi, %rdi
    -       subq    %rdi, %rax
    +       bsrq    %rdi, %rax

    Most of the changes are done with combine splitters, but for
    *bsr_rex64_2 and *bsr_2 I had to use define_insn_and_split, because
    as mentioned in the PR the combiner unfortunately doesn't create LOG_LINKS
    in between the two insns created by combine splitter, so it can't be
    combined further with following instructions.

    2021-07-31  Jakub Jelinek  <jakub@redhat.com>

            PR target/78103
            * config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New
            define_insn patterns.
            (*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns.
            Add combine splitters for constant - clz.
            (clz<mode>2): Use a temporary pseudo for bsr result.

            * gcc.target/i386/pr78103-1.c: New test.
            * gcc.target/i386/pr78103-2.c: New test.
            * gcc.target/i386/pr78103-3.c: New test.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2021-07-31  7:21 ` cvs-commit at gcc dot gnu.org
@ 2021-07-31  8:24 ` jakub at gcc dot gnu.org
  2021-08-01 20:33 ` cvs-commit at gcc dot gnu.org
  17 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-31  8:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #23 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for GCC 12+.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/78103] Failure to optimize with __builtin_clzl
       [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2021-07-31  8:24 ` jakub at gcc dot gnu.org
@ 2021-08-01 20:33 ` cvs-commit at gcc dot gnu.org
  17 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-08-01 20:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

--- Comment #24 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:6f0c43e97825ee54e3779afbedcd0def12443001

commit r12-2656-g6f0c43e97825ee54e3779afbedcd0def12443001
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Aug 1 09:55:33 2021 -0700

    i386: Improve SImode constant - __builtin_clzll for -mno-lzcnt

    Add a zero_extend patten for bsr_rex64_1 and use it to split SImode
    constant - __builtin_clzll to avoid unncessary zero_extend.

    gcc/

            PR target/78103
            * config/i386/i386.md (bsr_rex64_1_zext): New.
            (combine splitter for constant - clzll): Replace gen_bsr_rex64_1
            with gen_bsr_rex64_1_zext.

    gcc/testsuite/

            PR target/78103
            * gcc.target/i386/pr78103-2.c: Also scan incl.
            * gcc.target/i386/pr78103-3.c: Scan leal|addl|incl for x32.  Also
            scan incq.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-08-01 20:33 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-78103-4@http.gcc.gnu.org/bugzilla/>
2021-07-24  6:20 ` [Bug middle-end/78103] Failure to optimize with __builtin_clzl pinskia at gcc dot gnu.org
2021-07-24 12:28 ` jakub at gcc dot gnu.org
2021-07-26 10:57 ` jakub at gcc dot gnu.org
2021-07-26 11:14 ` jakub at gcc dot gnu.org
2021-07-26 11:22 ` jakub at gcc dot gnu.org
2021-07-26 12:11 ` jakub at gcc dot gnu.org
2021-07-26 21:01 ` segher at gcc dot gnu.org
2021-07-26 21:08 ` jakub at gcc dot gnu.org
2021-07-26 21:26 ` segher at gcc dot gnu.org
2021-07-27  8:52 ` jakub at gcc dot gnu.org
2021-07-27 12:35 ` segher at gcc dot gnu.org
2021-07-27 12:41 ` jakub at gcc dot gnu.org
2021-07-27 13:53 ` jakub at gcc dot gnu.org
2021-07-27 15:25 ` segher at gcc dot gnu.org
2021-07-27 15:37 ` segher at gcc dot gnu.org
2021-07-31  7:21 ` cvs-commit at gcc dot gnu.org
2021-07-31  8:24 ` jakub at gcc dot gnu.org
2021-08-01 20:33 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).