From: Christophe Lyon <christophe.lyon@linaro.org>
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
Date: Tue, 07 Jun 2016 19:35:00 -0000 [thread overview]
Message-ID: <CAKdteOZ1rzAAW+M1u0yH++erPf7P7pP7RCs83LT1LxXo7mafLA@mail.gmail.com> (raw)
In-Reply-To: <5746C78B.3030006@foss.arm.com>
On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> Hi all,
>
> In this PR we want to optimise:
> int foo (int i)
> {
> return (i == 0) ? N : __builtin_clz (i);
> }
>
> on targets where CLZ is defined at zero to the constant 'N'.
> This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO
> macro.
> The obvious place to implement this would be in combine through simplify-rtx
> where we'd
> recognise an IF_THEN_ELSE of the form:
> (set (reg:SI r1)
> (if_then_else:SI (ne (reg:SI r2)
> (const_int 0 [0]))
> (clz:SI (reg:SI r2))
> (const_int 32)))
>
> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify
> it into
> just (clz:SI (reg:SI r2)).
> However, I found this doesn't quite happen for a couple of reasons:
> 1) This depends on ifcvt or some other pass to have created a conditional
> move of the
> two branches that provide the IF_THEN_ELSE to propagate the const_int and
> clz operation into.
>
> 2) Combine will refuse to propagate r2 from the above example into both the
> condition and the
> CLZ at the same time, so the most we see is:
> (set (reg:SI r1)
> (if_then_else:SI (ne (reg:CC cc)
> (const_int 0))
> (clz:SI (reg:SI r2))
> (const_int 32)))
>
> which is not enough information to perform the simplification.
>
> This patch implements the optimisation in ce1 using the noce ifcvt
> framework.
> During ifcvt noce_process_if_block can see that we're trying to optimise
> something
> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all
> the information
> needed to perform the transformation.
>
> The transformation is performed by adding a new noce_try* function that
> tries to put the
> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to
> simplify that
> using the simplify-rtx machinery. That way, we can implement the
> simplification logic in
> simplify-rtx.c where it belongs.
>
> A similar transformation for CTZ is implemented as well.
> So for code:
> int foo (int i)
> {
> return (i == 0) ? 32 : __builtin_clz (i);
> }
>
> On aarch64 we now emit:
> foo:
> clz w0, w0
> ret
>
> instead of:
> foo:
> mov w1, 32
> clz w2, w0
> cmp w0, 0
> csel w0, w2, w1, ne
> ret
>
> and for arm similarly we generate:
> foo:
> clz r0, r0
> bx lr
>
> instead of:
> foo:
> cmp r0, #0
> clzne r0, r0
> moveq r0, #32
> bx lr
>
>
> and for x86_64 with -O2 -mlzcnt we generate:
> foo:
> xorl %eax, %eax
> lzcntl %edi, %eax
> ret
>
> instead of:
> foo:
> xorl %eax, %eax
> movl $32, %edx
> lzcntl %edi, %eax
> testl %edi, %edi
> cmove %edx, %eax
> ret
>
>
> I tried getting this to work on other targets as well, but encountered
> difficulties.
> For example on powerpc the two arms of the condition seen during ifcvt are:
>
> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
> (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
> (nil))
> and
> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
> (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
> (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
> (nil)))
>
> So the setup code in noce_process_if_block sees that the set destination is
> not the same
> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0))
> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
> I suppose that's a consequence of how SImode operations are represented in
> early RTL
> on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be
> taught to handle
> destinations that are subregs of one another, but that would be a separate
> patch.
>
> Anyway, is this patch ok for trunk?
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu.
>
> Thanks,
> Kyrill
>
> 2016-05-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> PR middle-end/37780
> * ifcvt.c (noce_try_ifelse_collapse): New function.
> Declare prototype.
> (noce_process_if_block): Call noce_try_ifelse_collapse.
> * simplify-rtx.c (simplify_cond_clz_ctz): New function.
> (simplify_ternary_operation): Use the above to simplify
> conditional CLZ/CTZ expressions.
>
> 2016-05-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> PR middle-end/37780
> * gcc.c-torture/execute/pr37780.c: New test.
> * gcc.target/aarch64/pr37780_1.c: Likewise.
> * gcc.target/arm/pr37780_1.c: Likewise.
Hi Kyrylo,
I've noticed that gcc.target/arm/pr37780_1.c fails on
arm if arch < v6.
I first tried to fix the effective-target guard (IIRC, the doc
says clz is available starting with v5t), but that isn't sufficient.
When compiling for armv5-t, the scan-assembler directives
fail. It seems to work with v6t2, so I am wondering whether
it's just a matter of increasing the effective-target arch version,
or if you really intended to make the test pass on these old
architectures?
Thanks,
Christophe.
next prev parent reply other threads:[~2016-06-07 19:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-26 13:55 Kyrill Tkachov
2016-06-06 14:36 ` Kyrill Tkachov
2016-06-06 14:44 ` Bernd Schmidt
2016-06-07 7:37 ` Andreas Schwab
2016-06-07 16:47 ` Kyrill Tkachov
2016-06-07 16:51 ` Kyrill Tkachov
2016-06-07 19:35 ` Christophe Lyon [this message]
2016-06-08 16:40 ` Kyrill Tkachov
2016-06-09 12:15 ` Christophe Lyon
2016-06-10 8:38 ` Kyrill Tkachov
2016-06-10 13:39 ` Christophe Lyon
2016-06-10 13:41 ` Kyrill Tkachov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKdteOZ1rzAAW+M1u0yH++erPf7P7pP7RCs83LT1LxXo7mafLA@mail.gmail.com \
--to=christophe.lyon@linaro.org \
--cc=gcc-patches@gcc.gnu.org \
--cc=kyrylo.tkachov@foss.arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).