From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13389 invoked by alias); 7 Jun 2016 19:35:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 13376 invoked by uid 89); 7 Jun 2016 19:35:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-qg0-f47.google.com Received: from mail-qg0-f47.google.com (HELO mail-qg0-f47.google.com) (209.85.192.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 07 Jun 2016 19:34:58 +0000 Received: by mail-qg0-f47.google.com with SMTP id p34so63803400qgp.1 for ; Tue, 07 Jun 2016 12:34:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=zRQjq33WGKvUIR4gRugzGBGB4Cabnj61enJ/9ut7GH4=; b=ZEch47azH4KOVJLsq1bIAZjiKk4j6wqlXWha6PUufHrPKrAYg4KWFtMl2FQv81pyds DPzCdTm19kRg/88yCTNGOXv/4YqfqiVY4O/eRTIk/BawXPiqCeAbz/cjO3SeOYuq5S42 1NJa6DT0pt8pjAPp3ZZLnggUtnu6qWTgH1e1CX/o5GyHu76j8B1O0ZT+BaDxdEo8MVXt Xvc2mGUEQ8voEGlUHCs7p3pUs16YIhBJ+66V373vk4PHSdPWpfpVw7VzomTQ1MDnwAbb I+OyTgSQf8t7LOOG1SQejkEbf7Xq+/23YSRsBf1JrAt/2wJBL/N2+l+YzBhwnbalIiWg TgDg== X-Gm-Message-State: ALyK8tITohKkBK8j/DUT66EskomSnivM4rDoyH8/zqsISkxyNYOJLRss8xz0hmYM+GNvbULowKaLkmeKu5PVI1AJ X-Received: by 10.140.99.109 with SMTP id p100mr1104645qge.97.1465328096408; Tue, 07 Jun 2016 12:34:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.39.47 with HTTP; Tue, 7 Jun 2016 12:34:55 -0700 (PDT) In-Reply-To: <5746C78B.3030006@foss.arm.com> References: <5746C78B.3030006@foss.arm.com> From: Christophe Lyon Date: Tue, 07 Jun 2016 19:35:00 -0000 Message-ID: Subject: Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out To: Kyrill Tkachov Cc: GCC Patches Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2016-06/txt/msg00513.txt.bz2 On 26 May 2016 at 11:53, Kyrill Tkachov wrote: > Hi all, > > In this PR we want to optimise: > int foo (int i) > { > return (i == 0) ? N : __builtin_clz (i); > } > > on targets where CLZ is defined at zero to the constant 'N'. > This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO > macro. > The obvious place to implement this would be in combine through simplify-rtx > where we'd > recognise an IF_THEN_ELSE of the form: > (set (reg:SI r1) > (if_then_else:SI (ne (reg:SI r2) > (const_int 0 [0])) > (clz:SI (reg:SI r2)) > (const_int 32))) > > and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify > it into > just (clz:SI (reg:SI r2)). > However, I found this doesn't quite happen for a couple of reasons: > 1) This depends on ifcvt or some other pass to have created a conditional > move of the > two branches that provide the IF_THEN_ELSE to propagate the const_int and > clz operation into. > > 2) Combine will refuse to propagate r2 from the above example into both the > condition and the > CLZ at the same time, so the most we see is: > (set (reg:SI r1) > (if_then_else:SI (ne (reg:CC cc) > (const_int 0)) > (clz:SI (reg:SI r2)) > (const_int 32))) > > which is not enough information to perform the simplification. > > This patch implements the optimisation in ce1 using the noce ifcvt > framework. > During ifcvt noce_process_if_block can see that we're trying to optimise > something > of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all > the information > needed to perform the transformation. > > The transformation is performed by adding a new noce_try* function that > tries to put the > condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to > simplify that > using the simplify-rtx machinery. That way, we can implement the > simplification logic in > simplify-rtx.c where it belongs. > > A similar transformation for CTZ is implemented as well. > So for code: > int foo (int i) > { > return (i == 0) ? 32 : __builtin_clz (i); > } > > On aarch64 we now emit: > foo: > clz w0, w0 > ret > > instead of: > foo: > mov w1, 32 > clz w2, w0 > cmp w0, 0 > csel w0, w2, w1, ne > ret > > and for arm similarly we generate: > foo: > clz r0, r0 > bx lr > > instead of: > foo: > cmp r0, #0 > clzne r0, r0 > moveq r0, #32 > bx lr > > > and for x86_64 with -O2 -mlzcnt we generate: > foo: > xorl %eax, %eax > lzcntl %edi, %eax > ret > > instead of: > foo: > xorl %eax, %eax > movl $32, %edx > lzcntl %edi, %eax > testl %edi, %edi > cmove %edx, %eax > ret > > > I tried getting this to work on other targets as well, but encountered > difficulties. > For example on powerpc the two arms of the condition seen during ifcvt are: > > (insn 4 22 11 4 (set (reg:DI 156 [ ]) > (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64} > (nil)) > and > (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ ]) 0) > (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2} > (expr_list:REG_DEAD (reg/v:DI 157 [ i ]) > (nil))) > > So the setup code in noce_process_if_block sees that the set destination is > not the same > ((reg:DI 156 [ ]) and (subreg/s/u:SI (reg:DI 156 [ ]) 0)) > so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check. > I suppose that's a consequence of how SImode operations are represented in > early RTL > on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be > taught to handle > destinations that are subregs of one another, but that would be a separate > patch. > > Anyway, is this patch ok for trunk? > > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, > x86_64-pc-linux-gnu. > > Thanks, > Kyrill > > 2016-05-26 Kyrylo Tkachov > > PR middle-end/37780 > * ifcvt.c (noce_try_ifelse_collapse): New function. > Declare prototype. > (noce_process_if_block): Call noce_try_ifelse_collapse. > * simplify-rtx.c (simplify_cond_clz_ctz): New function. > (simplify_ternary_operation): Use the above to simplify > conditional CLZ/CTZ expressions. > > 2016-05-26 Kyrylo Tkachov > > PR middle-end/37780 > * gcc.c-torture/execute/pr37780.c: New test. > * gcc.target/aarch64/pr37780_1.c: Likewise. > * gcc.target/arm/pr37780_1.c: Likewise. Hi Kyrylo, I've noticed that gcc.target/arm/pr37780_1.c fails on arm if arch < v6. I first tried to fix the effective-target guard (IIRC, the doc says clz is available starting with v5t), but that isn't sufficient. When compiling for armv5-t, the scan-assembler directives fail. It seems to work with v6t2, so I am wondering whether it's just a matter of increasing the effective-target arch version, or if you really intended to make the test pass on these old architectures? Thanks, Christophe.