From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28680 invoked by alias); 6 Aug 2012 22:40:59 -0000 Received: (qmail 28671 invoked by uid 22791); 6 Aug 2012 22:40:57 -0000 X-SWARE-Spam-Status: No, hits=-7.0 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_HI,RCVD_IN_HOSTKARMA_W,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 06 Aug 2012 22:40:40 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q76Meb89029997 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 6 Aug 2012 18:40:38 -0400 Received: from anchor.twiddle.home (vpn-9-86.rdu.redhat.com [10.11.9.86]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q76MeaJK027075; Mon, 6 Aug 2012 18:40:37 -0400 Message-ID: <502047E4.50304@redhat.com> Date: Mon, 06 Aug 2012 22:40:00 -0000 From: Richard Henderson User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120717 Thunderbird/14.0 MIME-Version: 1.0 To: Ulrich Weigand CC: gcc-patches@gcc.gnu.org Subject: s390: Avoid CAS boolean output inefficiency References: <201208061834.q76IY8HS013445@d06av02.portsmouth.uk.ibm.com> In-Reply-To: <201208061834.q76IY8HS013445@d06av02.portsmouth.uk.ibm.com> Content-Type: multipart/mixed; boundary="------------000401050502010407080203" X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-08/txt/msg00325.txt.bz2 This is a multi-part message in MIME format. --------------000401050502010407080203 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-length: 2618 On 08/06/2012 11:34 AM, Ulrich Weigand wrote: > There is one particular inefficiency I have noticed. This code: > > if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0)) > abort (); > > from atomic-compare-exchange-3.c gets compiled into: > > l %r3,0(%r2) > larl %r1,v > cs %r3,%r4,0(%r1) > ipm %r1 > sra %r1,28 > st %r3,0(%r2) > ltr %r1,%r1 > jne .L3 > > which is extremely inefficient; it converts the condition code into > an integer using the slow ipm, sra sequence, just so that it can > convert the integer back into a condition code via ltr and branch > on it ... This was caused (or perhaps abetted by) the representation of EQ as NE ^ 1. With the subsequent truncation and zero-extend, I think combine reached its insn limit of 3 before seeing everything it needed to see. I'm able to fix this problem by representing EQ as EQ before reload. For extimm targets this results in identical code; for older targets it requires avoidance of the constant pool, i.e. LHI+XR instead of X. l %r2,0(%r3) larl %r1,v cs %r2,%r5,0(%r1) st %r2,0(%r3) jne .L3 That fixed, we see the second CAS in that file: .loc 1 27 0 cs %r2,%r2,0(%r1) ipm %r5 sll %r5,28 lhi %r0,1 xr %r5,%r0 st %r2,0(%r3) ltr %r5,%r5 je .L20 This happens because CSE notices the cbranch vs 0, and sets r116 to zero along the path to 32 if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE)) at which point CSE decides that it would be cheaper to "re-use" the zero already in r116 instead of load another constant 0 here. After that, combine is ham-strung because r116 is not dead. I'm not quite sure the best way to fix this, since rtx_costs already has all constants cost 0. CSE ought not believe that r116 is better than a plain constant. CSE also shouldn't be extending the life of pseudos this way. A short-term possibility is to have the CAS insns accept general_operand, so that the 0 gets merged. With reload inheritance and post-reload cse, that might produce code that is "good enough". Certainly it's effective for the atomic-compare-exchange-3.c testcase. I'm less than happy with that, since the non-optimization of CAS depends on following code that is totally unrelated. This patch ought to be independent of any other patch so far. r~ --------------000401050502010407080203 Content-Type: text/plain; charset=UTF-8; name="z" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="z" Content-length: 2305 diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 0e43e51..bed6b79 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -5325,12 +5325,15 @@ (match_operand 3 "const0_operand")])) (clobber (reg:CC CC_REGNUM))])] "" - "emit_insn (gen_sne (operands[0], operands[2])); - if (GET_CODE (operands[1]) == EQ) - emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx)); - DONE;") +{ + if (!TARGET_EXTIMM && GET_CODE (operands[1]) == EQ) + { + emit_insn (gen_seq_neimm (operands[0], operands[2])); + DONE; + } +}) -(define_insn_and_split "sne" +(define_insn_and_split "*sne" [(set (match_operand:SI 0 "register_operand" "=d") (ne:SI (match_operand:CCZ1 1 "register_operand" "0") (const_int 0))) @@ -5342,6 +5345,48 @@ [(set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 28))) (clobber (reg:CC CC_REGNUM))])]) +(define_insn_and_split "*seq" + [(set (match_operand:SI 0 "register_operand" "=d") + (eq:SI (match_operand:CCZ1 1 "register_operand" "0") + (const_int 0))) + (clobber (reg:CC CC_REGNUM))] + "TARGET_EXTIMM" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op0 = operands[0]; + emit_insn (gen_lshrsi3 (op0, op0, GEN_INT (28))); + emit_insn (gen_xorsi3 (op0, op0, const1_rtx)); + DONE; +}) + +;; ??? Ideally we'd USE a const1_rtx, properly reloaded, but that makes +;; things more difficult for combine (which can only insert clobbers). +;; But perhaps it would be better still to have simply used a branch around +;; constant load instead of beginning with the IPM? +;; +;; What about LOCR for Z196? That's a more general question about cstore +;; being decomposed into movcc... + +(define_insn_and_split "seq_neimm" + [(set (match_operand:SI 0 "register_operand" "=d") + (eq:SI (match_operand:CCZ1 1 "register_operand" "0") + (const_int 0))) + (clobber (match_scratch:SI 2 "=&d")) + (clobber (reg:CC CC_REGNUM))] + "!TARGET_EXTIMM" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op2 = operands[2]; + emit_insn (gen_ashlsi3 (op0, op0, GEN_INT (28))); + emit_move_insn (op2, const1_rtx); + emit_insn (gen_xorsi3 (op0, op0, op2)); + DONE; +}) ;; ;; - Conditional move instructions (introduced with z196) --------------000401050502010407080203--