From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24128 invoked by alias); 7 Aug 2012 17:02:41 -0000 Received: (qmail 24110 invoked by uid 22791); 7 Aug 2012 17:02:37 -0000 X-SWARE-Spam-Status: No, hits=-4.1 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,MSGID_FROM_MTA_HEADER,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e06smtp17.uk.ibm.com (HELO e06smtp17.uk.ibm.com) (195.75.94.113) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 07 Aug 2012 17:02:20 +0000 Received: from /spool/local by e06smtp17.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 7 Aug 2012 18:02:18 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp17.uk.ibm.com (192.168.101.147) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 7 Aug 2012 18:02:16 +0100 Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by b06cxnps3074.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q77H2AOk19398740 for ; Tue, 7 Aug 2012 17:02:10 GMT Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q77H2Fje025597 for ; Tue, 7 Aug 2012 11:02:15 -0600 Received: from tuxmaker.boeblingen.de.ibm.com (tuxmaker.boeblingen.de.ibm.com [9.152.85.9]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with SMTP id q77H2DFl025511; Tue, 7 Aug 2012 11:02:13 -0600 Message-Id: <201208071702.q77H2DFl025511@d06av02.portsmouth.uk.ibm.com> Received: by tuxmaker.boeblingen.de.ibm.com (sSMTP sendmail emulation); Tue, 07 Aug 2012 19:02:13 +0200 Subject: Re: s390: Avoid CAS boolean output inefficiency To: rth@redhat.com (Richard Henderson) Date: Tue, 07 Aug 2012 17:02:00 -0000 From: "Ulrich Weigand" Cc: gcc-patches@gcc.gnu.org In-Reply-To: <502047E4.50304@redhat.com> from "Richard Henderson" at Aug 06, 2012 03:40:36 PM MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit x-cbid: 12080717-0542-0000-0000-000002ACE5C3 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-08/txt/msg00370.txt.bz2 Richard Henderson wrote: > On 08/06/2012 11:34 AM, Ulrich Weigand wrote: > > There is one particular inefficiency I have noticed. This code: > > > > if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0)) > > abort (); > > > > from atomic-compare-exchange-3.c gets compiled into: > > > > l %r3,0(%r2) > > larl %r1,v > > cs %r3,%r4,0(%r1) > > ipm %r1 > > sra %r1,28 > > st %r3,0(%r2) > > ltr %r1,%r1 > > jne .L3 > > > > which is extremely inefficient; it converts the condition code into > > an integer using the slow ipm, sra sequence, just so that it can > > convert the integer back into a condition code via ltr and branch > > on it ... > > This was caused (or perhaps abetted by) the representation of EQ > as NE ^ 1. With the subsequent truncation and zero-extend, I > think combine reached its insn limit of 3 before seeing everything > it needed to see. Yes, combine isn't able to handle everything in one go, but it finds an intermediate insn. With the old __sync_compare_and_swap, it is then able to optimize everything away in a second step; the only reason this doesn't work any more is the intervening store. The following patch changes the builtin expander to pass a MEM oldval as-is to the back-end expander, so that the back-end can move the store to before the CC operation. With that patch I'm also seeing all the IPMs disappear. [ Well, all except for this one: > This happens because CSE notices the cbranch vs 0, and sets r116 > to zero along the path to > > 32 if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG, > __ATOMIC_RELEASE, __ATOMIC_ACQUIRE)) > > at which point CSE decides that it would be cheaper to "re-use" > the zero already in r116 instead of load another constant 0 here. > After that, combine is ham-strung because r116 is not dead. As you point out, this does indeed fix this problem as well: > A short-term possibility is to have the CAS insns accept general_operand, > so that the 0 gets merged. ] What do you think about this solution? It has the advantage that we still get the same xor code if we actually do need the ipm ... Bye, Ulrich diff -ur gcc/builtins.c gcc.new/builtins.c --- gcc/builtins.c 2012-08-07 16:04:45.054348099 +0200 +++ gcc.new/builtins.c 2012-08-07 15:44:01.304349225 +0200 @@ -5376,6 +5376,7 @@ expect = expand_normal (CALL_EXPR_ARG (exp, 1)); expect = convert_memory_address (Pmode, expect); + expect = gen_rtx_MEM (mode, expect); desired = expand_expr_force_mode (CALL_EXPR_ARG (exp, 2), mode); weak = CALL_EXPR_ARG (exp, 3); @@ -5383,14 +5384,15 @@ if (host_integerp (weak, 0) && tree_low_cst (weak, 0) != 0) is_weak = true; - oldval = copy_to_reg (gen_rtx_MEM (mode, expect)); - + oldval = expect; if (!expand_atomic_compare_and_swap ((target == const0_rtx ? NULL : &target), &oldval, mem, oldval, desired, is_weak, success, failure)) return NULL_RTX; - emit_move_insn (gen_rtx_MEM (mode, expect), oldval); + if (oldval != expect) + emit_move_insn (expect, oldval); + return target; } diff -ur gcc/config/s390/s390.md gcc.new/config/s390/s390.md --- gcc/config/s390/s390.md 2012-08-07 16:04:54.204348621 +0200 +++ gcc.new/config/s390/s390.md 2012-08-07 16:00:21.934348628 +0200 @@ -8870,7 +8870,7 @@ (define_expand "atomic_compare_and_swap" [(match_operand:SI 0 "register_operand") ;; bool success output - (match_operand:DGPR 1 "register_operand") ;; oldval output + (match_operand:DGPR 1 "nonimmediate_operand");; oldval output (match_operand:DGPR 2 "memory_operand") ;; memory (match_operand:DGPR 3 "register_operand") ;; expected intput (match_operand:DGPR 4 "register_operand") ;; newval intput @@ -8879,9 +8879,17 @@ (match_operand:SI 7 "const_int_operand")] ;; failure model "" { - rtx cc, cmp; + rtx cc, cmp, output = operands[1]; + + if (!register_operand (output, mode)) + output = gen_reg_rtx (mode); + emit_insn (gen_atomic_compare_and_swap_internal - (operands[1], operands[2], operands[3], operands[4])); + (output, operands[2], operands[3], operands[4])); + + if (output != operands[1]) + emit_move_insn (operands[1], output); + cc = gen_rtx_REG (CCZ1mode, CC_REGNUM); cmp = gen_rtx_EQ (SImode, cc, const0_rtx); emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx)); -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com