From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25034 invoked by alias); 6 Aug 2012 18:34:35 -0000 Received: (qmail 24739 invoked by uid 22791); 6 Aug 2012 18:34:31 -0000 X-SWARE-Spam-Status: No, hits=-4.0 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,MSGID_FROM_MTA_HEADER,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,TW_LP,TW_LR,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e06smtp16.uk.ibm.com (HELO e06smtp16.uk.ibm.com) (195.75.94.112) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 06 Aug 2012 18:34:15 +0000 Received: from /spool/local by e06smtp16.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 6 Aug 2012 19:34:13 +0100 Received: from d06nrmr1806.portsmouth.uk.ibm.com (9.149.39.193) by e06smtp16.uk.ibm.com (192.168.101.146) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 6 Aug 2012 19:34:10 +0100 Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1806.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q76IY9OF2941040 for ; Mon, 6 Aug 2012 19:34:09 +0100 Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q76IY9de013460 for ; Mon, 6 Aug 2012 12:34:09 -0600 Received: from tuxmaker.boeblingen.de.ibm.com (tuxmaker.boeblingen.de.ibm.com [9.152.85.9]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with SMTP id q76IY8HS013445; Mon, 6 Aug 2012 12:34:08 -0600 Message-Id: <201208061834.q76IY8HS013445@d06av02.portsmouth.uk.ibm.com> Received: by tuxmaker.boeblingen.de.ibm.com (sSMTP sendmail emulation); Mon, 06 Aug 2012 20:34:08 +0200 Subject: Re: [PATCH 2/2] s390: Convert from sync to atomic optabs To: rth@redhat.com (Richard Henderson) Date: Mon, 06 Aug 2012 18:34:00 -0000 From: "Ulrich Weigand" Cc: gcc-patches@gcc.gnu.org In-Reply-To: <1343687574-3244-3-git-send-email-rth@redhat.com> from "Richard Henderson" at Jul 30, 2012 03:32:54 PM MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit x-cbid: 12080618-3548-0000-0000-000002C0B067 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-08/txt/msg00313.txt.bz2 Richard Henderson wrote: Some more comments on this patch. > +; Different from movdi_31 in that we have no splitters. > +(define_insn "atomic_loaddi_1" > + [(set (match_operand:DI 0 "register_operand" "=d,d,!*f,!*f") > + (unspec:DI [(match_operand:DI 1 "s_operand" "Q,S,Q,m")] The constraint line doesn't look right. ld(y) accept base+index, so they should get R and T constraints, respectively. In fact, I wouldn't use "s_operand" as a predicate, since this excludes valid addresses for those cases, and since in general I've found it to be preferable to have reload fix up addresses. So I'd just use "memory_operand" here, and actually in all the other patterns as well ... (This also simplifes the expander.) > + UNSPEC_MOVA))] > + "!TARGET_ZARCH" > + "@ > + lm\t%0,%M0,%S1 > + lmy\t%0,%M0,%S1 > + ld\t%0,%1 > + ldy\t%0,%1" > + [(set_attr "op_type" "RS,RSY,RS,RSY") > + (set_attr "type" "lm,lm,floaddf,floaddf")]) > + > +(define_insn "atomic_loadti_1" > + [(set (match_operand:TI 0 "register_operand" "=r") > + (unspec:TI [(match_operand:TI 1 "memory_operand" "m")] "m" is wrong here (and basically everywhere), since it includes SYMBOL_REFs for lrl etc. on z10. For lpq we need "RT". > + "lpq\t%0,%1" > + [(set_attr "op_type" "RXY")]) LPQ and STPQ probably should get a "type" of "other". This may not model the pipeline exactly, but it's better than just assuming "integer" by default. These instructions are special (and slow). > +(define_expand "atomic_compare_and_swap" > + [(match_operand:SI 0 "register_operand") ;; bool success output > + (match_operand:DGPR 1 "register_operand") ;; oldval output > + (match_operand:DGPR 2 "s_operand") ;; memory memory_operand probably better again. > + (match_operand:DGPR 3 "register_operand") ;; expected intput > + (match_operand:DGPR 4 "register_operand") ;; newval intput > + (match_operand:SI 5 "const_int_operand") ;; is_weak > + (match_operand:SI 6 "const_int_operand") ;; success model > + (match_operand:SI 7 "const_int_operand")] ;; failure model > + "" > +{ > + rtx cc, cmp; > + emit_insn (gen_atomic_compare_and_swap_internal > + (operands[1], operands[2], operands[3], operands[4])); > + cc = gen_rtx_REG (CCZ1mode, CC_REGNUM); > + cmp = gen_rtx_NE (SImode, cc, const0_rtx); As already pointed out, this needs to be EQ. > - "" > - "cds\t%0,%3,%S1" > - [(set_attr "op_type" "RS") > + "TARGET_ZARCH" > + "csg\t%0,%3,%S1" > + [(set_attr "op_type" "RXY") Should be "RSY" > + "@ > + cds\t%0,%3,%S1 > + cdsy\t%0,%3,%S1" > + [(set_attr "op_type" "RX,RXY") Should be "RS, RSY" > + "@ > + cs\t%0,%3,%S1 > + csy\t%0,%3,%S1" > + [(set_attr "op_type" "RX,RXY") Likewise. > +(define_insn "atomic_fetch__iaf" > + [(set (match_operand:GPR 0 "register_operand" "=d") > + (match_operand:GPR 1 "s_operand" "+S")) > + (set (match_dup 1) > + (unspec_volatile:GPR > + [(ATOMIC_Z196:GPR (match_dup 1) > + (match_operand:GPR 2 "general_operand" "d"))] > + UNSPECV_ATOMIC_OP)) > + (clobber (reg:CC CC_REGNUM))] > "TARGET_Z196" > - "la\t%0,%2,%1") > + "la\t%0,%2,%1" > + [(set_attr "op_type" "RXY") Again RSY. There is one particular inefficiency I have noticed. This code: if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0)) abort (); from atomic-compare-exchange-3.c gets compiled into: l %r3,0(%r2) larl %r1,v cs %r3,%r4,0(%r1) ipm %r1 sra %r1,28 st %r3,0(%r2) ltr %r1,%r1 jne .L3 which is extremely inefficient; it converts the condition code into an integer using the slow ipm, sra sequence, just so that it can convert the integer back into a condition code via ltr and branch on it ... Now, with the old code this sequence used to be optimized away by combine and we'd just have the cs followed by a branch. This is not done any more because we have the store to "expected" in between. This is because of this code in combine: /* Make sure that the value that is to be substituted for the register does not use any registers whose values alter in between. However, If the insns are adjacent, a use can't cross a set even though we think it might (this can happen for a sequence of insns each setting the same destination; last_set of that register might point to a NOTE). If INSN has a REG_EQUIV note, the register is always equivalent to the memory so the substitution is valid even if there are intervening stores. Also, don't move a volatile asm or UNSPEC_VOLATILE across any other insns. */ || (! all_adjacent && (((!MEM_P (src) || ! find_reg_note (insn, REG_EQUIV, src)) && use_crosses_set_p (src, DF_INSN_LUID (insn))) || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src)) || GET_CODE (src) == UNSPEC_VOLATILE)) Note that we have exactly the case mentioned, where a series of instructions to be combined all set the same (CC) register. If they're all adjacent, this still gets optimized away -- but they no longer are due to the store. Is there a way of structuring the atomic optabs/expander so that the store gets done either before or after all the CC operations? Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com