From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14514 invoked by alias); 31 Jul 2012 09:09:56 -0000 Received: (qmail 14428 invoked by uid 22791); 31 Jul 2012 09:09:53 -0000 X-SWARE-Spam-Status: No, hits=-6.0 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_HI,TW_LH,TW_SR,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 31 Jul 2012 09:09:39 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 694B1A329C; Tue, 31 Jul 2012 11:09:38 +0200 (CEST) Date: Tue, 31 Jul 2012 09:11:00 -0000 From: Richard Guenther To: Richard Henderson Cc: uweigand@de.ibm.com, gcc-patches@gcc.gnu.org Subject: Re: [PATCH 0/2] Convert s390 to atomic optabs, v2 In-Reply-To: <1343687574-3244-1-git-send-email-rth@redhat.com> Message-ID: References: <5016C81E.5020709@redhat.com> <1343687574-3244-1-git-send-email-rth@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-07/txt/msg01537.txt.bz2 On Mon, 30 Jul 2012, Richard Henderson wrote: > The atomic_load/storedi_1 patterns are fixed to use LM, STM. > > I've had a go at generating better code in the HQImode CAS > loop for aligned memory, but I don't know that I'd call it > the most efficient thing ever. Some of this is due to > deficiencies in other parts of the compiler (including the > s390 backend): > > (1) MEM_ALIGN can't pass down full align+ofs data that we had > during cfgexpand. This means the opportunities for using > the "aligned" path are less than they ought to be. > > (2) In get_pointer_alignment (used by get_builtin_sync_mem), > we don't consider an ADDR_EXPR to return the full alignment > that the type is due. I'm sure this is to work around some > other sort of usage via the builtins, but it's > less-than-handy in this case. > > I wonder if in get_builtin_sync_mem we ought to be using > get_object_alignment (build_fold_indirect_ref (addr)) instead? > > Consider > > struct S { int x; unsigned short y; } g_s; > unsigned short o, n; > void good() { > __builtin_compare_exchange (&g_s.y, &o, n, 0, 0, 0); > } > void bad(S *p_s) { > __builtin_compare_exchange (&p_s->y, &o, n, 0, 0, 0); > } > > where GOOD produces the aligned MEM that we need, and BAD doesn't. You cannot generally use get_object_alignment here. Once we have an address in the middle-end we treat it as generic pointer, happily not caring about the actual pointer types. But it seems we do /* The alignment needs to be at least according to that of the mode. */ set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode), get_pointer_alignment (loc))); anyway? What do we expect __builtin_compare_exchange to do for unaligned inputs? Like typedef int uint __attribute__((aligned((8)))); unsigned short o, n; void very_bad (uint *p) { __builtin_compare_exchange (p, &o, n, 0, 0, 0); } ? Doesn't the above set a wrong alignment? Back to using get_object_alignment - you cannot blindly use build_fold_indirect_ref at least, you could use get_object_alignment on the operand of an ADDR_EXPR address but I am sure we can construct a testcase where that would give a wrong answer, too (maybe not easily without violating the C standards rule that pointers have to be aligned according to their type ... but nobody in practice follows this and the middle-end does not require this either). Thus, the bad news is that it's hard for the middle-end to recover alignment of a memory access that is represented as a builtin function call that takes addresses as parameters (which also makes them address-taken and thus possibly aliased). Didn't Andrew have some patches to introduce a GIMPLE_ATOMIC eventually side-stepping this issue (maybe that used addresses, too)? Richard. > (3) Support for IC, and ICM via the insv pattern is lacking. > I've added a tiny bit of support here, in the form of using > the existing strict_low_part patterns, but most definitely we > could do better. > > (4) The *sethighpartsi and *sethighpartdi_64 patterns ought to be > more different. As is, we can't insert into bits 48-56 of a > DImode quantity, because we don't generate ICM for DImode, > only ICMH. > > (5) Missing support for RISBGZ in the form of an extv/z expander. > The existing *extv/z splitters probably ought to be conditionalized > on !Z10. > > (6) The strict_low_part patterns should allow registers for at > least Z10. The SImode strict_low_part can use LR everywhere. > > (7) RISBGZ could be used for a 3-address constant lshrsi3 before > srlk is available. > > For the GOOD function above, and this patch set, for -O3 -march=z10: > > larl %r3,s+4 > lhrl %r0,o > lhi %r2,1 > l %r1,0(%r3) > nilh %r1,0 > .L2: > lr %r5,%r1 > larl %r12,n > lr %r4,%r1 > risbg %r4,%r0,32,47,16 > icm %r5,3,0(%r12) > cs %r4,%r5,0(%r3) > je .L3 > lr %r5,%r4 > nilh %r5,0 > cr %r5,%r1 > lr %r1,%r5 > jne .L2 > lhi %r2,0 > .L3: > srl %r4,16 > sthrl %r4,o > > Odd things: > > * O is forced into a register before reaching the expander, so we > get the RISBG for that. N is left in a memory and so we commit > to using ICM for that. Further, because of how strict_low_part > is implemented we're committed to leaving that in memory. > > * We don't optimize the loop and hoist the LARL of N outside the loop. > > * Given that we're having to zap the mask in %r1 for the second > compare anyway, I wonder if RISBG is really beneficial over OR. > Is RISBG (or ICM for that matter) any faster (or even smaller)? > > > r~ > > > Richard Henderson (2): > s390: Reorg s390_expand_insv > s390: Convert from sync to atomic optabs > > gcc/config/s390/s390-protos.h | 3 +- > gcc/config/s390/s390.c | 270 ++++++++++++++++++---------- > gcc/config/s390/s390.md | 401 +++++++++++++++++++++++++++++------------ > 3 files changed, 465 insertions(+), 209 deletions(-) > > -- Richard Guenther SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend