From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14083 invoked by alias); 15 Jan 2015 08:29:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 14061 invoked by uid 89); 15 Jan 2015 08:29:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx2.suse.de Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Thu, 15 Jan 2015 08:29:47 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 3CD26AB09; Thu, 15 Jan 2015 08:29:43 +0000 (UTC) Date: Thu, 15 Jan 2015 08:43:00 -0000 From: Richard Biener To: Jakub Jelinek cc: Richard Biener , Jeff Law , Segher Boessenkool , Eric Botcazou , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637) In-Reply-To: <20150115081330.GB1405@tucnak.redhat.com> Message-ID: References: <20150113161819.GD1405@tucnak.redhat.com> <20150113163840.GA4183@gate.crashing.org> <54B575D7.8030107@redhat.com> <20150113201322.GJ1405@tucnak.redhat.com> <54B59964.7070707@redhat.com> <20150114000315.GA32710@gate.crashing.org> <54B609A9.9090800@redhat.com> <20150114151906.GA21784@gate.crashing.org> <54B74AD9.4010905@redhat.com> <20150115081330.GB1405@tucnak.redhat.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2015-01/txt/msg01161.txt.bz2 On Thu, 15 Jan 2015, Jakub Jelinek wrote: > On Thu, Jan 15, 2015 at 07:46:18AM +0100, Richard Biener wrote: > > >> That last line means the compiler is free to delete a non-volatile > > >> asm with a memory clobber if that asm is not needed for dataflow. Or > > >> that is how I read it; it is trying to indicate that if you want to > > >> prevent the memory clobber from being deleted (together with the rest > > >> of the asm), you need to make the asm volatile. > > >> > > >> So as far as I can see the compiler can CSE two identical > > >non-volatile > > >> asms with memory clobber just fine. Older GCC (I tried 4.7.2) does > > >do > > >> this; current mainline doesn't. I think it should. > > >No, it should not CSE those two cases. That's simply wrong and if an > > >older version did that optimization, that's a bug. > > > > I think segher has a point here. If the asm with memory clobber would store to random memory and the point would be to preserve that then the whole distinction with volatile doesn't make much sense (after all without volatile we happily DCE such asm if the regular outputs are not needed). > > > > This doesn't mean 'memory' is a well-designed thing, of course. Just its > > effects are effectively limited to reads without volatile(?) > > Segher's mails talk about "memory" being a write but not read. > If we even can't agree on what non-volatile "memory" means, I think > we should treat it more conservatively, because every user (and there are > lots of people using non-volatile asm with "memory" in the wild) treats it > differently. Just trying to grep for a few: > glibc: > ./sysdeps/alpha/bits/atomic.h:# define atomic_full_barrier() __asm ("mb" : : : "memory"); > ./sysdeps/alpha/bits/atomic.h:# define atomic_read_barrier() __asm ("mb" : : : "memory"); > ./sysdeps/alpha/bits/atomic.h:# define atomic_write_barrier() __asm ("wmb" : : : "memory"); > ./sysdeps/sparc/sparc32/bits/atomic.h:# define atomic_full_barrier() __asm ("" ::: "memory") > ./sysdeps/powerpc/powerpc32/bits/atomic.h:# define atomic_read_barrier() __asm ("lwsync" ::: "memory") > ./sysdeps/powerpc/powerpc32/bits/atomic.h:# define atomic_read_barrier() __asm ("sync" ::: "memory") > ./sysdeps/powerpc/powerpc64/bits/atomic.h:#define atomic_read_barrier() __asm ("lwsync" ::: "memory") > ./sysdeps/powerpc/bits/atomic.h:#define atomic_full_barrier() __asm ("sync" ::: "memory") > ./sysdeps/powerpc/bits/atomic.h:#define atomic_write_barrier() __asm ("eieio" ::: "memory") > ./sysdeps/generic/malloc-machine.h:# define atomic_full_barrier() __asm ("" ::: "memory") > ./elf/tst-tlsmod3.c: asm ("" ::: "memory"); > ./elf/tst-tlsmod4.c: asm ("" ::: "memory"); > ./elf/tst-tlsmod1.c: asm ("" ::: "memory"); > ./elf/tst-tlsmod2.c: asm ("" ::: "memory"); > ./include/atomic.h:# define atomic_full_barrier() __asm ("" ::: "memory") > linux kernel: > ./arch/arm/mach-omap2/pm24xx.c: asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc"); > ./arch/arm/include/asm/irqflags.h:#define local_fiq_enable() __asm__("cpsie f @ __stf" : : : "memory", "cc") > ./arch/arm/include/asm/irqflags.h:#define local_fiq_disable() __asm__("cpsid f @ __clf" : : : "memory", "cc") > ./arch/x86/include/asm/uaccess_64.h: asm("":::"memory"); > ./arch/x86/include/asm/uaccess_64.h: asm("":::"memory"); > ./arch/x86/include/asm/segment.h: asm("mov %%" #seg ",%0":"=r" (value) : : "memory") > ./arch/x86/include/asm/stackprotector.h: asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory"); > ./arch/arm64/include/asm/irqflags.h:#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory") > ./arch/arm64/include/asm/irqflags.h:#define local_fiq_disable() asm("msr daifset, #1" : : : "memory") > ./arch/arm64/include/asm/irqflags.h:#define local_async_enable() asm("msr daifclr, #4" : : : "memory") > ./arch/arm64/include/asm/irqflags.h:#define local_async_disable() asm("msr daifset, #4" : : : "memory") > ./arch/tile/lib/memcpy_64.c: __asm__ ("" : : : "memory"); > ./arch/tile/lib/memcpy_64.c: __asm__ ("" : : : "memory"); > ./arch/tile/include/hv/netio_intf.h: __asm__("" : : : "memory"); > ./drivers/net/ethernet/sgi/ioc3-eth.c: __asm__("sync" ::: "memory") > ./lib/sha1.c: #define setW(x, val) do { W(x) = (val); __asm__("":::"memory"); } while (0) > > The glibc barriers are supposedly something that can be CSEd (one barrier instead of > two consecutive barriers is enough), but certainly not moved across any loads/stores > in between. In the kernel case, the enable/disable probably wouldn't allow even CSE. > > So I'm with Jeff that we should treat "memory" at least as unspecified read and write, > and whether we can CSE them if there are no memory loads/stores in between them can > be discussed (most likely the kernel would be safe even in that case, because those > usually don't nest and appear in pairs, or act as barriers (like the glibc case), > or read from segment registers (guess again ok to be CSEd with no intervening loads/stores). > > In 4.9 backport I'd prefer not to CSE them at all though, stay conservative. Sure - I also requested "memory" to be not CSEd just to be conservative, not because I fully understood its meaning. Richard. -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)