From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9050 invoked by alias); 16 Aug 2016 16:43:25 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 9033 invoked by uid 89); 16 Aug 2016 16:43:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 spammy=TARGET_3DNOW, match_operand, target_3dnow, define_expand X-HELO: mail-yw0-f177.google.com Received: from mail-yw0-f177.google.com (HELO mail-yw0-f177.google.com) (209.85.161.177) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 16 Aug 2016 16:43:14 +0000 Received: by mail-yw0-f177.google.com with SMTP id u134so46527810ywg.3 for ; Tue, 16 Aug 2016 09:43:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=o+/iA9mWQHS4OLVblpY13U3LGzRy8JMiYrE6hN8h+Ik=; b=l98rUR2F/kgj5DX4jLUtqdR7yiLiUMLYj6DpxDp7jbbG9dhZuoqiu6e5NgsvUcFgdA soW9vHV+MZBDeu7Q46rPsUpUvgSOIlYyMmN9qyWelF8vUIGnYCKw/8gRahBW9oTV7MKS BKJ9seCIZ+aQJDtaXnTQI1d2zQrpTWsqZx4clurT3kMundo6dyE0te0EGUkLl9UYfJJp k+FMycGNNrU798B/Vgnr/EZl/CRjjPZiQtnYfdkPYxYE31eZ35skPrbSJ+OAJ1HvDXlF 4e55y1dUYqtVtDVFw+8HJ0m3t3sne46EbyMqJ9vQOiuYuaxDBE35n6ZGT3J5j6zYCCT2 nXFA== X-Gm-Message-State: AEkoouszFt5uOQx/QpwIqPSpOys7GdsnAz4Tr4iYUahZngAngCBCh4knw5/vyk36H9YU8BMwqHmsA55PADxOcg== X-Received: by 10.13.245.133 with SMTP id e127mr17182182ywf.199.1471365792916; Tue, 16 Aug 2016 09:43:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.215.143 with HTTP; Tue, 16 Aug 2016 09:42:52 -0700 (PDT) In-Reply-To: References: From: NightStrike Date: Tue, 16 Aug 2016 16:43:00 -0000 Message-ID: Subject: Re: option -mprfchw on 2 different Opteron cpus To: "Kumar, Venkataramanan" Cc: "Uros Bizjak (ubizjak@gmail.com)" , "lopezibanez@gmail.com" , Jan Hubicka , Jakub Jelinek , "gcc@gcc.gnu.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2016-08/txt/msg00086.txt.bz2 On Tue, May 3, 2016 at 12:40 AM, Kumar, Venkataramanan wrote: > Hi > >> -----Original Message----- >> From: NightStrike [mailto:nightstrike@gmail.com] >> Sent: Monday, May 2, 2016 10:31 PM >> To: Kumar, Venkataramanan >> Cc: Uros Bizjak (ubizjak@gmail.com) ; >> lopezibanez@gmail.com; Jan Hubicka ; Jakub Jelinek >> ; gcc@gcc.gnu.org >> Subject: Re: option -mprfchw on 2 different Opteron cpus >> >> On Mon, May 2, 2016 at 5:55 AM, Kumar, Venkataramanan >> wrote: >> >> If I compile on a k8 Opteron 248 with -march=3Dnative, I do not see >> >> -mprfchw listed in the options in -fverbose-asm. In the assembly, I = see >> this: >> >> >> >> prefetcht0 (%rax) # ivtmp.1160 >> >> prefetcht0 304(%rcx) # >> >> prefetcht0 (%rax) # ivtmp.1160 >> > >> > In AMD processors -mprfchw flag is used to enable "3dnowprefetch" ISA >> support. >> > >> > (Snip) >> > CPUID Fn8000_0001_ECX Feature Identifiers Bit 8 >> > 3DNowPrefetch: PREFETCH and PREFETCHW instruction support. See >> > =E2=80=9CPREFETCH=E2=80=9D and =E2=80=9CPREFETCHW=E2=80=9D in APM3 >> > Ref: http://support.amd.com/TechDocs/25481.pdf >> > (Snip) >> > >> > Can you please confirm what this CPUID flag returns on your k8 machine= ?. >> > I believe this ISA is not available on k8 machine so when -march=3Dnat= ive is >> added you don=E2=80=99t see -mprfchw in verbose. >> >> Looks like zero? This was generated with the cpuid program from >> http://www.etallen.com/cpuid.html >> >> 3DNow! instruction extensions =3D true >> 3DNow! instructions =3D true > > It has 3Dnow support. "prefetchw" is available with 3dnow. > >> misaligned SSE mode =3D false >> 3DNow! PREFETCH/PREFETCHW instructions =3D false > > It does not have 3DNowprefetch enabling ISA flag -mprftchw is not correct= for -march=3Dk8. > >> OS visible workaround =3D false >> instruction based sampling =3D false >> >> If I compile on a bdver2 Opteron 6386 SE with -march=3Dk8 (thus trying >> >> to target the older system), I do see it listed in the options in >> >> -fverbose-asm. In the assembly, I see this: >> > >> > K8 has 3dnow support and there is a patch that replaced 3dnow with >> prefetchw (3DNowPrefetch). >> > https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00866.html >> > So when you add -march=3Dk8 you see -mprfchw getting listed in verbos= e. >> > >> >> >> >> prefetcht0 (%rax) # ivtmp.1160 >> >> prefetcht0 304(%rcx) # >> >> prefetchw (%rax) # ivtmp.1160 >> >> >> >> (The third line is the only difference) >> >> >> > >> > This is my guess without seeing the test case, when write prefetching= is >> requested "prefetchw" is generated. >> > 3dnow (TARGET_3DNOW) ISA has support for it. >> > >> > (Snip) >> > Support for the PREFETCH and PREFETCHW instructions is indicated by >> > CPUID Fn8000_0001_ECX[3DNowPrefetch] OR Fn8000_0001_EDX[LM] OR >> > Fn8000_0001_EDX[3DNow] =3D 1. >> > (Snip) >> > Ref: >> http://developer.amd.com/wordpress/media/2008/10/24594_APM_v3.pdf >> > >> >> In both cases, I'm using gcc 4.9.3. Which is correct for a k8 Optero= n 248? >> >> >> >> Also, FWIW: >> >> >> >> 1) The march=3Dnative version that uses prefetcht0 is very repeatably >> >> faster by about 15% in the particular test case I'm looking at. >> >> >> >> 2) The compilers in both instances are not just the same version, >> >> they are the same compiler binary installed on an NFS mount and >> >> shared to both computers. >> > >> > As per GCC4.9.3 source. >> > >> > (Snip) >> > (define_expand "prefetch" >> > [(prefetch (match_operand 0 "address_operand") >> > (match_operand:SI 1 "const_int_operand") >> > (match_operand:SI 2 "const_int_operand"))] >> > "TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1" >> > { >> > bool write =3D INTVAL (operands[1]) !=3D 0; >> > int locality =3D INTVAL (operands[2]); >> > >> > gcc_assert (IN_RANGE (locality, 0, 3)); >> > >> > /* Use 3dNOW prefetch in case we are asking for write prefetch not >> > supported by SSE counterpart or the SSE prefetch is not available >> > (K6 machines). Otherwise use SSE prefetch as it allows specifying >> > of locality. */ >> > if (TARGET_PREFETCHWT1 && write && locality <=3D 2) >> > operands[2] =3D const2_rtx; >> > else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE)) >> > operands[2] =3D GEN_INT (3); >> > else >> > operands[1] =3D const0_rtx; >> > }) >> > (Snip) >> > >> > Write prefetch may be requested (either by auto prefetcher or builtins= ) but >> on -march=3Dnative, the below check could have become false. >> > else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE)) >> > TARGET_PRFCHW is off on native. >> > >> > So there are two issues here. >> > >> > (1) ISA flags enabled with -march=3Dk8 is different from -march=3Dnati= ve on k8 >> machine. > > I think we need to file bug for this. Need to check with Uros why the f= lag -mprfchw is shared with 3dnow. > To work around this issue you can use -mno-prfchw when building with -mar= ch=3Dk8. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D77270 >> > (2) Need to check why GCC middle end requested write prefetch for the >> test case with -march=3Dk8 . > On "prefetchw" generation it may be the case that GCC auto prefetcher req= uests write prefetches. > AFAIK generating write prefetches brings data from memory and marks the c= atch line modified and expects a write to happen next. > If read happens to that cache line instead then data will be written bac= k to memory before read which will be unnecessary. > Hard to answer without test case and I don=E2=80=99t have a ready k8 mach= ine with me. Should this be another bug filed if I can get a reduced test case, or is PR77270 enough, or is this not a bug?