RE: option -mprfchw on 2 different Opteron cpus

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Kumar, Venkataramanan" <Venkataramanan.Kumar@amd.com>
To: NightStrike <nightstrike@gmail.com>
Cc: "Uros Bizjak (ubizjak@gmail.com)" <ubizjak@gmail.com>,
	"lopezibanez@gmail.com" <lopezibanez@gmail.com>,
	Jan Hubicka	<hubicka@ucw.cz>, Jakub Jelinek <jakub@redhat.com>,
	"gcc@gcc.gnu.org"	<gcc@gcc.gnu.org>
Subject: RE: option -mprfchw on 2 different Opteron cpus
Date: Tue, 03 May 2016 04:40:00 -0000	[thread overview]
Message-ID: <CY1PR1201MB10986BAADD7DF937BAD1B98F8F7A0@CY1PR1201MB1098.namprd12.prod.outlook.com> (raw)
In-Reply-To: <CAF1jjLvYE5p+sDcdhyMtQ5PzBC_K_Sv+rc-5Zzd=kiYwTG2bjA@mail.gmail.com>

Hi 

> -----Original Message-----
> From: NightStrike [mailto:nightstrike@gmail.com]
> Sent: Monday, May 2, 2016 10:31 PM
> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
> Cc: Uros Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>;
> lopezibanez@gmail.com; Jan Hubicka <hubicka@ucw.cz>; Jakub Jelinek
> <jakub@redhat.com>; gcc@gcc.gnu.org
> Subject: Re: option -mprfchw on 2 different Opteron cpus
> 
> On Mon, May 2, 2016 at 5:55 AM, Kumar, Venkataramanan
> <Venkataramanan.Kumar@amd.com> wrote:
> >> If I compile on a k8 Opteron 248 with -march=native, I do not see
> >> -mprfchw listed in the options in -fverbose-asm.  In the assembly, I see
> this:
> >>
> >> prefetcht0      (%rax)  # ivtmp.1160
> >> prefetcht0      304(%rcx)       #
> >> prefetcht0      (%rax)  # ivtmp.1160
> >
> > In AMD processors -mprfchw flag  is used to enable "3dnowprefetch" ISA
> support.
> >
> > (Snip)
> > CPUID Fn8000_0001_ECX Feature Identifiers Bit 8
> > 3DNowPrefetch: PREFETCH and PREFETCHW instruction support. See
> > “PREFETCH” and “PREFETCHW” in APM3
> > Ref: http://support.amd.com/TechDocs/25481.pdf
> > (Snip)
> >
> > Can you please confirm what this CPUID flag returns on your k8 machine ?.
> > I believe this ISA is not available on k8 machine so when -march=native is
> added you don’t see  -mprfchw in verbose.
> 
> Looks like zero?  This was generated with the cpuid program from
> http://www.etallen.com/cpuid.html
> 
>       3DNow! instruction extensions         = true
>       3DNow! instructions                   = true

It has 3Dnow support.  "prefetchw" is available with 3dnow.
 
>       misaligned SSE mode                    = false
>       3DNow! PREFETCH/PREFETCHW instructions = false

It does not have 3DNowprefetch enabling ISA flag -mprftchw is not correct for -march=k8.  

>       OS visible workaround                  = false
>       instruction based sampling             = false
> >> If I compile on a bdver2 Opteron 6386 SE with -march=k8 (thus trying
> >> to target the older system), I do see it listed in the options in
> >> -fverbose-asm.  In the assembly, I see this:
> >
> > K8 has 3dnow support and there is a patch that replaced 3dnow with
> prefetchw (3DNowPrefetch).
> > https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00866.html
> > So when you add -march=k8 you see -mprfchw  getting listed in verbose.
> >
> >>
> >> prefetcht0      (%rax)  # ivtmp.1160
> >> prefetcht0      304(%rcx)       #
> >> prefetchw       (%rax)  # ivtmp.1160
> >>
> >> (The third line is the only difference)
> >>
> >
> > This is my guess without seeing the test case, when write  prefetching is
> requested "prefetchw" is generated.
> > 3dnow (TARGET_3DNOW) ISA has support for it.
> >
> > (Snip)
> > Support for the PREFETCH and PREFETCHW instructions is indicated by
> > CPUID Fn8000_0001_ECX[3DNowPrefetch] OR Fn8000_0001_EDX[LM] OR
> > Fn8000_0001_EDX[3DNow] = 1.
> > (Snip)
> > Ref:
> http://developer.amd.com/wordpress/media/2008/10/24594_APM_v3.pdf
> >
> >> In both cases, I'm using gcc 4.9.3.  Which is correct for a k8 Opteron 248?
> >>
> >> Also, FWIW:
> >>
> >> 1) The march=native version that uses prefetcht0 is very repeatably
> >> faster by about 15% in the particular test case I'm looking at.
> >>
> >> 2) The compilers in both instances are not just the same version,
> >> they are the same compiler binary installed on an NFS mount and
> >> shared to both computers.
> >
> > As per GCC4.9.3 source.
> >
> > (Snip)
> > (define_expand "prefetch"
> >   [(prefetch (match_operand 0 "address_operand")
> >              (match_operand:SI 1 "const_int_operand")
> >              (match_operand:SI 2 "const_int_operand"))]
> >   "TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1"
> > {
> >   bool write = INTVAL (operands[1]) != 0;
> >   int locality = INTVAL (operands[2]);
> >
> >   gcc_assert (IN_RANGE (locality, 0, 3));
> >
> >   /* Use 3dNOW prefetch in case we are asking for write prefetch not
> >      supported by SSE counterpart or the SSE prefetch is not available
> >      (K6 machines).  Otherwise use SSE prefetch as it allows specifying
> >      of locality.  */
> >   if (TARGET_PREFETCHWT1 && write && locality <= 2)
> >     operands[2] = const2_rtx;
> >   else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
> >     operands[2] = GEN_INT (3);
> >   else
> >     operands[1] = const0_rtx;
> > })
> > (Snip)
> >
> > Write prefetch may be requested (either by auto prefetcher or builtins) but
> on -march=native, the below check could have become false.
> >    else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
> > TARGET_PRFCHW is off on native.
> >
> > So there are two issues here.
> >
> > (1) ISA flags enabled with -march=k8 is different from -march=native on k8
> machine.

I think  we need to file bug for this.  Need to check with Uros why the flag -mprfchw is shared with 3dnow.
To work around this issue you can use -mno-prfchw when building with -march=k8.

> > (2) Need to check why GCC middle end requested write prefetch for the
> test case with -march=k8 .
On "prefetchw" generation it may be the case that GCC auto prefetcher requests write prefetches.
AFAIK generating write prefetches brings data from memory and marks the catch line modified and expects a write to happen next.
If read happens to that cache line instead  then data will be written back to memory before read which will be unnecessary. 
Hard to answer without test case and I don’t have a ready k8 machine with me.

> >
> > Regards,
> > Venkat.

next prev parent reply	other threads:[~2016-05-03  4:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-01 20:25 NightStrike
2016-05-02  9:55 ` Kumar, Venkataramanan
2016-05-02 17:01   ` NightStrike
2016-05-03  4:40     ` Kumar, Venkataramanan [this message]
2016-08-16 16:43       ` NightStrike

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY1PR1201MB10986BAADD7DF937BAD1B98F8F7A0@CY1PR1201MB1098.namprd12.prod.outlook.com \
    --to=venkataramanan.kumar@amd.com \
    --cc=gcc@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=jakub@redhat.com \
    --cc=lopezibanez@gmail.com \
    --cc=nightstrike@gmail.com \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).