From: Eric Botcazou <ebotcazou@adacore.com>
To: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Support for SPARC M7 and VIS 4.0
Date: Wed, 01 Jun 2016 09:27:00 -0000 [thread overview]
Message-ID: <31167664.u1IHI6vaI0@polaris> (raw)
In-Reply-To: <87d1o245xu.fsf@oracle.com>
> This patch adds support for -mcpu=niagara7, corresponding to the SPARC
> M7 CPU as documented in the Oracle SPARC Architecture 2015 and the M7
> Processor Supplement. The patch also includes intrinsics support for
> all the VIS 4.0 instructions.
Thanks for contributing this.
> This patch has been tested in sparc64-*-linux-gnu, sparcv9-*-linux-gnu
> and sparc-sun-solaris2.11 targets.
>
> The support for the new instructions/registers/isas/etc of the M7 is
> already committed upstream in binutils.
For a while, so I think that we should put the patch on the 6 branch too.
> The niagara7 pipeline description models the V3 pipe using a bypass
> with latency 3, from-to any instruction executing in the V3 pipe. The
> instructions are identified by mean of a new instruction attribute
> "v3pipe", that has been added to the proper define_insns in sparc.md.
>
> However, I am not sure how well this will scale ni the future, as in
> future cpu models the subset of instruction executing in the V3 pipe
> will be different than in the M7. So we need a way to define that
> instruction class that is processor-specific: using v3pipe, v3pipe_m8,
> v3pipe_m9, etc seems a bit messy to me. Any idea?
I guess it depends on whether the set of instructions executing in the V3 pipe
is (or will become) kind of arbitrary or not. The usual way to support a new
scheduling model is to define new (sub-)types of instructions and assign the
appropriate (sub-)types to the scheduling units but, of course, if affected
instructions are selected randomly, the model falls apart.
> Note that the reason why the empirically observed latencies in the T4
> were different than the documented ones in the T4 supplement (as Dave
> found and fixed in
> https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00934.html) was that the
> HW chaps didn't feel it necessary to document the complexity in the
> PRM, and just assigned a latency of 11 to the VIS instructions.
Very interesting insight. It would be nice to add a blurb about that in
config/sparc/niagara4.md then.
> - Changes in the cache parameters for niagara processors
>
> The Oracle SPARC Architecture (previously the UltraSPARC Architecture)
> specification states that when a PREFETCH[A] instruction is executed
> an implementation-specific amount of data is prefetched, and that it
> is at least 64 bytes long (aligned to at least 64 bytes).
>
> However, this is not correct. The M7 (and implementations prior to
> that) does not guarantee a 64B prefetch into a cache if the line size
> is smaller. A single cache line is all that is ever prefetched. So
> for the M7, where the L1D$ has 32B lines and the L2D$ and L3 have 64B
> lines, a prefetch will prefetch 64B into the L2 and L3, but only 32B
> are brought into the L1D$. (Assuming it is a read_n prefetch, which is
> the only type which allocates to the L1.)
Adding a comment to sparc_option_override about that would also be nice.
> Another change is setting PARAM_SIMULTANEOUS_PREFETCHES to 32, as
> opposed to its previous value of 2 for niagara processors. Unlike in
> the UltraSPARC III, which featured a documented prefetch queue with a
> size of 8, in niagara processors prefetches are handled much like
> regular loads. The L1 miss buffer is 32 entries, but prefetches start
> getting affected when 30 entries become occupied. That occupation
> could be a mix of regular loads and prefetches though. And that
> buffer is shared by all threads. Once the threshold is reached, if
> the core is running a single thread the prefetch will retry. If more
> than one thread is running, the prefetch will be dropped. So, as you
> can see, all this makes it very difficult to determine how many
> simultaneous prefetches can be issued simultaneously, even in a
> single-threaded program. However, 2 is clearly too low, and
> experimental results show that setting this parameter to 32 works well
> when the number of threads is not high. (Note that I didn't change
> this parameter for niagara2, niagara3 and niagara4, but we probably
> want to do that.)
Fine with me, let's do that, but with a comment too.
> All above together makes a difference when running STREAM in a M7 with
> OMP_NUM_THREADS=2:
>
> With -O3 -mcpu=niagara4 -fprefetch-loop-arrays:
>
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 6336.6573 5.4323 5.4224 5.4455
> Scale: 5796.6113 5.9289 5.9276 5.9309
> Add: 7517.6836 6.8760 6.8558 6.8927
> Triad: 7781.0785 6.6364 6.6237 6.6549
>
> With -O3 -mcpu=niagara7 -fprefetch-loop-arrays:
>
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 10743.8074 3.2052 3.1981 3.2132
> Scale: 10763.5906 3.1995 3.1922 3.2078
> Add: 11866.4764 4.3537 4.3433 4.3688
> Triad: 12856.0593 4.0129 4.0090 4.0178
>
> (The impact on performance quickly decreases as more threads are used
> in the benchmark, as the prefetches start getting dropped.)
Impressive indeed.
The patch looks good to me, but here are a few nits:
> 2016-05-31 Jose E. Marchesi <jose.marchesi@oracle.com>
>
> * config/sparc/sparc.md (cpu): Added niagara7 cpu type.
> Include the M7 SPARC DFA scheduler.
> New attribute v3pipe.
> Annotate insns with v3pipe where appropriate.
> Define cpu_feature vis4.
"Add lzd instruction type and set it on clzdi_sp64 and clzsi_sp64."
> Added (V8QI "8") to vbits.
> Added insns {add,sub}v8qi3
> Added insns ss{add,sub}v8qi3
> Added insns us{add,sub}{v8qi,v4hi}3
> Added insns {min,max}{v8qi,v4hi,v2si}3
> Added insns {minu,maxu}{v8qi,v4hi,v2si}3
> Added insns fpcmp{le,gt,ule,ug,ule,ugt}{8,16,32}_vis.
Only present tense in ChangeLog: "Add ..."
> * config/sparc/niagara7.md: New file.
> * configure.ac (HAVE_AS_SPARC5_VIS4): Define if the assembler
> supports SPARC5 and VIS 4.0 instructions.
You need to add:
* configure: Regenerate.
* config.in: Likewise.
IOW the number of '*' must be equal to the number of added/modified files.
> * config.gcc: niagara7 is a supported cpu in sparc*-*-* targets.
> * config/sparc/sol2.h (ASM_CPU32_DEFAUILT_SPEC): Set for
> TARGET_CPU_niagara7.
> (ASM_CPU64_DEFAULT_SPEC): Likewise.
> (CPP_CPU_SPEC): Handle niagara7.
> (ASM_CPU_SPEC): Likewise.
> * config/sparc/sparc-opts.h (processor_type): Add
> PROCESSOR_NIAGARA7.
> (mvis4): Added.
> * config/sparc/sparc.h (TARGET_CPU_niagara7): Defined.
> (AS_NIAGARA7_FLAG): Defined.
> (ASM_CPU64_DEFAULT_SPEC): Set for niagara7.
> (CPP_CPU64_DEFAULT_SPEC): Likewise.
> (CPP_CPU_SPEC): Handle niagara7.
> (ASM_CPU_SPEC): Likewise.
Likewise for the present tense: "Add", "Define".
> * config/sparc/sparc.c (niagara7_costs): Defined.
> (sparc_option_override): Set L2 cache size for niagara*.
> (sparc_option_override): Ditto for the L1 cache size.
> (sparc_option_override): Handle niagara7.
> (sparc32_initialize_trampoline): Likewise.
> (sparc_use_sched_lookahead): Likewise.
> (sparc_issue_rate): Likewise.
> (sparc_register_move_cost): Likewise.
> (dump_target_flag_bits): Support VIS4.
> (sparc_option_override): Likewise.
> (sparc_vis_init_builtins): Likewise.
> (sparc_builtins): Likewise.
Only one entry per function (sparc_option_override).
> @@ -1583,10 +1626,29 @@ sparc_option_override (void)
>
> || sparc_cpu == PROCESSOR_NIAGARA
> || sparc_cpu == PROCESSOR_NIAGARA2
> || sparc_cpu == PROCESSOR_NIAGARA3
>
> - || sparc_cpu == PROCESSOR_NIAGARA4)
> - ? 64 : 32),
> + || sparc_cpu == PROCESSOR_NIAGARA4
> + || sparc_cpu == PROCESSOR_NIAGARA7)
> + ? 32 : 64),
> + global_options.x_param_values,
> + global_options_set.x_param_values);
> + maybe_set_param_value (PARAM_L1_CACHE_SIZE,
> + ((sparc_cpu == PROCESSOR_ULTRASPARC
> + || sparc_cpu == PROCESSOR_ULTRASPARC3
> + || sparc_cpu == PROCESSOR_NIAGARA
> + || sparc_cpu == PROCESSOR_NIAGARA2
> + || sparc_cpu == PROCESSOR_NIAGARA3
> + || sparc_cpu == PROCESSOR_NIAGARA4
> + || sparc_cpu == PROCESSOR_NIAGARA7)
> + ? 16 : 64),
> global_options.x_param_values,
> global_options_set.x_param_values);
> + maybe_set_param_value (PARAM_L2_CACHE_SIZE,
> + (sparc_cpu == PROCESSOR_NIAGARA4
> + ? 128 : (sparc_cpu == PROCESSOR_NIAGARA7
> + ? 256 : 512)),
> + global_options.x_param_values,
> + global_options_set.x_param_values);
> +
Please add a blank line between the statements. Why swapping 32 and 64 for
PARAM_L1_CACHE_LINE_SIZE? If the 32 default is universally OK, then let's
just remove the statement.
> /* Disable save slot sharing for call-clobbered registers by default.
> The IRA sharing algorithm works on single registers only and this
> @@ -9178,7 +9240,8 @@ sparc32_initialize_trampoline (rtx m_tramp, rtx
> fnaddr, rtx cxt) && sparc_cpu != PROCESSOR_NIAGARA
> && sparc_cpu != PROCESSOR_NIAGARA2
> && sparc_cpu != PROCESSOR_NIAGARA3
> - && sparc_cpu != PROCESSOR_NIAGARA4)
> + && sparc_cpu != PROCESSOR_NIAGARA4
> + && sparc_cpu != PROCESSOR_NIAGARA7) /* XXX */
> emit_insn (gen_flushsi (validize_mem (adjust_address (m_tramp, SImode,
> 8))));
What does the "XXX" mean?
> /* Call __enable_execute_stack after writing onto the stack to make sure
> @@ -9419,7 +9483,8 @@ sparc_use_sched_lookahead (void)
>
> || sparc_cpu == PROCESSOR_NIAGARA2
> || sparc_cpu == PROCESSOR_NIAGARA3)
>
> return 0;
> - if (sparc_cpu == PROCESSOR_NIAGARA4)
> + if (sparc_cpu == PROCESSOR_NIAGARA4
> + || sparc_cpu == PROCESSOR_NIAGARA7) /* XXX */
> return 2;
> if (sparc_cpu == PROCESSOR_ULTRASPARC
>
> || sparc_cpu == PROCESSOR_ULTRASPARC3)
Likewise.
> @@ -9442,6 +9507,7 @@ sparc_issue_rate (void)
> default:
> return 1;
> case PROCESSOR_NIAGARA4:
> + case PROCESSOR_NIAGARA7:
> case PROCESSOR_V9:
> /* Assume V9 processors are capable of at least dual-issue. */
> return 2;
Trailing spaces.
> diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
> index ebfe87d..d91496a 100644
> --- a/gcc/config/sparc/sparc.h
> +++ b/gcc/config/sparc/sparc.h
> @@ -142,6 +142,7 @@ extern enum cmodel sparc_cmodel;
> #define TARGET_CPU_niagara2 14
> #define TARGET_CPU_niagara3 15
> #define TARGET_CPU_niagara4 16
> +#define TARGET_CPU_niagara7 19
Any contribution to plug the hole is of course welcome. :-)
> @@ -8628,6 +8678,14 @@
> "fp<plusminus_insn>64\t%1, %2, %0"
> [(set_attr "type" "fga")])
>
> +(define_insn "<plusminus_insn>v8qi3"
> + [(set (match_operand:V8QI 0 "register_operand" "=e")
> + (plusminus:V8QI (match_operand:V8QI 1 "register_operand" "e")
> + (match_operand:V8QI 2 "register_operand" "e")))]
> + "TARGET_VIS4"
> + "fp<plusminus_insn>8\t%1, %2, %0"
> + [(set_attr "type" "fga")])
> +
> (define_mode_iterator VASS [V4HI V2SI V2HI V1SI])
> (define_code_iterator vis3_addsub_ss [ss_plus ss_minus])
> (define_code_attr vis3_addsub_ss_insn
Trailing spaces.
> +(define_insn "<vis3_addsub_ss_patname>v8qi3"
> + [(set (match_operand:V8QI 0 "register_operand" "=e")
> + (vis3_addsub_ss:V8QI (match_operand:V8QI 1 "register_operand" "e")
> + (match_operand:V8QI 2 "register_operand"
> "e")))] + "TARGET_VIS4"
> + "<vis3_addsub_ss_insn>8\t%1, %2, %0"
> + [(set_attr "type" "fga")])
If the mix of VIS4 and vis3_ is correct, then this requires a comment.
--
Eric Botcazou
next prev parent reply other threads:[~2016-06-01 9:27 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-31 16:15 Jose E. Marchesi
2016-06-01 9:27 ` Eric Botcazou [this message]
2016-06-01 9:55 ` Jose E. Marchesi
2016-06-02 10:33 ` Eric Botcazou
2016-06-02 10:44 ` Jose E. Marchesi
2016-06-02 10:21 ` [PATCH v1] " Jose E. Marchesi
2016-06-03 8:15 ` Eric Botcazou
2016-06-06 11:42 ` Jose E. Marchesi
2016-06-08 13:55 ` Eric Botcazou
2016-06-08 14:31 ` Gerald Pfeifer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=31167664.u1IHI6vaI0@polaris \
--to=ebotcazou@adacore.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jose.marchesi@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).