From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16226 invoked by alias); 14 Nov 2014 20:16:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 16211 invoked by uid 89); 14 Nov 2014 20:16:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.2 X-HELO: e9.ny.us.ibm.com Received: from e9.ny.us.ibm.com (HELO e9.ny.us.ibm.com) (32.97.182.139) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 14 Nov 2014 20:16:42 +0000 Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 14 Nov 2014 15:16:40 -0500 Received: from d01dlp01.pok.ibm.com (9.56.250.166) by e9.ny.us.ibm.com (192.168.1.109) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 14 Nov 2014 15:16:37 -0500 Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 3537338C804D for ; Fri, 14 Nov 2014 15:11:10 -0500 (EST) Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id sAEKGagq24576210 for ; Fri, 14 Nov 2014 20:16:36 GMT Received: from d01av03.pok.ibm.com (localhost [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id sAEKGZIw026026 for ; Fri, 14 Nov 2014 15:16:35 -0500 Received: from ibm-tiger.the-meissners.org (dhcp-9-32-77-206.usma.ibm.com [9.32.77.206]) by d01av03.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id sAEKGZCV025979; Fri, 14 Nov 2014 15:16:35 -0500 Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 0BDAB4249C; Fri, 14 Nov 2014 15:16:34 -0500 (EST) Date: Fri, 14 Nov 2014 20:47:00 -0000 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, joseph@codesourcery.com, macro@codesourcery.com, pattyo.lists@gmail.com, segher@kernel.crashing.org, hainque@adacore.com, dmalcolm@redhat.com Subject: Re: PATCH [8 of 8], rs6000, add support for scalar floating point in Altivec registers Message-ID: <20141114201634.GA6247@ibm-tiger.the-meissners.org> Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, joseph@codesourcery.com, macro@codesourcery.com, pattyo.lists@gmail.com, segher@kernel.crashing.org, hainque@adacore.com, dmalcolm@redhat.com References: <20141112002113.GA1489@ibm-tiger.the-meissners.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="J2SCkAp4GZ/dPZZf" Content-Disposition: inline In-Reply-To: <20141112002113.GA1489@ibm-tiger.the-meissners.org> User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14111420-0033-0000-0000-0000010D70E5 X-IsSubscribed: yes X-SW-Source: 2014-11/txt/msg01890.txt.bz2 --J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 3141 I tracked down the regression in the spec benchmarks, and it was due to turning off pre-increment/pre-decrement for floating point values, and these two benchmarks use pre-increment/pre-decrement quite a bit. My secondary reload handlers are capable of adding in the pre-increment/pre-decrement if such an operation is attempted on an Altivec register. I am also including a patch to make the compiler work with -ffast-math. If you use -ffast-math, the easy_fp_constant predicate says that all constants are easy in order to enable using the reciprocal approximation instructions for division. I put in a define_split to move the constants to the constant pool after the reciprocal approximation work has been done but before reload starts. I had had this patch in when I was doing the development, but I thought I did not need it when making up the patches, but perhaps recent changes to the register allocator need it again. I added an option (-mupper-regs) to simplify setting both -mupper-regs-sf and -mupper-regs-df. It will only set the options that the particular machine supports. Finally, I made the default to turn on -mupper-regs-df on power7/power8 systems, and -mupper-regs-sf on power8 systems. I have run the regression test suite with these options on, and there were no regressions. Once all of the other patches go in, can I check in these patches? If you would prefer the default for GCC 5.0 not to enable the upper register support, let me know, and I can remove the lines in rs6000-cpu.def that sets the default. 2014-11-14 Michael Meissner * config/rs6000/predicates.md (memory_fp_constant): New predicate to return true if the operand is a floating point constant that must be put into the constant pool, before register allocation occurs. * config/rs6000/rs6000-cpus.def (ISA_2_6_MASKS_SERVER): Enable -mupper-regs-df by default. (ISA_2_7_MASKS_SERVER): Enable -mupper-regs-sf by default. (POWERPC_MASKS): Add -mupper-regs-{sf,df} as options set by the various -mcpu=... options. (power7 cpu): Enable -mupper-regs-df by default. * config/rs6000/rs6000.opt (-mupper-regs): New combination option that sets -mupper-regs-sf and -mupper-regs-df by default if the cpu supports the instructions. * config/rs6000/rs6000.c (rs6000_setup_reg_addr_masks): Allow pre-increment and pre-decrement on floating point, even if the -mupper-regs-{sf,df} options were used. (rs6000_option_override_internal): If -mupper-regs, set both -mupper-regs-sf and -mupper-regs-df, depending on the underlying cpu. * config/rs6000/rs6000.md (DFmode splitter): Add a define_split to move floating point constants to the constant pool before register allocation. Normally constants are put into the pool immediately, but -ffast-math delays putting them into the constant pool for the reciprocal approximation support. (SFmode splitter): Likewise. * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mupper-regs. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 --J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="gcc-power8.patch132i" Content-length: 9699 Index: gcc/config/rs6000/predicates.md =================================================================== --- gcc/config/rs6000/predicates.md (revision 217448) +++ gcc/config/rs6000/predicates.md (working copy) @@ -521,6 +521,27 @@ (define_predicate "easy_fp_constant" } }) +;; Return 1 if the operand must be loaded from memory. This is used by a +;; define_split to insure constants get pushed to the constant pool before +;; reload. If -ffast-math is used, easy_fp_constant will allow move insns to +;; have constants in order not interfere with reciprocal estimation. However, +;; with -mupper-regs support, these constants must be moved to the constant +;; pool before register allocation. + +(define_predicate "memory_fp_constant" + (match_code "const_double") +{ + if (TARGET_VSX && op == CONST0_RTX (mode)) + return 0; + + if (!TARGET_HARD_FLOAT || !TARGET_FPRS + || (mode == SFmode && !TARGET_SINGLE_FLOAT) + || (mode == DFmode && !TARGET_DOUBLE_FLOAT)) + return 0; + + return 1; +}) + ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a ;; vector register without using memory. (define_predicate "easy_vector_constant" Index: gcc/config/rs6000/rs6000-cpus.def =================================================================== --- gcc/config/rs6000/rs6000-cpus.def (revision 217448) +++ gcc/config/rs6000/rs6000-cpus.def (working copy) @@ -44,7 +44,8 @@ #define ISA_2_6_MASKS_SERVER (ISA_2_5_MASKS_SERVER \ | OPTION_MASK_POPCNTD \ | OPTION_MASK_ALTIVEC \ - | OPTION_MASK_VSX) + | OPTION_MASK_VSX \ + | OPTION_MASK_UPPER_REGS_DF) /* For now, don't provide an embedded version of ISA 2.07. */ #define ISA_2_7_MASKS_SERVER (ISA_2_6_MASKS_SERVER \ @@ -54,7 +55,8 @@ | OPTION_MASK_DIRECT_MOVE \ | OPTION_MASK_HTM \ | OPTION_MASK_QUAD_MEMORY \ - | OPTION_MASK_QUAD_MEMORY_ATOMIC) + | OPTION_MASK_QUAD_MEMORY_ATOMIC \ + | OPTION_MASK_UPPER_REGS_SF) #define POWERPC_7400_MASK (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC) @@ -94,6 +96,8 @@ | OPTION_MASK_RECIP_PRECISION \ | OPTION_MASK_SOFT_FLOAT \ | OPTION_MASK_STRICT_ALIGN_OPTIONAL \ + | OPTION_MASK_UPPER_REGS_DF \ + | OPTION_MASK_UPPER_REGS_SF \ | OPTION_MASK_VSX \ | OPTION_MASK_VSX_TIMODE) @@ -184,7 +188,7 @@ RS6000_CPU ("power6x", PROCESSOR_POWER6, RS6000_CPU ("power7", PROCESSOR_POWER7, /* Don't add MASK_ISEL by default */ POWERPC_7400_MASK | MASK_POWERPC64 | MASK_PPC_GPOPT | MASK_MFCRF | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD - | MASK_VSX | MASK_RECIP_PRECISION) + | MASK_VSX | MASK_RECIP_PRECISION | OPTION_MASK_UPPER_REGS_DF) RS6000_CPU ("power8", PROCESSOR_POWER8, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER) RS6000_CPU ("powerpc", PROCESSOR_POWERPC, 0) RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, MASK_PPC_GFXOPT | MASK_POWERPC64) Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 217448) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -589,6 +589,10 @@ mupper-regs-sf Target Report Mask(UPPER_REGS_SF) Var(rs6000_isa_flags) Allow float variables in upper registers with -mcpu=power8 or -mpower8-vector +mupper-regs +Target Report Var(TARGET_UPPER_REGS) Init(-1) Save +Allow float/double variables in upper registers if cpu allows it + moptimize-swaps Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save Analyze and remove doubleword swaps from VSX computations. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 217448) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -2462,9 +2462,7 @@ rs6000_setup_reg_addr_masks (void) /* Figure out if we can do PRE_INC, PRE_DEC, or PRE_MODIFY addressing. Restrict addressing on SPE for 64-bit types because of the SUBREG hackery used to address 64-bit floats in - '32-bit' GPRs. To simplify secondary reload, don't allow - update forms on scalar floating point types that can go in the - upper registers. */ + '32-bit' GPRs. */ if (TARGET_UPDATE && (rc == RELOAD_REG_GPR || rc == RELOAD_REG_FPR) @@ -2472,8 +2470,7 @@ rs6000_setup_reg_addr_masks (void) && !VECTOR_MODE_P (m2) && !COMPLEX_MODE_P (m2) && !indexed_only_p - && !(TARGET_E500_DOUBLE && GET_MODE_SIZE (m2) == 8) - && !reg_addr[m2].scalar_in_vmx_p) + && !(TARGET_E500_DOUBLE && GET_MODE_SIZE (m2) == 8)) { addr_mask |= RELOAD_REG_PRE_INCDEC; @@ -3509,6 +3506,40 @@ rs6000_option_override_internal (bool gl rs6000_isa_flags &= ~OPTION_MASK_DFP; } + /* Allow an explicit -mupper-regs to set both -mupper-regs-df and + -mupper-regs-sf, depending on the cpu, unless the user explicitly also set + the individual option. */ + if (TARGET_UPPER_REGS > 0) + { + if (TARGET_VSX + && !(rs6000_isa_flags_explicit & OPTION_MASK_UPPER_REGS_DF)) + { + rs6000_isa_flags |= OPTION_MASK_UPPER_REGS_DF; + rs6000_isa_flags_explicit |= OPTION_MASK_UPPER_REGS_DF; + } + if (TARGET_P8_VECTOR + && !(rs6000_isa_flags_explicit & OPTION_MASK_UPPER_REGS_SF)) + { + rs6000_isa_flags |= OPTION_MASK_UPPER_REGS_SF; + rs6000_isa_flags_explicit |= OPTION_MASK_UPPER_REGS_SF; + } + } + else if (TARGET_UPPER_REGS == 0) + { + if (TARGET_VSX + && !(rs6000_isa_flags_explicit & OPTION_MASK_UPPER_REGS_DF)) + { + rs6000_isa_flags &= ~OPTION_MASK_UPPER_REGS_DF; + rs6000_isa_flags_explicit |= OPTION_MASK_UPPER_REGS_DF; + } + if (TARGET_P8_VECTOR + && !(rs6000_isa_flags_explicit & OPTION_MASK_UPPER_REGS_SF)) + { + rs6000_isa_flags &= ~OPTION_MASK_UPPER_REGS_SF; + rs6000_isa_flags_explicit |= OPTION_MASK_UPPER_REGS_SF; + } + } + if (TARGET_UPPER_REGS_DF && !TARGET_VSX) { if (rs6000_isa_flags_explicit & OPTION_MASK_UPPER_REGS_DF) Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 217448) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -8137,6 +8137,21 @@ (define_insn_and_split "*mov_softf { rs6000_split_multireg_move (operands[0], operands[1]); DONE; } [(set_attr "length" "20,20,16")]) +;; If we are using -ffast-math, easy_fp_constant assumes all constants are +;; 'easy' in order to allow for reciprocal estimation. Make sure the constant +;; is in the constant pool before reload occurs. This simplifies accessing +;; scalars in the traditional Altivec registers. + +(define_split + [(set (match_operand:SFDF 0 "register_operand" "") + (match_operand:SFDF 1 "memory_fp_constant" ""))] + "TARGET__FPR && flag_unsafe_math_optimizations + && !reload_in_progress && !reload_completed && !lra_in_progress" + [(set (match_dup 0) (match_dup 2))] +{ + operands[2] = validize_mem (force_const_mem (mode, operands[1])); +}) + (define_expand "extenddftf2" [(set (match_operand:TF 0 "nonimmediate_operand" "") (float_extend:TF (match_operand:DF 1 "input_operand" "")))] Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 217448) +++ gcc/doc/invoke.texi (working copy) @@ -940,7 +940,8 @@ See RS/6000 and PowerPC Options. -mquad-memory -mno-quad-memory @gol -mquad-memory-atomic -mno-quad-memory-atomic @gol -mcompat-align-parm -mno-compat-align-parm @gol --mupper-regs-df -mno-upper-regs-df -mupper-regs-sf -mno-upper-regs-sf} +-mupper-regs-df -mno-upper-regs-df -mupper-regs-sf -mno-upper-regs-sf @gol +-mupper-regs -mno-upper-regs} @emph{RX Options} @gccoptlist{-m64bit-doubles -m32bit-doubles -fpu -nofpu@gol @@ -19691,10 +19692,9 @@ instructions. The @option{-mquad-memory Generate code that uses (does not use) the scalar double precision instructions that target all 64 registers in the vector/scalar floating point register set that were added in version 2.06 of the -PowerPC ISA. If @option{-mupper-regs-df} is not set, the traditional -floating instructions will be generated that target the first 32 -registers. This option requires the @option{-mvsx}, -@option{-mcpu=power7}, or @option{-mcpu=power8} options to be set. +PowerPC ISA. The @option{-mupper-regs-df} turned on by default if you +use either of the @option{-mcpu=power7}, @option{-mcpu=power8}, or +@option{-mvsx} options. @item -mupper-regs-sf @itemx -mno-upper-regs-sf @@ -19703,10 +19703,20 @@ registers. This option requires the @op Generate code that uses (does not use) the scalar single precision instructions that target all 64 registers in the vector/scalar floating point register set that were added in version 2.07 of the -PowerPC ISA. If @option{-mupper-regs-sf} is not set, the traditional -floating instructions will be generated that target the first 32 -registers. This option requires the @option{-mpower8-vector}, -@option{-mcpu=power7}, or @option{-mcpu=power8} options to be set. +PowerPC ISA. The @option{-mupper-regs-sf} turned on by default if you +use either of the @option{-mcpu=power8}, or @option{-mpower8-vector} +options. + +@item -mupper-regs +@itemx -mno-upper-regs +@opindex mupper-regs +@opindex mno-upper-regs +Generate code that uses (does not use) the scalar +instructions that target all 64 registers in the vector/scalar +floating point register set, depending on the model of the machine. + +If the @option{-mno-upper-regs} option was used, it will turn off both +@option{-mupper-regs-sf} and @option{-mupper-regs-df} options. @item -mfloat-gprs=@var{yes/single/double/no} @itemx -mfloat-gprs --J2SCkAp4GZ/dPZZf--