public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
@ 2018-02-06  0:07 Peter Bergner
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Bergner @ 2018-02-06  0:07 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool, Bill Schmidt, will_schmidt

PR83926 shows a problem in expanding the _builtin_vsx_{div,udiv,mul}_2di
builtins in 32-bit mode.  The problem is that the splitters for the
patterns explicitly call gen_{div,udiv,mul}di3 patterns, even in 32-bit
mode and those patterns assume the associated 64-bit HW instructions exist
when they do not.  The "fix" implemented here is to modify gen_{div,udiv,mul}
to catch the case when we have DImode operands in 32-bit mode (and not using
-mpowerpc64) and do the right thing.  In the case of gen_{div,udiv}di3, that
means calling their lib functions and for gen_muldi3, we call expand_mult()
which emits code that does the 64-bit multiply.

This passes bootstrap and regtesting on powerpc64le-linux, as well as on
powerpc64-linux (running the testsuite in both 32-bit and 64-bit modes).
Ok for trunk?

Peter

gcc/
	PR target/83926
	* config/rs6000/rs6000.md (*mul<mode>3): Rename to this...
	(mul<mode>3): ...from this.  Declare new define_expand.
	(*udiv<mode>3): Rename to this...
	(udiv<mode>3): ...from this.  Declare new define_expand.
	(div<mode>3): Handle DImode operands in 32-bit mode.

gcc/testsuite/
	PR target/83926
	* gcc.target/powerpc/pr83926.c: New test.

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 257390)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -2872,8 +2872,21 @@
   DONE;
 }")
 
+(define_expand "mul<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
+	(mult:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
+		  (match_operand:GPR 2 "reg_or_short_operand" "")))]
+  ""
+{
+  if (<MODE>mode == DImode && !TARGET_POWERPC64)
+    {
+      rtx ret = expand_mult (DImode, operands[1], operands[2], NULL, 0, false);
+      emit_move_insn (operands[0], ret);    
+      DONE;
+    }
+})
 
-(define_insn "mul<mode>3"
+(define_insn "*mul<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r")
 	(mult:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,r")
 		  (match_operand:GPR 2 "reg_or_short_operand" "r,I")))]
@@ -3040,7 +3053,25 @@
   "maddld %0,%1,%2,%3"
   [(set_attr "type" "mul")])
 
-(define_insn "udiv<mode>3"
+(define_expand "udiv<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
+	(udiv:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
+		  (match_operand:GPR 2 "gpc_reg_operand" "")))]
+  ""
+{
+  if (<MODE>mode == DImode && !TARGET_POWERPC64)
+    {
+      rtx libfunc = optab_libfunc (udiv_optab, <MODE>mode);
+      rtx target = emit_library_call_value (libfunc,
+					    operands[0], LCT_NORMAL, <MODE>mode,
+					    operands[1], <MODE>mode,
+					    operands[2], <MODE>mode);
+      emit_move_insn (operands[0], target);    
+      DONE;
+    }
+})
+
+(define_insn "*udiv<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
         (udiv:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
 		  (match_operand:GPR 2 "gpc_reg_operand" "r")))]
@@ -3066,7 +3097,16 @@
       emit_insn (gen_div<mode>3_sra (operands[0], operands[1], operands[2]));
       DONE;
     }
-
+  else if (<MODE>mode == DImode && !TARGET_POWERPC64)
+    {
+      rtx libfunc = optab_libfunc (sdiv_optab, <MODE>mode);
+      rtx target = emit_library_call_value (libfunc,
+					    operands[0], LCT_NORMAL, <MODE>mode,
+					    operands[1], <MODE>mode,
+					    operands[2], <MODE>mode);
+      emit_move_insn (operands[0], target);    
+      DONE;
+    }
   operands[2] = force_reg (<MODE>mode, operands[2]);
 })
 
Index: gcc/testsuite/gcc.target/powerpc/pr83926.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr83926.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr83926.c	(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-O2 -mcpu=power8 -mno-fold-gimple" } */
+
+__attribute__ ((altivec(vector__))) long long
+sdiv (__attribute__ ((altivec(vector__))) long long a,
+      __attribute__ ((altivec(vector__))) long long b)
+{
+  return __builtin_vsx_div_2di (a, b);
+}
+__attribute__ ((altivec(vector__))) unsigned long long
+udiv (__attribute__ ((altivec(vector__))) unsigned long long a,
+      __attribute__ ((altivec(vector__))) unsigned long long b)
+{
+  return __builtin_vsx_udiv_2di (a, b);
+}
+__attribute__ ((altivec(vector__))) long long
+smul (__attribute__ ((altivec(vector__))) long long a,
+      __attribute__ ((altivec(vector__))) long long b)
+{
+  return __builtin_vsx_mul_2di (a, b);
+}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-09 16:51                 ` Segher Boessenkool
@ 2018-02-09 17:09                   ` Peter Bergner
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Bergner @ 2018-02-09 17:09 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Edelsohn, GCC Patches, William J. Schmidt, Will Schmidt

On 2/9/18 10:51 AM, Segher Boessenkool wrote:
>> +/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */
>> +/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */

This was a typo in the original test case I didn't notice.  Looking closer,
they meant one of these to be vclzd.  I'll fix that, thanks.

Peter


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-08 18:28               ` Peter Bergner
@ 2018-02-09 16:51                 ` Segher Boessenkool
  2018-02-09 17:09                   ` Peter Bergner
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2018-02-09 16:51 UTC (permalink / raw)
  To: Peter Bergner
  Cc: David Edelsohn, GCC Patches, William J. Schmidt, Will Schmidt

Hi!

On Thu, Feb 08, 2018 at 12:28:05PM -0600, Peter Bergner wrote:
> On 2/8/18 10:38 AM, Peter Bergner wrote:
> > 	* gcc.target/powerpc/builtins-1-be.c: Filter out gimple folding disabled
> > 	message.  Fix test for running in 32-bit mode.
> 
> As we talked about offline, here's a bigger change to builtins-1-be.c that
> cleans up the test a little more, since we generate xxlor in more cases
> than just the __builtin_vec_or() call, so this change adds the -dp option
> and we match the pattern name to verify we are getting as many as we expect
> from that and that alone.  This also splits the xxland and xxlandc into
> their own matches, which match the source test cases use of vec_and() and
> vec_andc().

> +/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */

Duplicate line.

Fine otherwise, thanks!


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-09 16:17               ` Segher Boessenkool
@ 2018-02-09 16:39                 ` Peter Bergner
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Bergner @ 2018-02-09 16:39 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Edelsohn, GCC Patches, William J. Schmidt, Will Schmidt

On 2/9/18 10:17 AM, Segher Boessenkool wrote:
> On Thu, Feb 08, 2018 at 10:38:09AM -0600, Peter Bergner wrote:
>> -/* { dg-final { scan-assembler-times "divd" 4 } } */
> 
>> +/* { dg-final { scan-assembler-times {\mdivd\M} 2 { target lp64 } } } */
> 
>> +/* { dg-final { scan-assembler-times {\mbl __divdi3\M} 2 { target ilp32 } } } */
> 
> Why does divd change from 4 to 2 times?  Is it changed to muls?

No, it's because "divd" was matching both divd and divdu insns, so it was
overcounting before.  Now that we use \m and \M word delimiters, we don't.
There are only 2 vec_div() and 2 vec_udiv() calls, so we should only expect
2 for both.


> Okay for trunk.  Thanks!

Thanks, committed.

Peter


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-08 16:38             ` Peter Bergner
  2018-02-08 18:28               ` Peter Bergner
@ 2018-02-09 16:17               ` Segher Boessenkool
  2018-02-09 16:39                 ` Peter Bergner
  1 sibling, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2018-02-09 16:17 UTC (permalink / raw)
  To: Peter Bergner
  Cc: David Edelsohn, GCC Patches, William J. Schmidt, Will Schmidt

On Thu, Feb 08, 2018 at 10:38:09AM -0600, Peter Bergner wrote:
> On 2/6/18 10:36 AM, Peter Bergner wrote:
> > On 2/6/18 10:20 AM, David Edelsohn wrote:
> >> Do the gen_XXXdi3 calls work if you use SDI iterator instead of GPR
> >> iterator, as Segher suggested?
> > 
> > Well it works _if_ we use the first patch that changes the gen_*
> > patterns.  If we go this route, I agree we should use the SDI
> > iterator instead of GPR.
> 
> Actually, my bad.  While bootstrapping this on a BE system, we get an
> error when we attempt a 64-bit multiply in 32-bit mode.  In this case,
> the gen_muldi3() pattern calls expand_mult(DImode, ...) and the automatic
> expand machinery notices the gen_muldi3() now allows DImode in the
> !TARGET_POWERPC64 case and then calls gen_muldi3() to emit the multiply
> and we go into infinite recursion.  We don't have that problem in the
> div/udiv case, because we call out to the lib routines, so no recursion.
> Given this, I think we should probably go with the patch that modifies
> vsx.md and guards the calls to gen_{div,udiv,mul}di3() with a TARGET_POWERPC64
> test.

Okay, let's go with this, at least for now.  Thanks for investigating.

> -/* { dg-final { scan-assembler-times "divd" 4 } } */

> +/* { dg-final { scan-assembler-times {\mdivd\M} 2 { target lp64 } } } */

> +/* { dg-final { scan-assembler-times {\mbl __divdi3\M} 2 { target ilp32 } } } */

Why does divd change from 4 to 2 times?  Is it changed to muls?

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-08 16:38             ` Peter Bergner
@ 2018-02-08 18:28               ` Peter Bergner
  2018-02-09 16:51                 ` Segher Boessenkool
  2018-02-09 16:17               ` Segher Boessenkool
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Bergner @ 2018-02-08 18:28 UTC (permalink / raw)
  To: David Edelsohn, Segher Boessenkool
  Cc: GCC Patches, William J. Schmidt, Will Schmidt

On 2/8/18 10:38 AM, Peter Bergner wrote:
> 	* gcc.target/powerpc/builtins-1-be.c: Filter out gimple folding disabled
> 	message.  Fix test for running in 32-bit mode.

As we talked about offline, here's a bigger change to builtins-1-be.c that
cleans up the test a little more, since we generate xxlor in more cases
than just the __builtin_vec_or() call, so this change adds the -dp option
and we match the pattern name to verify we are getting as many as we expect
from that and that alone.  This also splits the xxland and xxlandc into
their own matches, which match the source test cases use of vec_and() and
vec_andc().

Peter


Index: gcc/testsuite/gcc.target/powerpc/builtins-1-be.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/builtins-1-be.c	(revision 257390)
+++ gcc/testsuite/gcc.target/powerpc/builtins-1-be.c	(working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc64-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
-/* { dg-options "-mcpu=power8 -O0 -mno-fold-gimple" } */
+/* { dg-options "-mcpu=power8 -O0 -mno-fold-gimple -dp" } */
+/* { dg-prune-output "gimple folding of rs6000 builtins has been disabled." } */
 
 /* Test that a number of newly added builtin overloads are accepted
    by the compiler.  */
@@ -22,10 +23,10 @@
    vec_ctf    xvmuldp 
    vec_cts xvcvdpsxds, vctsxs
    vec_ctu   xvcvdpuxds, vctuxs
-   vec_div   divd, divdu
+   vec_div   divd, divdu | __divdi3(), __udivdi3()
    vec_mergel vmrghb, vmrghh, xxmrghw
    vec_mergeh  xxmrglw, vmrglh
-   vec_mul mulld
+   vec_mul mulld | mullw, mulhwu
    vec_nor xxlnor
    vec_or xxlor
    vec_packsu vpksdus
@@ -36,34 +37,39 @@
    vec_rsqrt  xvrsqrtesp
    vec_rsqrte xvrsqrtesp  */
 
-/* { dg-final { scan-assembler-times "vcmpequd." 4 } } */
-/* { dg-final { scan-assembler-times "vcmpgtud." 8 } } */
-/* { dg-final { scan-assembler-times "xxland" 29 } } */
-/* { dg-final { scan-assembler-times "vclzb" 2 } } */
-/* { dg-final { scan-assembler-times "vclzb" 2 } } */
-/* { dg-final { scan-assembler-times "vclzw" 2 } } */
-/* { dg-final { scan-assembler-times "vclzh" 2 } } */
-/* { dg-final { scan-assembler-times "xvcpsgnsp" 1 } } */
-/* { dg-final { scan-assembler-times "xvmuldp" 6 } } */
-/* { dg-final { scan-assembler-times "xvcvdpsxds" 1 } } */
-/* { dg-final { scan-assembler-times "vctsxs" 1 } } */
-/* { dg-final { scan-assembler-times "xvcvdpuxds" 1 } } */
-/* { dg-final { scan-assembler-times "vctuxs" 1 } } */
-/* { dg-final { scan-assembler-times "divd" 4 } } */
-/* { dg-final { scan-assembler-times "divdu" 2 } } */
-/* { dg-final { scan-assembler-times "vmrghb" 0 } } */
-/* { dg-final { scan-assembler-times "vmrghh" 3 } } */
-/* { dg-final { scan-assembler-times "xxmrghw" 1 } } */
-/* { dg-final { scan-assembler-times "xxmrglw" 4 } } */
-/* { dg-final { scan-assembler-times "vmrglh" 4 } } */
-/* { dg-final { scan-assembler-times "mulld" 4 } } */
-/* { dg-final { scan-assembler-times "xxlnor" 19 } } */
-/* { dg-final { scan-assembler-times "xxlor" 14 } } */
-/* { dg-final { scan-assembler-times "vpksdus" 1 } } */
-/* { dg-final { scan-assembler-times "vperm" 2 } } */
-/* { dg-final { scan-assembler-times "xvrdpi" 1 } } */
-/* { dg-final { scan-assembler-times "xxsel" 6 } } */
-/* { dg-final { scan-assembler-times "xxlxor" 6 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequd\M\.} 4 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtud\M\.} 8 } } */
+/* { dg-final { scan-assembler-times {\mxxland\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mxxlandc\M} 13 } } */
+/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvclzw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvclzh\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxvcpsgnsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmuldp\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxvcvdpsxds\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvctsxs\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvcvdpuxds\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvctuxs\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmrghb\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvmrghh\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mxxmrghw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxmrglw\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvmrglh\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M[^\n]*\mboolv4si3_internal\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mvpksdus\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvperm\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxvrdpi\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxsel\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxlxor\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mdivd\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mdivdu\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mmulld\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mbl __divdi3\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mbl __udivdi3\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mmullw\M} 12 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mmulhwu\M} 4 { target ilp32 } } } */
 
 /* The source code for the test is in builtins-1.h.  */
 #include "builtins-1.h"

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-06 16:36           ` Peter Bergner
  2018-02-06 17:40             ` Segher Boessenkool
@ 2018-02-08 16:38             ` Peter Bergner
  2018-02-08 18:28               ` Peter Bergner
  2018-02-09 16:17               ` Segher Boessenkool
  1 sibling, 2 replies; 17+ messages in thread
From: Peter Bergner @ 2018-02-08 16:38 UTC (permalink / raw)
  To: David Edelsohn, Segher Boessenkool
  Cc: GCC Patches, William J. Schmidt, Will Schmidt

On 2/6/18 10:36 AM, Peter Bergner wrote:
> On 2/6/18 10:20 AM, David Edelsohn wrote:
>> Do the gen_XXXdi3 calls work if you use SDI iterator instead of GPR
>> iterator, as Segher suggested?
> 
> Well it works _if_ we use the first patch that changes the gen_*
> patterns.  If we go this route, I agree we should use the SDI
> iterator instead of GPR.

Actually, my bad.  While bootstrapping this on a BE system, we get an
error when we attempt a 64-bit multiply in 32-bit mode.  In this case,
the gen_muldi3() pattern calls expand_mult(DImode, ...) and the automatic
expand machinery notices the gen_muldi3() now allows DImode in the
!TARGET_POWERPC64 case and then calls gen_muldi3() to emit the multiply
and we go into infinite recursion.  We don't have that problem in the
div/udiv case, because we call out to the lib routines, so no recursion.
Given this, I think we should probably go with the patch that modifies
vsx.md and guards the calls to gen_{div,udiv,mul}di3() with a TARGET_POWERPC64
test.

For completeness, that patch again is below with one testsuite addition.
The builtins-1-be.c test case must never have been tested in 32-bit mode,
since it was always ICEing from the beginning.  I've fixed it to run in
both 32-bit and 64-bit modes and in 32-bit mode, it now correctly scans
for the 64-bit div/udiv/mul cases this patch generates.

Again, this passed bootstrap and regtesting on powerpc64le-linux as well
as on powerpc64-linux and running the testsuite in both 32-bit and 64-bit
modes.  Ok for trunk?

Peter


gcc/
	PR target/83926
	* config/rs6000/vsx.md (vsx_mul_v2di): Handle generating a 64-bit
	multiply in 32-bit mode.
	(vsx_div_v2di): Handle generating a 64-bit signed divide in 32-bit mode.
	(vsx_udiv_v2di): Handle generating a 64-bit unsigned divide in 32-bit
	mode.

gcc/testsuite/
	PR target/83926
	* gcc.target/powerpc/pr83926.c: New test.
	* gcc.target/powerpc/builtins-1-be.c: Filter out gimple folding disabled
	message.  Fix test for running in 32-bit mode.

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 257390)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -1650,10 +1650,22 @@
   rtx op5 = gen_reg_rtx (DImode);
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  emit_insn (gen_muldi3 (op5, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_muldi3 (op5, op3, op4));
+  else
+    {
+      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+      emit_move_insn (op5, ret);
+    }
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  emit_insn (gen_muldi3 (op3, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_muldi3 (op3, op3, op4));
+  else
+    {
+      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+      emit_move_insn (op3, ret);
+    }
   emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }"
@@ -1688,10 +1700,30 @@
   rtx op5 = gen_reg_rtx (DImode);
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  emit_insn (gen_divdi3 (op5, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_divdi3 (op5, op3, op4));
+  else
+    {
+      rtx libfunc = optab_libfunc (sdiv_optab, DImode);
+      rtx target = emit_library_call_value (libfunc,
+					    op5, LCT_NORMAL, DImode,
+					    op3, DImode,
+					    op4, DImode);
+      emit_move_insn (op5, target);
+    }
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  emit_insn (gen_divdi3 (op3, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_divdi3 (op3, op3, op4));
+  else
+    {
+      rtx libfunc = optab_libfunc (sdiv_optab, DImode);
+      rtx target = emit_library_call_value (libfunc,
+					    op3, LCT_NORMAL, DImode,
+					    op3, DImode,
+					    op4, DImode);
+      emit_move_insn (op3, target);
+    }
   emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }"
@@ -1716,10 +1748,30 @@
   rtx op5 = gen_reg_rtx (DImode);
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  emit_insn (gen_udivdi3 (op5, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_udivdi3 (op5, op3, op4));
+  else
+    {
+      rtx libfunc = optab_libfunc (udiv_optab, DImode);
+      rtx target = emit_library_call_value (libfunc,
+					    op5, LCT_NORMAL, DImode,
+					    op3, DImode,
+					    op4, DImode);
+      emit_move_insn (op5, target);
+    }
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  emit_insn (gen_udivdi3 (op3, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_udivdi3 (op3, op3, op4));
+  else
+    {
+      rtx libfunc = optab_libfunc (udiv_optab, DImode);
+      rtx target = emit_library_call_value (libfunc,
+					    op3, LCT_NORMAL, DImode,
+					    op3, DImode,
+					    op4, DImode);
+      emit_move_insn (op3, target);
+    }
   emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }"
Index: gcc/testsuite/gcc.target/powerpc/pr83926.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr83926.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr83926.c	(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-O2 -mcpu=power8 -mno-fold-gimple" } */
+
+__attribute__ ((altivec(vector__))) long long
+sdiv (__attribute__ ((altivec(vector__))) long long a,
+      __attribute__ ((altivec(vector__))) long long b)
+{
+  return __builtin_vsx_div_2di (a, b);
+}
+__attribute__ ((altivec(vector__))) unsigned long long
+udiv (__attribute__ ((altivec(vector__))) unsigned long long a,
+      __attribute__ ((altivec(vector__))) unsigned long long b)
+{
+  return __builtin_vsx_udiv_2di (a, b);
+}
+__attribute__ ((altivec(vector__))) long long
+smul (__attribute__ ((altivec(vector__))) long long a,
+      __attribute__ ((altivec(vector__))) long long b)
+{
+  return __builtin_vsx_mul_2di (a, b);
+}
Index: gcc/testsuite/gcc.target/powerpc/builtins-1-be.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/builtins-1-be.c	(revision 257390)
+++ gcc/testsuite/gcc.target/powerpc/builtins-1-be.c	(working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc64-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O0 -mno-fold-gimple" } */
+/* { dg-prune-output "gimple folding of rs6000 builtins has been disabled." } */
 
 /* Test that a number of newly added builtin overloads are accepted
    by the compiler.  */
@@ -22,10 +23,10 @@
    vec_ctf    xvmuldp 
    vec_cts xvcvdpsxds, vctsxs
    vec_ctu   xvcvdpuxds, vctuxs
-   vec_div   divd, divdu
+   vec_div   divd, divdu | __divdi3(), __udivdi3()
    vec_mergel vmrghb, vmrghh, xxmrghw
    vec_mergeh  xxmrglw, vmrglh
-   vec_mul mulld
+   vec_mul mulld | mullw, mulhwu
    vec_nor xxlnor
    vec_or xxlor
    vec_packsu vpksdus
@@ -49,21 +50,26 @@
 /* { dg-final { scan-assembler-times "vctsxs" 1 } } */
 /* { dg-final { scan-assembler-times "xvcvdpuxds" 1 } } */
 /* { dg-final { scan-assembler-times "vctuxs" 1 } } */
-/* { dg-final { scan-assembler-times "divd" 4 } } */
-/* { dg-final { scan-assembler-times "divdu" 2 } } */
 /* { dg-final { scan-assembler-times "vmrghb" 0 } } */
 /* { dg-final { scan-assembler-times "vmrghh" 3 } } */
 /* { dg-final { scan-assembler-times "xxmrghw" 1 } } */
 /* { dg-final { scan-assembler-times "xxmrglw" 4 } } */
 /* { dg-final { scan-assembler-times "vmrglh" 4 } } */
-/* { dg-final { scan-assembler-times "mulld" 4 } } */
-/* { dg-final { scan-assembler-times "xxlnor" 19 } } */
-/* { dg-final { scan-assembler-times "xxlor" 14 } } */
+/* { dg-final { scan-assembler-times "xxlnor" 6 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M} 8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M} 11 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "vpksdus" 1 } } */
 /* { dg-final { scan-assembler-times "vperm" 2 } } */
 /* { dg-final { scan-assembler-times "xvrdpi" 1 } } */
 /* { dg-final { scan-assembler-times "xxsel" 6 } } */
 /* { dg-final { scan-assembler-times "xxlxor" 6 } } */
+/* { dg-final { scan-assembler-times {\mdivd\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mdivdu\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mmulld\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mbl __divdi3\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mbl __udivdi3\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mmullw\M} 12 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mmulhwu\M} 4 { target ilp32 } } } */
 
 /* The source code for the test is in builtins-1.h.  */
 #include "builtins-1.h"

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-06 17:40             ` Segher Boessenkool
@ 2018-02-06 19:55               ` Peter Bergner
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Bergner @ 2018-02-06 19:55 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Edelsohn, GCC Patches, William J. Schmidt, Will Schmidt

On 2/6/18 11:39 AM, Segher Boessenkool wrote:
> Yes, if you make [u]divdi3 work for SDI (i.e. also for -m32) we'll never
> have to special-case it again (also [u]moddi3?)

Maybe, but there are no explicit calls to those, so I left them
as is, since if we touch those, then we probably need to look at
almost all logical/arithmetic patterns...and that is a lot of work
not needed to fix this specific issue.


> Yes, if you make [u]divdi3 work for SDI (i.e. also for -m32) we'll never
> have to special-case it again (also [u]moddi3?)

Heh, looking at "mod<mode>3", I see it calls gen_div<mode>3(), but since
there are no explicit calls to gen_moddi3(), we don't really have a problem.


> But this then also will be used by expand, when generation a DImode
> divide.  Does it generate code at least as good as what the generic code
> generates?  It probably will, but check please.

The change from GPR to SDI iterator and my change to explicitly call
the __divdi3() lib function for DImode divides ends up generating the
exact same code generated for the following test case that we get with
the non patched compiler:

  long long
  foo (long long a, long long b)
  {
    return a / b;
  }

Peter

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-06 16:36           ` Peter Bergner
@ 2018-02-06 17:40             ` Segher Boessenkool
  2018-02-06 19:55               ` Peter Bergner
  2018-02-08 16:38             ` Peter Bergner
  1 sibling, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2018-02-06 17:40 UTC (permalink / raw)
  To: Peter Bergner
  Cc: David Edelsohn, GCC Patches, William J. Schmidt, Will Schmidt

On Tue, Feb 06, 2018 at 10:36:41AM -0600, Peter Bergner wrote:
> On 2/6/18 10:20 AM, David Edelsohn wrote:
> > Do the gen_XXXdi3 calls work if you use SDI iterator instead of GPR
> > iterator, as Segher suggested?
> 
> Well it works _if_ we use the first patch that changes the gen_*
> patterns.  If we go this route, I agree we should use the SDI
> iterator instead of GPR.
> 
> 
> > Otherwise, this seems like the more correct approach to not conflict
> > with the semantics expected by the patterns.
> 
> It's up to you and Segher which patch you think is cleaner/more preferable.
> The benefit of the gen_* patch is that if any code added in the future
> calls those gen_* routines, then they'll work with no changes.  Otherwise,
> the new code would have to do something similar to this latest patch.
> Kind of a "six of one, half dozen of the other" sort of thing.
> I'm fine either way.

Yes, if you make [u]divdi3 work for SDI (i.e. also for -m32) we'll never
have to special-case it again (also [u]moddi3?)

But this then also will be used by expand, when generation a DImode
divide.  Does it generate code at least as good as what the generic code
generates?  It probably will, but check please.


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-06 16:20         ` David Edelsohn
@ 2018-02-06 16:36           ` Peter Bergner
  2018-02-06 17:40             ` Segher Boessenkool
  2018-02-08 16:38             ` Peter Bergner
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Bergner @ 2018-02-06 16:36 UTC (permalink / raw)
  To: David Edelsohn
  Cc: GCC Patches, Segher Boessenkool, William J. Schmidt, Will Schmidt

On 2/6/18 10:20 AM, David Edelsohn wrote:
> Do the gen_XXXdi3 calls work if you use SDI iterator instead of GPR
> iterator, as Segher suggested?

Well it works _if_ we use the first patch that changes the gen_*
patterns.  If we go this route, I agree we should use the SDI
iterator instead of GPR.


> Otherwise, this seems like the more correct approach to not conflict
> with the semantics expected by the patterns.

It's up to you and Segher which patch you think is cleaner/more preferable.
The benefit of the gen_* patch is that if any code added in the future
calls those gen_* routines, then they'll work with no changes.  Otherwise,
the new code would have to do something similar to this latest patch.
Kind of a "six of one, half dozen of the other" sort of thing.
I'm fine either way.

Peter

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-06 15:43       ` Peter Bergner
@ 2018-02-06 16:20         ` David Edelsohn
  2018-02-06 16:36           ` Peter Bergner
  0 siblings, 1 reply; 17+ messages in thread
From: David Edelsohn @ 2018-02-06 16:20 UTC (permalink / raw)
  To: Peter Bergner
  Cc: GCC Patches, Segher Boessenkool, William J. Schmidt, Will Schmidt

On Tue, Feb 6, 2018 at 10:43 AM, Peter Bergner <bergner@vnet.ibm.com> wrote:
> On 2/6/18 6:42 AM, Peter Bergner wrote:
>> On 2/5/18 10:48 PM, David Edelsohn wrote:
>>> On Mon, Feb 5, 2018 at 9:43 PM, Peter Bergner <bergner@vnet.ibm.com> wrote:
>>>> I did also try calling expand_divmod() here which did generate correct
>>>> code, the problem was that it wasn't as clean/optimized as the change
>>>> to gen_divdi3.
>>>
>>> Why not fix it at the site of the call to gen_divdi3 instead of the
>>> divdi3 pattern?
>>
>> Well as I said above, I did try that and we got worse code.  That said,
>> I unconditionally called expand_divmod() instead of calling gen_divdi3()
>> when we can (TARGET_POWERPC64).  Let me retry with that change and see
>> what kind of code gen we get.
>
> Ok, calling expand_divmod() in the !TARGET_POWERPC64 case still generates
> worse code.  However, if I instead explicitly generate the call to the
> div/udiv lib functions, then I seem to get the same code as the gen_*
> patch gave...at least for my unit tests I've been using.  Let me
> bootstrap/regtest the following, which I assume you're more happy with?

Do the gen_XXXdi3 calls work if you use SDI iterator instead of GPR
iterator, as Segher suggested?

Otherwise, this seems like the more correct approach to not conflict
with the semantics expected by the patterns.

We'll see what Segher has to say.

Thanks, David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-06 12:42     ` Peter Bergner
  2018-02-06 13:21       ` Segher Boessenkool
@ 2018-02-06 15:43       ` Peter Bergner
  2018-02-06 16:20         ` David Edelsohn
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Bergner @ 2018-02-06 15:43 UTC (permalink / raw)
  To: David Edelsohn
  Cc: GCC Patches, Segher Boessenkool, William J. Schmidt, Will Schmidt

On 2/6/18 6:42 AM, Peter Bergner wrote:
> On 2/5/18 10:48 PM, David Edelsohn wrote:
>> On Mon, Feb 5, 2018 at 9:43 PM, Peter Bergner <bergner@vnet.ibm.com> wrote:
>>> I did also try calling expand_divmod() here which did generate correct
>>> code, the problem was that it wasn't as clean/optimized as the change
>>> to gen_divdi3.
>>
>> Why not fix it at the site of the call to gen_divdi3 instead of the
>> divdi3 pattern?
> 
> Well as I said above, I did try that and we got worse code.  That said,
> I unconditionally called expand_divmod() instead of calling gen_divdi3()
> when we can (TARGET_POWERPC64).  Let me retry with that change and see
> what kind of code gen we get.

Ok, calling expand_divmod() in the !TARGET_POWERPC64 case still generates
worse code.  However, if I instead explicitly generate the call to the
div/udiv lib functions, then I seem to get the same code as the gen_*
patch gave...at least for my unit tests I've been using.  Let me
bootstrap/regtest the following, which I assume you're more happy with?

Peter

gcc/
        PR target/83926
        * config/rs6000/vsx.md (vsx_mul_v2di): Handle generating a 64-bit
	multiply in 32-bit mode.
	(vsx_div_v2di): Handle generating a 64-bit signed divide in 32-bit mode.
	(vsx_udiv_v2di): Handle generating a 64-bit unsigned divide in 32-bit
	mode.

gcc/testsuite/
        PR target/83926
        * gcc.target/powerpc/pr83926.c: New test.

Index: vsx.md
===================================================================
--- vsx.md	(revision 257390)
+++ vsx.md	(working copy)
@@ -1650,10 +1650,22 @@
   rtx op5 = gen_reg_rtx (DImode);
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  emit_insn (gen_muldi3 (op5, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_muldi3 (op5, op3, op4));
+  else
+    {
+      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+      emit_move_insn (op5, ret);
+    }
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  emit_insn (gen_muldi3 (op3, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_muldi3 (op3, op3, op4));
+  else
+    {
+      rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+      emit_move_insn (op3, ret);
+    }
   emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }"
@@ -1688,10 +1700,30 @@
   rtx op5 = gen_reg_rtx (DImode);
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  emit_insn (gen_divdi3 (op5, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_divdi3 (op5, op3, op4));
+  else
+    {
+      rtx libfunc = optab_libfunc (sdiv_optab, DImode);
+      rtx target = emit_library_call_value (libfunc,
+					    op5, LCT_NORMAL, DImode,
+					    op3, DImode,
+					    op4, DImode);
+      emit_move_insn (op5, target);
+    }
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  emit_insn (gen_divdi3 (op3, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_divdi3 (op3, op3, op4));
+  else
+    {
+      rtx libfunc = optab_libfunc (sdiv_optab, DImode);
+      rtx target = emit_library_call_value (libfunc,
+					    op3, LCT_NORMAL, DImode,
+					    op3, DImode,
+					    op4, DImode);
+      emit_move_insn (op3, target);
+    }
   emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }"
@@ -1716,10 +1748,30 @@
   rtx op5 = gen_reg_rtx (DImode);
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  emit_insn (gen_udivdi3 (op5, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_udivdi3 (op5, op3, op4));
+  else
+    {
+      rtx libfunc = optab_libfunc (udiv_optab, DImode);
+      rtx target = emit_library_call_value (libfunc,
+					    op5, LCT_NORMAL, DImode,
+					    op3, DImode,
+					    op4, DImode);
+      emit_move_insn (op5, target);
+    }
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  emit_insn (gen_udivdi3 (op3, op3, op4));
+  if (TARGET_POWERPC64)
+    emit_insn (gen_udivdi3 (op3, op3, op4));
+  else
+    {
+      rtx libfunc = optab_libfunc (udiv_optab, DImode);
+      rtx target = emit_library_call_value (libfunc,
+					    op3, LCT_NORMAL, DImode,
+					    op3, DImode,
+					    op4, DImode);
+      emit_move_insn (op3, target);
+    }
   emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }"

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-06 12:42     ` Peter Bergner
@ 2018-02-06 13:21       ` Segher Boessenkool
  2018-02-06 15:43       ` Peter Bergner
  1 sibling, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2018-02-06 13:21 UTC (permalink / raw)
  To: Peter Bergner
  Cc: David Edelsohn, GCC Patches, William J. Schmidt, Will Schmidt

On Tue, Feb 06, 2018 at 06:42:06AM -0600, Peter Bergner wrote:
> On 2/5/18 10:48 PM, David Edelsohn wrote:
> > On Mon, Feb 5, 2018 at 9:43 PM, Peter Bergner <bergner@vnet.ibm.com> wrote:
> >> I did also try calling expand_divmod() here which did generate correct
> >> code, the problem was that it wasn't as clean/optimized as the change
> >> to gen_divdi3.
> > 
> > Why not fix it at the site of the call to gen_divdi3 instead of the
> > divdi3 pattern?
> 
> Well as I said above, I did try that and we got worse code.  That said,
> I unconditionally called expand_divmod() instead of calling gen_divdi3()
> when we can (TARGET_POWERPC64).  Let me retry with that change and see
> what kind of code gen we get.

You can make the div<mode>3 define_expand for SDI instead of for GPR
(like we do for many other arithmetic and logical operations already)
if that is nicer.  (And udiv<mode>3, which doesn't have an expander
yet).


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-06  4:48   ` David Edelsohn
@ 2018-02-06 12:42     ` Peter Bergner
  2018-02-06 13:21       ` Segher Boessenkool
  2018-02-06 15:43       ` Peter Bergner
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Bergner @ 2018-02-06 12:42 UTC (permalink / raw)
  To: David Edelsohn
  Cc: GCC Patches, Segher Boessenkool, William J. Schmidt, Will Schmidt

On 2/5/18 10:48 PM, David Edelsohn wrote:
> On Mon, Feb 5, 2018 at 9:43 PM, Peter Bergner <bergner@vnet.ibm.com> wrote:
>> I did also try calling expand_divmod() here which did generate correct
>> code, the problem was that it wasn't as clean/optimized as the change
>> to gen_divdi3.
> 
> Why not fix it at the site of the call to gen_divdi3 instead of the
> divdi3 pattern?

Well as I said above, I did try that and we got worse code.  That said,
I unconditionally called expand_divmod() instead of calling gen_divdi3()
when we can (TARGET_POWERPC64).  Let me retry with that change and see
what kind of code gen we get.

Peter

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-06  2:43 ` Peter Bergner
@ 2018-02-06  4:48   ` David Edelsohn
  2018-02-06 12:42     ` Peter Bergner
  0 siblings, 1 reply; 17+ messages in thread
From: David Edelsohn @ 2018-02-06  4:48 UTC (permalink / raw)
  To: Peter Bergner
  Cc: GCC Patches, Segher Boessenkool, William J. Schmidt, Will Schmidt

On Mon, Feb 5, 2018 at 9:43 PM, Peter Bergner <bergner@vnet.ibm.com> wrote:
> On 2/5/18 7:32 PM, David Edelsohn wrote:
>> Peter,
>>
>> Why can't you place the tests into the final condition of the pattern
>> so that the pattern fails and the normal GCC fallback machinery is
>> used instead of manually implementing the fallback emulation?
>
> You mean something like the following which I already tried?
>
> (define_expand "div<mode>3"
>   [(set (match_operand:GPR 0 "gpc_reg_operand" "")
>         (div:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
>                  (match_operand:GPR 2 "reg_or_cint_operand" "")))]
>   "<MODE>mode != DImode || TARGET_POWERPC64"
> {
>   if (CONST_INT_P (operands[2])
>       && INTVAL (operands[2]) > 0
>       && exact_log2 (INTVAL (operands[2])) >= 0)
>     {
>       emit_insn (gen_div<mode>3_sra (operands[0], operands[1], operands[2]));
>       DONE;
>     }
>   operands[2] = force_reg (<MODE>mode, operands[2]);
> })
>
>
> The problem with the above is that the condition doesn't seem to be able
> to stop explicit calls to the gen_divdi3() function as long as there is
> some conditions in which the conditional test is true.  Since the condition
> is true for -m64, we create the gen_divdi3() function in insn-emit.c and
> then it seems any explicit calls to that function are fair game, even
> when they come from -m32 compiles.  In this case, the explicit call is
> coming from:
>
> ; Emulate vector with scalar for vec_div in V2DImode
> (define_insn_and_split "vsx_div_v2di"
>   [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa")
>         (unspec:V2DI [(match_operand:V2DI 1 "vsx_register_operand" "wa")
>                       (match_operand:V2DI 2 "vsx_register_operand" "wa")]
>                      UNSPEC_VSX_DIVSD))]
>   "VECTOR_MEM_VSX_P (V2DImode)"
>   "#"
>   "VECTOR_MEM_VSX_P (V2DImode) && !reload_completed"
>   [(const_int 0)]
>   "
> {
>   rtx op0 = operands[0];
>   rtx op1 = operands[1];
>   rtx op2 = operands[2];
>   rtx op3 = gen_reg_rtx (DImode);
>   rtx op4 = gen_reg_rtx (DImode);
>   rtx op5 = gen_reg_rtx (DImode);
>   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
>   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
>   emit_insn (gen_divdi3 (op5, op3, op4));                       <-- Here
>   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
>   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
>   emit_insn (gen_divdi3 (op3, op3, op4));                       <-- Here
>   emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
>   DONE;
> }"
>   [(set_attr "type" "div")])
>
> I did also try calling expand_divmod() here which did generate correct
> code, the problem was that it wasn't as clean/optimized as the change
> to gen_divdi3.

Why not fix it at the site of the call to gen_divdi3 instead of the
divdi3 pattern?

Your patch is trying to clean up invocations of methods under
circumstances that they should not be called. This should be avoided
at the call site.  vsx_div_v2di is violating the API when it makes the
call.  vsx_div_v2di knows whether TARGET_POWERPC64 is in effect.
vsx_div_v2di can choose the emulation path.

Your patch seems to be fixing the symptom, not the problem.

Thanks, David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
  2018-02-06  1:32 David Edelsohn
@ 2018-02-06  2:43 ` Peter Bergner
  2018-02-06  4:48   ` David Edelsohn
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Bergner @ 2018-02-06  2:43 UTC (permalink / raw)
  To: David Edelsohn
  Cc: GCC Patches, Segher Boessenkool, William J. Schmidt, Will Schmidt

On 2/5/18 7:32 PM, David Edelsohn wrote:
> Peter,
> 
> Why can't you place the tests into the final condition of the pattern
> so that the pattern fails and the normal GCC fallback machinery is
> used instead of manually implementing the fallback emulation?

You mean something like the following which I already tried?

(define_expand "div<mode>3"
  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
        (div:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
                 (match_operand:GPR 2 "reg_or_cint_operand" "")))]
  "<MODE>mode != DImode || TARGET_POWERPC64"
{
  if (CONST_INT_P (operands[2])
      && INTVAL (operands[2]) > 0
      && exact_log2 (INTVAL (operands[2])) >= 0)
    {
      emit_insn (gen_div<mode>3_sra (operands[0], operands[1], operands[2]));
      DONE;
    }
  operands[2] = force_reg (<MODE>mode, operands[2]);
})


The problem with the above is that the condition doesn't seem to be able
to stop explicit calls to the gen_divdi3() function as long as there is
some conditions in which the conditional test is true.  Since the condition
is true for -m64, we create the gen_divdi3() function in insn-emit.c and
then it seems any explicit calls to that function are fair game, even
when they come from -m32 compiles.  In this case, the explicit call is
coming from:

; Emulate vector with scalar for vec_div in V2DImode
(define_insn_and_split "vsx_div_v2di"
  [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa")
        (unspec:V2DI [(match_operand:V2DI 1 "vsx_register_operand" "wa")
                      (match_operand:V2DI 2 "vsx_register_operand" "wa")]
                     UNSPEC_VSX_DIVSD))]
  "VECTOR_MEM_VSX_P (V2DImode)"
  "#"
  "VECTOR_MEM_VSX_P (V2DImode) && !reload_completed"
  [(const_int 0)]
  "
{
  rtx op0 = operands[0];
  rtx op1 = operands[1];
  rtx op2 = operands[2];
  rtx op3 = gen_reg_rtx (DImode);
  rtx op4 = gen_reg_rtx (DImode);
  rtx op5 = gen_reg_rtx (DImode);
  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
  emit_insn (gen_divdi3 (op5, op3, op4));			<-- Here
  emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
  emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
  emit_insn (gen_divdi3 (op3, op3, op4));			<-- Here
  emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
  DONE;
}"
  [(set_attr "type" "div")])

I did also try calling expand_divmod() here which did generate correct
code, the problem was that it wasn't as clean/optimized as the change
to gen_divdi3.

Peter


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
@ 2018-02-06  1:32 David Edelsohn
  2018-02-06  2:43 ` Peter Bergner
  0 siblings, 1 reply; 17+ messages in thread
From: David Edelsohn @ 2018-02-06  1:32 UTC (permalink / raw)
  To: Peter Bergner
  Cc: GCC Patches, Segher Boessenkool, William J. Schmidt, Will Schmidt

Peter,

Why can't you place the tests into the final condition of the pattern
so that the pattern fails and the normal GCC fallback machinery is
used instead of manually implementing the fallback emulation?

The GPR iterator only defines DI for TARGET_POWERPC64, so how does GCC
get into the muldi3 pattern, for example, and also satisfy both

(define_mode_iterator GPR [SI (DI "TARGET_POWERPC64")])

<MODE>mode == DImode && !TARGET_POWERPC64

This seems contradictory.

Thanks, David

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-02-09 17:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-06  0:07 [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins Peter Bergner
2018-02-06  1:32 David Edelsohn
2018-02-06  2:43 ` Peter Bergner
2018-02-06  4:48   ` David Edelsohn
2018-02-06 12:42     ` Peter Bergner
2018-02-06 13:21       ` Segher Boessenkool
2018-02-06 15:43       ` Peter Bergner
2018-02-06 16:20         ` David Edelsohn
2018-02-06 16:36           ` Peter Bergner
2018-02-06 17:40             ` Segher Boessenkool
2018-02-06 19:55               ` Peter Bergner
2018-02-08 16:38             ` Peter Bergner
2018-02-08 18:28               ` Peter Bergner
2018-02-09 16:51                 ` Segher Boessenkool
2018-02-09 17:09                   ` Peter Bergner
2018-02-09 16:17               ` Segher Boessenkool
2018-02-09 16:39                 ` Peter Bergner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).