public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
@ 2012-11-04 13:52 Uros Bizjak
  2012-11-04 14:58 ` Oleg Endo
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Uros Bizjak @ 2012-11-04 13:52 UTC (permalink / raw)
  To: gcc-patches; +Cc: vbyakovl23, Kaz Kojima

Hello!

Vzeroupper placement patch uses MODE_EXIT to determine if vzeroupper
has to be placed before function exit. However, when compiling
following test

--cut here--
typedef struct objc_class *Class;
typedef struct objc_object
{
  Class class_pointer;
} *id;

typedef const struct objc_selector *SEL;
typedef void * retval_t;
typedef void * arglist_t;

extern retval_t __objc_forward (id object, SEL sel, arglist_t args);

double
__objc_double_forward (id rcv, SEL op, ...)
{
  void *args, *res;

  args = __builtin_apply_args ();
  res = __objc_forward (rcv, op, args);
  __builtin_return (res);
}
--cut here--

__builtin_return emits a sequence of loads from "argument area" to all
possible function return registers in their widest mode (together with
corresponding USE RTXes), creating:

(insn 31 30 33 2 (set (reg:DI 0 ax)
        (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8])) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 33 31 35 2 (set (reg:XF 8 st)
        (mem:XF (plus:DI (reg/v/f:DI 60 [ res ])
                (const_int 16 [0x10])) [0 S16 A8])) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 35 33 32 2 (set (reg:OI 21 xmm0)
        (mem:OI (plus:DI (reg/v/f:DI 60 [ res ])
                (const_int 32 [0x20])) [0 S32 A8])) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 32 35 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1
     (nil))

This sequence breaks assumption in mode-switching.c, that final
function return register load operates in MODE_EXIT mode and triggere
following code:

		    for (j = n_entities - 1; j >= 0; j--)
		      {
			int e = entity_map[j];
			int mode = MODE_NEEDED (e, return_copy);

			if (mode != num_modes[e] && mode != MODE_EXIT (e))
			  break;
		      }

As discussed above, modes of loads, generated from __builtin_apply
have no connection to function return mode. mode-switching.c does
detect __builtin_apply situation and raises maybe_builtin_apply flag,
but doesn't use it to short-circuit wrong check. In proposed patch, we
detect this situation and raise force_late_switch in the same way, as
SH4 does for its "late" fpscr emission. For the testcase above,
following RTX sequence is generated:

(insn 31 30 33 2 (set (reg:DI 0 ax)
        (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8]))
avx-vzeroupper-27.c:23 63 {*movdi_internal_rex64}
     (nil))
(insn 33 31 35 2 (set (reg:XF 8 st)
        (mem:XF (plus:DI (reg/v/f:DI 60 [ res ])
                (const_int 16 [0x10])) [0 S16 A8]))
avx-vzeroupper-27.c:23 106 {*movxf_internal}
     (nil))
(insn 35 33 52 2 (set (reg:OI 21 xmm0)
        (mem:OI (plus:DI (reg/v/f:DI 60 [ res ])
                (const_int 32 [0x20])) [0 S32 A8]))
avx-vzeroupper-27.c:23 60 {*movoi_internal_avx}
     (expr_list:REG_DEAD (reg/v/f:DI 60 [ res ])
        (nil)))
(insn 52 35 49 2 (unspec_volatile [
            (const_int 9 [0x9])
        ] UNSPECV_VZEROUPPER) -1
     (nil))
(note 49 52 32 2 NOTE_INSN_DELETED)
(insn 32 49 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1
     (expr_list:REG_DEAD (reg:DI 0 ax)
        (nil)))
(insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1
     (expr_list:REG_DEAD (reg:XF 8 st)
        (nil)))
(insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 44 36 0 2 (use (reg/i:DF 21 xmm0)) avx-vzeroupper-27.c:24 -1

where mode switching correctly placed vzeroupper insn, based on
MODE_EXIT mode, determined from final use in (insn 44).

2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
	    Uros Bizjak  <ubizjak@gmail.com>

        * mode-switching.c (create_pre_exit): Added code for
maybe_builtin_apply case.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
with vzeroupper patch [1] applied.

I have added SH4 maintainer for possible comments.

OK for mainline SVN?

[1] http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00292.html

Uros.

--
Index: mode-switching.c
===================================================================
--- mode-switching.c    (revision 193133)
+++ mode-switching.c    (working copy)
@@ -342,6 +342,16 @@ create_pre_exit (int n_entities, int *entity_map,
                      }
                    if (j >= 0)
                      {
+                       /* __builtin_return emits a sequence of loads to all
+                          function value registers in their widest mode,
+                          which breaks the assumption on the mode of the
+                          return register load. Allow this situation, so the
+                          final mode switch will be emitted after the load.  */
+                       if (maybe_builtin_apply
+                           && targetm.calls.function_value_regno_p
+                               (copy_start))
+                         forced_late_switch = 1;
+
                        /* For the SH4, floating point loads depend on fpscr,
                           thus we might need to put the final mode switch
                           after the return value copy.  That is still OK,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-04 13:52 [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return Uros Bizjak
@ 2012-11-04 14:58 ` Oleg Endo
  2012-11-04 15:03   ` Uros Bizjak
  2012-11-05  9:10 ` Kaz Kojima
  2012-11-05 18:07 ` Eric Botcazou
  2 siblings, 1 reply; 12+ messages in thread
From: Oleg Endo @ 2012-11-04 14:58 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, vbyakovl23, Kaz Kojima

On Sun, 2012-11-04 at 14:52 +0100, Uros Bizjak wrote:
> Hello!
> 
> Vzeroupper placement patch uses MODE_EXIT to determine if vzeroupper
> has to be placed before function exit. However, when compiling
> following test
> 
> --cut here--
> typedef struct objc_class *Class;
> typedef struct objc_object
> {
>   Class class_pointer;
> } *id;
> 
> typedef const struct objc_selector *SEL;
> typedef void * retval_t;
> typedef void * arglist_t;
> 
> extern retval_t __objc_forward (id object, SEL sel, arglist_t args);
> 
> double
> __objc_double_forward (id rcv, SEL op, ...)
> {
>   void *args, *res;
> 
>   args = __builtin_apply_args ();
>   res = __objc_forward (rcv, op, args);
>   __builtin_return (res);
> }
> --cut here--
> 
> __builtin_return emits a sequence of loads from "argument area" to all
> possible function return registers in their widest mode (together with
> corresponding USE RTXes), creating:
> 
> (insn 31 30 33 2 (set (reg:DI 0 ax)
>         (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8])) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 33 31 35 2 (set (reg:XF 8 st)
>         (mem:XF (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 16 [0x10])) [0 S16 A8])) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 35 33 32 2 (set (reg:OI 21 xmm0)
>         (mem:OI (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 32 [0x20])) [0 S32 A8])) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 32 35 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1
>      (nil))
> 
> This sequence breaks assumption in mode-switching.c, that final
> function return register load operates in MODE_EXIT mode and triggere
> following code:
> 
> 		    for (j = n_entities - 1; j >= 0; j--)
> 		      {
> 			int e = entity_map[j];
> 			int mode = MODE_NEEDED (e, return_copy);
> 
> 			if (mode != num_modes[e] && mode != MODE_EXIT (e))
> 			  break;
> 		      }
> 
> As discussed above, modes of loads, generated from __builtin_apply
> have no connection to function return mode. mode-switching.c does
> detect __builtin_apply situation and raises maybe_builtin_apply flag,
> but doesn't use it to short-circuit wrong check. In proposed patch, we
> detect this situation and raise force_late_switch in the same way, as
> SH4 does for its "late" fpscr emission. For the testcase above,
> following RTX sequence is generated:
> 
> (insn 31 30 33 2 (set (reg:DI 0 ax)
>         (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8]))
> avx-vzeroupper-27.c:23 63 {*movdi_internal_rex64}
>      (nil))
> (insn 33 31 35 2 (set (reg:XF 8 st)
>         (mem:XF (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 16 [0x10])) [0 S16 A8]))
> avx-vzeroupper-27.c:23 106 {*movxf_internal}
>      (nil))
> (insn 35 33 52 2 (set (reg:OI 21 xmm0)
>         (mem:OI (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 32 [0x20])) [0 S32 A8]))
> avx-vzeroupper-27.c:23 60 {*movoi_internal_avx}
>      (expr_list:REG_DEAD (reg/v/f:DI 60 [ res ])
>         (nil)))
> (insn 52 35 49 2 (unspec_volatile [
>             (const_int 9 [0x9])
>         ] UNSPECV_VZEROUPPER) -1
>      (nil))
> (note 49 52 32 2 NOTE_INSN_DELETED)
> (insn 32 49 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1
>      (expr_list:REG_DEAD (reg:DI 0 ax)
>         (nil)))
> (insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1
>      (expr_list:REG_DEAD (reg:XF 8 st)
>         (nil)))
> (insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 44 36 0 2 (use (reg/i:DF 21 xmm0)) avx-vzeroupper-27.c:24 -1
> 
> where mode switching correctly placed vzeroupper insn, based on
> MODE_EXIT mode, determined from final use in (insn 44).
> 
> 2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
> 	    Uros Bizjak  <ubizjak@gmail.com>
> 
>         * mode-switching.c (create_pre_exit): Added code for
> maybe_builtin_apply case.
> 
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
> with vzeroupper patch [1] applied.
> 
> I have added SH4 maintainer for possible comments.

BTW, there are at least two mode-switching SH PRs: 41933 and 49220.
I've tried those test cases with the vzeroupper patch [1] applied.
Unfortunately, it doesn't change the situation of the two PRs.

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-04 14:58 ` Oleg Endo
@ 2012-11-04 15:03   ` Uros Bizjak
  2012-11-04 15:23     ` Uros Bizjak
  0 siblings, 1 reply; 12+ messages in thread
From: Uros Bizjak @ 2012-11-04 15:03 UTC (permalink / raw)
  To: Oleg Endo; +Cc: gcc-patches, vbyakovl23, Kaz Kojima

On Sun, Nov 4, 2012 at 3:58 PM, Oleg Endo <oleg.endo@t-online.de> wrote:

>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
>> with vzeroupper patch [1] applied.
>>
>> I have added SH4 maintainer for possible comments.
>
> BTW, there are at least two mode-switching SH PRs: 41933 and 49220.
> I've tried those test cases with the vzeroupper patch [1] applied.
> Unfortunately, it doesn't change the situation of the two PRs.

True, this patch fixes very specific case involving __builtin_apply
only.  Hopefully it doesn't _break_ something on SH.

Uros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-04 15:03   ` Uros Bizjak
@ 2012-11-04 15:23     ` Uros Bizjak
  2012-11-04 18:06       ` Uros Bizjak
  0 siblings, 1 reply; 12+ messages in thread
From: Uros Bizjak @ 2012-11-04 15:23 UTC (permalink / raw)
  To: Oleg Endo; +Cc: gcc-patches, vbyakovl23, Kaz Kojima

On Sun, Nov 4, 2012 at 4:02 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Nov 4, 2012 at 3:58 PM, Oleg Endo <oleg.endo@t-online.de> wrote:
>
>>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
>>> with vzeroupper patch [1] applied.
>>>
>>> I have added SH4 maintainer for possible comments.
>>
>> BTW, there are at least two mode-switching SH PRs: 41933 and 49220.
>> I've tried those test cases with the vzeroupper patch [1] applied.
>> Unfortunately, it doesn't change the situation of the two PRs.
>
> True, this patch fixes very specific case involving __builtin_apply
> only.  Hopefully it doesn't _break_ something on SH.

FYI, the testcase from PR41993 involving __builtin_return inserts
vzeroupper at correct place, even with "-O0 -mavx -vzeroupper", so the
ICE seems SH specific. The testcase from PR49220 works OK as well for
all compile flags I have tried, although YMM registers are not used,
and consequently no vzerouppers are inserted.

Uros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-04 15:23     ` Uros Bizjak
@ 2012-11-04 18:06       ` Uros Bizjak
  0 siblings, 0 replies; 12+ messages in thread
From: Uros Bizjak @ 2012-11-04 18:06 UTC (permalink / raw)
  To: Oleg Endo; +Cc: gcc-patches, vbyakovl23, Kaz Kojima

On Sun, Nov 4, 2012 at 4:23 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

>>>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
>>>> with vzeroupper patch [1] applied.
>>>>
>>>> I have added SH4 maintainer for possible comments.
>>>
>>> BTW, there are at least two mode-switching SH PRs: 41933 and 49220.
>>> I've tried those test cases with the vzeroupper patch [1] applied.
>>> Unfortunately, it doesn't change the situation of the two PRs.
>>
>> True, this patch fixes very specific case involving __builtin_apply
>> only.  Hopefully it doesn't _break_ something on SH.
>
> FYI, the testcase from PR41993 involving __builtin_return inserts
> vzeroupper at correct place, even with "-O0 -mavx -vzeroupper", so the
> ICE seems SH specific. The testcase from PR49220 works OK as well for
> all compile flags I have tried, although YMM registers are not used,
> and consequently no vzerouppers are inserted.

Please see PR41993 for follow-up discussion and possible fix.

Uros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-04 13:52 [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return Uros Bizjak
  2012-11-04 14:58 ` Oleg Endo
@ 2012-11-05  9:10 ` Kaz Kojima
  2012-11-05  9:55   ` Vladimir Yakovlev
  2012-11-05 18:07 ` Eric Botcazou
  2 siblings, 1 reply; 12+ messages in thread
From: Kaz Kojima @ 2012-11-05  9:10 UTC (permalink / raw)
  To: ubizjak; +Cc: gcc-patches, vbyakovl23

Uros Bizjak <ubizjak@gmail.com> wrote:
> 2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
> 	    Uros Bizjak  <ubizjak@gmail.com>
> 
>         * mode-switching.c (create_pre_exit): Added code for
> maybe_builtin_apply case.
> 
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
> with vzeroupper patch [1] applied.
> 
> I have added SH4 maintainer for possible comments.

I've confirmed that there are no new failures with the patch
on sh4-unknown-linux-gnu.
BTW, it looks that the copyright year of mode-switching.c
should be updated.


Regards,
	kaz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-05  9:10 ` Kaz Kojima
@ 2012-11-05  9:55   ` Vladimir Yakovlev
  0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Yakovlev @ 2012-11-05  9:55 UTC (permalink / raw)
  To: Kaz Kojima; +Cc: ubizjak, gcc-patches

Hellow, Kaz

I've updated copyright. Is it Ok?

Thanks,
Vladimir

--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -1,6 +1,6 @@
 /* CPU mode switching
    Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008,
-   2009, 2010 Free Software Foundation, Inc.
+   2009, 2010, 2011, 2012 Free Software Foundation, Inc.

 This file is part of GCC.


2012/11/5 Kaz Kojima <kkojima@rr.iij4u.or.jp>:
> Uros Bizjak <ubizjak@gmail.com> wrote:
>> 2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
>>           Uros Bizjak  <ubizjak@gmail.com>
>>
>>         * mode-switching.c (create_pre_exit): Added code for
>> maybe_builtin_apply case.
>>
>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
>> with vzeroupper patch [1] applied.
>>
>> I have added SH4 maintainer for possible comments.
>
> I've confirmed that there are no new failures with the patch
> on sh4-unknown-linux-gnu.
> BTW, it looks that the copyright year of mode-switching.c
> should be updated.
>
>
> Regards,
>         kaz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-04 13:52 [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return Uros Bizjak
  2012-11-04 14:58 ` Oleg Endo
  2012-11-05  9:10 ` Kaz Kojima
@ 2012-11-05 18:07 ` Eric Botcazou
  2012-11-05 18:34   ` Uros Bizjak
  2 siblings, 1 reply; 12+ messages in thread
From: Eric Botcazou @ 2012-11-05 18:07 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, vbyakovl23, Kaz Kojima

> This sequence breaks assumption in mode-switching.c, that final
> function return register load operates in MODE_EXIT mode and triggere
> following code:
> 
> 		    for (j = n_entities - 1; j >= 0; j--)
> 		      {
> 			int e = entity_map[j];
> 			int mode = MODE_NEEDED (e, return_copy);
> 
> 			if (mode != num_modes[e] && mode != MODE_EXIT (e))
> 			  break;
> 		      }
> 
> As discussed above, modes of loads, generated from __builtin_apply
> have no connection to function return mode. mode-switching.c does
> detect __builtin_apply situation and raises maybe_builtin_apply flag,
> but doesn't use it to short-circuit wrong check. In proposed patch, we
> detect this situation and raise force_late_switch in the same way, as
> SH4 does for its "late" fpscr emission.

If I understand correctly, we need to insert the vzeroupper because the 
function returns double in SSE registers but we generate an OImode load 
instead of a DFmode load because of the __builtin_return.  So we're in the 
forced_late_switch case but we fail to recognize the tweaked return value load 
since the number of registers doesn't match.

If so, I'd rather add another special case, like for the SH4, instead of a 
generic bypass for maybe_builtin_apply, something along the lines of:

	/* For the x86 with AVX, we might be using a larger load for a value
	   returned in SSE registers and we want to put the final mode switch
	   after this return value copy.  */
	if (copy_start == ret_start
	    && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
	    && copy_num >= nregs
	    && OBJECT_P (SET_SRC (return_copy_pat)))
	  forced_late_switch = 1;

> 2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
> 	    Uros Bizjak  <ubizjak@gmail.com>
> 
>         * mode-switching.c (create_pre_exit): Added code for
> maybe_builtin_apply case.

Present tense in ChangeLog.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-05 18:07 ` Eric Botcazou
@ 2012-11-05 18:34   ` Uros Bizjak
  2012-11-05 18:43     ` Uros Bizjak
  0 siblings, 1 reply; 12+ messages in thread
From: Uros Bizjak @ 2012-11-05 18:34 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, vbyakovl23, Kaz Kojima

[-- Attachment #1: Type: text/plain, Size: 2347 bytes --]

On Mon, Nov 5, 2012 at 7:05 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> This sequence breaks assumption in mode-switching.c, that final
>> function return register load operates in MODE_EXIT mode and triggere
>> following code:
>>
>>                   for (j = n_entities - 1; j >= 0; j--)
>>                     {
>>                       int e = entity_map[j];
>>                       int mode = MODE_NEEDED (e, return_copy);
>>
>>                       if (mode != num_modes[e] && mode != MODE_EXIT (e))
>>                         break;
>>                     }
>>
>> As discussed above, modes of loads, generated from __builtin_apply
>> have no connection to function return mode. mode-switching.c does
>> detect __builtin_apply situation and raises maybe_builtin_apply flag,
>> but doesn't use it to short-circuit wrong check. In proposed patch, we
>> detect this situation and raise force_late_switch in the same way, as
>> SH4 does for its "late" fpscr emission.
>
> If I understand correctly, we need to insert the vzeroupper because the
> function returns double in SSE registers but we generate an OImode load
> instead of a DFmode load because of the __builtin_return.  So we're in the
> forced_late_switch case but we fail to recognize the tweaked return value load
> since the number of registers doesn't match.
>
> If so, I'd rather add another special case, like for the SH4, instead of a
> generic bypass for maybe_builtin_apply, something along the lines of:
>
>         /* For the x86 with AVX, we might be using a larger load for a value
>            returned in SSE registers and we want to put the final mode switch
>            after this return value copy.  */
>         if (copy_start == ret_start
>             && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
>             && copy_num >= nregs
>             && OBJECT_P (SET_SRC (return_copy_pat)))
>           forced_late_switch = 1;

Yes, this approach also works.

I assume it is OK to commit attached patch?

2012-11-05  Eric Botcazou  <ebotcazou@adacore.com>
	    Uros Bizjak  <ubizjak@gmail.com>

	* mode-switching.c (create_pre_exit): Force late switch for
	__builtin_return case, when value, returned in SSE register,
	was loaded using OImode load.

Tested on x86_64-pc-linux-gnu, also with to-be-committed avx-vzeroupper-27.c

Thanks,
Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 1110 bytes --]

Index: mode-switching.c
===================================================================
--- mode-switching.c	(revision 193174)
+++ mode-switching.c	(working copy)
@@ -1,6 +1,6 @@
 /* CPU mode switching
    Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008,
-   2009, 2010 Free Software Foundation, Inc.
+   2009, 2010, 2012  Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -342,6 +342,17 @@ create_pre_exit (int n_entities, int *entity_map,
 		      }
 		    if (j >= 0)
 		      {
+			/* For the x86 with AVX, we might be using a larger
+			   load for a value returned in SSE registers and we
+			   want to put the final mode switch after this
+			   return value copy.  */
+			if (copy_start == ret_start
+			    && nregs
+			       == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
+			    && copy_num >= nregs
+			    && OBJECT_P (SET_SRC (return_copy_pat)))
+			  forced_late_switch = 1;
+
 			/* For the SH4, floating point loads depend on fpscr,
 			   thus we might need to put the final mode switch
 			   after the return value copy.  That is still OK,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-05 18:34   ` Uros Bizjak
@ 2012-11-05 18:43     ` Uros Bizjak
  2012-11-05 19:07       ` Uros Bizjak
  2012-11-05 19:32       ` Eric Botcazou
  0 siblings, 2 replies; 12+ messages in thread
From: Uros Bizjak @ 2012-11-05 18:43 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, vbyakovl23, Kaz Kojima

On Mon, Nov 5, 2012 at 7:34 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Nov 5, 2012 at 7:05 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>>> This sequence breaks assumption in mode-switching.c, that final
>>> function return register load operates in MODE_EXIT mode and triggere
>>> following code:
>>>
>>>                   for (j = n_entities - 1; j >= 0; j--)
>>>                     {
>>>                       int e = entity_map[j];
>>>                       int mode = MODE_NEEDED (e, return_copy);
>>>
>>>                       if (mode != num_modes[e] && mode != MODE_EXIT (e))
>>>                         break;
>>>                     }
>>>
>>> As discussed above, modes of loads, generated from __builtin_apply
>>> have no connection to function return mode. mode-switching.c does
>>> detect __builtin_apply situation and raises maybe_builtin_apply flag,
>>> but doesn't use it to short-circuit wrong check. In proposed patch, we
>>> detect this situation and raise force_late_switch in the same way, as
>>> SH4 does for its "late" fpscr emission.
>>
>> If I understand correctly, we need to insert the vzeroupper because the
>> function returns double in SSE registers but we generate an OImode load
>> instead of a DFmode load because of the __builtin_return.  So we're in the
>> forced_late_switch case but we fail to recognize the tweaked return value load
>> since the number of registers doesn't match.
>>
>> If so, I'd rather add another special case, like for the SH4, instead of a
>> generic bypass for maybe_builtin_apply, something along the lines of:
>>
>>         /* For the x86 with AVX, we might be using a larger load for a value
>>            returned in SSE registers and we want to put the final mode switch
>>            after this return value copy.  */
>>         if (copy_start == ret_start
>>             && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
>>             && copy_num >= nregs
>>             && OBJECT_P (SET_SRC (return_copy_pat)))
>>           forced_late_switch = 1;
>
> Yes, this approach also works.
>
> I assume it is OK to commit attached patch?
>
> 2012-11-05  Eric Botcazou  <ebotcazou@adacore.com>
>             Uros Bizjak  <ubizjak@gmail.com>
>
>         * mode-switching.c (create_pre_exit): Force late switch for
>         __builtin_return case, when value, returned in SSE register,
>         was loaded using OImode load.
>
> Tested on x86_64-pc-linux-gnu, also with to-be-committed avx-vzeroupper-27.c

Unfortunately the proposed patch fails the testcase from PR41993:

--cut here--
short retframe_short (void *rframe)
{
    __builtin_return (rframe);
}
--cut here--

Uros.
>
> Thanks,
> Uros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-05 18:43     ` Uros Bizjak
@ 2012-11-05 19:07       ` Uros Bizjak
  2012-11-05 19:32       ` Eric Botcazou
  1 sibling, 0 replies; 12+ messages in thread
From: Uros Bizjak @ 2012-11-05 19:07 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, vbyakovl23, Kaz Kojima

On Mon, Nov 5, 2012 at 7:43 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

>>>> As discussed above, modes of loads, generated from __builtin_apply
>>>> have no connection to function return mode. mode-switching.c does
>>>> detect __builtin_apply situation and raises maybe_builtin_apply flag,
>>>> but doesn't use it to short-circuit wrong check. In proposed patch, we
>>>> detect this situation and raise force_late_switch in the same way, as
>>>> SH4 does for its "late" fpscr emission.
>>>
>>> If I understand correctly, we need to insert the vzeroupper because the
>>> function returns double in SSE registers but we generate an OImode load
>>> instead of a DFmode load because of the __builtin_return.  So we're in the
>>> forced_late_switch case but we fail to recognize the tweaked return value load
>>> since the number of registers doesn't match.

Actually, the complication with __buitlin_apply/__builtin_return is
that it blindly loads all possible return registers. So, in PR41993
case, we load SSE register using OImode load (that forces AVX dirty
state), even if we actually return %eax. In your patch it is assumed
that wide-load corresponds to the current function return register,
which is not the case.

Please note, that we enter the above code only in case the needed mode
of processed insn is different than MODE_EXIT. So in x86 case, we know
that this was due to hard-register load in "wrong" mode from
__builtin_{apply,return}. My patch also uses extra condition, where
hard-reg should be one of possible return registers, but not only
current function return register.

Uros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
  2012-11-05 18:43     ` Uros Bizjak
  2012-11-05 19:07       ` Uros Bizjak
@ 2012-11-05 19:32       ` Eric Botcazou
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Botcazou @ 2012-11-05 19:32 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, vbyakovl23, Kaz Kojima

> Unfortunately the proposed patch fails the testcase from PR41993:
> 
> --cut here--
> short retframe_short (void *rframe)
> {
>     __builtin_return (rframe);
> }

OK, so that's not only an issue with the mode of the return register, but with 
the return register itself.  Then the original patch is OK if the comment is 
reworked, something like:

   /* __builtin_return emits a sequence of loads to all return
      registers.  One of them might require another mode than
      MODE_EXIT, even if it is unrelated to the return value,
      so we want to put the final mode switch after it.  */

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-11-05 19:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-04 13:52 [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return Uros Bizjak
2012-11-04 14:58 ` Oleg Endo
2012-11-04 15:03   ` Uros Bizjak
2012-11-04 15:23     ` Uros Bizjak
2012-11-04 18:06       ` Uros Bizjak
2012-11-05  9:10 ` Kaz Kojima
2012-11-05  9:55   ` Vladimir Yakovlev
2012-11-05 18:07 ` Eric Botcazou
2012-11-05 18:34   ` Uros Bizjak
2012-11-05 18:43     ` Uros Bizjak
2012-11-05 19:07       ` Uros Bizjak
2012-11-05 19:32       ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).