public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: GCC Status Report (2004-03-09)
@ 2004-03-19 20:08 Richard Kenner
  0 siblings, 0 replies; 32+ messages in thread
From: Richard Kenner @ 2004-03-19 20:08 UTC (permalink / raw)
  To: ebotcazou; +Cc: gcc-patches, gcc

    Before I start thinking about a replacement, I'd like to understand
    what I'll be trying the replace.  It appears that no-bo-dy can tell
    what is the purpose of RTX_UNCHANGING_P.

Well I can tell you the *purpose*, but if you want the *definition*, that's
something else entirely.

The original idea was to separate objects that we knew could not be
modified by stores, like constant pool objects.

Then, at some later point, it was realized that we could also optimize by
allowing a *single* store to "constant" object to initialize it, but things
started rapidly going downhill once we started to take more and more
advantage of that property.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-19 20:23                         ` Mark Mitchell
@ 2004-03-20 19:51                           ` Eric Botcazou
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Botcazou @ 2004-03-20 19:51 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jakub Jelinek, gcc, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3878 bytes --]

> >Do you want me to revert that patch?
>
> Yes -- if your tests confirm that it is safe to do so.

The attached patch successfully completed a testing cycle for the 3.4 branch 
on x86 (including the testcase for PR opt/8634) but I'm not convinced it is 
safe because, even if the comment in maybe_set_unchanging seems to imply 
that the problem for PR opt/8634 was caused by the double store in 
store_constructor, this is not the case: the original problem came from the 
mere existence of /u on memory writes for non-static initializers.  See
http://gcc.gnu.org/ml/gcc-patches/2002-12/msg00533.html
http://gcc.gnu.org/ml/gcc-patches/2003-04/msg00482.html
http://gcc.gnu.org/ml/gcc-patches/2003-04/msg00489.html

And I'm not sure whether rth's patch catches all the cases.

> It's supposed to be the RTL equivalent of "const".  In other words, once
> initialized, an RTX_UNCHANGING_P thing is immutable, and therefore no
> writes can alias it.  If you're after the one write to the
> RTX_UNCHANGING_P thing, then it's value is always valid.  That's true
> even if someone has its address; they cannot write through the pointer.

Yes, I understand the definition.  The problem for me is really the purpose.

> This is clearly a valuable optimization aid.

Optimization or correctness?  We had several bugreports recently on the 3.3 
branch where reload generates moves to read-only memory.  The only way to 
prevent that without rewriting the compiler seems to be testing the /u flag.

> I think that simply making this flag a tri-state will be the cleanest fix.
> For memory to which this clearing optimization applies, the flag should not
> be set, because the memory is written twice.  If we're worried about
> pessimization in that case, we should avoid writing the memory twice.

I don't think this is radical enough.  I think we need to separate 
correctness from optimization, i.e having a way to say "this place can never 
ever be written to" and to say "this place is not supposed to have been 
written more than once because of the semantics of the language".  The 
latter could be relaxed by the middle-end if it deems it profitable, the 
former being of course immutable.

> Frankly, I suspect that there is virtually no real code where writing
> only to the holes (where "holes" means "fields that are not explicitly
> initialized to a non-zero value, and, if the compiler so desires, parts
> of the object that are not part of any field") has any observable
> performance from clearing the whole structure.  If most of the structure
> is zero, then that will certainly be true.  If only a tiny bit of the
> structure is non-zero, that will certainly be true.  If the non-zero
> parts are contiguous, that will probably be true.  In practice, there
> are few inner loops involving initializing every other field in a
> structure, and that is the case where we would lose.

Frankly, I don't feel confident enough to implement myself that change on the 
3.4 branch at this point.  On the other hand, it's clearly a better solution 
than the blockage.  So, if someone more confident than me wants to do it,  
I'll help him to merge its work with the patch from ACT and verify that it 
fixes all the known failures.


 2004-03-20  Eric Botcazou <ebotcazou@libertysurf.fr>
             Mark Mitchell <mark@codesourcery.com>

        PR optimization/13424
	* explow.c (maybe_set_unchanging): Revert 2003-04-07 patch.
	* tree.h (readwrite_fields_p): New prototype.
	* alias.c (readwrite_fields_p): New function.
        * expr.c (store_constructor): When clearing the aggregate because of
	an incomplete or mostly zero constructor, do not put the /u flag if the
	target is already unchanging.  Record whether a non-unchanging
	aggregate containing read-write fields is cleared with the /u flag.
	In that case, emit a blockage right after the clearing.


-- 
Eric Botcazou

[-- Attachment #2: pr13424-4.diff --]
[-- Type: text/x-diff, Size: 4695 bytes --]

Index: explow.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/explow.c,v
retrieving revision 1.118
diff -u -p -r1.118 explow.c
--- explow.c	13 Dec 2003 04:11:20 -0000	1.118
+++ explow.c	19 Mar 2004 19:18:45 -0000
@@ -597,18 +597,9 @@ maybe_set_unchanging (rtx ref, tree t)
   /* We can set RTX_UNCHANGING_P from TREE_READONLY for decls whose
      initialization is only executed once, or whose initializer always
      has the same value.  Currently we simplify this to PARM_DECLs in the
-     first case, and decls with TREE_CONSTANT initializers in the second.
-
-     We cannot do this for non-static aggregates, because of the double
-     writes that can be generated by store_constructor, depending on the
-     contents of the initializer.  Yes, this does eliminate a good fraction
-     of the number of uses of RTX_UNCHANGING_P for a language like Ada.
-     It also eliminates a good quantity of bugs.  Let this be incentive to
-     eliminate RTX_UNCHANGING_P entirely in favor of a more reliable
-     solution, perhaps based on alias sets.  */
+     first case, and decls with TREE_CONSTANT initializers in the second.  */
 
   if ((TREE_READONLY (t) && DECL_P (t)
-       && (TREE_STATIC (t) || ! AGGREGATE_TYPE_P (TREE_TYPE (t)))
        && (TREE_CODE (t) == PARM_DECL
 	   || (DECL_INITIAL (t) && TREE_CONSTANT (DECL_INITIAL (t)))))
       || TREE_CODE_CLASS (TREE_CODE (t)) == 'c')
Index: alias.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/alias.c,v
retrieving revision 1.209.2.4
diff -u -p -r1.209.2.4 alias.c
--- alias.c	12 Feb 2004 23:28:27 -0000	1.209.2.4
+++ alias.c	19 Mar 2004 19:18:53 -0000
@@ -312,6 +312,26 @@ readonly_fields_p (tree type)
 
   return 0;
 }
+
+/* Same as above but for read-write fields.  */
+
+int
+readwrite_fields_p (tree type)
+{
+  tree field;
+
+  if (TREE_CODE (type) != RECORD_TYPE && TREE_CODE (type) != UNION_TYPE
+      && TREE_CODE (type) != QUAL_UNION_TYPE)
+    return 0;
+
+  for (field = TYPE_FIELDS (type); field != 0; field = TREE_CHAIN (field))
+    if (TREE_CODE (field) == FIELD_DECL
+	&& (! TREE_READONLY (field)
+	    || readwrite_fields_p (TREE_TYPE (field))))
+      return 1;
+
+  return 0;
+}
 \f
 /* Return 1 if any MEM object of type T1 will always conflict (using the
    dependency routines in this file) with any MEM object of type T2.
Index: expr.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expr.c,v
retrieving revision 1.615.4.9
diff -u -p -r1.615.4.9 expr.c
--- expr.c	13 Mar 2004 18:26:23 -0000	1.615.4.9
+++ expr.c	19 Mar 2004 19:19:19 -0000
@@ -4513,6 +4513,7 @@ store_constructor (tree exp, rtx target,
   if (TREE_CODE (type) == RECORD_TYPE || TREE_CODE (type) == UNION_TYPE
       || TREE_CODE (type) == QUAL_UNION_TYPE)
     {
+      bool readwrite_fields_marked_unchanging_p = false;
       tree elt;
 
       /* If size is zero or the target is already cleared, do nothing.  */
@@ -4552,15 +4553,28 @@ store_constructor (tree exp, rtx target,
 	{
 	  rtx xtarget = target;
 
-	  if (readonly_fields_p (type))
+	  if (readonly_fields_p (type) && ! RTX_UNCHANGING_P (xtarget))
 	    {
 	      xtarget = copy_rtx (xtarget);
 	      RTX_UNCHANGING_P (xtarget) = 1;
+	      if (readwrite_fields_p (type))
+		readwrite_fields_marked_unchanging_p = true;
 	    }
 
 	  clear_storage (xtarget, GEN_INT (size));
 	  cleared = 1;
 	}
+
+      /* ??? Emit a blockage to prevent the scheduler from swapping the
+	 memory write issued above with the /u flag and memory writes
+	 that may be issued later without it.  Note that the clearing
+	 above cannot be simply disabled in the unsafe cases because
+	 the C front-end relies on it to implement the semantics of
+	 constructors for automatic objects.  However, not all machine
+	 descriptions define a blockage insn, so emit an ASM_INPUT to
+	 act as one.  */
+      if (readwrite_fields_marked_unchanging_p)
+	emit_insn (gen_rtx_ASM_INPUT (VOIDmode, ""));
 
       if (! cleared)
 	emit_insn (gen_rtx_CLOBBER (VOIDmode, target));
Index: tree.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree.h,v
retrieving revision 1.458.2.4
diff -u -p -r1.458.2.4 tree.h
--- tree.h	8 Feb 2004 01:52:43 -0000	1.458.2.4
+++ tree.h	19 Mar 2004 19:19:27 -0000
@@ -2828,6 +2828,7 @@ extern void record_component_aliases (tr
 extern HOST_WIDE_INT get_alias_set (tree);
 extern int alias_sets_conflict_p (HOST_WIDE_INT, HOST_WIDE_INT);
 extern int readonly_fields_p (tree);
+extern int readwrite_fields_p (tree);
 extern int objects_must_conflict_p (tree, tree);
 
 /* In tree.c */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-19 20:04                       ` Eric Botcazou
@ 2004-03-19 20:23                         ` Mark Mitchell
  2004-03-20 19:51                           ` Eric Botcazou
  0 siblings, 1 reply; 32+ messages in thread
From: Mark Mitchell @ 2004-03-19 20:23 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Jakub Jelinek, gcc, gcc-patches

Eric Botcazou wrote:

>>Well, so, let's remove that chunk of code; it should no longer be needed.
>>    
>>
>
>
>Do you want me to revert that patch?
>
>  
>
Yes -- if your tests confirm that it is safe to do so.

>>Eric, I can tell you're unhappy with this approach, and so you're
>>casting about for something better.  That's good, but I think we've come
>>as far as we can for 3.4.0.
>>    
>>
>
>Yes, I'm pretty frustrated because we're (again) chasing down a far-reaching 
>bug just days before a release.
>
>  
>
Yes.  This bug has been lying around for a long time, and we were all 
afraid to tackle it.  The long-term solution is clearer interfaces and 
specifications so that it's more obvious what's wrong; the problem here 
is that none of us quite know what this code is supposed to do.

>Before I start thinking about a replacement, I'd like to understand what I'll 
>be trying the replace.  It appears that no-bo-dy can tell what is the 
>purpose of RTX_UNCHANGING_P.
>  
>
It's supposed to be the RTL equivalent of "const".  In other words, once 
initialized, an RTX_UNCHANGING_P thing is immutable, and therefore no 
writes can alias it.  If you're after the one write to the 
RTX_UNCHANGING_P thing, then it's value is always valid.  That's true 
even if someone has its address; they cannot write through the pointer.

This is clearly a valuable optimization aid.  I think that simply making 
this flag a tri-state will be the cleanest fix.  For memory to which 
this clearing optimization applies, the flag should not be set, because 
the memory is written twice.  If we're worried about pessimization in 
that case, we should avoid writing the memory twice. 

Frankly, I suspect that there is virtually no real code where writing 
only to the holes (where "holes" means "fields that are not explicitly 
initialized to a non-zero value, and, if the compiler so desires, parts 
of the object that are not part of any field") has any observable 
performance from clearing the whole structure.  If most of the structure 
is zero, then that will certainly be true.  If only a tiny bit of the 
structure is non-zero, that will certainly be true.  If the non-zero 
parts are contiguous, that will probably be true.  In practice, there 
are few inner loops involving initializing every other field in a 
structure, and that is the case where we would lose.

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-19 19:29                     ` Mark Mitchell
@ 2004-03-19 20:04                       ` Eric Botcazou
  2004-03-19 20:23                         ` Mark Mitchell
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Botcazou @ 2004-03-19 20:04 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jakub Jelinek, gcc, gcc-patches

> Well, so, let's remove that chunk of code; it should no longer be needed.

No, not the whole chunk of code.  The pessimization was introduced by:


2003-04-07  Glen Nakamura  <glen@imodulo.com>

	PR opt/8634
	* explow.c (maybe_set_unchanging): Don't flag non-static const
	aggregate type initializers with RTX_UNCHANGING_P.

===================================================================
RCS file: /cvs/gcc/gcc/gcc/explow.c,v
retrieving revision 1.107
retrieving revision 1.108
diff -u -r1.107 -r1.108
--- gcc/gcc/explow.c	2003/03/20 22:51:39	1.107
+++ gcc/gcc/explow.c	2003/04/07 22:57:41	1.108
@@ -657,8 +657,18 @@
   /* We can set RTX_UNCHANGING_P from TREE_READONLY for decls whose
      initialization is only executed once, or whose initializer always
      has the same value.  Currently we simplify this to PARM_DECLs in the
-     first case, and decls with TREE_CONSTANT initializers in the second.  
*/
+     first case, and decls with TREE_CONSTANT initializers in the second.
+
+     We cannot do this for non-static aggregates, because of the double
+     writes that can be generated by store_constructor, depending on the
+     contents of the initializer.  Yes, this does eliminate a good fraction
+     of the number of uses of RTX_UNCHANGING_P for a language like Ada.
+     It also eliminates a good quantity of bugs.  Let this be incentive to
+     eliminate RTX_UNCHANGING_P entirely in favour of a more reliable
+     solution, perhaps based on alias sets.  */
+
   if ((TREE_READONLY (t) && DECL_P (t)
+       && (TREE_STATIC (t) || ! AGGREGATE_TYPE_P (TREE_TYPE (t)))
        && (TREE_CODE (t) == PARM_DECL
 	   || (DECL_INITIAL (t) && TREE_CONSTANT (DECL_INITIAL (t)))))
       || TREE_CODE_CLASS (TREE_CODE (t)) == 'c')


Do you want me to revert that patch?

> Eric, I can tell you're unhappy with this approach, and so you're
> casting about for something better.  That's good, but I think we've come
> as far as we can for 3.4.0.

Yes, I'm pretty frustrated because we're (again) chasing down a far-reaching 
bug just days before a release.

> If all this is sufficiently annoyhing to you, maybe you can work on ripping
> out RTX_UNCHANGING_P for 3.5 and replacing it with something better!

Before I start thinking about a replacement, I'd like to understand what I'll 
be trying the replace.  It appears that no-bo-dy can tell what is the 
purpose of RTX_UNCHANGING_P.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-19 14:31                   ` Eric Botcazou
@ 2004-03-19 19:29                     ` Mark Mitchell
  2004-03-19 20:04                       ` Eric Botcazou
  0 siblings, 1 reply; 32+ messages in thread
From: Mark Mitchell @ 2004-03-19 19:29 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Jakub Jelinek, gcc, gcc-patches

Eric Botcazou wrote:

>>Can't we at least detect the case where either the whole aggregate
>>is const or all its fields are const?  Then we don't need any blockage.
>>    
>>
>
>Guess what?  The constness is not propagated to the RTL in the former case, 
>because of the pessimization recently introduced in maybe_set_unchanging:
>
>     We cannot do this for non-static aggregates, because of the double
>     writes that can be generated by store_constructor, depending on the
>     contents of the initializer.  Yes, this does eliminate a good fraction
>     of the number of uses of RTX_UNCHANGING_P for a language like Ada.
>     It also eliminates a good quantity of bugs.  Let this be incentive to
>     eliminate RTX_UNCHANGING_P entirely in favor of a more reliable
>     solution, perhaps based on alias sets.  */
>
>This means that const automatic aggregates would take a double pessimization 
>with the blockage scheme.
>  
>
Well, so, let's remove that chunk of code; it should no longer be needed.

Eric, I can tell you're unhappy with this approach, and so you're 
casting about for something better.  That's good, but I think we've come 
as far as we can for 3.4.0.   If all this is sufficiently annoyhing to 
you, maybe you can work on ripping out RTX_UNCHANGING_P for 3.5 and 
replacing it with something better!

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-18 23:41                 ` Jakub Jelinek
  2004-03-19  1:23                   ` Eric Botcazou
@ 2004-03-19 14:31                   ` Eric Botcazou
  2004-03-19 19:29                     ` Mark Mitchell
  1 sibling, 1 reply; 32+ messages in thread
From: Eric Botcazou @ 2004-03-19 14:31 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Mark Mitchell, gcc, gcc-patches

> Can't we at least detect the case where either the whole aggregate
> is const or all its fields are const?  Then we don't need any blockage.

Guess what?  The constness is not propagated to the RTL in the former case, 
because of the pessimization recently introduced in maybe_set_unchanging:

     We cannot do this for non-static aggregates, because of the double
     writes that can be generated by store_constructor, depending on the
     contents of the initializer.  Yes, this does eliminate a good fraction
     of the number of uses of RTX_UNCHANGING_P for a language like Ada.
     It also eliminates a good quantity of bugs.  Let this be incentive to
     eliminate RTX_UNCHANGING_P entirely in favor of a more reliable
     solution, perhaps based on alias sets.  */

This means that const automatic aggregates would take a double pessimization 
with the blockage scheme.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
@ 2004-03-19 14:22 Richard Kenner
  0 siblings, 0 replies; 32+ messages in thread
From: Richard Kenner @ 2004-03-19 14:22 UTC (permalink / raw)
  To: jakub; +Cc: gcc-patches, gcc

    That would be certainly safest and the UNCHANGING would really mean
    unchanging.  But I hope we could at least extend it to objects which are
    *never* stored into in current function (see if the object is initialized
    in the function currently expanded into RTL, if not, use RTX_UNCHANGING_P
    for it as well, otherwise perhaps use some other bit or alias set).

I think so.  Certainly that would be true for parameters in languages
where they can't be changed.

It seems the trickiest problems come when we try to define what
*writing* to "unchanging" memory means.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-19  6:34 Richard Kenner
@ 2004-03-19 12:18 ` Jakub Jelinek
  0 siblings, 0 replies; 32+ messages in thread
From: Jakub Jelinek @ 2004-03-19 12:18 UTC (permalink / raw)
  To: Richard Kenner; +Cc: ebotcazou, gcc-patches, gcc

On Thu, Mar 18, 2004 at 08:26:30PM -0500, Richard Kenner wrote:
> I don't know what to think about /u overall.  It's becoming more and more
> clear that it's the source of a lot of trouble and probably is becoming
> less and less useful with aliasing code around anyway.  I'd be in favor
> of trying the experiment of only using it on data that's *never* explicitly
> stored (like constant-pool or items with static initializers).  So we
> disallow stores to /u.

That would be certainly safest and the UNCHANGING would really mean
unchanging.  But I hope we could at least extend it to objects which are
*never* stored into in current function (see if the object is initialized
in the function currently expanded into RTL, if not, use RTX_UNCHANGING_P
for it as well, otherwise perhaps use some other bit or alias set).

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
       [not found]                   ` <200403190155.18981.ebotcazou@libertysurf.fr>
@ 2004-03-19  6:42                     ` Mark Mitchell
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Mitchell @ 2004-03-19  6:42 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc, gcc-patches

Eric Botcazou wrote:

>>I'm OK with the blockage approach.  We can try to do better for 3.4.1,
>>perhaps.
>>    
>>
>
>And what about Jakub's tweak?
>  
>
Oh, sorry, yes; I meant to include that.

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
@ 2004-03-19  6:34 Richard Kenner
  2004-03-19 12:18 ` Jakub Jelinek
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Kenner @ 2004-03-19  6:34 UTC (permalink / raw)
  To: ebotcazou; +Cc: gcc-patches, gcc

    > Can't we at least detect the case where either the whole aggregate
    > is const or all its fields are const?  Then we don't need any blockage.

    So we endorse the double /u liberality?

I don't.

As I said in more private email, my feeling in the /u case where you have
a partial aggregate is to do the stores into a temporary and move it into
the /u.  Yes, that's less efficient and in many ways more than undoes the
advantage of /u, but partial aggregates are rare.

I think that's cleaner than blockage stuff.

I don't know what to think about /u overall.  It's becoming more and more
clear that it's the source of a lot of trouble and probably is becoming
less and less useful with aliasing code around anyway.  I'd be in favor
of trying the experiment of only using it on data that's *never* explicitly
stored (like constant-pool or items with static initializers).  So we
disallow stores to /u.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-18 23:41                 ` Jakub Jelinek
@ 2004-03-19  1:23                   ` Eric Botcazou
  2004-03-19 14:31                   ` Eric Botcazou
  1 sibling, 0 replies; 32+ messages in thread
From: Eric Botcazou @ 2004-03-19  1:23 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Mark Mitchell, gcc, gcc-patches

> Can't we at least detect the case where either the whole aggregate
> is const or all its fields are const?  Then we don't need any blockage.

So we endorse the double /u liberality?

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-18 23:36               ` Eric Botcazou
@ 2004-03-18 23:41                 ` Jakub Jelinek
  2004-03-19  1:23                   ` Eric Botcazou
  2004-03-19 14:31                   ` Eric Botcazou
       [not found]                 ` <405A3F26.2050100@codesourcery.com>
  1 sibling, 2 replies; 32+ messages in thread
From: Jakub Jelinek @ 2004-03-18 23:41 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Mark Mitchell, gcc, gcc-patches

On Fri, Mar 19, 2004 at 12:36:33AM +0100, Eric Botcazou wrote:
> > How about the idea above of just zeroing the holes?
> 
> If you're fine with the blockage approach...

Can't we at least detect the case where either the whole aggregate
is const or all its fields are const?  Then we don't need any blockage.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-18 19:15             ` Mark Mitchell
@ 2004-03-18 23:36               ` Eric Botcazou
  2004-03-18 23:41                 ` Jakub Jelinek
       [not found]                 ` <405A3F26.2050100@codesourcery.com>
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Botcazou @ 2004-03-18 23:36 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc, gcc-patches

> We could only zero out the "holes" between the fields.  That would be
> slower than zeroing the hole thing at once, but usually not much,
> especially since memzero will be inlined for small chunks of memory.

This can probably be awful in certain cases too (mix const/non-const, 
bitfields, packed fields, ...).  And this would defeat the clrstr patterns, 
on x86 for example.

> We could also disable the use of RTX_UNCHANGING_P in alias.c.

Could this be done somewhat selectively?

> We could make RTX_UNCHANGING_P a tri-state (with a don't-know option),
> which would probably be best.  That would allow us to keep most of the
> optimizations, but it's probably a lot more work and more invasive.
>
> We have to pessimize something because we have a poorly-designed feature
> (i.e., one with an incomplete specification) that results in generating
> wrong code.

Unfortunately, yes, this appears to be unavoidable.

> What solution do you dislike least?

Nice formulation :-)  Richard (rth) even proposed to disable entirely 
RTX_UNCHANGING_P in 3.4/3.5.  I personally think the 3.3.x code is cleaner 
(albeit probably not 100% safe).

> How about the idea above of just zeroing the holes?

If you're fine with the blockage approach...

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-18 18:31           ` Eric Botcazou
@ 2004-03-18 19:15             ` Mark Mitchell
  2004-03-18 23:36               ` Eric Botcazou
  0 siblings, 1 reply; 32+ messages in thread
From: Mark Mitchell @ 2004-03-18 19:15 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc, gcc-patches

Eric Botcazou wrote:

> 2004-03-18  Eric Botcazou <ebotcazou@libertysurf.fr>
>             Mark Mitchell <mark@codesourcery.com>
>
> 	PR optimization/13424
> 	* expr.c (store_constructor): Emit a blockage after clearing the
> 	aggregate because of an incomplete or mostly zero constructor if
>	the aggregate contains read-only fields.
>
>
>But this would still pessimize a lot, because life analysis would not be able 
>to delete the redundant writes anymore.
>  
>
We could only zero out the "holes" between the fields.  That would be 
slower than zeroing the hole thing at once, but usually not much, 
especially since memzero will be inlined for small chunks of memory.

We could also disable the use of RTX_UNCHANGING_P in alias.c.

We could make RTX_UNCHANGING_P a tri-state (with a don't-know option), 
which would probably be best.  That would allow us to keep most of the 
optimizations, but it's probably a lot more work and more invasive.

We have to pessimize something because we have a poorly-designed feature 
(i.e., one with an incomplete specification) that results in generating 
wrong code.

What solution do you dislike least?

How about the idea above of just zeroing the holes?

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-18  8:25         ` Eric Botcazou
@ 2004-03-18 18:31           ` Eric Botcazou
  2004-03-18 19:15             ` Mark Mitchell
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Botcazou @ 2004-03-18 18:31 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 805 bytes --]

> 2004-03-18  Eric Botcazou <ebotcazou@libertysurf.fr>
>             Mark Mitchell <mark@codesourcery.com>
>
> 	PR optimization/13424
> 	* expr.c (store_constructor): Emit a blockage after clearing the
> 	aggregate because of an incomplete or mostly zero constructor.

Well, this pessimizes way too much.  We can emit the blockage only in the 
unsafe cases:


 2004-03-18  Eric Botcazou <ebotcazou@libertysurf.fr>
             Mark Mitchell <mark@codesourcery.com>

 	PR optimization/13424
 	* expr.c (store_constructor): Emit a blockage after clearing the
 	aggregate because of an incomplete or mostly zero constructor if
	the aggregate contains read-only fields.


But this would still pessimize a lot, because life analysis would not be able 
to delete the redundant writes anymore.

-- 
Eric Botcazou

[-- Attachment #2: pr13424-3.diff --]
[-- Type: text/x-diff, Size: 1464 bytes --]

Index: expr.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expr.c,v
retrieving revision 1.615.4.9
diff -u -p -r1.615.4.9 expr.c
--- expr.c	13 Mar 2004 18:26:23 -0000	1.615.4.9
+++ expr.c	18 Mar 2004 18:14:18 -0000
@@ -4551,8 +4551,9 @@ store_constructor (tree exp, rtx target,
 		       == size)))
 	{
 	  rtx xtarget = target;
+	  bool partly_readonly_p = readonly_fields_p (type);
 
-	  if (readonly_fields_p (type))
+	  if (partly_readonly_p)
 	    {
 	      xtarget = copy_rtx (xtarget);
 	      RTX_UNCHANGING_P (xtarget) = 1;
@@ -4560,6 +4561,19 @@ store_constructor (tree exp, rtx target,
 
 	  clear_storage (xtarget, GEN_INT (size));
 	  cleared = 1;
+
+	  /* ??? Emit a blockage to prevent the scheduler from swapping the
+	     memory write issued just above and the memory write that may be
+	     issued below to initialize each field.  This is needed for a
+	     read-write field because the former write may carry the /u
+	     flag and not the latter, so they will not conflict.  Note that
+	     the clearing cannot be simply disabled in the unsafe cases
+	     because the C front-end relies on it to implement the semantics
+	     of constructors for automatic objects.
+	     However, not all machine descriptions define a blockage insn,
+	     so emit an ASM_INPUT to act as one.  */
+	  if (partly_readonly_p)
+	    emit_insn (gen_rtx_ASM_INPUT (VOIDmode, ""));
 	}
 
       if (! cleared)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-17 15:55       ` Mark Mitchell
@ 2004-03-18  8:25         ` Eric Botcazou
  2004-03-18 18:31           ` Eric Botcazou
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Botcazou @ 2004-03-18  8:25 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]

> (1) Try my scheduling barrier idea, where we introduce a volatile asm
> equivalent between the clearing and the normal assignment.  I know this
> is potentially pessimizing, but in practice the pessimization will
> probably be slight.
>
> (2) In the unsafe case, implement the C99 rule by explicitly clearing
> each individual field.  In other words, iterate through the fields,
> clearing all of the fields that do not have an explicit CONSTRUCTOR_ELT.
>
> I think (1) will probably be less pessimizing that (2).  Are you willing
> to give that a try?

The attached patch fixes PR opt/13424 (both on PA and UltraSPARC) and doesn't 
do any harm to the testcase on x86.  OK for mainline and 3.4 branch after a 
complete testing cycle on x86?


2004-03-18  Eric Botcazou <ebotcazou@libertysurf.fr>
            Mark Mitchell <mark@codesourcery.com>

	PR optimization/13424
	* expr.c (store_constructor): Emit a blockage after clearing the
	aggregate because of an incomplete or mostly zero constructor.


-- 
Eric Botcazou

[-- Attachment #2: pr13424-2.diff --]
[-- Type: text/x-diff, Size: 1117 bytes --]

Index: expr.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expr.c,v
retrieving revision 1.615.4.9
diff -u -p -r1.615.4.9 expr.c
--- expr.c	13 Mar 2004 18:26:23 -0000	1.615.4.9
+++ expr.c	17 Mar 2004 17:56:20 -0000
@@ -4560,6 +4560,18 @@ store_constructor (tree exp, rtx target,
 
 	  clear_storage (xtarget, GEN_INT (size));
 	  cleared = 1;
+
+	  /* ??? Emit a blockage to prevent the scheduler from swapping the
+	     memory write issued just above and the memory write that may be
+	     issued below to initialize each field.  This is needed for a
+	     read-write field because the former write may carry the /u
+	     flag and not the latter, so they will not conflict.  Note that
+	     the clearing cannot be simply disabled in the unsafe cases
+	     because the C front-end relies on it to implement the semantics
+	     of constructors for automatic objects.
+	     However, not all machine descriptions define a blockage insn,
+	     so emit an ASM_INPUT to act as one.  */
+	  emit_insn (gen_rtx_ASM_INPUT (VOIDmode, ""));
 	}
 
       if (! cleared)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-17 10:56     ` Eric Botcazou
  2004-03-17 11:49       ` Eric Botcazou
@ 2004-03-17 15:55       ` Mark Mitchell
  2004-03-18  8:25         ` Eric Botcazou
  1 sibling, 1 reply; 32+ messages in thread
From: Mark Mitchell @ 2004-03-17 15:55 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

Eric Botcazou wrote:

>>So, would you please apply Olivier's patch to the 3.4 branch and the
>>mainline?  Then, we need to come up with some better long-term solution.
>>    
>>
>
>There is a hitch: the C front-end is relying on the clearing optimization to 
>implement the semantics mandated by C99 §6.7.8 [19], that is for
>
>struct B {
>    struct A a;
>    void * const b;
>    struct A const * const c;
>    struct A const *d;
>};
>
>bar (void *x)
>{
>  struct B y = { .b = x, .c = (void *) 0 };
>}
>
>the front-end sends a constructor with only 2 elements to the middle-end, 
>while C99 wants b.a and b.d to be cleared too.
>  
>
Thank you for doing the experiment!

There are two other fixes I can think of:

(1) Try my scheduling barrier idea, where we introduce a volatile asm 
equivalent between the clearing and the normal assignment.  I know this 
is potentially pessimizing, but in practice the pessimization will 
probably be slight.

(2) In the unsafe case, implement the C99 rule by explicitly clearing 
each individual field.  In other words, iterate through the fields, 
clearing all of the fields that do not have an explicit CONSTRUCTOR_ELT.

I think (1) will probably be less pessimizing that (2).  Are you willing 
to give that a try?

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-17 10:56     ` Eric Botcazou
@ 2004-03-17 11:49       ` Eric Botcazou
  2004-03-17 15:55       ` Mark Mitchell
  1 sibling, 0 replies; 32+ messages in thread
From: Eric Botcazou @ 2004-03-17 11:49 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

> bar (void *x)
> {
>   struct B y = { .b = x, .c = (void *) 0 };
> }
>
> the front-end sends a constructor with only 2 elements to the middle-end,
> while C99 wants b.a and b.d to be cleared too.

... y.a and y.d of course.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-16 16:53   ` Mark Mitchell
  2004-03-16 16:59     ` Paul Koning
@ 2004-03-17 10:56     ` Eric Botcazou
  2004-03-17 11:49       ` Eric Botcazou
  2004-03-17 15:55       ` Mark Mitchell
  1 sibling, 2 replies; 32+ messages in thread
From: Eric Botcazou @ 2004-03-17 10:56 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc

> So, would you please apply Olivier's patch to the 3.4 branch and the
> mainline?  Then, we need to come up with some better long-term solution.

There is a hitch: the C front-end is relying on the clearing optimization to 
implement the semantics mandated by C99 §6.7.8 [19], that is for

struct B {
    struct A a;
    void * const b;
    struct A const * const c;
    struct A const *d;
};

bar (void *x)
{
  struct B y = { .b = x, .c = (void *) 0 };
}

the front-end sends a constructor with only 2 elements to the middle-end, 
while C99 wants b.a and b.d to be cleared too.

Amazingly, the patch causes no failure in the whole testsuite except for the 
PR's testcase (gcc.dg/20031202-1.c), which now fails everywhere.

Hum... what to do here?  Other front-ends may also be relying on this 
semantics (the middle-end clears every record or union whose constructor is 
not complete).  Maybe we could simply revert to the state of the 3.3 branch.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-16 17:24         ` Zack Weinberg
@ 2004-03-16 17:25           ` Paul Koning
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Koning @ 2004-03-16 17:25 UTC (permalink / raw)
  To: zack; +Cc: ian, gcc

>>>>> "Zack" == Zack Weinberg <zack@codesourcery.com> writes:

 Zack> Ian Lance Taylor <ian@wasabisystems.com> writes:
 >> Paul Koning <pkoning@equallogic.com> writes:
 >>> Perhaps we need asm("":::"barrier")?
 >> Seconded.  For high performance device manipulation, it would be
 >> good to have a way to say "do not schedule instructions across
 >> this point" without also saying "clobber everything".

 Zack> __builtin_fence() (possibly with an argument to request various
 Zack> sorts of memory barriers that various architectures have) seems
 Zack> preferable to me over asm()...

Good point.  You'd want to be able to ask for memory barrier
instructions, or not -- in some cases all you need is to block
scheduler actions, and the ordering rules of the target machine are
otherwise sufficient for the purpose.

	  paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-16 17:11       ` Ian Lance Taylor
@ 2004-03-16 17:24         ` Zack Weinberg
  2004-03-16 17:25           ` Paul Koning
  0 siblings, 1 reply; 32+ messages in thread
From: Zack Weinberg @ 2004-03-16 17:24 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Paul Koning, gcc

Ian Lance Taylor <ian@wasabisystems.com> writes:
> Paul Koning <pkoning@equallogic.com> writes:
>> Perhaps we need asm("":::"barrier")?
>
> Seconded.  For high performance device manipulation, it would be good
> to have a way to say "do not schedule instructions across this point"
> without also saying "clobber everything".

__builtin_fence() (possibly with an argument to request various sorts
of memory barriers that various architectures have) seems preferable
to me over asm()...

zw

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-16 16:59     ` Paul Koning
@ 2004-03-16 17:11       ` Ian Lance Taylor
  2004-03-16 17:24         ` Zack Weinberg
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Lance Taylor @ 2004-03-16 17:11 UTC (permalink / raw)
  To: Paul Koning; +Cc: gcc

Paul Koning <pkoning@equallogic.com> writes:

> >>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:
> 
>  Mark> The other alternative I can think of is to introduce a
>  Mark> scheduling barrier (e.g., a dummy vaolatile asm) after the
>  Mark> clearing operation. 
> 
> It would be nice if there were a clearly documented way to do that.
> 
> I've found two:
> 
>      asm("")
>      asm("":::"memory")
> 
> The former does what I want but isn't documented and (according to
> what I was told) should NOT be counted on to be a barrier.
> 
> The latter is documented and does a superset of what I needed at the
> time.  I say "superset" because it causes registers to be reloaded
> with data from variables (due to the memory clobber).  That's a
> performance hit if I didn't have any real memory clobbers in the
> system.
> 
> Perhaps we need asm("":::"barrier")?

Seconded.  For high performance device manipulation, it would be good
to have a way to say "do not schedule instructions across this point"
without also saying "clobber everything".

Ian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-16 16:53   ` Mark Mitchell
@ 2004-03-16 16:59     ` Paul Koning
  2004-03-16 17:11       ` Ian Lance Taylor
  2004-03-17 10:56     ` Eric Botcazou
  1 sibling, 1 reply; 32+ messages in thread
From: Paul Koning @ 2004-03-16 16:59 UTC (permalink / raw)
  To: mark; +Cc: ebotcazou, gcc

>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:

 Mark> The other alternative I can think of is to introduce a
 Mark> scheduling barrier (e.g., a dummy vaolatile asm) after the
 Mark> clearing operation. 

It would be nice if there were a clearly documented way to do that.

I've found two:

     asm("")
     asm("":::"memory")

The former does what I want but isn't documented and (according to
what I was told) should NOT be counted on to be a barrier.

The latter is documented and does a superset of what I needed at the
time.  I say "superset" because it causes registers to be reloaded
with data from variables (due to the memory clobber).  That's a
performance hit if I didn't have any real memory clobbers in the
system.

Perhaps we need asm("":::"barrier")?

	paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-11  9:45 ` Eric Botcazou
  2004-03-11 12:48   ` Jakub Jelinek
@ 2004-03-16 16:53   ` Mark Mitchell
  2004-03-16 16:59     ` Paul Koning
  2004-03-17 10:56     ` Eric Botcazou
  1 sibling, 2 replies; 32+ messages in thread
From: Mark Mitchell @ 2004-03-16 16:53 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

Eric Botcazou wrote:

>>I'd particularly like to understand this RTX_UNCHANGING_P optimization
>>issue.  We've bumped into this before.  I think we need to take the
>>conservative approach, even if that is pessimizing in some case.
>>Would someone who understands the details of this bug please summarize
>>it, mail that to me, and also add that information to the appropriate
>>PR?
>>    
>>
>
>The problem is related to the semantics of the /u flag on memory writes.  
>Strictly speaking, the /u flag should be put only if the memory location is 
>guaranteed to be ever written once.
>  
>
Thanks for this summary.

>Third problem: now, on the 3.4 branch, if the whole structure contains 
>read-only fields and read-write fields, is wholly cleared and at least one 
>read-write field is set to non-zero, we have a memory write with the /u flag 
>followed by a memory write without the /u flag.  These of course don't 
>conflict so may be swapped by the scheduler (this is my testcase on 
>UltraSPARC for PR opt/13424).
>  
>
>A solution (by Olivier Hainque, that is now in the ACT tree) is to disable 
>the clearing optimization in the unsafe cases, that is when we know that 
>another write may be issued later which will not conflict
>
The other alternative I can think of is to introduce a scheduling 
barrier (e.g., a dummy vaolatile asm) after the clearing operation.  I 
don't know which of these alternatives would lead to greater 
pessimization.  I think the scheduling barrier approach might actually 
be better, but I can't justify that very well.

So, would you please apply Olivier's patch to the 3.4 branch and the 
mainline?  Then, we need to come up with some better long-term solution.

Thanks,

-- 
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-11 12:48   ` Jakub Jelinek
@ 2004-03-11 21:11     ` Richard Henderson
  0 siblings, 0 replies; 32+ messages in thread
From: Richard Henderson @ 2004-03-11 21:11 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Eric Botcazou, mark, gcc

On Thu, Mar 11, 2004 at 11:39:11AM +0100, Jakub Jelinek wrote:
> BTW, there is one optimization which we perhaps could do.  People often
> write:
> void foo (void)
> {
>   const char bar[16] = { 0x80 };
>   baz (bar);
> }
> which could be optimized into:
> void foo (void)
> {
>   static const char bar[16] = { 0x80 };
>   baz (bar);
> }
> 
> Then we can get rid of the runtime initialization altogether.
> This could be done either if the whole aggregate is const, or if all fields
> are const.

This does indeed happen on tree-ssa branch if the whole aggregate
is const.


r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-11 12:49 Richard Kenner
@ 2004-03-11 14:09 ` Eric Botcazou
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Botcazou @ 2004-03-11 14:09 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

> I believe I became aware that the multiple writes wouldn't work *after*
> the discussion you mention.  But I'm not certain.

Ok, that makes sense.  Thanks.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
@ 2004-03-11 12:49 Richard Kenner
  2004-03-11 14:09 ` Eric Botcazou
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Kenner @ 2004-03-11 12:49 UTC (permalink / raw)
  To: ebotcazou; +Cc: gcc

    Ok, fine, that's even clearer then.  But that was not the impression I got 
    when reading the discussion between you and Olivier on the subject.

I believe I became aware that the multiple writes wouldn't work *after*
the discussion you mention.  But I'm not certain.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-11  9:45 ` Eric Botcazou
@ 2004-03-11 12:48   ` Jakub Jelinek
  2004-03-11 21:11     ` Richard Henderson
  2004-03-16 16:53   ` Mark Mitchell
  1 sibling, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2004-03-11 12:48 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: mark, gcc

On Thu, Mar 11, 2004 at 10:52:44AM +0100, Eric Botcazou wrote:
> So we may end up with a record that contains some read-only fields, has been 
> wholly cleared and is indirectly assigned.  In this case again, we have a 
> memory write without the /u flag followed by a memory write with the /u 
> flag.  Richard Kenner fixed that on the 3.4 branch *before* Jakub's patch by 
> putting the /u flag during the clearing optimization in store_constructor:

But I've reverted my patch on 3.4 branch and the trunk (after changing
i386 backend to honor /u in stringops).

BTW, there is one optimization which we perhaps could do.  People often
write:
void foo (void)
{
  const char bar[16] = { 0x80 };
  baz (bar);
}
which could be optimized into:
void foo (void)
{
  static const char bar[16] = { 0x80 };
  baz (bar);
}

Then we can get rid of the runtime initialization altogether.
This could be done either if the whole aggregate is const, or if all fields
are const.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-11 12:15 Richard Kenner
@ 2004-03-11 12:39 ` Eric Botcazou
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Botcazou @ 2004-03-11 12:39 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

> No, I don't think that's right.  I'm pretty sure that there is some
> code someplace that will fail if there are two writes to the same /u
> location and I thought we added code to prevent any such.

Ok, fine, that's even clearer then.  But that was not the impression I got 
when reading the discussion between you and Olivier on the subject.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
@ 2004-03-11 12:15 Richard Kenner
  2004-03-11 12:39 ` Eric Botcazou
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Kenner @ 2004-03-11 12:15 UTC (permalink / raw)
  To: ebotcazou; +Cc: gcc

    In practice, though, there has been a more liberal use of this /u flag 
    because putting it on several writes to the same memory location doesn't 
    seem to cause any problem as long as every write to the said location 
    carries the /u flag (because the memory writes then conflict).

No, I don't think that's right.  I'm pretty sure that there is some
code someplace that will fail if there are two writes to the same /u location
and I thought we added code to prevent any such.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: GCC Status Report (2004-03-09)
  2004-03-09 18:09 Mark Mitchell
@ 2004-03-11  9:45 ` Eric Botcazou
  2004-03-11 12:48   ` Jakub Jelinek
  2004-03-16 16:53   ` Mark Mitchell
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Botcazou @ 2004-03-11  9:45 UTC (permalink / raw)
  To: mark; +Cc: gcc

> I'd particularly like to understand this RTX_UNCHANGING_P optimization
> issue.  We've bumped into this before.  I think we need to take the
> conservative approach, even if that is pessimizing in some case.
> Would someone who understands the details of this bug please summarize
> it, mail that to me, and also add that information to the appropriate
> PR?

The problem is related to the semantics of the /u flag on memory writes.  
Strictly speaking, the /u flag should be put only if the memory location is 
guaranteed to be ever written once.

In practice, though, there has been a more liberal use of this /u flag 
because putting it on several writes to the same memory location doesn't 
seem to cause any problem as long as every write to the said location 
carries the /u flag (because the memory writes then conflict).

store_constructor features an optimization: if the constructor is mostly 
zero, the whole structure is cleared first, even though a few members are set 
to something else.  Later, if some individual members are read-only and set 
to something else, additional individual writes are issued and the /u flag 
is naturally put on them.  This was the initial state on the 3.3 branch.

First problem: if the whole structure contains read-only fields, is wholly 
cleared and at least one read-only field is set to non-zero, we have a 
memory write without the /u flag followed by a memory write with the /u 
flag.  These don't conflict so may be swapped by the scheduler.  Jakub fixed 
that on the 3.3 branch by not putting the /u flag on the individual writes 
if the whole structure has been cleared:

before:

	  if (TREE_READONLY (field))
	    {
	      if (GET_CODE (to_rtx) == MEM)
		to_rtx = copy_rtx (to_rtx);

	      RTX_UNCHANGING_P (to_rtx) = 1;
	    }

after

	  /* If the constructor has been cleared, setting RTX_UNCHANGING_P
	     on the MEM might lead to scheduling the clearing after the
	     store.  */
	  if (TREE_READONLY (field) && !cleared)
	    {
	      if (GET_CODE (to_rtx) == MEM)
		to_rtx = copy_rtx (to_rtx);

	      RTX_UNCHANGING_P (to_rtx) = 1;
	    }

This is the current state of the 3.3 branch.


Second problem: expand_expr contains these lines:

case INDIRECT_REF:
...
	/* If we are writing to this object and its type is a record with
	   readonly fields, we must mark it as readonly so it will
	   conflict with readonly references to those fields.  */
	if (modifier == EXPAND_WRITE && readonly_fields_p (type))
	  RTX_UNCHANGING_P (temp) = 1;

So we may end up with a record that contains some read-only fields, has been 
wholly cleared and is indirectly assigned.  In this case again, we have a 
memory write without the /u flag followed by a memory write with the /u 
flag.  Richard Kenner fixed that on the 3.4 branch *before* Jakub's patch by 
putting the /u flag during the clearing optimization in store_constructor:

before

      /* If the constructor has fewer fields than the structure
	 or if we are initializing the structure to mostly zeros,
	 clear the whole structure first.  Don't do this if TARGET is a
	 register whose mode size isn't equal to SIZE since clear_storage
	 can't handle this case.  */
      else if (((list_length (CONSTRUCTOR_ELTS (exp)) != fields_length 
(type))
		|| mostly_zeros_p (exp))
	       && (GET_CODE (target) != REG
		   || ((HOST_WIDE_INT) GET_MODE_SIZE (GET_MODE (target))
		       == size)))
	{
	  clear_storage (target, GEN_INT (size));
	  cleared = 1;
	}

after

      /* If the constructor has fewer fields than the structure
	 or if we are initializing the structure to mostly zeros,
	 clear the whole structure first.  Don't do this if TARGET is a
	 register whose mode size isn't equal to SIZE since clear_storage
	 can't handle this case.  */
      else if (((list_length (CONSTRUCTOR_ELTS (exp)) != fields_length 
(type))
		|| mostly_zeros_p (exp))
	       && (GET_CODE (target) != REG
		   || ((HOST_WIDE_INT) GET_MODE_SIZE (GET_MODE (target))
		       == size)))
	{
	  rtx xtarget = target;

	  if (readonly_fields_p (type))
	    {
	      xtarget = copy_rtx (xtarget);
	      RTX_UNCHANGING_P (xtarget) = 1;
	    }

	  clear_storage (xtarget, GEN_INT (size));
	  cleared = 1;
	}

This is the current state of the 3.4 branch and means that Jakub's patch is 
not valid on this branch (because otherwise we may have a memory write with 
the /u flag followed by a memory write without the /u flag for read-only 
fields).


Third problem: now, on the 3.4 branch, if the whole structure contains 
read-only fields and read-write fields, is wholly cleared and at least one 
read-write field is set to non-zero, we have a memory write with the /u flag 
followed by a memory write without the /u flag.  These of course don't 
conflict so may be swapped by the scheduler (this is my testcase on 
UltraSPARC for PR opt/13424).


A solution (by Olivier Hainque, that is now in the ACT tree) is to disable 
the clearing optimization in the unsafe cases, that is when we know that 
another write may be issued later which will not conflict:

      /* If the constructor has fewer fields than the structure or if
	 we are initializing the structure to mostly zeros, clear the
	 whole structure first. This is an optimization which avoids
	 having to clear several parts individually later on.  */
      else if (! cleared && size > 0
	       && ((list_length (CONSTRUCTOR_ELTS (exp))
		    != fields_length (type))
		   || mostly_zeros_p (exp))
	       /* Don't do this if TARGET is a register whose mode
  	          size isn't equal to SIZE since clear_storage can't
  	          handle this case.  */
	       && (GET_CODE (target) != REG
		   || ((HOST_WIDE_INT) GET_MODE_SIZE (GET_MODE (target))
		       == size))
	       /* Don't do this if the type has readonly fields or target is
		  marked unchanging. We would first end up with multiple
		  stores to const mem (to clear and then to initialize), and
		  this could also interfere with later stores to non readonly
		  fields.  */
	       && ! (readonly_fields_p (type) || RTX_UNCHANGING_P (target)))
	{
	  clear_storage (target, GEN_INT (size));
	  cleared = 1;
	}

Of course this pessimizes.  In particular, I think we could still clear if 
the whole structure is read-only, assuming we endorse the liberal use of the 
/u flag I was talking about above.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 32+ messages in thread

* GCC Status Report (2004-03-09)
@ 2004-03-09 18:09 Mark Mitchell
  2004-03-11  9:45 ` Eric Botcazou
  0 siblings, 1 reply; 32+ messages in thread
From: Mark Mitchell @ 2004-03-09 18:09 UTC (permalink / raw)
  To: gcc


[I tried to send this out last night, but apparently failed.]

GCC 3.4
=======

Unfortunately, although progress was made last week, we still have a
ways to go.  We're down to 48 regressions targeted at 3.4.1, down from
57.  Some of those have patches that I've approved for 3.4, but there
are a lot that do not.  We're not ready to spin prerelease bits yet.

I'd particularly like to understand this RTX_UNCHANGING_P optimization
issue.  We've bumped into this before.  I think we need to take the
conservative approach, even if that is pessimizing in some case.
Would someone who understands the details of this bug please summarize
it, mail that to me, and also add that information to the appropriate
PR?

Some of the other PRs also look pretty significant.  I'll continue to
beat on the C++ issues, but there are serious issues with debugging
(13974) and with wrong code (14381, 14470, 12863, 13424, 13632).

Henceforth, please do not make any non-documentation check-ins to the
3.4 branch without my explicit approval.  To get that approval, please
do *not* send me mail directly.  Instead, add your patch to the
relevant PR, which must be targeted at 3.r, and add me to the CC list
for the PR.  Note that this procedure implies that if there is no PR
targeted at 3.4 I will not accept the patch.

Furthermore, please do not create any new PRs targeted for 3.4 without
my explicit permission.  If it's a regression, target it for 3.4.1.
If you think it might need to be fixed in 3.4, add me to the CC list,
and add a note asking me to move back the target.  Please do not do
this unless the PR is wrong-code, ICE-on-valid, or bootstrap for a
primary target.  New PRs referring to other categories of error are
simply not going to get fixed for 3.4.

GCC 3.5
=======

In a holding pattern until tree-ssa merge is complete.

--
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2004-03-20 11:03 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-19 20:08 GCC Status Report (2004-03-09) Richard Kenner
  -- strict thread matches above, loose matches on Subject: below --
2004-03-19 14:22 Richard Kenner
2004-03-19  6:34 Richard Kenner
2004-03-19 12:18 ` Jakub Jelinek
2004-03-11 12:49 Richard Kenner
2004-03-11 14:09 ` Eric Botcazou
2004-03-11 12:15 Richard Kenner
2004-03-11 12:39 ` Eric Botcazou
2004-03-09 18:09 Mark Mitchell
2004-03-11  9:45 ` Eric Botcazou
2004-03-11 12:48   ` Jakub Jelinek
2004-03-11 21:11     ` Richard Henderson
2004-03-16 16:53   ` Mark Mitchell
2004-03-16 16:59     ` Paul Koning
2004-03-16 17:11       ` Ian Lance Taylor
2004-03-16 17:24         ` Zack Weinberg
2004-03-16 17:25           ` Paul Koning
2004-03-17 10:56     ` Eric Botcazou
2004-03-17 11:49       ` Eric Botcazou
2004-03-17 15:55       ` Mark Mitchell
2004-03-18  8:25         ` Eric Botcazou
2004-03-18 18:31           ` Eric Botcazou
2004-03-18 19:15             ` Mark Mitchell
2004-03-18 23:36               ` Eric Botcazou
2004-03-18 23:41                 ` Jakub Jelinek
2004-03-19  1:23                   ` Eric Botcazou
2004-03-19 14:31                   ` Eric Botcazou
2004-03-19 19:29                     ` Mark Mitchell
2004-03-19 20:04                       ` Eric Botcazou
2004-03-19 20:23                         ` Mark Mitchell
2004-03-20 19:51                           ` Eric Botcazou
     [not found]                 ` <405A3F26.2050100@codesourcery.com>
     [not found]                   ` <200403190155.18981.ebotcazou@libertysurf.fr>
2004-03-19  6:42                     ` Mark Mitchell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).