public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix unaligned access when predictive commoning (PR 71083)
@ 2016-08-08 19:57 Bernd Edlinger
  2016-08-09  7:29 ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Bernd Edlinger @ 2016-08-08 19:57 UTC (permalink / raw)
  To: gcc-patches, Richard Biener, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 817 bytes --]

Hi!


As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71083
we generate unaligned accesses because tree-predcom rewrites
bitfield and packed accesses in a way that looses the proper
alignment information.

The attached patch fixes this by re-using the gimple memory
expression from the original access.

I was not sure, if any non-constant array references would be
valid at where the ref_at_iteration is expanded, I set these
to the array-lower-bound, as the DR_OFFSET contains the folded
value of all non-constant array references.

The patch compensates for the constant offset from the outer
reference to the inner reference, and replaces the inner
reference instead of the outer reference.


Boot-strapped and reg-tested on x86_64-linux-gnu.
OK for trunk?


Thanks
Bernd.

[-- Attachment #2: changelog-pr71083.txt --]
[-- Type: text/plain, Size: 295 bytes --]

2016-08-08  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* tree-predcom.c (ref_at_iteration): Rewrite the inner reference.

testsuite:
2016-08-08  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* gcc.c-torture/execute/pr71083.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-pr71083.diff --]
[-- Type: text/x-patch; name="patch-pr71083.diff", Size: 4008 bytes --]

Index: gcc/tree-predcom.c
===================================================================
--- gcc/tree-predcom.c	(revision 239193)
+++ gcc/tree-predcom.c	(working copy)
@@ -1364,11 +1364,33 @@ replace_ref_with (gimple *stmt, tree new_tree, boo
 /* Returns a memory reference to DR in the ITER-th iteration of
    the loop it was analyzed in.  Append init stmts to STMTS.  */
 
-static tree 
+static tree
 ref_at_iteration (data_reference_p dr, int iter, gimple_seq *stmts)
 {
   tree off = DR_OFFSET (dr);
   tree coff = DR_INIT (dr);
+  tree ref = unshare_expr (DR_REF (dr));
+  tree *exp = &ref;
+  while (handled_component_p (*exp))
+    {
+      switch (TREE_CODE (*exp))
+	{
+	  case ARRAY_REF:
+	  case ARRAY_RANGE_REF:
+	    if (!cst_and_fits_in_hwi (TREE_OPERAND (*exp, 1)))
+	      TREE_OPERAND (*exp, 1) = array_ref_low_bound (*exp);
+	    break;
+	  default:
+	    break;
+	}
+      exp = &TREE_OPERAND (*exp, 0);
+    }
+  HOST_WIDE_INT bitsize, bitpos;
+  tree offset;
+  machine_mode mode;
+  int unsignedp, reversep, volatilep = 0;
+  get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode,
+		       &unsignedp, &reversep, &volatilep);
   if (iter == 0)
     ;
   else if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST)
@@ -1377,27 +1399,16 @@ ref_at_iteration (data_reference_p dr, int iter, g
   else
     off = size_binop (PLUS_EXPR, off,
 		      size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter)));
+  if (offset)
+    off = size_binop (MINUS_EXPR, off, offset);
+  if (bitpos)
+    coff = size_binop (MINUS_EXPR, coff, ssize_int (bitpos / BITS_PER_UNIT));
   tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
   addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
 				 is_gimple_mem_ref_addr, NULL_TREE);
-  tree alias_ptr = fold_convert (reference_alias_ptr_type (DR_REF (dr)), coff);
-  /* While data-ref analysis punts on bit offsets it still handles
-     bitfield accesses at byte boundaries.  Cope with that.  Note that
-     we cannot simply re-apply the outer COMPONENT_REF because the
-     byte-granular portion of it is already applied via DR_INIT and
-     DR_OFFSET, so simply build a BIT_FIELD_REF knowing that the bits
-     start at offset zero.  */
-  if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
-      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1)))
-    {
-      tree field = TREE_OPERAND (DR_REF (dr), 1);
-      return build3 (BIT_FIELD_REF, TREE_TYPE (DR_REF (dr)),
-		     build2 (MEM_REF, DECL_BIT_FIELD_TYPE (field),
-			     addr, alias_ptr),
-		     DECL_SIZE (field), bitsize_zero_node);
-    }
-  else
-    return fold_build2 (MEM_REF, TREE_TYPE (DR_REF (dr)), addr, alias_ptr);
+  tree alias_ptr = fold_convert (reference_alias_ptr_type (*exp), coff);
+  *exp = fold_build2 (MEM_REF, TREE_TYPE (*exp), addr, alias_ptr);
+  return ref;
 }
 
 /* Get the initialization expression for the INDEX-th temporary variable
Index: gcc/testsuite/gcc.c-torture/execute/pr71083.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/pr71083.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr71083.c	(working copy)
@@ -0,0 +1,43 @@
+struct lock_chain {
+  unsigned int irq_context: 2,
+    depth: 6,
+    base: 24;
+};
+
+__attribute__((noinline, noclone))
+struct lock_chain * foo (struct lock_chain *chain)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+    {
+      chain[i+1].base = chain[i].base;
+    }
+  return chain;
+}
+
+struct lock_chain1 {
+  char x;
+  unsigned short base;
+} __attribute__((packed));
+
+__attribute__((noinline, noclone))
+struct lock_chain1 * bar (struct lock_chain1 *chain)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+    {
+      chain[i+1].base = chain[i].base;
+    }
+  return chain;
+}
+
+struct lock_chain test [101];
+struct lock_chain1 test1 [101];
+
+int
+main ()
+{
+  foo (test);
+  bar (test1);
+  return 0;
+}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-08 19:57 [PATCH] Fix unaligned access when predictive commoning (PR 71083) Bernd Edlinger
@ 2016-08-09  7:29 ` Richard Biener
  2016-08-09 17:31   ` Bernd Edlinger
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2016-08-09  7:29 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches, Jeff Law, Jakub Jelinek

On Mon, 8 Aug 2016, Bernd Edlinger wrote:

> Hi!
> 
> 
> As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71083
> we generate unaligned accesses because tree-predcom rewrites
> bitfield and packed accesses in a way that looses the proper
> alignment information.
> 
> The attached patch fixes this by re-using the gimple memory
> expression from the original access.
> 
> I was not sure, if any non-constant array references would be
> valid at where the ref_at_iteration is expanded, I set these
> to the array-lower-bound, as the DR_OFFSET contains the folded
> value of all non-constant array references.
> 
> The patch compensates for the constant offset from the outer
> reference to the inner reference, and replaces the inner
> reference instead of the outer reference.
> 
> 
> Boot-strapped and reg-tested on x86_64-linux-gnu.
> OK for trunk?

I don't think what you do is correct.  Consider

 for (i)
  for (j)
   {
     ... = a[i][j];
   }

predcom considers innermost loops only and thus the DRs will
be analyzed with respect to that which means DR_BASE_ADDRESS
is &a[i][0] but you end up generating (*(&a) + i * ..)[0][0]
which is at best suspicious with respect to further analyses
by data-ref and TBAA.  Also note that this may get alignment
wrong as well as changing [i] to [0] may lead to false alignment
(consider a [2][2] char array aligned to 4 bytes).

Your patch goes halfway back to code we had in the past
(looking at the 4.3 branch right now) which had numerous
issues (don't remember exactly).

I believe that if we'd handle bitfields by

  if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
      && DECL_BIT_FIELD_TYPE (TREE_OPERAND (DR_REF (dr), 1)))
    ref = TREE_OPERAND (DR_REF (dr), 0);
  else
    ref = DR_REF (dr);
  unsigned align = get_object_alignment (ref);

and use the type / alignment of that ref for the built MEM_REF
(with coff adjusted by the split bitfield component-ref offset)
and build a COMPONENT_REF around the MEM_REF to handle the
bitfield part this should work ok.

Richard.

> 
> Thanks
> Bernd.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-09  7:29 ` Richard Biener
@ 2016-08-09 17:31   ` Bernd Edlinger
  2016-08-09 20:48     ` Eric Botcazou
  0 siblings, 1 reply; 14+ messages in thread
From: Bernd Edlinger @ 2016-08-09 17:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jeff Law, Jakub Jelinek, Eric Botcazou

[-- Attachment #1: Type: text/plain, Size: 3205 bytes --]

On 08/09/16 09:29, Richard Biener wrote:
> On Mon, 8 Aug 2016, Bernd Edlinger wrote:
>
>> Hi!
>>
>>
>> As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71083
>> we generate unaligned accesses because tree-predcom rewrites
>> bitfield and packed accesses in a way that looses the proper
>> alignment information.
>>
>> The attached patch fixes this by re-using the gimple memory
>> expression from the original access.
>>
>> I was not sure, if any non-constant array references would be
>> valid at where the ref_at_iteration is expanded, I set these
>> to the array-lower-bound, as the DR_OFFSET contains the folded
>> value of all non-constant array references.
>>
>> The patch compensates for the constant offset from the outer
>> reference to the inner reference, and replaces the inner
>> reference instead of the outer reference.
>>
>>
>> Boot-strapped and reg-tested on x86_64-linux-gnu.
>> OK for trunk?
>
> I don't think what you do is correct.  Consider
>
>   for (i)
>    for (j)
>     {
>       ... = a[i][j];
>     }
>
> predcom considers innermost loops only and thus the DRs will
> be analyzed with respect to that which means DR_BASE_ADDRESS
> is &a[i][0] but you end up generating (*(&a) + i * ..)[0][0]
> which is at best suspicious with respect to further analyses
> by data-ref and TBAA.  Also note that this may get alignment
> wrong as well as changing [i] to [0] may lead to false alignment
> (consider a [2][2] char array aligned to 4 bytes).
>
> Your patch goes halfway back to code we had in the past
> (looking at the 4.3 branch right now) which had numerous
> issues (don't remember exactly).
>

Hmm.  Yes.  These ARRAY_REFs ruin the whole idea :)

> I believe that if we'd handle bitfields by
>
>    if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
>        && DECL_BIT_FIELD_TYPE (TREE_OPERAND (DR_REF (dr), 1)))
>      ref = TREE_OPERAND (DR_REF (dr), 0);
>    else
>      ref = DR_REF (dr);
>    unsigned align = get_object_alignment (ref);
>
> and use the type / alignment of that ref for the built MEM_REF
> (with coff adjusted by the split bitfield component-ref offset)
> and build a COMPONENT_REF around the MEM_REF to handle the
> bitfield part this should work ok.
>

Ooookay.  I think your idea works in principle.

Attached a new version of the patch, I hope it did not become
too ugly.

I think from Eric's comment in get_inner_ref it can be possible
in Ada that the outer object is accidentally byte-aligned
but the inner reference is not.  In that case I think it is
better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.

Eric do you agree?  Are there any Ada test cases where the
pcom optimization jumps in?

We prefer the COMPONENT_REF just because it is likely more
aligned than the bit field itself.  The mem_ref should be
correctly aligned in both cases.

I was not sure if I should pass the TREE_OPERAND(ref, 2) to
the COMPONENT_REF constructor. Is that used at all? All other
places where COMPONENT_REF are built, seen to pass NULL there.


Bootstrap on x86_64-linux-gnu, reg-test is still running.

Is it OK it no regressions?


Thanks
Bernd.

[-- Attachment #2: changelog-pr71083.txt --]
[-- Type: text/plain, Size: 315 bytes --]

2016-08-08  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* tree-predcom.c (ref_at_iteration): Correctly align the
	inner reference.

testsuite:
2016-08-08  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* gcc.c-torture/execute/pr71083.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-pr71083.diff --]
[-- Type: text/x-patch; name="patch-pr71083.diff", Size: 4673 bytes --]

Index: gcc/tree-predcom.c
===================================================================
--- gcc/tree-predcom.c	(revision 239193)
+++ gcc/tree-predcom.c	(working copy)
@@ -213,6 +213,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "params.h"
 #include "tree-affine.h"
+#include "builtins.h"
 
 /* The maximum number of iterations between the considered memory
    references.  */
@@ -1364,11 +1365,16 @@ replace_ref_with (gimple *stmt, tree new_tree, boo
 /* Returns a memory reference to DR in the ITER-th iteration of
    the loop it was analyzed in.  Append init stmts to STMTS.  */
 
-static tree 
+static tree
 ref_at_iteration (data_reference_p dr, int iter, gimple_seq *stmts)
 {
   tree off = DR_OFFSET (dr);
   tree coff = DR_INIT (dr);
+  tree ref = DR_REF (dr);
+  enum tree_code ref_code = ERROR_MARK;
+  tree ref_type = NULL_TREE;
+  tree ref_op1 = NULL_TREE;
+  tree ref_op2 = NULL_TREE;
   if (iter == 0)
     ;
   else if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST)
@@ -1377,27 +1383,43 @@ ref_at_iteration (data_reference_p dr, int iter, g
   else
     off = size_binop (PLUS_EXPR, off,
 		      size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter)));
+  if (TREE_CODE (ref) == COMPONENT_REF
+      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
+    {
+      unsigned HOST_WIDE_INT boff;
+      tree field = TREE_OPERAND (ref, 1);
+      ref_type = TREE_TYPE (ref);
+      boff = tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field));
+      /* This can occur in Ada.  See Eric's comment in get_bit_range.  */
+      if ((boff % BITS_PER_UNIT) != 0)
+	{
+	  ref_code = BIT_FIELD_REF;
+	  ref_op1 = DECL_SIZE (field);
+	  ref_op2 = bitsize_zero_node;
+	}
+      else
+	{
+	  boff >>= LOG2_BITS_PER_UNIT;
+	  boff += tree_to_uhwi (component_ref_field_offset (ref));
+	  coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
+	  ref_code = COMPONENT_REF;
+	  ref_op1 = field;
+	  ref_op2 = TREE_OPERAND (ref, 2);
+	  ref = TREE_OPERAND (ref, 0);
+	}
+    }
   tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
   addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
 				 is_gimple_mem_ref_addr, NULL_TREE);
-  tree alias_ptr = fold_convert (reference_alias_ptr_type (DR_REF (dr)), coff);
+  tree alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff);
+  tree type = build_aligned_type (TREE_TYPE (ref),
+				  get_object_alignment (ref));
+  ref = build2 (MEM_REF, type, addr, alias_ptr);
   /* While data-ref analysis punts on bit offsets it still handles
-     bitfield accesses at byte boundaries.  Cope with that.  Note that
-     we cannot simply re-apply the outer COMPONENT_REF because the
-     byte-granular portion of it is already applied via DR_INIT and
-     DR_OFFSET, so simply build a BIT_FIELD_REF knowing that the bits
-     start at offset zero.  */
-  if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
-      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1)))
-    {
-      tree field = TREE_OPERAND (DR_REF (dr), 1);
-      return build3 (BIT_FIELD_REF, TREE_TYPE (DR_REF (dr)),
-		     build2 (MEM_REF, DECL_BIT_FIELD_TYPE (field),
-			     addr, alias_ptr),
-		     DECL_SIZE (field), bitsize_zero_node);
-    }
-  else
-    return fold_build2 (MEM_REF, TREE_TYPE (DR_REF (dr)), addr, alias_ptr);
+     bitfield accesses at byte boundaries.  Cope with that.  */
+  if (ref_type)
+    ref = build3 (ref_code, ref_type, ref, ref_op1, ref_op2);
+  return ref;
 }
 
 /* Get the initialization expression for the INDEX-th temporary variable
Index: gcc/testsuite/gcc.c-torture/execute/pr71083.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/pr71083.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr71083.c	(working copy)
@@ -0,0 +1,43 @@
+struct lock_chain {
+  unsigned int irq_context: 2,
+    depth: 6,
+    base: 24;
+};
+
+__attribute__((noinline, noclone))
+struct lock_chain * foo (struct lock_chain *chain)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+    {
+      chain[i+1].base = chain[i].base;
+    }
+  return chain;
+}
+
+struct lock_chain1 {
+  char x;
+  unsigned short base;
+} __attribute__((packed));
+
+__attribute__((noinline, noclone))
+struct lock_chain1 * bar (struct lock_chain1 *chain)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+    {
+      chain[i+1].base = chain[i].base;
+    }
+  return chain;
+}
+
+struct lock_chain test [101];
+struct lock_chain1 test1 [101];
+
+int
+main ()
+{
+  foo (test);
+  bar (test1);
+  return 0;
+}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-09 17:31   ` Bernd Edlinger
@ 2016-08-09 20:48     ` Eric Botcazou
  2016-08-09 22:23       ` Bernd Edlinger
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Botcazou @ 2016-08-09 20:48 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: Richard Biener, gcc-patches, Jeff Law, Jakub Jelinek

> I think from Eric's comment in get_inner_ref it can be possible
> in Ada that the outer object is accidentally byte-aligned
> but the inner reference is not.  In that case I think it is
> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.

The patch says get_bit_range instead...  The comment therein means that 
bitfields can be nested in Ada: you can have a bitfield in a record which is 
itself a bitfield in another record.

> Eric do you agree?  Are there any Ada test cases where the
> pcom optimization jumps in?

According to the comment, data-ref analysis punts on bit offsets so I'm not 
sure how boff can be not byte-aligned.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-09 20:48     ` Eric Botcazou
@ 2016-08-09 22:23       ` Bernd Edlinger
  2016-08-10  8:47         ` AW: " Bernd Edlinger
  2016-08-10 12:29         ` Richard Biener
  0 siblings, 2 replies; 14+ messages in thread
From: Bernd Edlinger @ 2016-08-09 22:23 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Richard Biener, gcc-patches, Jeff Law, Jakub Jelinek

On 08/09/16 22:48, Eric Botcazou wrote:
>> I think from Eric's comment in get_inner_ref it can be possible
>> in Ada that the outer object is accidentally byte-aligned
>> but the inner reference is not.  In that case I think it is
>> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.
>
> The patch says get_bit_range instead...  The comment therein means that
> bitfields can be nested in Ada: you can have a bitfield in a record which is
> itself a bitfield in another record.
>
>> Eric do you agree?  Are there any Ada test cases where the
>> pcom optimization jumps in?
>
> According to the comment, data-ref analysis punts on bit offsets so I'm not
> sure how boff can be not byte-aligned.
>

I mean in dr_analyze_innermost, we have:

   base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode,
                               &punsignedp, &preversep, &pvolatilep);
   gcc_assert (base != NULL_TREE);

   if (pbitpos % BITS_PER_UNIT != 0)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
         fprintf (dump_file, "failed: bit offset alignment.\n");
       return false;
     }

   if (preversep)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
         fprintf (dump_file, "failed: reverse storage order.\n");
       return false;
     }


and that means that get_inner_reference on the outermost ref
returns a byte-aligned bit-offset, and from there I would
think it has the knowledge, when to punt on reversep and
when the offset is not byte-aligned.

But in get_bit_range we have a bit-field with arbitrary
bit-offset, but surprisingly the
get_inner_reference (TREE_OPERAND (exp, 0)) did not return
byte-aligned bit-offset.  I doubt that data-ref ananlysis
ever cares about the byte-alignment of intermediate
refs.

We know that the difference between the innermost ref
and the outer ref is byte-aligned, but we do not know
that the same is true for offset between the COMPONENT_REF
and the outer ref.

I mean boff is essentially the difference between
bitpos of get_inner_reference(exp) and
bitpos of get_inner_reference(TREE_OPERAND (exp, 0))

This would be exposed by my patch, because previously
we always generated BIT_FIELD_REFS, with bit-offset 0,
but the MEM_REF at the byte-offset there is in all likelihood
pretty much unaligned, the MEM_REF at the COMPONENT_REF
is likely more aligned and allows better code for ARM processors,
but only if the MEM_REF is at a byte-aligned offset at all,
otherwise the whole transformation would be wrong.



Thanks
Bernd.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* AW: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-09 22:23       ` Bernd Edlinger
@ 2016-08-10  8:47         ` Bernd Edlinger
  2016-08-10 12:19           ` Eric Botcazou
  2016-08-10 12:29         ` Richard Biener
  1 sibling, 1 reply; 14+ messages in thread
From: Bernd Edlinger @ 2016-08-10  8:47 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Richard Biener, gcc-patches, Jeff Law, Jakub Jelinek

On 08/10/16, Bernd Edlinger wrote:
>On 08/09/16 22:48, Eric Botcazou wrote:
>>> I think from Eric's comment in get_inner_ref it can be possible
>>> in Ada that the outer object is accidentally byte-aligned
>>> but the inner reference is not.  In that case I think it is
>>> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.
>>

I actually meant to say get_bit_range.

I tried to translate the c-test case to an equivalent ada test case with
my not-so-fluent Ada speak...

So here is a test case that *actually* hits the if ((boff % BITS_PER_UNIT) != 0)
code path.

Before my patch there was an unaligned SImode access, and with
the patch we have 3 QImode accesses here.

cat pr71083_pkg.ads 
package Pr71083_pkg is
  type Nibble is mod 2**4;
  type Int24  is mod 2**24;
  type StructA is record
    a : Nibble;
    b : Int24;
  end record;
  pragma Pack(StructA);
  type StructB is record
    a : Nibble;
    b : StructA;
  end record;
  pragma Pack(StructB);
  type ArrayOfStructB is array(0..100) of StructB;
  procedure Foo (X : in out ArrayOfStructB);
end Pr71083_pkg;

cat pr71083_pkg.adb
-- { dg-do compile }
-- { dg-options "-O3" }
package body Pr71083_pkg is
  procedure Foo (X : in out ArrayOfStructB) is
  begin
    for K in 0..99 loop
      X (K+1).b.b := X (K).b.b;
    end loop;
  end Foo;
end Pr71083_pkg;

cat pr71083.adb 
-- { dg-do run }
-- { dg-options "-O3" }
with Pr71083_pkg;
use Pr71083_pkg;
procedure Pr71083 is
  Test : ArrayOfStructB;
begin
  Test (0).b.b := 9999;
  Foo (Test);
  if Test (100).b.b /= 9999 then
    raise Program_Error;
  end if;
end;


I was not sure which name to choose,
so I used the same convention as in the c-torture.
How do you like that?

I would like to add that to the gnat.dg because
it seems that there is zero testing for the predictive
commoning in the gnat.dg test suite.


Bernd.

>>> The patch says get_bit_range instead...  The comment therein means that
>>> bitfields can be nested in Ada: you can have a bitfield in a record which is
>>> itself a bitfield in another record.
>>>
>>>> Eric do you agree?  Are there any Ada test cases where the
>>>> pcom optimization jumps in?
>>>
>>> According to the comment, data-ref analysis punts on bit offsets so I'm not
>>> sure how boff can be not byte-aligned.
>>>
>
> I mean in dr_analyze_innermost, we have:
>
>   base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode,
>                               &punsignedp, &preversep, &pvolatilep);
>   gcc_assert (base != NULL_TREE);
>
>   if (pbitpos % BITS_PER_UNIT != 0)
>     {
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         fprintf (dump_file, "failed: bit offset alignment.\n");
>       return false;
>     }
>
>   if (preversep)
>     {
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         fprintf (dump_file, "failed: reverse storage order.\n");
>       return false;
>     }
>
>
> and that means that get_inner_reference on the outermost ref
> returns a byte-aligned bit-offset, and from there I would
> think it has the knowledge, when to punt on reversep and
> when the offset is not byte-aligned.
>
> But in get_bit_range we have a bit-field with arbitrary
> bit-offset, but surprisingly the
> get_inner_reference (TREE_OPERAND (exp, 0)) did not return
> byte-aligned bit-offset.  I doubt that data-ref ananlysis
> ever cares about the byte-alignment of intermediate
> refs.
>
> We know that the difference between the innermost ref
> and the outer ref is byte-aligned, but we do not know
> that the same is true for offset between the COMPONENT_REF
> and the outer ref.
>
> I mean boff is essentially the difference between
> bitpos of get_inner_reference(exp) and
> bitpos of get_inner_reference(TREE_OPERAND (exp, 0))
>
> This would be exposed by my patch, because previously
> we always generated BIT_FIELD_REFS, with bit-offset 0,
> but the MEM_REF at the byte-offset there is in all likelihood
> pretty much unaligned, the MEM_REF at the COMPONENT_REF
> is likely more aligned and allows better code for ARM processors,
> but only if the MEM_REF is at a byte-aligned offset at all,
> otherwise the whole transformation would be wrong.
>
>
>
> Thanks
> Bernd.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AW: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-10  8:47         ` AW: " Bernd Edlinger
@ 2016-08-10 12:19           ` Eric Botcazou
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Botcazou @ 2016-08-10 12:19 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: Richard Biener, gcc-patches, Jeff Law, Jakub Jelinek

> I tried to translate the c-test case to an equivalent ada test case with
> my not-so-fluent Ada speak...
> 
> So here is a test case that *actually* hits the if ((boff % BITS_PER_UNIT)
> != 0) code path.

I see, well done then.

> Before my patch there was an unaligned SImode access, and with
> the patch we have 3 QImode accesses here.
> 
> cat pr71083_pkg.ads
> package Pr71083_pkg is
>   type Nibble is mod 2**4;
>   type Int24  is mod 2**24;
>   type StructA is record
>     a : Nibble;
>     b : Int24;
>   end record;
>   pragma Pack(StructA);
>   type StructB is record
>     a : Nibble;
>     b : StructA;
>   end record;
>   pragma Pack(StructB);
>   type ArrayOfStructB is array(0..100) of StructB;
>   procedure Foo (X : in out ArrayOfStructB);
> end Pr71083_pkg;
> 
> cat pr71083_pkg.adb
> -- { dg-do compile }
> -- { dg-options "-O3" }
> package body Pr71083_pkg is
>   procedure Foo (X : in out ArrayOfStructB) is
>   begin
>     for K in 0..99 loop
>       X (K+1).b.b := X (K).b.b;
>     end loop;
>   end Foo;
> end Pr71083_pkg;
> 
> cat pr71083.adb
> -- { dg-do run }
> -- { dg-options "-O3" }
> with Pr71083_pkg;
> use Pr71083_pkg;
> procedure Pr71083 is
>   Test : ArrayOfStructB;
> begin
>   Test (0).b.b := 9999;
>   Foo (Test);
>   if Test (100).b.b /= 9999 then
>     raise Program_Error;
>   end if;
> end;
> 
> 
> I was not sure which name to choose,
> so I used the same convention as in the c-torture.
> How do you like that?

Worst convention ever. :-)  Either opt57 or loop_optimization23 at your 
convenience (you can add a reference to the PR in a comment).

> I would like to add that to the gnat.dg because
> it seems that there is zero testing for the predictive
> commoning in the gnat.dg test suite.

Very likely so.  The renamed testcase is OK for mainline, but please replace 
"Eric's" with "the" in the patch or copy-and-paste the explanation.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-09 22:23       ` Bernd Edlinger
  2016-08-10  8:47         ` AW: " Bernd Edlinger
@ 2016-08-10 12:29         ` Richard Biener
  2016-08-10 16:24           ` Bernd Edlinger
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Biener @ 2016-08-10 12:29 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: Eric Botcazou, gcc-patches, Jeff Law, Jakub Jelinek

On Tue, 9 Aug 2016, Bernd Edlinger wrote:

> On 08/09/16 22:48, Eric Botcazou wrote:
> >> I think from Eric's comment in get_inner_ref it can be possible
> >> in Ada that the outer object is accidentally byte-aligned
> >> but the inner reference is not.  In that case I think it is
> >> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.
> >
> > The patch says get_bit_range instead...  The comment therein means that
> > bitfields can be nested in Ada: you can have a bitfield in a record which is
> > itself a bitfield in another record.
> >
> >> Eric do you agree?  Are there any Ada test cases where the
> >> pcom optimization jumps in?
> >
> > According to the comment, data-ref analysis punts on bit offsets so I'm not
> > sure how boff can be not byte-aligned.
> >
> 
> I mean in dr_analyze_innermost, we have:
> 
>    base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode,
>                                &punsignedp, &preversep, &pvolatilep);
>    gcc_assert (base != NULL_TREE);
> 
>    if (pbitpos % BITS_PER_UNIT != 0)
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
>          fprintf (dump_file, "failed: bit offset alignment.\n");
>        return false;
>      }
> 
>    if (preversep)
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
>          fprintf (dump_file, "failed: reverse storage order.\n");
>        return false;
>      }
> 
> 
> and that means that get_inner_reference on the outermost ref
> returns a byte-aligned bit-offset, and from there I would
> think it has the knowledge, when to punt on reversep and
> when the offset is not byte-aligned.
> 
> But in get_bit_range we have a bit-field with arbitrary
> bit-offset, but surprisingly the
> get_inner_reference (TREE_OPERAND (exp, 0)) did not return
> byte-aligned bit-offset.  I doubt that data-ref ananlysis
> ever cares about the byte-alignment of intermediate
> refs.

It doesn't.  Note that it cares about byte-alignment of the ref
only because it decomposes the ref into an address plus
a byte offset - and the address is natrually to a byte-aligned thing.

> We know that the difference between the innermost ref
> and the outer ref is byte-aligned, but we do not know
> that the same is true for offset between the COMPONENT_REF
> and the outer ref.
> 
> I mean boff is essentially the difference between
> bitpos of get_inner_reference(exp) and
> bitpos of get_inner_reference(TREE_OPERAND (exp, 0))
> 
> This would be exposed by my patch, because previously
> we always generated BIT_FIELD_REFS, with bit-offset 0,
> but the MEM_REF at the byte-offset there is in all likelihood
> pretty much unaligned, the MEM_REF at the COMPONENT_REF
> is likely more aligned and allows better code for ARM processors,
> but only if the MEM_REF is at a byte-aligned offset at all,
> otherwise the whole transformation would be wrong.

Note that the important thing to ensure is that the access the
MEM_REF performs is correct TBAA-wise which means you either
have to use alias-set zero (ptr_type_node offset) or _not_
shuffle the offset arbitrarily between the MEM_REF and the
components you wrap it in.

Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-10 12:29         ` Richard Biener
@ 2016-08-10 16:24           ` Bernd Edlinger
  2016-08-11  7:07             ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Bernd Edlinger @ 2016-08-10 16:24 UTC (permalink / raw)
  To: Richard Biener; +Cc: Eric Botcazou, gcc-patches, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2167 bytes --]

On 08/10/16 14:29, Richard Biener wrote:
> On Tue, 9 Aug 2016, Bernd Edlinger wrote:
>> We know that the difference between the innermost ref
>> and the outer ref is byte-aligned, but we do not know
>> that the same is true for offset between the COMPONENT_REF
>> and the outer ref.
>>
>> I mean boff is essentially the difference between
>> bitpos of get_inner_reference(exp) and
>> bitpos of get_inner_reference(TREE_OPERAND (exp, 0))
>>
>> This would be exposed by my patch, because previously
>> we always generated BIT_FIELD_REFS, with bit-offset 0,
>> but the MEM_REF at the byte-offset there is in all likelihood
>> pretty much unaligned, the MEM_REF at the COMPONENT_REF
>> is likely more aligned and allows better code for ARM processors,
>> but only if the MEM_REF is at a byte-aligned offset at all,
>> otherwise the whole transformation would be wrong.
>
> Note that the important thing to ensure is that the access the
> MEM_REF performs is correct TBAA-wise which means you either
> have to use alias-set zero (ptr_type_node offset) or _not_
> shuffle the offset arbitrarily between the MEM_REF and the
> components you wrap it in.
>
> Richard.
>

Yes, the patch exactly replicates the outermost COMPONENT_REF and
subtracts the component's byte-offset from the MEM_REF's address,
and the MEM_REF uses the pointer type of the inner reference.

In the case without bitfields and the Ada bitfields the patch changes
nothing, except we build an aligned type out of TREE_TYPE (DR_REF (dr))
and get_object_alignment (DR_REF (dr)).

In the case with a component_ref that is byte-aligned
we subtract the component byte offset from the address
before the MEM_REF is constructed.  And the
alias_ptr is of type reference_alias_ptr_type
(TREE_OPERAND (DR_REF (dr), 0)) and again the alignment
from get_object_alignment (TREE_OPERAND (DR_REF (dr), 0)
so that should be exactly type-correct from TBAA's perspective.


Attached a new version of the patch with an improved comment,
and the new Ada test cases.


Bootstrap and reg-test on x86_64-pc-linux-gnu without regression.
Is it OK for trunk?


Thanks
Bernd.

[-- Attachment #2: changelog-pr71083.txt --]
[-- Type: text/plain, Size: 464 bytes --]

2016-08-08  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* tree-predcom.c (ref_at_iteration): Correctly align the
	inner reference.

testsuite:
2016-08-08  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* gcc.c-torture/execute/pr71083.c: New test.
	* gnat.dg/loop_optimization23.adb: New test.
	* gnat.dg/loop_optimization23_pkg.ads: New test.
	* gnat.dg/loop_optimization23_pkg.adb: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-pr71083.diff --]
[-- Type: text/x-patch; name="patch-pr71083.diff", Size: 6975 bytes --]

Index: gcc/tree-predcom.c
===================================================================
--- gcc/tree-predcom.c	(revision 239193)
+++ gcc/tree-predcom.c	(working copy)
@@ -213,6 +213,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "params.h"
 #include "tree-affine.h"
+#include "builtins.h"
 
 /* The maximum number of iterations between the considered memory
    references.  */
@@ -1364,11 +1365,16 @@ replace_ref_with (gimple *stmt, tree new_tree, boo
 /* Returns a memory reference to DR in the ITER-th iteration of
    the loop it was analyzed in.  Append init stmts to STMTS.  */
 
-static tree 
+static tree
 ref_at_iteration (data_reference_p dr, int iter, gimple_seq *stmts)
 {
   tree off = DR_OFFSET (dr);
   tree coff = DR_INIT (dr);
+  tree ref = DR_REF (dr);
+  enum tree_code ref_code = ERROR_MARK;
+  tree ref_type = NULL_TREE;
+  tree ref_op1 = NULL_TREE;
+  tree ref_op2 = NULL_TREE;
   if (iter == 0)
     ;
   else if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST)
@@ -1377,27 +1383,48 @@ ref_at_iteration (data_reference_p dr, int iter, g
   else
     off = size_binop (PLUS_EXPR, off,
 		      size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter)));
-  tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
-  addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
-				 is_gimple_mem_ref_addr, NULL_TREE);
-  tree alias_ptr = fold_convert (reference_alias_ptr_type (DR_REF (dr)), coff);
   /* While data-ref analysis punts on bit offsets it still handles
      bitfield accesses at byte boundaries.  Cope with that.  Note that
-     we cannot simply re-apply the outer COMPONENT_REF because the
-     byte-granular portion of it is already applied via DR_INIT and
-     DR_OFFSET, so simply build a BIT_FIELD_REF knowing that the bits
+     if the bitfield object also starts at a byte-boundary we can simply
+     replicate the COMPONENT_REF, but we have to subtract the component's
+     byte-offset from the MEM_REF address first.
+     Otherwise we simply build a BIT_FIELD_REF knowing that the bits
      start at offset zero.  */
-  if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
-      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1)))
+  if (TREE_CODE (ref) == COMPONENT_REF
+      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
     {
-      tree field = TREE_OPERAND (DR_REF (dr), 1);
-      return build3 (BIT_FIELD_REF, TREE_TYPE (DR_REF (dr)),
-		     build2 (MEM_REF, DECL_BIT_FIELD_TYPE (field),
-			     addr, alias_ptr),
-		     DECL_SIZE (field), bitsize_zero_node);
+      unsigned HOST_WIDE_INT boff;
+      tree field = TREE_OPERAND (ref, 1);
+      ref_type = TREE_TYPE (ref);
+      boff = tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field));
+      /* This can occur in Ada.  See the comment in get_bit_range.  */
+      if (boff % BITS_PER_UNIT != 0)
+	{
+	  ref_code = BIT_FIELD_REF;
+	  ref_op1 = DECL_SIZE (field);
+	  ref_op2 = bitsize_zero_node;
+	}
+      else
+	{
+	  boff >>= LOG2_BITS_PER_UNIT;
+	  boff += tree_to_uhwi (component_ref_field_offset (ref));
+	  coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
+	  ref_code = COMPONENT_REF;
+	  ref_op1 = field;
+	  ref_op2 = TREE_OPERAND (ref, 2);
+	  ref = TREE_OPERAND (ref, 0);
+	}
     }
-  else
-    return fold_build2 (MEM_REF, TREE_TYPE (DR_REF (dr)), addr, alias_ptr);
+  tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
+  addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
+				 is_gimple_mem_ref_addr, NULL_TREE);
+  tree alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff);
+  tree type = build_aligned_type (TREE_TYPE (ref),
+				  get_object_alignment (ref));
+  ref = build2 (MEM_REF, type, addr, alias_ptr);
+  if (ref_type)
+    ref = build3 (ref_code, ref_type, ref, ref_op1, ref_op2);
+  return ref;
 }
 
 /* Get the initialization expression for the INDEX-th temporary variable
Index: gcc/testsuite/gcc.c-torture/execute/pr71083.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/pr71083.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr71083.c	(working copy)
@@ -0,0 +1,43 @@
+struct lock_chain {
+  unsigned int irq_context: 2,
+    depth: 6,
+    base: 24;
+};
+
+__attribute__((noinline, noclone))
+struct lock_chain * foo (struct lock_chain *chain)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+    {
+      chain[i+1].base = chain[i].base;
+    }
+  return chain;
+}
+
+struct lock_chain1 {
+  char x;
+  unsigned short base;
+} __attribute__((packed));
+
+__attribute__((noinline, noclone))
+struct lock_chain1 * bar (struct lock_chain1 *chain)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+    {
+      chain[i+1].base = chain[i].base;
+    }
+  return chain;
+}
+
+struct lock_chain test [101];
+struct lock_chain1 test1 [101];
+
+int
+main ()
+{
+  foo (test);
+  bar (test1);
+  return 0;
+}
Index: gcc/testsuite/gnat.dg/loop_optimization23.adb
===================================================================
--- gcc/testsuite/gnat.dg/loop_optimization23.adb	(revision 0)
+++ gcc/testsuite/gnat.dg/loop_optimization23.adb	(working copy)
@@ -0,0 +1,14 @@
+-- { dg-do run }
+-- { dg-options "-O3" }
+-- PR tree-optimization/71083
+with Loop_Optimization23_Pkg;
+use Loop_Optimization23_Pkg;
+procedure Loop_Optimization23 is
+  Test : ArrayOfStructB;
+begin
+  Test (0).b.b := 9999;
+  Foo (Test);
+  if Test (100).b.b /= 9999 then
+    raise Program_Error;
+  end if;
+end;
Index: gcc/testsuite/gnat.dg/loop_optimization23_pkg.adb
===================================================================
--- gcc/testsuite/gnat.dg/loop_optimization23_pkg.adb	(revision 0)
+++ gcc/testsuite/gnat.dg/loop_optimization23_pkg.adb	(working copy)
@@ -0,0 +1,11 @@
+-- { dg-do compile }
+-- { dg-options "-O3" }
+-- PR tree-optimization/71083
+package body Loop_Optimization23_Pkg is
+  procedure Foo (X : in out ArrayOfStructB) is
+  begin
+    for K in 0..99 loop
+      X (K+1).b.b := X (K).b.b;
+    end loop;
+  end Foo;
+end Loop_Optimization23_Pkg;
Index: gcc/testsuite/gnat.dg/loop_optimization23_pkg.ads
===================================================================
--- gcc/testsuite/gnat.dg/loop_optimization23_pkg.ads	(revision 0)
+++ gcc/testsuite/gnat.dg/loop_optimization23_pkg.ads	(working copy)
@@ -0,0 +1,17 @@
+-- PR tree-optimization/71083
+package Loop_Optimization23_Pkg is
+  type Nibble is mod 2**4;
+  type Int24  is mod 2**24;
+  type StructA is record
+    a : Nibble;
+    b : Int24;
+  end record;
+  pragma Pack(StructA);
+  type StructB is record
+    a : Nibble;
+    b : StructA;
+  end record;
+  pragma Pack(StructB);
+  type ArrayOfStructB is array(0..100) of StructB;
+  procedure Foo (X : in out ArrayOfStructB);
+end Loop_Optimization23_Pkg;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-10 16:24           ` Bernd Edlinger
@ 2016-08-11  7:07             ` Richard Biener
  2016-08-11 10:09               ` Bernd Edlinger
  2016-08-11 19:47               ` [PATCH] Increase alignment for bit-field " Bernd Edlinger
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Biener @ 2016-08-11  7:07 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: Eric Botcazou, gcc-patches, Jeff Law, Jakub Jelinek

On Wed, 10 Aug 2016, Bernd Edlinger wrote:

> On 08/10/16 14:29, Richard Biener wrote:
> > On Tue, 9 Aug 2016, Bernd Edlinger wrote:
> >> We know that the difference between the innermost ref
> >> and the outer ref is byte-aligned, but we do not know
> >> that the same is true for offset between the COMPONENT_REF
> >> and the outer ref.
> >>
> >> I mean boff is essentially the difference between
> >> bitpos of get_inner_reference(exp) and
> >> bitpos of get_inner_reference(TREE_OPERAND (exp, 0))
> >>
> >> This would be exposed by my patch, because previously
> >> we always generated BIT_FIELD_REFS, with bit-offset 0,
> >> but the MEM_REF at the byte-offset there is in all likelihood
> >> pretty much unaligned, the MEM_REF at the COMPONENT_REF
> >> is likely more aligned and allows better code for ARM processors,
> >> but only if the MEM_REF is at a byte-aligned offset at all,
> >> otherwise the whole transformation would be wrong.
> >
> > Note that the important thing to ensure is that the access the
> > MEM_REF performs is correct TBAA-wise which means you either
> > have to use alias-set zero (ptr_type_node offset) or _not_
> > shuffle the offset arbitrarily between the MEM_REF and the
> > components you wrap it in.
> >
> > Richard.
> >
> 
> Yes, the patch exactly replicates the outermost COMPONENT_REF and
> subtracts the component's byte-offset from the MEM_REF's address,
> and the MEM_REF uses the pointer type of the inner reference.
> 
> In the case without bitfields and the Ada bitfields the patch changes
> nothing, except we build an aligned type out of TREE_TYPE (DR_REF (dr))
> and get_object_alignment (DR_REF (dr)).
> 
> In the case with a component_ref that is byte-aligned
> we subtract the component byte offset from the address
> before the MEM_REF is constructed.  And the
> alias_ptr is of type reference_alias_ptr_type
> (TREE_OPERAND (DR_REF (dr), 0)) and again the alignment
> from get_object_alignment (TREE_OPERAND (DR_REF (dr), 0)
> so that should be exactly type-correct from TBAA's perspective.
> 
> 
> Attached a new version of the patch with an improved comment,
> and the new Ada test cases.
> 
> 
> Bootstrap and reg-test on x86_64-pc-linux-gnu without regression.
> Is it OK for trunk?

The patch looks mostly ok, but

+      else
+       {
+         boff >>= LOG2_BITS_PER_UNIT;
+         boff += tree_to_uhwi (component_ref_field_offset (ref));
+         coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));

how can we be sure that component_ref_field_offset is an INTEGER_CST?
At least Ada can have variably-length fields before a bitfield.  We'd
need to apply component_ref_field_offset to off in that case.  Which
makes me wonder if we can simply avoid the COMPONENT_REF path in
a first iteration of the patch and always build a BIT_FIELD_REF?
It should solve the correctness issues as well and be more applicable
for branches.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-11  7:07             ` Richard Biener
@ 2016-08-11 10:09               ` Bernd Edlinger
  2016-08-11 10:30                 ` Richard Biener
  2016-08-11 19:47               ` [PATCH] Increase alignment for bit-field " Bernd Edlinger
  1 sibling, 1 reply; 14+ messages in thread
From: Bernd Edlinger @ 2016-08-11 10:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: Eric Botcazou, gcc-patches, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1703 bytes --]

On 08/11/16, Richard Biener wrote:
> 
> The patch looks mostly ok, but
> 
> +      else
> +       {
> +         boff >>= LOG2_BITS_PER_UNIT;
> +         boff += tree_to_uhwi (component_ref_field_offset (ref));
> +         coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
> 
> how can we be sure that component_ref_field_offset is an INTEGER_CST?
> At least Ada can have variably-length fields before a bitfield.  We'd
> need to apply component_ref_field_offset to off in that case.  Which
> makes me wonder if we can simply avoid the COMPONENT_REF path in
> a first iteration of the patch and always build a BIT_FIELD_REF?
> It should solve the correctness issues as well and be more applicable
> for branches.
> 

Oh yes, thanks for catching that!

If that information is true, that ought to go into the comment before
the if, that would certainly be an interesting comment :-)

Are there any test cases for this non-constant field offsets?

I see many checks if TREE_TYPE of
component_ref_field_offset is INTEGER_CST, but with very little
background why it could be otherwise.

I think we should simply fall back to the BIT_FIELD_REF in that case,
that would mean, the if should be something like:

tree offset = component_ref_field_offset (ref);
if (boff % BITS_PER_UNIT != 0
    || !tree_fits_uhwi_p (offset))

And yes, the correctness issue can certainly be solved with the
BIT_FIELD_REF alone.

So, as requested, here is a first iteration of my patch that always builds
a BIT_FIELD_REF, together with the test cases.


Boot-strap & regression testing was done on x86_64-pc-linux-gnu.
Is it OK for trunk (and active branches)?


Thanks
Bernd.

[-- Attachment #2: changelog-pr71083.txt --]
[-- Type: text/plain, Size: 449 bytes --]

2016-08-11  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* tree-predcom.c (ref_at_iteration): Correctly align the
	reference type.

testsuite:
2016-08-11  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* gcc.c-torture/execute/pr71083.c: New test.
	* gnat.dg/loop_optimization23.adb: New test.
	* gnat.dg/loop_optimization23_pkg.ads: New test.
	* gnat.dg/loop_optimization23_pkg.adb: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-pr71083.diff --]
[-- Type: text/x-patch; name="patch-pr71083.diff", Size: 4448 bytes --]

Index: gcc/tree-predcom.c
===================================================================
--- gcc/tree-predcom.c	(revision 239193)
+++ gcc/tree-predcom.c	(working copy)
@@ -213,6 +213,7 @@ along with GCC; see the file COPYING3.
 #include "tree-scalar-evolution.h"
 #include "params.h"
 #include "tree-affine.h"
+#include "builtins.h"
 
 /* The maximum number of iterations between the considered memory
    references.  */
@@ -1381,6 +1382,8 @@ ref_at_iteration (data_reference_p dr, i
   addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
 				 is_gimple_mem_ref_addr, NULL_TREE);
   tree alias_ptr = fold_convert (reference_alias_ptr_type (DR_REF (dr)), coff);
+  tree type = build_aligned_type (TREE_TYPE (DR_REF (dr)),
+				  get_object_alignment (DR_REF (dr)));
   /* While data-ref analysis punts on bit offsets it still handles
      bitfield accesses at byte boundaries.  Cope with that.  Note that
      we cannot simply re-apply the outer COMPONENT_REF because the
@@ -1392,12 +1395,11 @@ ref_at_iteration (data_reference_p dr, i
     {
       tree field = TREE_OPERAND (DR_REF (dr), 1);
       return build3 (BIT_FIELD_REF, TREE_TYPE (DR_REF (dr)),
-		     build2 (MEM_REF, DECL_BIT_FIELD_TYPE (field),
-			     addr, alias_ptr),
+		     build2 (MEM_REF, type, addr, alias_ptr),
 		     DECL_SIZE (field), bitsize_zero_node);
     }
   else
-    return fold_build2 (MEM_REF, TREE_TYPE (DR_REF (dr)), addr, alias_ptr);
+    return fold_build2 (MEM_REF, type, addr, alias_ptr);
 }
 
 /* Get the initialization expression for the INDEX-th temporary variable
Index: gcc/testsuite/gcc.c-torture/execute/pr71083.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/pr71083.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr71083.c	(working copy)
@@ -0,0 +1,43 @@
+struct lock_chain {
+  unsigned int irq_context: 2,
+    depth: 6,
+    base: 24;
+};
+
+__attribute__((noinline, noclone))
+struct lock_chain * foo (struct lock_chain *chain)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+    {
+      chain[i+1].base = chain[i].base;
+    }
+  return chain;
+}
+
+struct lock_chain1 {
+  char x;
+  unsigned short base;
+} __attribute__((packed));
+
+__attribute__((noinline, noclone))
+struct lock_chain1 * bar (struct lock_chain1 *chain)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+    {
+      chain[i+1].base = chain[i].base;
+    }
+  return chain;
+}
+
+struct lock_chain test [101];
+struct lock_chain1 test1 [101];
+
+int
+main ()
+{
+  foo (test);
+  bar (test1);
+  return 0;
+}
Index: gcc/testsuite/gnat.dg/loop_optimization23.adb
===================================================================
--- gcc/testsuite/gnat.dg/loop_optimization23.adb	(revision 0)
+++ gcc/testsuite/gnat.dg/loop_optimization23.adb	(working copy)
@@ -0,0 +1,14 @@
+-- { dg-do run }
+-- { dg-options "-O3" }
+-- PR tree-optimization/71083
+with Loop_Optimization23_Pkg;
+use Loop_Optimization23_Pkg;
+procedure Loop_Optimization23 is
+  Test : ArrayOfStructB;
+begin
+  Test (0).b.b := 9999;
+  Foo (Test);
+  if Test (100).b.b /= 9999 then
+    raise Program_Error;
+  end if;
+end;
Index: gcc/testsuite/gnat.dg/loop_optimization23_pkg.adb
===================================================================
--- gcc/testsuite/gnat.dg/loop_optimization23_pkg.adb	(revision 0)
+++ gcc/testsuite/gnat.dg/loop_optimization23_pkg.adb	(working copy)
@@ -0,0 +1,11 @@
+-- { dg-do compile }
+-- { dg-options "-O3" }
+-- PR tree-optimization/71083
+package body Loop_Optimization23_Pkg is
+  procedure Foo (X : in out ArrayOfStructB) is
+  begin
+    for K in 0..99 loop
+      X (K+1).b.b := X (K).b.b;
+    end loop;
+  end Foo;
+end Loop_Optimization23_Pkg;
Index: gcc/testsuite/gnat.dg/loop_optimization23_pkg.ads
===================================================================
--- gcc/testsuite/gnat.dg/loop_optimization23_pkg.ads	(revision 0)
+++ gcc/testsuite/gnat.dg/loop_optimization23_pkg.ads	(working copy)
@@ -0,0 +1,17 @@
+-- PR tree-optimization/71083
+package Loop_Optimization23_Pkg is
+  type Nibble is mod 2**4;
+  type Int24  is mod 2**24;
+  type StructA is record
+    a : Nibble;
+    b : Int24;
+  end record;
+  pragma Pack(StructA);
+  type StructB is record
+    a : Nibble;
+    b : StructA;
+  end record;
+  pragma Pack(StructB);
+  type ArrayOfStructB is array(0..100) of StructB;
+  procedure Foo (X : in out ArrayOfStructB);
+end Loop_Optimization23_Pkg;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)
  2016-08-11 10:09               ` Bernd Edlinger
@ 2016-08-11 10:30                 ` Richard Biener
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Biener @ 2016-08-11 10:30 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: Eric Botcazou, gcc-patches, Jeff Law, Jakub Jelinek

On Thu, 11 Aug 2016, Bernd Edlinger wrote:

> On 08/11/16, Richard Biener wrote:
> > 
> > The patch looks mostly ok, but
> > 
> > +      else
> > +       {
> > +         boff >>= LOG2_BITS_PER_UNIT;
> > +         boff += tree_to_uhwi (component_ref_field_offset (ref));
> > +         coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
> > 
> > how can we be sure that component_ref_field_offset is an INTEGER_CST?
> > At least Ada can have variably-length fields before a bitfield.  We'd
> > need to apply component_ref_field_offset to off in that case.  Which
> > makes me wonder if we can simply avoid the COMPONENT_REF path in
> > a first iteration of the patch and always build a BIT_FIELD_REF?
> > It should solve the correctness issues as well and be more applicable
> > for branches.
> > 
> 
> Oh yes, thanks for catching that!
> 
> If that information is true, that ought to go into the comment before
> the if, that would certainly be an interesting comment :-)
> 
> Are there any test cases for this non-constant field offsets?
> 
> I see many checks if TREE_TYPE of
> component_ref_field_offset is INTEGER_CST, but with very little
> background why it could be otherwise.
> 
> I think we should simply fall back to the BIT_FIELD_REF in that case,
> that would mean, the if should be something like:
> 
> tree offset = component_ref_field_offset (ref);
> if (boff % BITS_PER_UNIT != 0
>     || !tree_fits_uhwi_p (offset))
> 
> And yes, the correctness issue can certainly be solved with the
> BIT_FIELD_REF alone.
> 
> So, as requested, here is a first iteration of my patch that always builds
> a BIT_FIELD_REF, together with the test cases.
> 
> 
> Boot-strap & regression testing was done on x86_64-pc-linux-gnu.
> Is it OK for trunk (and active branches)?

Yes.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Increase alignment for bit-field access when predictive commoning (PR 71083)
  2016-08-11  7:07             ` Richard Biener
  2016-08-11 10:09               ` Bernd Edlinger
@ 2016-08-11 19:47               ` Bernd Edlinger
  2016-08-12  7:13                 ` Richard Biener
  1 sibling, 1 reply; 14+ messages in thread
From: Bernd Edlinger @ 2016-08-11 19:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: Eric Botcazou, gcc-patches, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

On 08/11/16 09:07, Richard Biener wrote:
> The patch looks mostly ok, but
>
> +      else
> +       {
> +         boff >>= LOG2_BITS_PER_UNIT;
> +         boff += tree_to_uhwi (component_ref_field_offset (ref));
> +         coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
>
> how can we be sure that component_ref_field_offset is an INTEGER_CST?
> At least Ada can have variably-length fields before a bitfield.  We'd
> need to apply component_ref_field_offset to off in that case.  Which
> makes me wonder if we can simply avoid the COMPONENT_REF path in
> a first iteration of the patch and always build a BIT_FIELD_REF?
> It should solve the correctness issues as well and be more applicable
> for branches.
>

I believe that it will be necessary to check for tree_fits_uhwi_p here,
but while it happens quite often hat a normal COMPONENT_REF has a
variable offset, it happens rarely that a bit-field COMPONENT_REF has a
variable offset: In the whole Ada test suite it was only on the pack16
test case. However I was not able to use that trick in the
loop_optimization23 test case.  When I tried, it did no longer get
optimized by pcom.  So there will be no new test case at this time.

Therefore I left the comment as-is, because it is not clear, if a
variable offset will ever happen here; but if it happens, it will be
in Ada, and it will be safe to create a BIT_FIELD_REF instead.

So this is the follow-up patch that tries to create a more aligned
access using the COMPONENT_REF.


Boot-strap and reg-test on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

[-- Attachment #2: changelog-pr71083.txt --]
[-- Type: text/plain, Size: 188 bytes --]

2016-08-11  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* tree-predcom.c (ref_at_iteration): Use a COMPONENT_REF for the
	bitfield access when possible.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-pr71083.diff --]
[-- Type: text/x-patch; name="patch-pr71083.diff", Size: 3845 bytes --]

Index: gcc/tree-predcom.c
===================================================================
--- gcc/tree-predcom.c	(revision 239367)
+++ gcc/tree-predcom.c	(working copy)
@@ -1365,11 +1365,16 @@ replace_ref_with (gimple *stmt, tree new_tree, boo
 /* Returns a memory reference to DR in the ITER-th iteration of
    the loop it was analyzed in.  Append init stmts to STMTS.  */
 
-static tree 
+static tree
 ref_at_iteration (data_reference_p dr, int iter, gimple_seq *stmts)
 {
   tree off = DR_OFFSET (dr);
   tree coff = DR_INIT (dr);
+  tree ref = DR_REF (dr);
+  enum tree_code ref_code = ERROR_MARK;
+  tree ref_type = NULL_TREE;
+  tree ref_op1 = NULL_TREE;
+  tree ref_op2 = NULL_TREE;
   if (iter == 0)
     ;
   else if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST)
@@ -1378,28 +1383,50 @@ ref_at_iteration (data_reference_p dr, int iter, g
   else
     off = size_binop (PLUS_EXPR, off,
 		      size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter)));
-  tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
-  addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
-				 is_gimple_mem_ref_addr, NULL_TREE);
-  tree alias_ptr = fold_convert (reference_alias_ptr_type (DR_REF (dr)), coff);
-  tree type = build_aligned_type (TREE_TYPE (DR_REF (dr)),
-				  get_object_alignment (DR_REF (dr)));
   /* While data-ref analysis punts on bit offsets it still handles
      bitfield accesses at byte boundaries.  Cope with that.  Note that
-     we cannot simply re-apply the outer COMPONENT_REF because the
-     byte-granular portion of it is already applied via DR_INIT and
-     DR_OFFSET, so simply build a BIT_FIELD_REF knowing that the bits
+     if the bitfield object also starts at a byte-boundary we can simply
+     replicate the COMPONENT_REF, but we have to subtract the component's
+     byte-offset from the MEM_REF address first.
+     Otherwise we simply build a BIT_FIELD_REF knowing that the bits
      start at offset zero.  */
-  if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
-      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1)))
+  if (TREE_CODE (ref) == COMPONENT_REF
+      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
     {
-      tree field = TREE_OPERAND (DR_REF (dr), 1);
-      return build3 (BIT_FIELD_REF, TREE_TYPE (DR_REF (dr)),
-		     build2 (MEM_REF, type, addr, alias_ptr),
-		     DECL_SIZE (field), bitsize_zero_node);
+      unsigned HOST_WIDE_INT boff;
+      tree field = TREE_OPERAND (ref, 1);
+      tree offset = component_ref_field_offset (ref);
+      ref_type = TREE_TYPE (ref);
+      boff = tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field));
+      /* This can occur in Ada.  See the comment in get_bit_range.  */
+      if (boff % BITS_PER_UNIT != 0
+	  || !tree_fits_uhwi_p (offset))
+	{
+	  ref_code = BIT_FIELD_REF;
+	  ref_op1 = DECL_SIZE (field);
+	  ref_op2 = bitsize_zero_node;
+	}
+      else
+	{
+	  boff >>= LOG2_BITS_PER_UNIT;
+	  boff += tree_to_uhwi (offset);
+	  coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
+	  ref_code = COMPONENT_REF;
+	  ref_op1 = field;
+	  ref_op2 = TREE_OPERAND (ref, 2);
+	  ref = TREE_OPERAND (ref, 0);
+	}
     }
-  else
-    return fold_build2 (MEM_REF, type, addr, alias_ptr);
+  tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
+  addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
+				 is_gimple_mem_ref_addr, NULL_TREE);
+  tree alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff);
+  tree type = build_aligned_type (TREE_TYPE (ref),
+				  get_object_alignment (ref));
+  ref = build2 (MEM_REF, type, addr, alias_ptr);
+  if (ref_type)
+    ref = build3 (ref_code, ref_type, ref, ref_op1, ref_op2);
+  return ref;
 }
 
 /* Get the initialization expression for the INDEX-th temporary variable

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Increase alignment for bit-field access when predictive commoning (PR 71083)
  2016-08-11 19:47               ` [PATCH] Increase alignment for bit-field " Bernd Edlinger
@ 2016-08-12  7:13                 ` Richard Biener
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Biener @ 2016-08-12  7:13 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: Eric Botcazou, gcc-patches, Jeff Law, Jakub Jelinek

On Thu, 11 Aug 2016, Bernd Edlinger wrote:

> On 08/11/16 09:07, Richard Biener wrote:
> > The patch looks mostly ok, but
> >
> > +      else
> > +       {
> > +         boff >>= LOG2_BITS_PER_UNIT;
> > +         boff += tree_to_uhwi (component_ref_field_offset (ref));
> > +         coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
> >
> > how can we be sure that component_ref_field_offset is an INTEGER_CST?
> > At least Ada can have variably-length fields before a bitfield.  We'd
> > need to apply component_ref_field_offset to off in that case.  Which
> > makes me wonder if we can simply avoid the COMPONENT_REF path in
> > a first iteration of the patch and always build a BIT_FIELD_REF?
> > It should solve the correctness issues as well and be more applicable
> > for branches.
> >
> 
> I believe that it will be necessary to check for tree_fits_uhwi_p here,
> but while it happens quite often hat a normal COMPONENT_REF has a
> variable offset, it happens rarely that a bit-field COMPONENT_REF has a
> variable offset: In the whole Ada test suite it was only on the pack16
> test case. However I was not able to use that trick in the
> loop_optimization23 test case.  When I tried, it did no longer get
> optimized by pcom.  So there will be no new test case at this time.
> 
> Therefore I left the comment as-is, because it is not clear, if a
> variable offset will ever happen here; but if it happens, it will be
> in Ada, and it will be safe to create a BIT_FIELD_REF instead.
> 
> So this is the follow-up patch that tries to create a more aligned
> access using the COMPONENT_REF.
> 
> 
> Boot-strap and reg-test on x86_64-pc-linux-gnu.
> Is it OK for trunk?

Ok.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-08-12  7:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-08 19:57 [PATCH] Fix unaligned access when predictive commoning (PR 71083) Bernd Edlinger
2016-08-09  7:29 ` Richard Biener
2016-08-09 17:31   ` Bernd Edlinger
2016-08-09 20:48     ` Eric Botcazou
2016-08-09 22:23       ` Bernd Edlinger
2016-08-10  8:47         ` AW: " Bernd Edlinger
2016-08-10 12:19           ` Eric Botcazou
2016-08-10 12:29         ` Richard Biener
2016-08-10 16:24           ` Bernd Edlinger
2016-08-11  7:07             ` Richard Biener
2016-08-11 10:09               ` Bernd Edlinger
2016-08-11 10:30                 ` Richard Biener
2016-08-11 19:47               ` [PATCH] Increase alignment for bit-field " Bernd Edlinger
2016-08-12  7:13                 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).