public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3
@ 2016-02-18 14:30 Richard Biener
  2016-02-19  8:33 ` Eric Botcazou
  2016-05-02  9:13 ` Eric Botcazou
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Biener @ 2016-02-18 14:30 UTC (permalink / raw)
  To: gcc-patches


The following experiment resulted from looking at making 
array_ref_low_bound and array_ref_element_size non-mutating.  Again
I wondered why we do this strange scaling by offset/element alignment.

This exposes sth to GIMPLE (the division) that is in the end not
necessary while costing extra tree building/folding each time
array_ref_low_bound or array_ref_element_size are called (for
variable size cases, of course).  And those are called a lot
via get_inner_reference and get_ref_base_and_extent.

So - the following patch gets rid of that scaling.  For a "simple"
C testcase

void bar (void *);
void foo (int n)
{
  struct S { struct R { int b[n]; } a[2]; int k; } s;
  s.k = 1;
  s.a[1].b[7] = 3;
  bar (&s);
}

the difference in .gimple (and also pre expand at -O1) is:

@@ -95,10 +93,8 @@
       D.1782 = D.1770 * 8;
       D.1788 = D.1782 + 4;
       s.1 = __builtin_alloca_with_align (D.1788, 32);
-      D.1790 = D.1782 /[ex] 4;
-      s.1->k{off: D.1790 * 4} = 1;
-      D.1791 = D.1773 /[ex] 4;
-      s.1->a[1]{lb: 0 sz: D.1791 * 4}.b[7] = 3;
+      s.1->k{off: D.1782} = 1;
+      s.1->a[1]{lb: 0 sz: D.1773}.b[7] = 3;
       s.2 = s.1;
       bar (s.2);

and the generated initial RTL is much smaller:

-;; s.1_10->k{off: _11 * 4} = 1;
+;; s.1_10->k{off: _7} = 1;
 
-(insn 21 20 22 (parallel [
-            (set (reg:DI 108)
-                (lshiftrt:DI (reg:DI 90 [ _7 ])
-                    (const_int 2 [0x2])))
-            (clobber (reg:CC 17 flags))
-        ]) t.c:5 -1
-     (expr_list:REG_EQUAL (udiv:DI (reg:DI 90 [ _7 ])
-            (const_int 4 [0x4]))
-        (nil)))
-
-(insn 22 21 0 (set (mem/c:SI (plus:DI (mult:DI (reg:DI 108)
-                    (const_int 4 [0x4]))
-                (reg/f:DI 92 [ s.1 ])) [3 s.1_10->k{off: _11 * 4}+0 S4 
A32])
+(insn 20 19 0 (set (mem/c:SI (plus:DI (reg/f:DI 91 [ s.1 ])
+                (reg:DI 89 [ _7 ])) [3 s.1_10->k{off: _7}+0 S4 A32])
         (const_int 1 [0x1])) t.c:5 -1
      (nil))

...

-;; s.1_10->a[1]{lb: 0 sz: _13 * 4}.b[7] = 3;
+Applying pattern match.pd:83, generic-match.c:9034
 
-(insn 23 22 24 (parallel [
-            (set (reg:DI 109)
-                (ashift:DI (reg:DI 88 [ _5 ])
-                    (const_int 2 [0x2])))
-            (clobber (reg:CC 17 flags))
-        ]) t.c:4 -1
-     (nil))
-
-(insn 24 23 25 (parallel [
-            (set (reg:DI 110)
-                (lshiftrt:DI (reg:DI 109)
-                    (const_int 2 [0x2])))
-            (clobber (reg:CC 17 flags))
-        ]) t.c:6 -1
-     (expr_list:REG_EQUAL (udiv:DI (reg:DI 109)
-            (const_int 4 [0x4]))
-        (nil)))
+;; s.1_10->a[1]{lb: 0 sz: _6}.b[7] = 3;
 
-(insn 25 24 26 (parallel [
-            (set (reg:DI 111)
-                (plus:DI (reg:DI 110)
-                    (const_int 7 [0x7])))
-            (clobber (reg:CC 17 flags))
-        ]) t.c:6 -1
-     (nil))
-
-(insn 26 25 0 (set (mem:SI (plus:DI (mult:DI (reg:DI 111)
-                    (const_int 4 [0x4]))
-                (reg/f:DI 92 [ s.1 ])) [3 s.1_10->a[1]{lb: 0 sz: _13 * 
4}.b+28 S4 A32])
+(insn 21 20 0 (set (mem:SI (plus:DI (plus:DI (mult:DI (reg:DI 87 [ _5 ])
+                        (const_int 4 [0x4]))
+                    (reg/f:DI 91 [ s.1 ]))
+                (const_int 28 [0x1c])) [3 s.1_10->a[1]{lb: 0 sz: _6}.b+28 
S4 A32])
         (const_int 3 [0x3])) t.c:6 -1
      (nil))

assembler difference is

@@ -11,12 +11,11 @@
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movslq  %edi, %rax
-       leaq    0(,%rax,8), %rcx
-       leaq    22(%rcx), %rdx
+       leaq    22(,%rax,8), %rdx
        andq    $-16, %rdx
        subq    %rdx, %rsp
        movq    %rsp, %rdi
-       movl    $1, (%rsp,%rcx)
+       movl    $1, (%rsp,%rax,8)
        movl    $3, 28(%rsp,%rax,4)
        call    bar
        leave

which may be a mixed bag - I suppose the testcase is too simple to
evaluate code changes.

I tried to go back in time and finding relevant discussions about this
but didn't find anything as the whole (very large) discussion ended
up about increasing the size of the COMPONENT_REF / ARRAY_REF trees ....

So - I hope somebody from Adacore can evaluate this patch code-generation
wise.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  Queued for GCC 7
if I hear nothing negative from the Ada folks.

Back to making array_ref_element_size/array_ref_low_bound non-mutating
(for the PR69553 fix).  Bah, which doesn't work - substituting 
placeholders during gimplification only - as PLACEHOLDER_EXPRs
leak into the IL via folding (calling get_inner_reference for example)
before gimplification.  Even the ADA FE itself does this.

Thanks,
Richard.

2016-02-18  Richard Biener  <rguenther@suse.de>

	* tree.def (COMPONENT_REF): Adjust docs for operand 2.
	(ARRAY_REF): Adjust docs for operand 3.
	* gimplify.c (gimplify_compound_lval): Remove scaling of
	COMPONENT_REF and ARRAY_REF operands.
	* tree.c (array_ref_element_size): Remove scaling of operand 3.
	(component_ref_field_offset): Remove scaling of operand 2.
	* tree-ssa-pre.c (create_component_ref_by_pieces_1): Remove
	kludge for ARRAY_REF operand 3 scaling issue.

Index: gcc/tree.def
===================================================================
*** gcc/tree.def	(revision 233498)
--- gcc/tree.def	(working copy)
*************** DEFTREECODE (TRANSLATION_UNIT_DECL, "tra
*** 422,429 ****
  /* Value is structure or union component.
     Operand 0 is the structure or union (an expression).
     Operand 1 is the field (a node of type FIELD_DECL).
!    Operand 2, if present, is the value of DECL_FIELD_OFFSET, measured
!    in units of DECL_OFFSET_ALIGN / BITS_PER_UNIT.  */
  DEFTREECODE (COMPONENT_REF, "component_ref", tcc_reference, 3)
  
  /* Reference to a group of bits within an object.  Similar to COMPONENT_REF
--- 422,428 ----
  /* Value is structure or union component.
     Operand 0 is the structure or union (an expression).
     Operand 1 is the field (a node of type FIELD_DECL).
!    Operand 2, if present, is the value of DECL_FIELD_OFFSET.  */
  DEFTREECODE (COMPONENT_REF, "component_ref", tcc_reference, 3)
  
  /* Reference to a group of bits within an object.  Similar to COMPONENT_REF
*************** DEFTREECODE (BIT_FIELD_REF, "bit_field_r
*** 439,446 ****
  /* Array indexing.
     Operand 0 is the array; operand 1 is a (single) array index.
     Operand 2, if present, is a copy of TYPE_MIN_VALUE of the index.
!    Operand 3, if present, is the element size, measured in units of
!    the alignment of the element type.  */
  DEFTREECODE (ARRAY_REF, "array_ref", tcc_reference, 4)
  
  /* Likewise, except that the result is a range ("slice") of the array.  The
--- 438,444 ----
  /* Array indexing.
     Operand 0 is the array; operand 1 is a (single) array index.
     Operand 2, if present, is a copy of TYPE_MIN_VALUE of the index.
!    Operand 3, if present, is the element size as in TYPE_SIZE_UNIT.  */
  DEFTREECODE (ARRAY_REF, "array_ref", tcc_reference, 4)
  
  /* Likewise, except that the result is a range ("slice") of the array.  The
Index: gcc/gimplify.c
===================================================================
*** gcc/gimplify.c	(revision 233498)
--- gcc/gimplify.c	(working copy)
*************** gimplify_compound_lval (tree *expr_p, gi
*** 2040,2057 ****
  
  	  if (TREE_OPERAND (t, 3) == NULL_TREE)
  	    {
! 	      tree elmt_type = TREE_TYPE (TREE_TYPE (TREE_OPERAND (t, 0)));
! 	      tree elmt_size = unshare_expr (array_ref_element_size (t));
! 	      tree factor = size_int (TYPE_ALIGN_UNIT (elmt_type));
! 
! 	      /* Divide the element size by the alignment of the element
! 		 type (above).  */
! 	      elmt_size
! 		= size_binop_loc (loc, EXACT_DIV_EXPR, elmt_size, factor);
  
  	      if (!is_gimple_min_invariant (elmt_size))
  		{
! 		  TREE_OPERAND (t, 3) = elmt_size;
  		  tret = gimplify_expr (&TREE_OPERAND (t, 3), pre_p,
  					post_p, is_gimple_reg,
  					fb_rvalue);
--- 2040,2050 ----
  
  	  if (TREE_OPERAND (t, 3) == NULL_TREE)
  	    {
! 	      tree elmt_size = array_ref_element_size (t);
  
  	      if (!is_gimple_min_invariant (elmt_size))
  		{
! 		  TREE_OPERAND (t, 3) = unshare_expr (elmt_size);
  		  tret = gimplify_expr (&TREE_OPERAND (t, 3), pre_p,
  					post_p, is_gimple_reg,
  					fb_rvalue);
*************** gimplify_compound_lval (tree *expr_p, gi
*** 2070,2086 ****
  	  /* Set the field offset into T and gimplify it.  */
  	  if (TREE_OPERAND (t, 2) == NULL_TREE)
  	    {
! 	      tree offset = unshare_expr (component_ref_field_offset (t));
! 	      tree field = TREE_OPERAND (t, 1);
! 	      tree factor
! 		= size_int (DECL_OFFSET_ALIGN (field) / BITS_PER_UNIT);
! 
! 	      /* Divide the offset by its alignment.  */
! 	      offset = size_binop_loc (loc, EXACT_DIV_EXPR, offset, factor);
  
  	      if (!is_gimple_min_invariant (offset))
  		{
! 		  TREE_OPERAND (t, 2) = offset;
  		  tret = gimplify_expr (&TREE_OPERAND (t, 2), pre_p,
  					post_p, is_gimple_reg,
  					fb_rvalue);
--- 2063,2073 ----
  	  /* Set the field offset into T and gimplify it.  */
  	  if (TREE_OPERAND (t, 2) == NULL_TREE)
  	    {
! 	      tree offset = component_ref_field_offset (t);
  
  	      if (!is_gimple_min_invariant (offset))
  		{
! 		  TREE_OPERAND (t, 2) = unshare_expr (offset);
  		  tret = gimplify_expr (&TREE_OPERAND (t, 2), pre_p,
  					post_p, is_gimple_reg,
  					fb_rvalue);
Index: gcc/tree.c
===================================================================
*** gcc/tree.c	(revision 233498)
--- gcc/tree.c	(working copy)
*************** get_base_address (tree t)
*** 12862,12882 ****
  tree
  array_ref_element_size (tree exp)
  {
!   tree aligned_size = TREE_OPERAND (exp, 3);
    tree elmt_type = TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0)));
-   location_t loc = EXPR_LOCATION (exp);
  
!   /* If a size was specified in the ARRAY_REF, it's the size measured
!      in alignment units of the element type.  So multiply by that value.  */
!   if (aligned_size)
!     {
!       /* ??? tree_ssa_useless_type_conversion will eliminate casts to
! 	 sizetype from another type of the same width and signedness.  */
!       if (TREE_TYPE (aligned_size) != sizetype)
! 	aligned_size = fold_convert_loc (loc, sizetype, aligned_size);
!       return size_binop_loc (loc, MULT_EXPR, aligned_size,
! 			     size_int (TYPE_ALIGN_UNIT (elmt_type)));
!     }
  
    /* Otherwise, take the size from that of the element type.  Substitute
       any PLACEHOLDER_EXPR that we have.  */
--- 12865,12876 ----
  tree
  array_ref_element_size (tree exp)
  {
!   tree size = TREE_OPERAND (exp, 3);
    tree elmt_type = TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0)));
  
!   /* If a size was specified in the ARRAY_REF, return it.  */
!   if (size)
!     return size;
  
    /* Otherwise, take the size from that of the element type.  Substitute
       any PLACEHOLDER_EXPR that we have.  */
*************** array_at_struct_end_p (tree ref)
*** 12965,12987 ****
  tree
  component_ref_field_offset (tree exp)
  {
-   tree aligned_offset = TREE_OPERAND (exp, 2);
    tree field = TREE_OPERAND (exp, 1);
!   location_t loc = EXPR_LOCATION (exp);
  
!   /* If an offset was specified in the COMPONENT_REF, it's the offset measured
!      in units of DECL_OFFSET_ALIGN / BITS_PER_UNIT.  So multiply by that
!      value.  */
!   if (aligned_offset)
!     {
!       /* ??? tree_ssa_useless_type_conversion will eliminate casts to
! 	 sizetype from another type of the same width and signedness.  */
!       if (TREE_TYPE (aligned_offset) != sizetype)
! 	aligned_offset = fold_convert_loc (loc, sizetype, aligned_offset);
!       return size_binop_loc (loc, MULT_EXPR, aligned_offset,
! 			     size_int (DECL_OFFSET_ALIGN (field)
! 				       / BITS_PER_UNIT));
!     }
  
    /* Otherwise, take the offset from that of the field.  Substitute
       any PLACEHOLDER_EXPR that we have.  */
--- 12959,12970 ----
  tree
  component_ref_field_offset (tree exp)
  {
    tree field = TREE_OPERAND (exp, 1);
!   tree offset = TREE_OPERAND (exp, 2);
  
!   /* If an offset was specified in the COMPONENT_REF, return it.  */
!   if (offset)
!     return offset;
  
    /* Otherwise, take the offset from that of the field.  Substitute
       any PLACEHOLDER_EXPR that we have.  */
Index: gcc/tree-ssa-pre.c
===================================================================
*** gcc/tree-ssa-pre.c	(revision 233516)
--- gcc/tree-ssa-pre.c	(working copy)
*************** create_component_ref_by_pieces_1 (basic_
*** 2592,2607 ****
  	if (genop3)
  	  {
  	    tree elmt_type = TREE_TYPE (TREE_TYPE (genop0));
! 	    /* We can't always put a size in units of the element alignment
! 	       here as the element alignment may be not visible.  See
! 	       PR43783.  Simply drop the element size for constant
! 	       sizes.  */
  	    if (tree_int_cst_equal (genop3, TYPE_SIZE_UNIT (elmt_type)))
  	      genop3 = NULL_TREE;
  	    else
  	      {
- 		genop3 = size_binop (EXACT_DIV_EXPR, genop3,
- 				     size_int (TYPE_ALIGN_UNIT (elmt_type)));
  		genop3 = find_or_generate_expression (block, genop3, stmts);
  		if (!genop3)
  		  return NULL_TREE;
--- 2592,2602 ----
  	if (genop3)
  	  {
  	    tree elmt_type = TREE_TYPE (TREE_TYPE (genop0));
! 	    /* Drop the element size for constant sizes.  */
  	    if (tree_int_cst_equal (genop3, TYPE_SIZE_UNIT (elmt_type)))
  	      genop3 = NULL_TREE;
  	    else
  	      {
  		genop3 = find_or_generate_expression (block, genop3, stmts);
  		if (!genop3)
  		  return NULL_TREE;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3
  2016-02-18 14:30 [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3 Richard Biener
@ 2016-02-19  8:33 ` Eric Botcazou
  2016-04-19 12:15   ` Richard Biener
  2016-05-02  9:13 ` Eric Botcazou
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2016-02-19  8:33 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

> The following experiment resulted from looking at making
> array_ref_low_bound and array_ref_element_size non-mutating.  Again
> I wondered why we do this strange scaling by offset/element alignment.

I personally never really grasped it either...

> So - I hope somebody from Adacore can evaluate this patch code-generation
> wise.

I will, this looks like a valuable simplification to me.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3
  2016-02-19  8:33 ` Eric Botcazou
@ 2016-04-19 12:15   ` Richard Biener
  2016-04-25 11:08     ` Eric Botcazou
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2016-04-19 12:15 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: GCC Patches

On Fri, Feb 19, 2016 at 9:33 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> The following experiment resulted from looking at making
>> array_ref_low_bound and array_ref_element_size non-mutating.  Again
>> I wondered why we do this strange scaling by offset/element alignment.
>
> I personally never really grasped it either...
>
>> So - I hope somebody from Adacore can evaluate this patch code-generation
>> wise.
>
> I will, this looks like a valuable simplification to me.

Did you manage to do this yet?  I'm flushing my stage1 queue of
"simple cleanups" right now.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3
  2016-04-19 12:15   ` Richard Biener
@ 2016-04-25 11:08     ` Eric Botcazou
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Botcazou @ 2016-04-25 11:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

> Did you manage to do this yet?  I'm flushing my stage1 queue of
> "simple cleanups" right now.

No, I'm going to have a look this week.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3
  2016-02-18 14:30 [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3 Richard Biener
  2016-02-19  8:33 ` Eric Botcazou
@ 2016-05-02  9:13 ` Eric Botcazou
  2016-05-02  9:42   ` Richard Biener
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2016-05-02  9:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

> The following experiment resulted from looking at making
> array_ref_low_bound and array_ref_element_size non-mutating.  Again
> I wondered why we do this strange scaling by offset/element alignment.

The idea is to expose the alignment factor to the RTL expander:

	tree tem
	  = get_inner_reference (exp, &bitsize, &bitpos, &offset, &mode1,
				 &unsignedp, &reversep, &volatilep, true);

[...]

	    rtx offset_rtx = expand_expr (offset, NULL_RTX, VOIDmode,
					  EXPAND_SUM);

[...]

	    op0 = offset_address (op0, offset_rtx,
				  highest_pow2_factor (offset));

With the scaling, offset is something like _69 * 4 so highest_pow2_factor can 
see the factor and passes it down to offset_address:

(gdb) p debug_rtx(op0)
(mem/c:SI (plus:SI (reg/f:SI 193)
        (reg:SI 194)) [3 *s.16_63 S4 A32])

With your patch in the same situation:

(gdb) p debug_rtx(op0)
(mem/c:SI (plus:SI (reg/f:SI 139)
        (reg:SI 116 [ _33 ])) [3 *s.16_63 S4 A8])

On strict-alignment targets, this makes a big difference, e.g. SPARC:

	ld	[%i4+%i5], %i0

vs

	ldub	[%i5+%i4], %g1
	sll	%g1, 24, %g1
	add	%i5, %i4, %i5
	ldub	[%i5+1], %i0
	sll	%i0, 16, %i0
	or	%i0, %g1, %i0
	ldub	[%i5+2], %g1
	sll	%g1, 8, %g1
	or	%g1, %i0, %g1
	ldub	[%i5+3], %i0
	or	%i0, %g1, %i0


Now this is mitigated by a couple of things:

  1. the above pessimization only happens on the RHS; on the LHS, the expander 
calls highest_pow2_factor_for_target instead of highest_pow2_factor and the 
former takes into account the type's alignment thanks to the MAX:

/* Similar, except that the alignment requirements of TARGET are
   taken into account.  Assume it is at least as aligned as its
   type, unless it is a COMPONENT_REF in which case the layout of
   the structure gives the alignment.  */

static unsigned HOST_WIDE_INT
highest_pow2_factor_for_target (const_tree target, const_tree exp)
{
  unsigned HOST_WIDE_INT talign = target_align (target) / BITS_PER_UNIT;
  unsigned HOST_WIDE_INT factor = highest_pow2_factor (exp);

  return MAX (factor, talign);
}

  2. highest_pow2_factor can be rescued by the set_nonzero_bits machinery of 
the SSA CCP pass because it calls tree_ctz.  The above example was compiled 
with -O -fno-tree-ccp on SPARC; at -O, the code isn't pessimized.

> So - the following patch gets rid of that scaling.  For a "simple"
> C testcase
> 
> void bar (void *);
> void foo (int n)
> {
>   struct S { struct R { int b[n]; } a[2]; int k; } s;
>   s.k = 1;
>   s.a[1].b[7] = 3;
>   bar (&s);
> }

This only exposes the LHS case, here's a more complete testcase:

void bar (void *);

int foo (int n)
{
  struct S { struct R { char b[n]; } a[2]; int k; } s;
  s.k = 1;
  s.a[1].b[7] = 3;
  bar (&s);
  return s.k;
}

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3
  2016-05-02  9:13 ` Eric Botcazou
@ 2016-05-02  9:42   ` Richard Biener
  2016-05-03  8:39     ` Eric Botcazou
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2016-05-02  9:42 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

On Mon, 2 May 2016, Eric Botcazou wrote:

> > The following experiment resulted from looking at making
> > array_ref_low_bound and array_ref_element_size non-mutating.  Again
> > I wondered why we do this strange scaling by offset/element alignment.
> 
> The idea is to expose the alignment factor to the RTL expander:
> 
> 	tree tem
> 	  = get_inner_reference (exp, &bitsize, &bitpos, &offset, &mode1,
> 				 &unsignedp, &reversep, &volatilep, true);
> 
> [...]
> 
> 	    rtx offset_rtx = expand_expr (offset, NULL_RTX, VOIDmode,
> 					  EXPAND_SUM);
> 
> [...]
> 
> 	    op0 = offset_address (op0, offset_rtx,
> 				  highest_pow2_factor (offset));
> 
> With the scaling, offset is something like _69 * 4 so highest_pow2_factor can 
> see the factor and passes it down to offset_address:
> 
> (gdb) p debug_rtx(op0)
> (mem/c:SI (plus:SI (reg/f:SI 193)
>         (reg:SI 194)) [3 *s.16_63 S4 A32])
> 
> With your patch in the same situation:
> 
> (gdb) p debug_rtx(op0)
> (mem/c:SI (plus:SI (reg/f:SI 139)
>         (reg:SI 116 [ _33 ])) [3 *s.16_63 S4 A8])
> 
> On strict-alignment targets, this makes a big difference, e.g. SPARC:
> 
> 	ld	[%i4+%i5], %i0
> 
> vs
> 
> 	ldub	[%i5+%i4], %g1
> 	sll	%g1, 24, %g1
> 	add	%i5, %i4, %i5
> 	ldub	[%i5+1], %i0
> 	sll	%i0, 16, %i0
> 	or	%i0, %g1, %i0
> 	ldub	[%i5+2], %g1
> 	sll	%g1, 8, %g1
> 	or	%g1, %i0, %g1
> 	ldub	[%i5+3], %i0
> 	or	%i0, %g1, %i0
> 
> 
> Now this is mitigated by a couple of things:
> 
>   1. the above pessimization only happens on the RHS; on the LHS, the expander 
> calls highest_pow2_factor_for_target instead of highest_pow2_factor and the 
> former takes into account the type's alignment thanks to the MAX:
> 
> /* Similar, except that the alignment requirements of TARGET are
>    taken into account.  Assume it is at least as aligned as its
>    type, unless it is a COMPONENT_REF in which case the layout of
>    the structure gives the alignment.  */
> 
> static unsigned HOST_WIDE_INT
> highest_pow2_factor_for_target (const_tree target, const_tree exp)
> {
>   unsigned HOST_WIDE_INT talign = target_align (target) / BITS_PER_UNIT;
>   unsigned HOST_WIDE_INT factor = highest_pow2_factor (exp);
> 
>   return MAX (factor, talign);
> }
> 
>   2. highest_pow2_factor can be rescued by the set_nonzero_bits machinery of 
> the SSA CCP pass because it calls tree_ctz.  The above example was compiled 
> with -O -fno-tree-ccp on SPARC; at -O, the code isn't pessimized.
> 
> > So - the following patch gets rid of that scaling.  For a "simple"
> > C testcase
> > 
> > void bar (void *);
> > void foo (int n)
> > {
> >   struct S { struct R { int b[n]; } a[2]; int k; } s;
> >   s.k = 1;
> >   s.a[1].b[7] = 3;
> >   bar (&s);
> > }
> 
> This only exposes the LHS case, here's a more complete testcase:
> 
> void bar (void *);
> 
> int foo (int n)
> {
>   struct S { struct R { char b[n]; } a[2]; int k; } s;
>   s.k = 1;
>   s.a[1].b[7] = 3;
>   bar (&s);
>   return s.k;
> }

Ok.  Note that on x86_64 at least SLSR wrecks the component-ref case:

@@ -30,10 +34,14 @@
   _9 = _8 + 4;
   s.1_11 = __builtin_alloca_with_align (_9, 32);
   _12 = _7 >> 2;
-  s.1_11->k{off: _12 * 4} = 1;
+  _18 = _12 * 4;
+  _19 = s.1_11 + _18;
+  MEM[(struct S *)_19] = 1;
   s.1_11->a[1]{lb: 0 sz: _5}.b[7] = 3;
   bar (s.1_11);
-  _16 = s.1_11->k{off: _12 * 4};
+  _20 = _12 * 4;
+  _21 = s.1_11 + _20;
+  _16 = MEM[(struct S *)_21];
   __builtin_stack_restore (saved_stack.3_3);
   return _16;

It seems to me that the issue in the end is that where we compute
alignment from is the pieces gathered by get_inner_reference
instead of computing it alongside of that information in
get_inner_reference, taking advantage of DECL_OFFSET_ALIGN
and the array element type alignment there.  This would be
another opportunity to merge get_inner_reference and
get_object_alignment_2 (or have a common worker).

That we expose the exact division in the IL has unfortunate side-effects
like above.

Do I understand you correctly that without using -fno-tree-ccp there
are currently no regressions?  What about -O0 then?  The code
generated by -O0 on x86_64 currently is quite horrible of course,
so maybe we don't care too much...  I think -Og disables CCPs
bit-tracking though.

I see that the constant offset parts are only sometimes folded
into op0 before calling offset_address with the variable offset parts.
Now with get_object_alignment_2 computing alignment and misalignment
one could split out the misaligning offset from bitpos in some way
to have op0 always be aligned.

Sth like

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 235706)
+++ gcc/expr.c  (working copy)
@@ -10334,6 +10334,9 @@ expand_expr_real_1 (tree exp, rtx target
                offset_rtx = convert_to_mode (address_mode, offset_rtx, 
0);
              }
 
+           get_object_alignment_2 (exp, &align, &misalign, false);
+           bitpos -= misalign;
+
            /* See the comment in expand_assignment for the rationale.  */
            if (mode1 != VOIDmode
                && bitpos != 0
@@ -10348,6 +10351,7 @@ expand_expr_real_1 (tree exp, rtx target
 
            op0 = offset_address (op0, offset_rtx,
                                  highest_pow2_factor (offset));
+           set_mem_align (op0, align);
          }
 
        /* If OFFSET is making OP0 more aligned than BIGGEST_ALIGNMENT,

which of course still would need adjustments to get_object_alignment_2
and get_innner_reference to recover alignment info by looking at
DECL_OFFSET_ALIGN and friends.

For the moment (as my original motivation was to get rid of the
requirement to call component_ref_field_offset and friends
from GIMPLE parts of the compiler to make them non-tree-mutating
but that has shown to be impossible or rather difficult)
I'm leaving things as-is.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3
  2016-05-02  9:42   ` Richard Biener
@ 2016-05-03  8:39     ` Eric Botcazou
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Botcazou @ 2016-05-03  8:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

> It seems to me that the issue in the end is that where we compute
> alignment from is the pieces gathered by get_inner_reference
> instead of computing it alongside of that information in
> get_inner_reference, taking advantage of DECL_OFFSET_ALIGN
> and the array element type alignment there.  This would be
> another opportunity to merge get_inner_reference and
> get_object_alignment_2 (or have a common worker).

Yes, the problem is the way we extract the alignment information in the RTL 
expander by means of get_inner_reference.

> Do I understand you correctly that without using -fno-tree-ccp there
> are currently no regressions?  What about -O0 then?  The code
> generated by -O0 on x86_64 currently is quite horrible of course,
> so maybe we don't care too much...  I think -Og disables CCPs
> bit-tracking though.

There are no regressions at -O, at least in simple cases.  Of course -O0 is 
also pessimized because CCP bit-tracking isn't enabled at -O0.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-05-03  8:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-18 14:30 [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3 Richard Biener
2016-02-19  8:33 ` Eric Botcazou
2016-04-19 12:15   ` Richard Biener
2016-04-25 11:08     ` Eric Botcazou
2016-05-02  9:13 ` Eric Botcazou
2016-05-02  9:42   ` Richard Biener
2016-05-03  8:39     ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).