VRP causing extra register usage

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* VRP causing extra register usage
@ 2015-11-12 19:10 Senthil Kumar Selvaraj
  2015-11-12 19:37 ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Senthil Kumar Selvaraj @ 2015-11-12 19:10 UTC (permalink / raw)
  To: gcc; +Cc: law, rguenther, amacleod

Hi,

  When analyzing code size differences between a 4.9.x compiler and
  trunk for the AVR target, I found quite a few cases where extra
  registers were being used to hold smaller types (two 8 bit registers
  for a uint8_t, for example).

  On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was the point
  at which the dumps start to diverge.

  For code like this

#include <stdint.h>

uint16_t get(const uint16_t wvalue)
{
  const uint8_t type = (wvalue >> 8);
  uint16_t size = 0;

  if (type == 1)
  {
    size = 20;
  }
  else if (type == 2)
  {
    size = 32;
  }
  return size;
}

  VRP substitutes wvalue for the type variable in the conditionals 
  (simplify_cond_using_ranges:9506), as it figures out that the range 
  of wvalue is the same as a uint8_t). That is, code goes from

<bb 2>:
_3 = wvalue_2(D) >> 8;
type_4 = (const uint8_t) _3;
if (type_4 == 1)
  goto <bb 5>;
else
  goto <bb 3>;

to

<bb 2>:
_3 = wvalue_2(D) >> 8;
type_4 = (const uint8_t) _3;
if (_3 == 1)
  goto <bb 5>;
else
  goto <bb 3>;

  This "widening" causes later passes to allocate extra registers 
  (holding zeros for the extra bits) and generate extra comparisons
  for the AVR target.

  I found that in the 4.9.2 compiler I was benchmarking against, VRP
  didn't figure out the range for wvalue and therefore the folding
  didn't happen, which was why the code was better.

  With the native compiler on my machine (gcc 5.2 x86_64) - replacing 
  uint8_t by uint32_t and uint16_t by uint64_t, and shifting right by 
  32 bits instead of 8 shows the same effect - the generated code uses
  rdi instead of just di to hold the type variable.

  Is this a bug? Should the folding happen only if the type
  conversion was from a smaller type to a bigger one? Or is the backend
  supposed to detect this pattern and deal with it?

Regards
Senthil


details-raw vrp1 dump

;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0, symbol_order=0)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 5
;; 2 succs { 5 3 }
;; 3 succs { 4 5 }
;; 4 succs { 5 }
;; 5 succs { 1 }

ASSERT_EXPRs to be inserted

Assertions to be inserted for type_4
	if (type_4 == 1)

	BB #3
	EDGE 2->3 2 [55.9%]  (FALSE_VALUE,EXECUTABLE)
	PREDICATE: type_4 ne_expr 1




Updating SSA:
Registering new PHI nodes in block #2
Updating SSA information for statement type_4 = (const uint8_t) _3;
Updating SSA information for statement if (type_4 == 1)
Registering new PHI nodes in block #3
Updating SSA information for statement type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
Updating SSA information for statement if (type_4 == 2)
Registering new PHI nodes in block #4
Registering new PHI nodes in block #5

SSA replacement table
N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j

type_6 -> { type_4 }
Incremental SSA update started at block: 2
Number of blocks in CFG: 6
Number of blocks to update: 2 ( 33%)
Affected blocks: 2 3



SSA form after inserting ASSERT_EXPRs
get (const uint16_t wvalue, const uint8_t windex, const void * * const address)
{
  uint16_t size;
  const uint8_t type;
  unsigned int _3;

  <bb 2>:
  _3 = wvalue_2(D) >> 8;
  type_4 = (const uint8_t) _3;
  if (type_4 == 1)
    goto <bb 5>;
  else
    goto <bb 3>;

  <bb 3>:
  type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
  if (type_6 == 2)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:

  <bb 5>:
  # size_1 = PHI <20(2), 0(3), 32(4)>
  return size_1;

}


Immediate_uses: 

size_1 : --> single use.
return size_1;

wvalue_2(D) : --> single use.
_3 = wvalue_2(D) >> 8;

_3 : --> single use.
type_4 = (const uint8_t) _3;

type_4 : -->3 uses.
type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
if (type_4 == 1)

.MEM_5(D) : --> single use.
# VUSE <.MEM_5(D)>
return size_1;

type_6 : --> single use.
if (type_6 == 2)

Adding destination of edge (0 -> 2) to worklist

Simulating block 2

Visiting statement:
_3 = wvalue_2(D) >> 8;
Intersecting
  [0, 255]
and
  [0, +INF]
to
  [0, 255]
Found new range for _3: [0, 255]
interesting_ssa_edges: adding SSA use in type_4 = (const uint8_t) _3;
marking stmt to be not simulated again

Visiting statement:
type_4 = (const uint8_t) _3;
Found new range for type_4: [0, +INF]
interesting_ssa_edges: adding SSA use in type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
interesting_ssa_edges: adding SSA use in if (type_4 == 1)
marking stmt to be not simulated again

Visiting statement:
if (type_4 == 1)

Visiting conditional with predicate: if (type_4 == 1)

With known ranges
	type_4: [0, +INF]

Predicate evaluates to: DON'T KNOW
Adding destination of edge (2 -> 5) to worklist
Adding destination of edge (2 -> 3) to worklist

Simulating block 3

Visiting statement:
type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
Intersecting
  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
and
  [0, +INF]
to
  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
Found new range for type_6: ~[1, 1]
interesting_ssa_edges: adding SSA use in if (type_6 == 2)
marking stmt to be not simulated again

Visiting statement:
if (type_6 == 2)

Visiting conditional with predicate: if (type_6 == 2)

With known ranges
	type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)

Predicate evaluates to: DON'T KNOW
Adding destination of edge (3 -> 4) to worklist

Simulating block 4

Simulating block 5

Visiting PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
    Argument #0 (2 -> 5 executable)
	20: [20, 20]
    Argument #1 (3 -> 5 executable)
	0: [0, 0]
Meeting
  [20, 20]
and
  [0, 0]
to
  [0, 20]
    Argument #2 (4 -> 5 executable)
	32: [32, 32]
Meeting
  [0, 20]
and
  [32, 32]
to
  [0, 32]
Intersecting
  [0, 32]
and
  [0, +INF]
to
  [0, 32]
Found new range for size_1: [0, 32]
interesting_ssa_edges: adding SSA use in return size_1;
marking stmt to be not simulated again

Visiting statement:
return size_1;

Value ranges after VRP:

size_1: [0, 32]
wvalue_2(D): VARYING
_3: [0, 255]
type_4: [0, +INF]
type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)



Substituting values and folding statements

Folding statement: _3 = wvalue_2(D) >> 8;
Not folded
Folding statement: type_4 = (const uint8_t) _3;
Not folded
Folding statement: if (type_4 == 1)
Folded into: if (_3 == 1)

Folding statement: if (type_6 == 2)
Not folded
Folding PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
No folding possible
Folding statement: return size_1;
Not folded
get (const uint16_t wvalue, const uint8_t windex, const void * * const address)
{
  uint16_t size;
  const uint8_t type;
  unsigned int _3;

  <bb 2>:
  _3 = wvalue_2(D) >> 8;
  type_4 = (const uint8_t) _3;
  if (_3 == 1)
    goto <bb 5>;
  else
    goto <bb 3>;

  <bb 3>:
  if (type_4 == 2)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:

  <bb 5>:
  # size_1 = PHI <20(2), 0(3), 32(4)>
  return size_1;

}



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: VRP causing extra register usage
  2015-11-12 19:10 VRP causing extra register usage Senthil Kumar Selvaraj
@ 2015-11-12 19:37 ` Richard Biener
  2015-11-13 13:36   ` Senthil Kumar Selvaraj
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2015-11-12 19:37 UTC (permalink / raw)
  To: Senthil Kumar Selvaraj, gcc; +Cc: law, rguenther, amacleod

On November 12, 2015 8:10:05 PM GMT+01:00, Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com> wrote:
>Hi,
>
>  When analyzing code size differences between a 4.9.x compiler and
>  trunk for the AVR target, I found quite a few cases where extra
>  registers were being used to hold smaller types (two 8 bit registers
>  for a uint8_t, for example).
>
>On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was the
>point
>  at which the dumps start to diverge.
>
>  For code like this
>
>#include <stdint.h>
>
>uint16_t get(const uint16_t wvalue)
>{
>  const uint8_t type = (wvalue >> 8);
>  uint16_t size = 0;
>
>  if (type == 1)
>  {
>    size = 20;
>  }
>  else if (type == 2)
>  {
>    size = 32;
>  }
>  return size;
>}
>
>  VRP substitutes wvalue for the type variable in the conditionals 
>  (simplify_cond_using_ranges:9506), as it figures out that the range 
>  of wvalue is the same as a uint8_t). That is, code goes from
>
><bb 2>:
>_3 = wvalue_2(D) >> 8;
>type_4 = (const uint8_t) _3;
>if (type_4 == 1)
>  goto <bb 5>;
>else
>  goto <bb 3>;
>
>to
>
><bb 2>:
>_3 = wvalue_2(D) >> 8;
>type_4 = (const uint8_t) _3;
>if (_3 == 1)
>  goto <bb 5>;
>else
>  goto <bb 3>;
>
>  This "widening" causes later passes to allocate extra registers 
>  (holding zeros for the extra bits) and generate extra comparisons
>  for the AVR target.
>
>  I found that in the 4.9.2 compiler I was benchmarking against, VRP
>  didn't figure out the range for wvalue and therefore the folding
>  didn't happen, which was why the code was better.
>
>  With the native compiler on my machine (gcc 5.2 x86_64) - replacing 
>  uint8_t by uint32_t and uint16_t by uint64_t, and shifting right by 
>  32 bits instead of 8 shows the same effect - the generated code uses
>  rdi instead of just di to hold the type variable.
>
>  Is this a bug? Should the folding happen only if the type
>  conversion was from a smaller type to a bigger one? Or is the backend
>  supposed to detect this pattern and deal with it?

We should probably avoid widening beyond word_mode (or sth else if there is sth suitable).

Richard.

>Regards
>Senthil
>
>
>details-raw vrp1 dump
>
>;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0,
>symbol_order=0)
>
>;; 1 loops found
>;;
>;; Loop 0
>;;  header 0, latch 1
>;;  depth 0, outer -1
>;;  nodes: 0 1 2 3 4 5
>;; 2 succs { 5 3 }
>;; 3 succs { 4 5 }
>;; 4 succs { 5 }
>;; 5 succs { 1 }
>
>ASSERT_EXPRs to be inserted
>
>Assertions to be inserted for type_4
>	if (type_4 == 1)
>
>	BB #3
>	EDGE 2->3 2 [55.9%]  (FALSE_VALUE,EXECUTABLE)
>	PREDICATE: type_4 ne_expr 1
>
>
>
>
>Updating SSA:
>Registering new PHI nodes in block #2
>Updating SSA information for statement type_4 = (const uint8_t) _3;
>Updating SSA information for statement if (type_4 == 1)
>Registering new PHI nodes in block #3
>Updating SSA information for statement type_6 = ASSERT_EXPR <type_4,
>type_4 != 1>;
>Updating SSA information for statement if (type_4 == 2)
>Registering new PHI nodes in block #4
>Registering new PHI nodes in block #5
>
>SSA replacement table
>N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
>
>type_6 -> { type_4 }
>Incremental SSA update started at block: 2
>Number of blocks in CFG: 6
>Number of blocks to update: 2 ( 33%)
>Affected blocks: 2 3
>
>
>
>SSA form after inserting ASSERT_EXPRs
>get (const uint16_t wvalue, const uint8_t windex, const void * * const
>address)
>{
>  uint16_t size;
>  const uint8_t type;
>  unsigned int _3;
>
>  <bb 2>:
>  _3 = wvalue_2(D) >> 8;
>  type_4 = (const uint8_t) _3;
>  if (type_4 == 1)
>    goto <bb 5>;
>  else
>    goto <bb 3>;
>
>  <bb 3>:
>  type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>  if (type_6 == 2)
>    goto <bb 4>;
>  else
>    goto <bb 5>;
>
>  <bb 4>:
>
>  <bb 5>:
>  # size_1 = PHI <20(2), 0(3), 32(4)>
>  return size_1;
>
>}
>
>
>Immediate_uses: 
>
>size_1 : --> single use.
>return size_1;
>
>wvalue_2(D) : --> single use.
>_3 = wvalue_2(D) >> 8;
>
>_3 : --> single use.
>type_4 = (const uint8_t) _3;
>
>type_4 : -->3 uses.
>type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>if (type_4 == 1)
>
>.MEM_5(D) : --> single use.
># VUSE <.MEM_5(D)>
>return size_1;
>
>type_6 : --> single use.
>if (type_6 == 2)
>
>Adding destination of edge (0 -> 2) to worklist
>
>Simulating block 2
>
>Visiting statement:
>_3 = wvalue_2(D) >> 8;
>Intersecting
>  [0, 255]
>and
>  [0, +INF]
>to
>  [0, 255]
>Found new range for _3: [0, 255]
>interesting_ssa_edges: adding SSA use in type_4 = (const uint8_t) _3;
>marking stmt to be not simulated again
>
>Visiting statement:
>type_4 = (const uint8_t) _3;
>Found new range for type_4: [0, +INF]
>interesting_ssa_edges: adding SSA use in type_6 = ASSERT_EXPR <type_4,
>type_4 != 1>;
>interesting_ssa_edges: adding SSA use in if (type_4 == 1)
>marking stmt to be not simulated again
>
>Visiting statement:
>if (type_4 == 1)
>
>Visiting conditional with predicate: if (type_4 == 1)
>
>With known ranges
>	type_4: [0, +INF]
>
>Predicate evaluates to: DON'T KNOW
>Adding destination of edge (2 -> 5) to worklist
>Adding destination of edge (2 -> 3) to worklist
>
>Simulating block 3
>
>Visiting statement:
>type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>Intersecting
>  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>and
>  [0, +INF]
>to
>  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>Found new range for type_6: ~[1, 1]
>interesting_ssa_edges: adding SSA use in if (type_6 == 2)
>marking stmt to be not simulated again
>
>Visiting statement:
>if (type_6 == 2)
>
>Visiting conditional with predicate: if (type_6 == 2)
>
>With known ranges
>	type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>
>Predicate evaluates to: DON'T KNOW
>Adding destination of edge (3 -> 4) to worklist
>
>Simulating block 4
>
>Simulating block 5
>
>Visiting PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
>    Argument #0 (2 -> 5 executable)
>	20: [20, 20]
>    Argument #1 (3 -> 5 executable)
>	0: [0, 0]
>Meeting
>  [20, 20]
>and
>  [0, 0]
>to
>  [0, 20]
>    Argument #2 (4 -> 5 executable)
>	32: [32, 32]
>Meeting
>  [0, 20]
>and
>  [32, 32]
>to
>  [0, 32]
>Intersecting
>  [0, 32]
>and
>  [0, +INF]
>to
>  [0, 32]
>Found new range for size_1: [0, 32]
>interesting_ssa_edges: adding SSA use in return size_1;
>marking stmt to be not simulated again
>
>Visiting statement:
>return size_1;
>
>Value ranges after VRP:
>
>size_1: [0, 32]
>wvalue_2(D): VARYING
>_3: [0, 255]
>type_4: [0, +INF]
>type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>
>
>
>Substituting values and folding statements
>
>Folding statement: _3 = wvalue_2(D) >> 8;
>Not folded
>Folding statement: type_4 = (const uint8_t) _3;
>Not folded
>Folding statement: if (type_4 == 1)
>Folded into: if (_3 == 1)
>
>Folding statement: if (type_6 == 2)
>Not folded
>Folding PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
>No folding possible
>Folding statement: return size_1;
>Not folded
>get (const uint16_t wvalue, const uint8_t windex, const void * * const
>address)
>{
>  uint16_t size;
>  const uint8_t type;
>  unsigned int _3;
>
>  <bb 2>:
>  _3 = wvalue_2(D) >> 8;
>  type_4 = (const uint8_t) _3;
>  if (_3 == 1)
>    goto <bb 5>;
>  else
>    goto <bb 3>;
>
>  <bb 3>:
>  if (type_4 == 2)
>    goto <bb 4>;
>  else
>    goto <bb 5>;
>
>  <bb 4>:
>
>  <bb 5>:
>  # size_1 = PHI <20(2), 0(3), 32(4)>
>  return size_1;
>
>}


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: VRP causing extra register usage
  2015-11-12 19:37 ` Richard Biener
@ 2015-11-13 13:36   ` Senthil Kumar Selvaraj
  2015-11-13 15:15     ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Senthil Kumar Selvaraj @ 2015-11-13 13:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, law, rguenther, amacleod


On Thu, Nov 12, 2015 at 08:37:02PM +0100, Richard Biener wrote:
> On November 12, 2015 8:10:05 PM GMT+01:00, Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com> wrote:
> >Hi,
> >
> >  When analyzing code size differences between a 4.9.x compiler and
> >  trunk for the AVR target, I found quite a few cases where extra
> >  registers were being used to hold smaller types (two 8 bit registers
> >  for a uint8_t, for example).
> >
> >On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was the
> >point
> >  at which the dumps start to diverge.
> >
> >  For code like this
> >
> >#include <stdint.h>
> >
> >uint16_t get(const uint16_t wvalue)
> >{
> >  const uint8_t type = (wvalue >> 8);
> >  uint16_t size = 0;
> >
> >  if (type == 1)
> >  {
> >    size = 20;
> >  }
> >  else if (type == 2)
> >  {
> >    size = 32;
> >  }
> >  return size;
> >}
> >
> >  VRP substitutes wvalue for the type variable in the conditionals 
> >  (simplify_cond_using_ranges:9506), as it figures out that the range 
> >  of wvalue is the same as a uint8_t). That is, code goes from
> >
> ><bb 2>:
> >_3 = wvalue_2(D) >> 8;
> >type_4 = (const uint8_t) _3;
> >if (type_4 == 1)
> >  goto <bb 5>;
> >else
> >  goto <bb 3>;
> >
> >to
> >
> ><bb 2>:
> >_3 = wvalue_2(D) >> 8;
> >type_4 = (const uint8_t) _3;
> >if (_3 == 1)
> >  goto <bb 5>;
> >else
> >  goto <bb 3>;
> >
> >  This "widening" causes later passes to allocate extra registers 
> >  (holding zeros for the extra bits) and generate extra comparisons
> >  for the AVR target.
> >
> >  I found that in the 4.9.2 compiler I was benchmarking against, VRP
> >  didn't figure out the range for wvalue and therefore the folding
> >  didn't happen, which was why the code was better.
> >
> >  With the native compiler on my machine (gcc 5.2 x86_64) - replacing 
> >  uint8_t by uint32_t and uint16_t by uint64_t, and shifting right by 
> >  32 bits instead of 8 shows the same effect - the generated code uses
> >  rdi instead of just di to hold the type variable.
> >
> >  Is this a bug? Should the folding happen only if the type
> >  conversion was from a smaller type to a bigger one? Or is the backend
> >  supposed to detect this pattern and deal with it?
> 
> We should probably avoid widening beyond word_mode (or sth else if there is sth suitable).
> 

Hmm, that does fix this problem. The below patch allows folding only if
the operand size is smaller or equal to a word.

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e2393e4..3f5668c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
       innerop = gimple_assign_rhs1 (def_stmt);
 
       if (TREE_CODE (innerop) == SSA_NAME
+    && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
+      <= GET_MODE_SIZE(word_mode))
          && !POINTER_TYPE_P (TREE_TYPE (innerop)))
        {
          value_range *vr = get_value_range (innerop);

However, if the type conversion was from a smaller type to a larger type, and the
smaller type was bigger than a word, this patch results in worse code. For example.

#include <stdint.h>

uint16_t get(const uint16_t wvalue)
{
  const uint32_t type = (wvalue >> 8);
  uint16_t size = 0;

  if (type == 1)
  {
    size = 20;
  }
  else if (type == 2)
  {
    size = 32;
  }
  return size;
}

Before the patch, the folding done caused type to be substituted with wvalue, and the
smaller size resulted in better code. After the patch, since wvalue is bigger than a 
word (for AVR), the folding doesn't happen and registers are allocated to hold all 32
bits.

How does the below patch look? It does the folding only if it is beneficial i.e., the
value being substituted (innerop) is smaller than or equal the current one (op0)?

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e2393e4..3f5668c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
       innerop = gimple_assign_rhs1 (def_stmt);
 
       if (TREE_CODE (innerop) == SSA_NAME
+    && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
+      <= GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(op0))))
          && !POINTER_TYPE_P (TREE_TYPE (innerop)))
        {
          value_range *vr = get_value_range (innerop);


Regards
Senthil

> Richard.
> 
> >Regards
> >Senthil
> >
> >
> >details-raw vrp1 dump
> >
> >;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0,
> >symbol_order=0)
> >
> >;; 1 loops found
> >;;
> >;; Loop 0
> >;;  header 0, latch 1
> >;;  depth 0, outer -1
> >;;  nodes: 0 1 2 3 4 5
> >;; 2 succs { 5 3 }
> >;; 3 succs { 4 5 }
> >;; 4 succs { 5 }
> >;; 5 succs { 1 }
> >
> >ASSERT_EXPRs to be inserted
> >
> >Assertions to be inserted for type_4
> >	if (type_4 == 1)
> >
> >	BB #3
> >	EDGE 2->3 2 [55.9%]  (FALSE_VALUE,EXECUTABLE)
> >	PREDICATE: type_4 ne_expr 1
> >
> >
> >
> >
> >Updating SSA:
> >Registering new PHI nodes in block #2
> >Updating SSA information for statement type_4 = (const uint8_t) _3;
> >Updating SSA information for statement if (type_4 == 1)
> >Registering new PHI nodes in block #3
> >Updating SSA information for statement type_6 = ASSERT_EXPR <type_4,
> >type_4 != 1>;
> >Updating SSA information for statement if (type_4 == 2)
> >Registering new PHI nodes in block #4
> >Registering new PHI nodes in block #5
> >
> >SSA replacement table
> >N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
> >
> >type_6 -> { type_4 }
> >Incremental SSA update started at block: 2
> >Number of blocks in CFG: 6
> >Number of blocks to update: 2 ( 33%)
> >Affected blocks: 2 3
> >
> >
> >
> >SSA form after inserting ASSERT_EXPRs
> >get (const uint16_t wvalue, const uint8_t windex, const void * * const
> >address)
> >{
> >  uint16_t size;
> >  const uint8_t type;
> >  unsigned int _3;
> >
> >  <bb 2>:
> >  _3 = wvalue_2(D) >> 8;
> >  type_4 = (const uint8_t) _3;
> >  if (type_4 == 1)
> >    goto <bb 5>;
> >  else
> >    goto <bb 3>;
> >
> >  <bb 3>:
> >  type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
> >  if (type_6 == 2)
> >    goto <bb 4>;
> >  else
> >    goto <bb 5>;
> >
> >  <bb 4>:
> >
> >  <bb 5>:
> >  # size_1 = PHI <20(2), 0(3), 32(4)>
> >  return size_1;
> >
> >}
> >
> >
> >Immediate_uses: 
> >
> >size_1 : --> single use.
> >return size_1;
> >
> >wvalue_2(D) : --> single use.
> >_3 = wvalue_2(D) >> 8;
> >
> >_3 : --> single use.
> >type_4 = (const uint8_t) _3;
> >
> >type_4 : -->3 uses.
> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
> >if (type_4 == 1)
> >
> >.MEM_5(D) : --> single use.
> ># VUSE <.MEM_5(D)>
> >return size_1;
> >
> >type_6 : --> single use.
> >if (type_6 == 2)
> >
> >Adding destination of edge (0 -> 2) to worklist
> >
> >Simulating block 2
> >
> >Visiting statement:
> >_3 = wvalue_2(D) >> 8;
> >Intersecting
> >  [0, 255]
> >and
> >  [0, +INF]
> >to
> >  [0, 255]
> >Found new range for _3: [0, 255]
> >interesting_ssa_edges: adding SSA use in type_4 = (const uint8_t) _3;
> >marking stmt to be not simulated again
> >
> >Visiting statement:
> >type_4 = (const uint8_t) _3;
> >Found new range for type_4: [0, +INF]
> >interesting_ssa_edges: adding SSA use in type_6 = ASSERT_EXPR <type_4,
> >type_4 != 1>;
> >interesting_ssa_edges: adding SSA use in if (type_4 == 1)
> >marking stmt to be not simulated again
> >
> >Visiting statement:
> >if (type_4 == 1)
> >
> >Visiting conditional with predicate: if (type_4 == 1)
> >
> >With known ranges
> >	type_4: [0, +INF]
> >
> >Predicate evaluates to: DON'T KNOW
> >Adding destination of edge (2 -> 5) to worklist
> >Adding destination of edge (2 -> 3) to worklist
> >
> >Simulating block 3
> >
> >Visiting statement:
> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
> >Intersecting
> >  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
> >and
> >  [0, +INF]
> >to
> >  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
> >Found new range for type_6: ~[1, 1]
> >interesting_ssa_edges: adding SSA use in if (type_6 == 2)
> >marking stmt to be not simulated again
> >
> >Visiting statement:
> >if (type_6 == 2)
> >
> >Visiting conditional with predicate: if (type_6 == 2)
> >
> >With known ranges
> >	type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
> >
> >Predicate evaluates to: DON'T KNOW
> >Adding destination of edge (3 -> 4) to worklist
> >
> >Simulating block 4
> >
> >Simulating block 5
> >
> >Visiting PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
> >    Argument #0 (2 -> 5 executable)
> >	20: [20, 20]
> >    Argument #1 (3 -> 5 executable)
> >	0: [0, 0]
> >Meeting
> >  [20, 20]
> >and
> >  [0, 0]
> >to
> >  [0, 20]
> >    Argument #2 (4 -> 5 executable)
> >	32: [32, 32]
> >Meeting
> >  [0, 20]
> >and
> >  [32, 32]
> >to
> >  [0, 32]
> >Intersecting
> >  [0, 32]
> >and
> >  [0, +INF]
> >to
> >  [0, 32]
> >Found new range for size_1: [0, 32]
> >interesting_ssa_edges: adding SSA use in return size_1;
> >marking stmt to be not simulated again
> >
> >Visiting statement:
> >return size_1;
> >
> >Value ranges after VRP:
> >
> >size_1: [0, 32]
> >wvalue_2(D): VARYING
> >_3: [0, 255]
> >type_4: [0, +INF]
> >type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
> >
> >
> >
> >Substituting values and folding statements
> >
> >Folding statement: _3 = wvalue_2(D) >> 8;
> >Not folded
> >Folding statement: type_4 = (const uint8_t) _3;
> >Not folded
> >Folding statement: if (type_4 == 1)
> >Folded into: if (_3 == 1)
> >
> >Folding statement: if (type_6 == 2)
> >Not folded
> >Folding PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
> >No folding possible
> >Folding statement: return size_1;
> >Not folded
> >get (const uint16_t wvalue, const uint8_t windex, const void * * const
> >address)
> >{
> >  uint16_t size;
> >  const uint8_t type;
> >  unsigned int _3;
> >
> >  <bb 2>:
> >  _3 = wvalue_2(D) >> 8;
> >  type_4 = (const uint8_t) _3;
> >  if (_3 == 1)
> >    goto <bb 5>;
> >  else
> >    goto <bb 3>;
> >
> >  <bb 3>:
> >  if (type_4 == 2)
> >    goto <bb 4>;
> >  else
> >    goto <bb 5>;
> >
> >  <bb 4>:
> >
> >  <bb 5>:
> >  # size_1 = PHI <20(2), 0(3), 32(4)>
> >  return size_1;
> >
> >}
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: VRP causing extra register usage
  2015-11-13 13:36   ` Senthil Kumar Selvaraj
@ 2015-11-13 15:15     ` Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2015-11-13 15:15 UTC (permalink / raw)
  To: Senthil Kumar Selvaraj, Richard Biener; +Cc: gcc, law, amacleod

On November 13, 2015 2:35:41 PM GMT+01:00, Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com> wrote:
>
>On Thu, Nov 12, 2015 at 08:37:02PM +0100, Richard Biener wrote:
>> On November 12, 2015 8:10:05 PM GMT+01:00, Senthil Kumar Selvaraj
><senthil_kumar.selvaraj@atmel.com> wrote:
>> >Hi,
>> >
>> >  When analyzing code size differences between a 4.9.x compiler and
>> >  trunk for the AVR target, I found quite a few cases where extra
>> >  registers were being used to hold smaller types (two 8 bit
>registers
>> >  for a uint8_t, for example).
>> >
>> >On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was
>the
>> >point
>> >  at which the dumps start to diverge.
>> >
>> >  For code like this
>> >
>> >#include <stdint.h>
>> >
>> >uint16_t get(const uint16_t wvalue)
>> >{
>> >  const uint8_t type = (wvalue >> 8);
>> >  uint16_t size = 0;
>> >
>> >  if (type == 1)
>> >  {
>> >    size = 20;
>> >  }
>> >  else if (type == 2)
>> >  {
>> >    size = 32;
>> >  }
>> >  return size;
>> >}
>> >
>> >  VRP substitutes wvalue for the type variable in the conditionals 
>> >  (simplify_cond_using_ranges:9506), as it figures out that the
>range 
>> >  of wvalue is the same as a uint8_t). That is, code goes from
>> >
>> ><bb 2>:
>> >_3 = wvalue_2(D) >> 8;
>> >type_4 = (const uint8_t) _3;
>> >if (type_4 == 1)
>> >  goto <bb 5>;
>> >else
>> >  goto <bb 3>;
>> >
>> >to
>> >
>> ><bb 2>:
>> >_3 = wvalue_2(D) >> 8;
>> >type_4 = (const uint8_t) _3;
>> >if (_3 == 1)
>> >  goto <bb 5>;
>> >else
>> >  goto <bb 3>;
>> >
>> >  This "widening" causes later passes to allocate extra registers 
>> >  (holding zeros for the extra bits) and generate extra comparisons
>> >  for the AVR target.
>> >
>> >  I found that in the 4.9.2 compiler I was benchmarking against, VRP
>> >  didn't figure out the range for wvalue and therefore the folding
>> >  didn't happen, which was why the code was better.
>> >
>> >  With the native compiler on my machine (gcc 5.2 x86_64) -
>replacing 
>> >  uint8_t by uint32_t and uint16_t by uint64_t, and shifting right
>by 
>> >  32 bits instead of 8 shows the same effect - the generated code
>uses
>> >  rdi instead of just di to hold the type variable.
>> >
>> >  Is this a bug? Should the folding happen only if the type
>> >  conversion was from a smaller type to a bigger one? Or is the
>backend
>> >  supposed to detect this pattern and deal with it?
>> 
>> We should probably avoid widening beyond word_mode (or sth else if
>there is sth suitable).
>> 
>
>Hmm, that does fix this problem. The below patch allows folding only if
>the operand size is smaller or equal to a word.
>
>diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>index e2393e4..3f5668c 100644
>--- a/gcc/tree-vrp.c
>+++ b/gcc/tree-vrp.c
>@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
>       innerop = gimple_assign_rhs1 (def_stmt);
> 
>       if (TREE_CODE (innerop) == SSA_NAME
>+    && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
>+      <= GET_MODE_SIZE(word_mode))
>          && !POINTER_TYPE_P (TREE_TYPE (innerop)))
>        {
>          value_range *vr = get_value_range (innerop);
>
>However, if the type conversion was from a smaller type to a larger
>type, and the
>smaller type was bigger than a word, this patch results in worse code.
>For example.
>
>#include <stdint.h>
>
>uint16_t get(const uint16_t wvalue)
>{
>  const uint32_t type = (wvalue >> 8);
>  uint16_t size = 0;
>
>  if (type == 1)
>  {
>    size = 20;
>  }
>  else if (type == 2)
>  {
>    size = 32;
>  }
>  return size;
>}
>
>Before the patch, the folding done caused type to be substituted with
>wvalue, and the
>smaller size resulted in better code. After the patch, since wvalue is
>bigger than a 
>word (for AVR), the folding doesn't happen and registers are allocated
>to hold all 32
>bits.
>
>How does the below patch look? It does the folding only if it is
>beneficial i.e., the
>value being substituted (innerop) is smaller than or equal the current
>one (op0)?

I think widenings up to word-mode are OK.

>diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>index e2393e4..3f5668c 100644
>--- a/gcc/tree-vrp.c
>+++ b/gcc/tree-vrp.c
>@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
>       innerop = gimple_assign_rhs1 (def_stmt);
> 
>       if (TREE_CODE (innerop) == SSA_NAME
>+    && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
>+      <= GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(op0))))
>          && !POINTER_TYPE_P (TREE_TYPE (innerop)))
>        {
>          value_range *vr = get_value_range (innerop);
>
>
>Regards
>Senthil
>
>> Richard.
>> 
>> >Regards
>> >Senthil
>> >
>> >
>> >details-raw vrp1 dump
>> >
>> >;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0,
>> >symbol_order=0)
>> >
>> >;; 1 loops found
>> >;;
>> >;; Loop 0
>> >;;  header 0, latch 1
>> >;;  depth 0, outer -1
>> >;;  nodes: 0 1 2 3 4 5
>> >;; 2 succs { 5 3 }
>> >;; 3 succs { 4 5 }
>> >;; 4 succs { 5 }
>> >;; 5 succs { 1 }
>> >
>> >ASSERT_EXPRs to be inserted
>> >
>> >Assertions to be inserted for type_4
>> >	if (type_4 == 1)
>> >
>> >	BB #3
>> >	EDGE 2->3 2 [55.9%]  (FALSE_VALUE,EXECUTABLE)
>> >	PREDICATE: type_4 ne_expr 1
>> >
>> >
>> >
>> >
>> >Updating SSA:
>> >Registering new PHI nodes in block #2
>> >Updating SSA information for statement type_4 = (const uint8_t) _3;
>> >Updating SSA information for statement if (type_4 == 1)
>> >Registering new PHI nodes in block #3
>> >Updating SSA information for statement type_6 = ASSERT_EXPR <type_4,
>> >type_4 != 1>;
>> >Updating SSA information for statement if (type_4 == 2)
>> >Registering new PHI nodes in block #4
>> >Registering new PHI nodes in block #5
>> >
>> >SSA replacement table
>> >N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
>> >
>> >type_6 -> { type_4 }
>> >Incremental SSA update started at block: 2
>> >Number of blocks in CFG: 6
>> >Number of blocks to update: 2 ( 33%)
>> >Affected blocks: 2 3
>> >
>> >
>> >
>> >SSA form after inserting ASSERT_EXPRs
>> >get (const uint16_t wvalue, const uint8_t windex, const void * *
>const
>> >address)
>> >{
>> >  uint16_t size;
>> >  const uint8_t type;
>> >  unsigned int _3;
>> >
>> >  <bb 2>:
>> >  _3 = wvalue_2(D) >> 8;
>> >  type_4 = (const uint8_t) _3;
>> >  if (type_4 == 1)
>> >    goto <bb 5>;
>> >  else
>> >    goto <bb 3>;
>> >
>> >  <bb 3>:
>> >  type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >  if (type_6 == 2)
>> >    goto <bb 4>;
>> >  else
>> >    goto <bb 5>;
>> >
>> >  <bb 4>:
>> >
>> >  <bb 5>:
>> >  # size_1 = PHI <20(2), 0(3), 32(4)>
>> >  return size_1;
>> >
>> >}
>> >
>> >
>> >Immediate_uses: 
>> >
>> >size_1 : --> single use.
>> >return size_1;
>> >
>> >wvalue_2(D) : --> single use.
>> >_3 = wvalue_2(D) >> 8;
>> >
>> >_3 : --> single use.
>> >type_4 = (const uint8_t) _3;
>> >
>> >type_4 : -->3 uses.
>> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >if (type_4 == 1)
>> >
>> >.MEM_5(D) : --> single use.
>> ># VUSE <.MEM_5(D)>
>> >return size_1;
>> >
>> >type_6 : --> single use.
>> >if (type_6 == 2)
>> >
>> >Adding destination of edge (0 -> 2) to worklist
>> >
>> >Simulating block 2
>> >
>> >Visiting statement:
>> >_3 = wvalue_2(D) >> 8;
>> >Intersecting
>> >  [0, 255]
>> >and
>> >  [0, +INF]
>> >to
>> >  [0, 255]
>> >Found new range for _3: [0, 255]
>> >interesting_ssa_edges: adding SSA use in type_4 = (const uint8_t)
>_3;
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >type_4 = (const uint8_t) _3;
>> >Found new range for type_4: [0, +INF]
>> >interesting_ssa_edges: adding SSA use in type_6 = ASSERT_EXPR
><type_4,
>> >type_4 != 1>;
>> >interesting_ssa_edges: adding SSA use in if (type_4 == 1)
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >if (type_4 == 1)
>> >
>> >Visiting conditional with predicate: if (type_4 == 1)
>> >
>> >With known ranges
>> >	type_4: [0, +INF]
>> >
>> >Predicate evaluates to: DON'T KNOW
>> >Adding destination of edge (2 -> 5) to worklist
>> >Adding destination of edge (2 -> 3) to worklist
>> >
>> >Simulating block 3
>> >
>> >Visiting statement:
>> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >Intersecting
>> >  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>> >and
>> >  [0, +INF]
>> >to
>> >  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>> >Found new range for type_6: ~[1, 1]
>> >interesting_ssa_edges: adding SSA use in if (type_6 == 2)
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >if (type_6 == 2)
>> >
>> >Visiting conditional with predicate: if (type_6 == 2)
>> >
>> >With known ranges
>> >	type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>> >
>> >Predicate evaluates to: DON'T KNOW
>> >Adding destination of edge (3 -> 4) to worklist
>> >
>> >Simulating block 4
>> >
>> >Simulating block 5
>> >
>> >Visiting PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
>> >    Argument #0 (2 -> 5 executable)
>> >	20: [20, 20]
>> >    Argument #1 (3 -> 5 executable)
>> >	0: [0, 0]
>> >Meeting
>> >  [20, 20]
>> >and
>> >  [0, 0]
>> >to
>> >  [0, 20]
>> >    Argument #2 (4 -> 5 executable)
>> >	32: [32, 32]
>> >Meeting
>> >  [0, 20]
>> >and
>> >  [32, 32]
>> >to
>> >  [0, 32]
>> >Intersecting
>> >  [0, 32]
>> >and
>> >  [0, +INF]
>> >to
>> >  [0, 32]
>> >Found new range for size_1: [0, 32]
>> >interesting_ssa_edges: adding SSA use in return size_1;
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >return size_1;
>> >
>> >Value ranges after VRP:
>> >
>> >size_1: [0, 32]
>> >wvalue_2(D): VARYING
>> >_3: [0, 255]
>> >type_4: [0, +INF]
>> >type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>> >
>> >
>> >
>> >Substituting values and folding statements
>> >
>> >Folding statement: _3 = wvalue_2(D) >> 8;
>> >Not folded
>> >Folding statement: type_4 = (const uint8_t) _3;
>> >Not folded
>> >Folding statement: if (type_4 == 1)
>> >Folded into: if (_3 == 1)
>> >
>> >Folding statement: if (type_6 == 2)
>> >Not folded
>> >Folding PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
>> >No folding possible
>> >Folding statement: return size_1;
>> >Not folded
>> >get (const uint16_t wvalue, const uint8_t windex, const void * *
>const
>> >address)
>> >{
>> >  uint16_t size;
>> >  const uint8_t type;
>> >  unsigned int _3;
>> >
>> >  <bb 2>:
>> >  _3 = wvalue_2(D) >> 8;
>> >  type_4 = (const uint8_t) _3;
>> >  if (_3 == 1)
>> >    goto <bb 5>;
>> >  else
>> >    goto <bb 3>;
>> >
>> >  <bb 3>:
>> >  if (type_4 == 2)
>> >    goto <bb 4>;
>> >  else
>> >    goto <bb 5>;
>> >
>> >  <bb 4>:
>> >
>> >  <bb 5>:
>> >  # size_1 = PHI <20(2), 0(3), 32(4)>
>> >  return size_1;
>> >
>> >}
>> 
>> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-11-13 15:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12 19:10 VRP causing extra register usage Senthil Kumar Selvaraj
2015-11-12 19:37 ` Richard Biener
2015-11-13 13:36   ` Senthil Kumar Selvaraj
2015-11-13 15:15     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).