> Well - the best way would be to expose the target specifics to GIMPLE
> at some point in the optimization pipeline.  My guess would be that it's
> appropriate after loop optimizations (but maybe before induction variable
> optimization).
> 
> That is, have a pass that applies register promotion to all SSA names
> in the function, inserting appropriate truncations and extensions.  That
> way you'd never see (set (subreg...) on RTL.  The VRP and DOM
> passes running after that pass would then be able to aggressively
> optimize redundant truncations and extensions.
> 
> Effects on debug information are to be considered.  You can change
> the type of SSA names in-place but you don't want to do that for
> user DECLs (and we can't have the SSA name type and its DECL
> type differ - and not sure if we might want to lift that restriction).


Thanks for the comments. Here is a prototype patch that implements a
type promotion pass. This pass records SSA variables that will have
values in higher bits (than the original type precision) if promoted and
uses this information in inserting appropriate truncations and
extensions. This pass also classifies some of the stmts that sets ssa's
to be unsafe to promote. Here is a gimple difference for the type
promotion as compared to previous dump for a testcase.

 crc2 (short unsigned int crc, unsigned char data)
 {
   unsigned char carry;
   unsigned char x16;
   unsigned char i;
-  unsigned char ivtmp_5;
+  unsigned int _2;
+  unsigned char _3;
+  unsigned int _4;
+  unsigned int _5;
   unsigned char _9;
-  unsigned char _10;
-  unsigned char ivtmp_18;
+  unsigned int _10;
+  unsigned int _11;
+  unsigned int _12;
+  unsigned int _13;
+  unsigned int _15;
+  unsigned int _16;
+  unsigned int _18;
+  unsigned int _21;
+  unsigned int _22;
+  unsigned int _24;
+  short unsigned int _26;
+  unsigned char _27;
+  unsigned int _28;
+  unsigned int _29;
+  unsigned int _30;

   <bb 2>:
+  _12 = (unsigned int) data_8(D);
+  _2 = (unsigned int) crc_7(D);

   <bb 3>:
-  # crc_28 = PHI <crc_2(5), crc_7(D)(2)>
-  # data_29 = PHI <data_12(5), data_8(D)(2)>
-  # ivtmp_18 = PHI <ivtmp_5(5), 8(2)>
-  _9 = (unsigned char) crc_28;
-  _10 = _9 ^ data_29;
-  x16_11 = _10 & 1;
-  data_12 = data_29 >> 1;
-  if (x16_11 == 1)
+  # _30 = PHI <_28(5), _2(2)>
+  # _16 = PHI <_29(5), _12(2)>
+  # _4 = PHI <_18(5), 8(2)>
+  _9 = (unsigned char) _30;
+  _5 = (unsigned int) _9;
+  _22 = _5 ^ _16;
+  _10 = _22 & 1;
+  _29 = _16 >> 1;
+  _27 = (unsigned char) _10;
+  if (_27 == 1)
     goto <bb 4>;
   else
     goto <bb 7>;

   <bb 4>:
-  crc_13 = crc_28 ^ 16386;
-  crc_24 = crc_13 >> 1;
-  crc_15 = crc_24 | 32768;
+  _11 = _30 ^ 16386;
+  _13 = _11 >> 1;
+  _24 = _13 | 32768;

   <bb 5>:
-  # crc_2 = PHI <crc_15(4), crc_21(7)>
-  ivtmp_5 = ivtmp_18 - 1;
-  if (ivtmp_5 != 0)
+  # _28 = PHI <_24(4), _15(7)>
+  _18 = _4 - 1;
+  _3 = (unsigned char) _18;
+  if (_3 != 0)
     goto <bb 3>;
   else
     goto <bb 6>;

   <bb 6>:
-  # crc_19 = PHI <crc_2(5)>
-  return crc_19;
+  # _21 = PHI <_28(5)>
+  _26 = (short unsigned int) _21;
+  return _26;

   <bb 7>:
-  crc_21 = crc_28 >> 1;
+  _15 = _30 >> 1;
   goto <bb 5>;

 }


I experimented with few simple test-cases and results so far are mixed.
It also seems that subsequent passes are not always optimising as
expected. I haven't looked in detail but will look into it based on the
feedback.

Please also note that this pass still doest handle debug instructions
and there are couple regression failures for ARM.

Thanks,
Kugan