[Patch, AVR]: QI builtins for parity, popcount, 1<< n

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [Patch, AVR]: QI builtins for parity, popcount, 1<< n
@ 2011-06-16 16:08 Georg-Johann Lay
  2011-06-16 17:08 ` Joseph S. Myers
  0 siblings, 1 reply; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-16 16:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: Denis Chertykov, Anatoly Sokolov, Eric B. Weddington

[-- Attachment #1: Type: text/plain, Size: 1196 bytes --]

The recent implementation of some asm function in libgcc added
__popcountqi2 and __parityqi2. This patch makes these functions
available as __builtin_avr_popcount8 resp. __builtin_avr_parity8.

Moreover, just out of a mood, I wrote a builtin for 1<<n.
1<<n is sometimes used to set a variable SFR bit. The builtin supplies
a fast, loop-free implementation.

The nice thing is that all these can represented explicitly so that
gcc sees what's going on and can fold if it sees operation on a
compile time constant.

Johann

	* config/avr/avr.c (avr_init_builtins, avr_builtin_id,
	bdesc_1arg): Add support for __builtin_avr_parity8,
	__builtin_avr_popcount8, __builtin_avr_pow2.
	(adjust_insn_length): Filter out "ashlqi2_1" insn in length
	computation.
	* config/avr/avr.c (avr_cpu_cpp_builtins): New builtin defines for
	__BUILTIN_AVR_PARITY8, __BUILTIN_AVR_POPCOUNT8,	__BUILTIN_AVR_POW2.
	* config/avr/avr.md (parityqi2): New expander.
	(popcountqi2): New expander.
	(*parityqihi2.libgcc): New insn.
	(*popcountqi2.libgcc): New insn.
	(ashlqi2_1): New insn.
	* doc/extend.texi (AVR Built-in Functions): Document new builtins
	__builtin_avr_parity8, __builtin_avr_popcount8,
	__builtin_avr_pow2.

[-- Attachment #2: builtin-pop.diff --]
[-- Type: text/x-patch, Size: 5232 bytes --]

Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(Revision 175104)
+++ doc/extend.texi	(Arbeitskopie)
@@ -8203,6 +8203,21 @@ int __builtin_avr_fmuls (char, char)
 int __builtin_avr_fmulsu (char, unsigned char)
 @end smallexample
 
+The following built-in functions are a supplements to gcc's
+@code{__builtin_parity*} resp. @code{__builtin_popcount*} built-ins
+for 8-bit values and are implemented as library calls:
+@smallexample
+unsigned char __builtin_avr_parity8 (unsigned char)
+unsigned char __builtin_avr_popcount8 (unsigned char)
+@end smallexample
+
+@smallexample
+unsigned char __builtin_avr_pow2 (unsigned char)
+@end smallexample
+This built-in supplies a fast, loop-free implementation for the @var{N}-th
+power of two as of @code{1 << (N & 7)} that takes about seven ticks resp.
+seven instructions.
+
 In order to delay execution for a specific number of cycles, GCC
 implements
 @smallexample
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(Revision 175104)
+++ config/avr/avr.md	(Arbeitskopie)
@@ -3321,6 +3321,62 @@ (define_insn "delay_cycles_4"
   [(set_attr "length" "9")
    (set_attr "cc" "clobber")])
 
+;; __builtin_avr_parity8
+(define_expand "parityqi2"
+  [(set (reg:QI 24)
+        (match_operand:QI 1 "register_operand" ""))
+   (set (reg:HI 24)
+        (zero_extend:HI (parity:QI (reg:QI 24))))
+   (set (match_operand:QI 0 "register_operand" "")
+        (reg:QI 24))]
+  ""
+  "")
+
+(define_insn "*parityqihi2.libgcc"
+  [(set (reg:HI 24)
+        (zero_extend:HI (parity:QI (reg:QI 24))))]
+  ""
+  "%~call __parityqi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+;; __builtin_avr_popcount8
+(define_expand "popcountqi2"
+  [(set (reg:QI 24)
+        (match_operand:QI 1 "register_operand" ""))
+   (set (reg:QI 24)
+        (popcount:QI (reg:QI 24)))
+   (set (match_operand:QI 0 "register_operand" "")
+        (reg:QI 24))]
+  ""
+  "")
+
+(define_insn "*popcountqi2.libgcc"
+  [(set (reg:QI 24)
+        (popcount:QI (reg:QI 24)))]
+  ""
+  "%~call __popcountqi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+;; __builtin_avr_pow2
+;; $0 = (1 << ($1 & 7))
+(define_insn "ashlqi2_1"
+  [(set (match_operand:QI 0 "register_operand" "=&d")
+        (ashift:QI (const_int 1)
+                   (and:QI (match_operand:QI 1 "register_operand" "r")
+                           (const_int 7))))]
+  ""
+  "ldi %0, 1
+	sbrc %1, 1
+	ldi %0, 4
+	sbrc %1, 0
+	lsl %0
+	sbrc %1, 2
+	swap %0"
+  [(set_attr "length" "7")
+   (set_attr "cc" "clobber")])
+
 ;; CPU instructions
 
 ;; NOP taking 1 or 2 Ticks 
Index: config/avr/avr-c.c
===================================================================
--- config/avr/avr-c.c	(Revision 175104)
+++ config/avr/avr-c.c	(Arbeitskopie)
@@ -93,6 +93,9 @@ avr_cpu_cpp_builtins (struct cpp_reader
   cpp_define (pfile, "__BUILTIN_AVR_SLEEP");
   cpp_define (pfile, "__BUILTIN_AVR_SWAP");
   cpp_define (pfile, "__BUILTIN_AVR_DELAY_CYCLES");
+  cpp_define (pfile, "__BUILTIN_AVR_PARITY8");
+  cpp_define (pfile, "__BUILTIN_AVR_POPCOUNT8");
+  cpp_define (pfile, "__BUILTIN_AVR_POW2");
 
   if (AVR_HAVE_MUL)
     {
Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c	(Revision 175104)
+++ config/avr/avr.c	(Arbeitskopie)
@@ -4740,9 +4740,11 @@ adjust_insn_length (rtx insn, int len)
 	      break;
 	    }
 	}
-      else if (GET_CODE (op[1]) == ASHIFT
-	  || GET_CODE (op[1]) == ASHIFTRT
-	  || GET_CODE (op[1]) == LSHIFTRT)
+      else if ((GET_CODE (op[1]) == ASHIFT
+                || GET_CODE (op[1]) == ASHIFTRT
+                || GET_CODE (op[1]) == LSHIFTRT)
+               /* Filter out "ashlqi2_1" */
+               && AND != GET_CODE (XEXP (op[1], 1)))
 	{
 	  rtx ops[10];
 	  ops[0] = op[0];
@@ -6614,7 +6616,10 @@ enum avr_builtin_id
     AVR_BUILTIN_FMUL,
     AVR_BUILTIN_FMULS,
     AVR_BUILTIN_FMULSU,
-    AVR_BUILTIN_DELAY_CYCLES
+    AVR_BUILTIN_DELAY_CYCLES,
+    AVR_BUILTIN_PARITY8,
+    AVR_BUILTIN_POPCOUNT8,
+    AVR_BUILTIN_POW2
   };
 
 #define DEF_BUILTIN(NAME, TYPE, CODE)                                   \
@@ -6665,6 +6670,12 @@ avr_init_builtins (void)
   DEF_BUILTIN ("__builtin_avr_swap", uchar_ftype_uchar, AVR_BUILTIN_SWAP);
   DEF_BUILTIN ("__builtin_avr_delay_cycles", void_ftype_ulong, 
                AVR_BUILTIN_DELAY_CYCLES);
+  DEF_BUILTIN ("__builtin_avr_parity8", uchar_ftype_uchar,
+               AVR_BUILTIN_PARITY8);
+  DEF_BUILTIN ("__builtin_avr_popcount8", uchar_ftype_uchar,
+               AVR_BUILTIN_POPCOUNT8);
+  DEF_BUILTIN ("__builtin_avr_pow2", uchar_ftype_uchar,
+               AVR_BUILTIN_POW2);
 
   if (AVR_HAVE_MUL)
     {
@@ -6694,6 +6705,9 @@ static const struct avr_builtin_descript
 bdesc_1arg[] =
   {
     { CODE_FOR_rotlqi3_4, "__builtin_avr_swap", AVR_BUILTIN_SWAP }
+    , { CODE_FOR_parityqi2, "__builtin_avr_parity8", AVR_BUILTIN_PARITY8 }
+    , { CODE_FOR_popcountqi2, "__builtin_avr_popcount8", AVR_BUILTIN_POPCOUNT8 }
+    , { CODE_FOR_ashlqi2_1, "__builtin_avr_pow2", AVR_BUILTIN_POW2 }
   };
 
 static const struct avr_builtin_description

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: QI builtins for parity, popcount, 1<< n
  2011-06-16 16:08 [Patch, AVR]: QI builtins for parity, popcount, 1<< n Georg-Johann Lay
@ 2011-06-16 17:08 ` Joseph S. Myers
  2011-06-17 11:04   ` Georg-Johann Lay
  0 siblings, 1 reply; 10+ messages in thread
From: Joseph S. Myers @ 2011-06-16 17:08 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: gcc-patches, Denis Chertykov, Anatoly Sokolov, Eric B. Weddington

On Thu, 16 Jun 2011, Georg-Johann Lay wrote:

> The recent implementation of some asm function in libgcc added
> __popcountqi2 and __parityqi2. This patch makes these functions
> available as __builtin_avr_popcount8 resp. __builtin_avr_parity8.
> 
> Moreover, just out of a mood, I wrote a builtin for 1<<n.
> 1<<n is sometimes used to set a variable SFR bit. The builtin supplies
> a fast, loop-free implementation.

I hope this whole patch is just intended as an example of how to add 
built-in functions and not as a serious proposal for an addition to GCC, 
since it doesn't make sense to add machine-specific built-in functions for 
things that can so readily be represented in generic GNU C; instead, you 
should make generic GNU C generate the right code (for example, by 
handling __builtin_parity and __builtin_popcount smartly on zero-extended 
values if QImode patterns are available - if it doesn't already do so).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: QI builtins for parity, popcount, 1<< n
  2011-06-16 17:08 ` Joseph S. Myers
@ 2011-06-17 11:04   ` Georg-Johann Lay
  2011-06-17 13:06     ` Joseph S. Myers
  0 siblings, 1 reply; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-17 11:04 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: gcc-patches, Denis Chertykov, Anatoly Sokolov, Eric B. Weddington

Joseph S. Myers schrieb:
> On Thu, 16 Jun 2011, Georg-Johann Lay wrote:
> 
>> The recent implementation of some asm function in libgcc added
>> __popcountqi2 and __parityqi2. This patch makes these functions
>> available as __builtin_avr_popcount8 resp. __builtin_avr_parity8.
>>
>> Moreover, just out of a mood, I wrote a builtin for 1<<n.
>> 1<<n is sometimes used to set a variable SFR bit. The builtin supplies
>> a fast, loop-free implementation.
> 
> I hope this whole patch is just intended as an example of how to add 
> built-in functions and not as a serious proposal for an addition to GCC, 
> since it doesn't make sense to add machine-specific built-in functions for 
> things that can so readily be represented in generic GNU C; instead, you 
> should make generic GNU C generate the right code (for example, by 
> handling __builtin_parity and __builtin_popcount smartly on zero-extended 
> values if QImode patterns are available - if it doesn't already do so).
> 
> Joseph S. Myers

I don't see what's bat with the patch, it's straight forward.

As I don't intend to introduce new register classes and constraints
(hard regnos won't do any more if combine is intended to come up with
new insn, what it actually does) just for that... Dropping the patch.

Johann

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: QI builtins for parity, popcount, 1<< n
  2011-06-17 11:04   ` Georg-Johann Lay
@ 2011-06-17 13:06     ` Joseph S. Myers
  2011-06-17 19:03       ` Georg-Johann Lay
  0 siblings, 1 reply; 10+ messages in thread
From: Joseph S. Myers @ 2011-06-17 13:06 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: gcc-patches, Denis Chertykov, Anatoly Sokolov, Eric B. Weddington

On Fri, 17 Jun 2011, Georg-Johann Lay wrote:

> I don't see what's bat with the patch, it's straight forward.

C is a high-level language, defined in terms of high-level semantics 
rather than machine instructions.  C code should be written where possible 
using machine-independent functionality, falling into machine-dependent 
operations only where the semantics cannot readily be represented in a 
machine-independent way; the compiler should generally be responsible for 
picking optimal instructions for the source code.

C code should write "1 << n" (or "1 << (n & 7)" if that's what it wants) 
for shifts rather than __builtin_avr_pow2.  That way it's more portable 
and more readable to general C programmers who aren't familiar with all 
the machine-specific interfaces.  Similarly, code wanting to add values 
should use the "+" operator rather than each target having its own 
__builtin_<arch>_add.  And since parity and population-count operations 
are widely present and generically useful, GNU C has generic built-in 
functions for those, so code should use __builtin_parity and 
__builtin_popcount on all machines rather than machine-specific variants; 
such machine-specific variants should not exist.

The machine-independent parts of the compiler should know about the 
equivalence of (popcount X) and (popcount (zero_extend X)), (parity X) and 
(parity (zero_extend X)) and (parity X) and (parity (sign_extend X)) if 
the sign-extension is by an even number of bits - in fact, there is 
already code in simplify_unary_operation_1 that does know this (the 
assumption that all sign-extensions are by an even number of bits is 
hardcoded).  The target should just need to describe how to code these 
operations on various modes.  If the existing code doesn't suffice to 
cause popcountqi etc. patterns to be used for the generic built-in 
functions, the appropriate fix is generic rather than adding new 
target-specific built-in functions - just as if the compiler didn't 
generate your target's addition instruction from the '+' operator, the 
right fix is not to add __builtin_<arch>_add to generate that instruction 
but rather to make '+' generate it.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: QI builtins for parity, popcount, 1<< n
  2011-06-17 13:06     ` Joseph S. Myers
@ 2011-06-17 19:03       ` Georg-Johann Lay
  2011-06-17 19:14         ` Georg-Johann Lay
  2011-06-20 10:53         ` Joseph S. Myers
  0 siblings, 2 replies; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-17 19:03 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: gcc-patches, Denis Chertykov, Anatoly Sokolov,
	Eric B. Weddington, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 8327 bytes --]

Joseph S. Myers schrieb:
> On Fri, 17 Jun 2011, Georg-Johann Lay wrote:
> 
>> I don't see what's bat with the patch, it's straight forward.
> 
> C is a high-level language, defined in terms of high-level
> semantics rather than machine instructions.  C code should be
> written where possible using machine-independent functionality,
> falling into machine-dependent operations only where the semantics
> cannot readily be represented in a machine-independent way; the
> compiler should generally be responsible for picking optimal
> instructions for the source code.
> 
> C code should write "1 << n" (or "1 << (n & 7)" if that's what it
> wants) for shifts rather than __builtin_avr_pow2.  That way it's
> more portable and more readable to general C programmers who aren't
> familiar with all the machine-specific interfaces.  Similarly, code
> wanting to add values should use the "+" operator rather than each
> target having its own __builtin_<arch>_add.  And since parity and
> population-count operations are widely present and generically
> useful, GNU C has generic built-in functions for those, so code
> should use __builtin_parity and __builtin_popcount on all machines
> rather than machine-specific variants; such machine-specific
> variants should not exist.
> 
> The machine-independent parts of the compiler should know about the
>  equivalence of (popcount X) and (popcount (zero_extend X)),
> (parity X) and (parity (zero_extend X)) and (parity X) and (parity
> (sign_extend X)) if the sign-extension is by an even number of bits
> - in fact, there is already code in simplify_unary_operation_1 that
> does know this (the assumption that all sign-extensions are by an
> even number of bits is hardcoded).  The target should just need to
> describe how to code these operations on various modes.  If the
> existing code doesn't suffice to cause popcountqi etc. patterns to
> be used for the generic built-in functions, the appropriate fix is
> generic rather than adding new target-specific built-in functions -
> just as if the compiler didn't generate your target's addition
> instruction from the '+' operator, the right fix is not to add
> __builtin_<arch>_add to generate that instruction but rather to
> make '+' generate it.
> 
>
> Joseph S. Myers

Thanks for you detailed explanations.  For popcount and parity I
agree, the patch is cleaner now and there are no ad-hoc builtins
needed to get the QI versions produced.

Trying popcountsi2 confused be a bit: Without popcountsi2 in md,
optabs expands to a function as HI = popcount (SI) as I expect from
the prototype/documentation.  For the expander, however, destination
(operand0) must be SI in order to get it to work. With HI destination
the expander is ignored.

For the 1 << n type of shifts the situation on a 8-bit ÂµC is
considerably different to a PC:

On bare metal with RAM of 128 bytes, 2K for the program and at your
option also hard real time constraints, the situation is different and
sometimes each instruction/tick/byte counts.  In my programs I avoid
shifting by variable offsets if possible; but some guys do it.
The resulting code is a loop because AVR can only shift by one bit.
Together with C spec that results in a bulky 16-bit loop, see PR29560.

A typical piece of code might look like this:

#define SFR (*((volatile unsigned char*) 0x2b))

void set_port (unsigned char pin)
{
    SFR |= (1 << pin);
}

which compiles to

set_port:
	in r25,43-0x20	 ;  7	*movqi/4	[length = 1]
	ldi r18,lo8(1)	 ;  9	*movhi/4	[length = 2]
	ldi r19,hi8(1)
	rjmp 2f	 ;  10	ashlhi3/1	[length = 6]
1:	lsl r18
	rol r19
2:	dec r24
	brpl 1b
	or r25,r18	 ;  11	iorqi3/1	[length = 1]
	out 43-0x20,r25	 ;  13	*movqi/3	[length = 1]
	ret	 ;  21	return	[length = 1]

Trying to hack around that and to get at least an 8-bit loop fails.
Sprinkling (unsigned char) casts won't help and writing
  1 << (pin & 7)
will add just another instruction (andqi3).

Insn combine does not come up with more than
(set (reg:HI)
     (ashift:HI (const_int 1)
                (reg:QI)))
and even if a suitable pattern would show up it is not nice to clutter
a backend with this sorts of code like
(set (reg:QI)
     (subreg:QI (ashift:HI (const_int 1)
                           (and:QI (reg:QI)
                                   (const_int 7))) 0))
or whatever.  There is a ashlqi3 pattern in avr BE but I never managed
to produce it (except with -mint8 which changes ABI and is not really
supported any more).

For the addition, some users torture poor AVR with long long.
I agree that it's overkill and anyone that writes that in C must know
what he is doing.

Anyway, there are programmers that write such code and expect avr-gcc
to come up with code like

  subi r18, -1
  sbci r19, -1
  sbci r20, -1
  sbci r21, -1
  sbci r22, -1
  sbci r23, -1
  sbci r24, -1
  sbci r25, -1

for x=x+1.  What they actually will get is

	mov r14,r18	 ;  215	*movqi/1	[length = 1]
	inc r14	 ;  24	addqi3/3	[length = 1]
	ldi r30,lo8(1)	 ;  25	*movqi/2	[length = 1]
	cp r14,r18	 ;  26	*cmpqi/2	[length = 1]
	brlo .L5	 ;  27	branch	[length = 1]
	ldi r30,lo8(0)	 ;  28	*movqi/1	[length = 1]
.L5:
	mov r15,r30	 ;  216	*movqi/1	[length = 1]
	add r15,r19	 ;  36	addqi3/1	[length = 1]
	ldi r16,lo8(1)	 ;  37	*movqi/2	[length = 1]
	cp r15,r19	 ;  38	*cmpqi/2	[length = 1]
	brlo .L7	 ;  39	branch	[length = 1]
	ldi r16,lo8(0)	 ;  40	*movqi/1	[length = 1]
.L7:
	add r16,r20	 ;  50	addqi3/1	[length = 1]
	ldi r17,lo8(1)	 ;  51	*movqi/2	[length = 1]
	cp r16,r20	 ;  52	*cmpqi/2	[length = 1]
	brlo .L9	 ;  53	branch	[length = 1]
	ldi r17,lo8(0)	 ;  54	*movqi/1	[length = 1]
.L9:
	add r17,r21	 ;  64	addqi3/1	[length = 1]
	ldi r27,lo8(1)	 ;  65	*movqi/2	[length = 1]
	cp r17,r21	 ;  66	*cmpqi/2	[length = 1]
	brlo .L11	 ;  67	branch	[length = 1]
	ldi r27,lo8(0)	 ;  68	*movqi/1	[length = 1]
.L11:
	add r27,r22	 ;  78	addqi3/1	[length = 1]
	ldi r26,lo8(1)	 ;  79	*movqi/2	[length = 1]
	cp r27,r22	 ;  80	*cmpqi/2	[length = 1]
	brlo .L13	 ;  81	branch	[length = 1]
	ldi r26,lo8(0)	 ;  82	*movqi/1	[length = 1]
.L13:
	add r26,r23	 ;  92	addqi3/1	[length = 1]
	ldi r31,lo8(1)	 ;  93	*movqi/2	[length = 1]
	cp r26,r23	 ;  94	*cmpqi/2	[length = 1]
	brlo .L15	 ;  95	branch	[length = 1]
	ldi r31,lo8(0)	 ;  96	*movqi/1	[length = 1]
.L15:
	add r31,r24	 ;  106	addqi3/1	[length = 1]
	ldi r30,lo8(1)	 ;  107	*movqi/2	[length = 1]
	cp r31,r24	 ;  108	*cmpqi/2	[length = 1]
	brlo .L17	 ;  109	branch	[length = 1]
	ldi r30,lo8(0)	 ;  110	*movqi/1	[length = 1]
.L17:
	mov r18,r14	 ;  138	*movqi/1	[length = 1]
	mov r19,r15	 ;  139	*movqi/1	[length = 1]
	mov r20,r16	 ;  140	*movqi/1	[length = 1]
	mov r21,r17	 ;  141	*movqi/1	[length = 1]
	mov r22,r27	 ;  142	*movqi/1	[length = 1]
	mov r23,r26	 ;  143	*movqi/1	[length = 1]
	mov r24,r31	 ;  144	*movqi/1	[length = 1]
	add r25,r30	 ;  145	addqi3/1	[length = 1]

I agree with Denis that 64-bit support in avr is too complex.

However, if someone sees such code and asks what can be done about it,
there is no straight forward solution (movdi in avr BE is not really
straight forward), and typical reactions are complete lack of
understanding and comments like
"free software = scrap, use another compiler"

Guys that write such code would be glad if there was something like
builtin_add64...

So here is my question: Would it be big deal to teach optabs to expand
plus:di, minus:di as libcall?  Maybe even compare is feasible? Like so:

* Just test if there is a corresponding insn
* if not, look if there is a corresponding optab entry
* if not, lower to word_mode.

To come back to the original topic, here is a tentative patch for
better popcount and parity:

	* config/avr/t-avr (LIB1ASMFUNCS): Rename _loop_ffsqi2 to
	_ffsqi2_nz.
	* confif/avr/libgcc.S: Ditto. Rename __loop_ffsqi2 to __ffsqi2_nz.
	(__ctzsi2, __ctzhi2): Map zero to 255.
	(__popcounthi2): Use r27 instead of r30.
	(__popcountdi2): Use r30 instead of r27.
	* config/avr/avr.md (parityhi2): New expander.
	(popcounthi2): New expander.
	(popcountsi2): New expander.
	(*parityhi2.libgcc): New insn.
	(*parityqihi2.libgcc): New insn.
	(*popcounthi2.libgcc): New insn.
	(*popcountsi2.libgcc): New insn.
	(*popcountqi2.libgcc): New insn.
	(*popcountqihi2.libgcc): New insn_and_split.

Johann

BTW, combine tests for
   (parity:HI (reg:QI))
it does not try
   (parity:HI (zero_extend:HI (reg:QI)))


[-- Attachment #2: builtin-pop.diff --]
[-- Type: text/x-patch, Size: 5232 bytes --]

Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(Revision 175104)
+++ doc/extend.texi	(Arbeitskopie)
@@ -8203,6 +8203,21 @@ int __builtin_avr_fmuls (char, char)
 int __builtin_avr_fmulsu (char, unsigned char)
 @end smallexample
 
+The following built-in functions are a supplements to gcc's
+@code{__builtin_parity*} resp. @code{__builtin_popcount*} built-ins
+for 8-bit values and are implemented as library calls:
+@smallexample
+unsigned char __builtin_avr_parity8 (unsigned char)
+unsigned char __builtin_avr_popcount8 (unsigned char)
+@end smallexample
+
+@smallexample
+unsigned char __builtin_avr_pow2 (unsigned char)
+@end smallexample
+This built-in supplies a fast, loop-free implementation for the @var{N}-th
+power of two as of @code{1 << (N & 7)} that takes about seven ticks resp.
+seven instructions.
+
 In order to delay execution for a specific number of cycles, GCC
 implements
 @smallexample
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(Revision 175104)
+++ config/avr/avr.md	(Arbeitskopie)
@@ -3321,6 +3321,62 @@ (define_insn "delay_cycles_4"
   [(set_attr "length" "9")
    (set_attr "cc" "clobber")])
 
+;; __builtin_avr_parity8
+(define_expand "parityqi2"
+  [(set (reg:QI 24)
+        (match_operand:QI 1 "register_operand" ""))
+   (set (reg:HI 24)
+        (zero_extend:HI (parity:QI (reg:QI 24))))
+   (set (match_operand:QI 0 "register_operand" "")
+        (reg:QI 24))]
+  ""
+  "")
+
+(define_insn "*parityqihi2.libgcc"
+  [(set (reg:HI 24)
+        (zero_extend:HI (parity:QI (reg:QI 24))))]
+  ""
+  "%~call __parityqi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+;; __builtin_avr_popcount8
+(define_expand "popcountqi2"
+  [(set (reg:QI 24)
+        (match_operand:QI 1 "register_operand" ""))
+   (set (reg:QI 24)
+        (popcount:QI (reg:QI 24)))
+   (set (match_operand:QI 0 "register_operand" "")
+        (reg:QI 24))]
+  ""
+  "")
+
+(define_insn "*popcountqi2.libgcc"
+  [(set (reg:QI 24)
+        (popcount:QI (reg:QI 24)))]
+  ""
+  "%~call __popcountqi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+;; __builtin_avr_pow2
+;; $0 = (1 << ($1 & 7))
+(define_insn "ashlqi2_1"
+  [(set (match_operand:QI 0 "register_operand" "=&d")
+        (ashift:QI (const_int 1)
+                   (and:QI (match_operand:QI 1 "register_operand" "r")
+                           (const_int 7))))]
+  ""
+  "ldi %0, 1
+	sbrc %1, 1
+	ldi %0, 4
+	sbrc %1, 0
+	lsl %0
+	sbrc %1, 2
+	swap %0"
+  [(set_attr "length" "7")
+   (set_attr "cc" "clobber")])
+
 ;; CPU instructions
 
 ;; NOP taking 1 or 2 Ticks 
Index: config/avr/avr-c.c
===================================================================
--- config/avr/avr-c.c	(Revision 175104)
+++ config/avr/avr-c.c	(Arbeitskopie)
@@ -93,6 +93,9 @@ avr_cpu_cpp_builtins (struct cpp_reader
   cpp_define (pfile, "__BUILTIN_AVR_SLEEP");
   cpp_define (pfile, "__BUILTIN_AVR_SWAP");
   cpp_define (pfile, "__BUILTIN_AVR_DELAY_CYCLES");
+  cpp_define (pfile, "__BUILTIN_AVR_PARITY8");
+  cpp_define (pfile, "__BUILTIN_AVR_POPCOUNT8");
+  cpp_define (pfile, "__BUILTIN_AVR_POW2");
 
   if (AVR_HAVE_MUL)
     {
Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c	(Revision 175104)
+++ config/avr/avr.c	(Arbeitskopie)
@@ -4740,9 +4740,11 @@ adjust_insn_length (rtx insn, int len)
 	      break;
 	    }
 	}
-      else if (GET_CODE (op[1]) == ASHIFT
-	  || GET_CODE (op[1]) == ASHIFTRT
-	  || GET_CODE (op[1]) == LSHIFTRT)
+      else if ((GET_CODE (op[1]) == ASHIFT
+                || GET_CODE (op[1]) == ASHIFTRT
+                || GET_CODE (op[1]) == LSHIFTRT)
+               /* Filter out "ashlqi2_1" */
+               && AND != GET_CODE (XEXP (op[1], 1)))
 	{
 	  rtx ops[10];
 	  ops[0] = op[0];
@@ -6614,7 +6616,10 @@ enum avr_builtin_id
     AVR_BUILTIN_FMUL,
     AVR_BUILTIN_FMULS,
     AVR_BUILTIN_FMULSU,
-    AVR_BUILTIN_DELAY_CYCLES
+    AVR_BUILTIN_DELAY_CYCLES,
+    AVR_BUILTIN_PARITY8,
+    AVR_BUILTIN_POPCOUNT8,
+    AVR_BUILTIN_POW2
   };
 
 #define DEF_BUILTIN(NAME, TYPE, CODE)                                   \
@@ -6665,6 +6670,12 @@ avr_init_builtins (void)
   DEF_BUILTIN ("__builtin_avr_swap", uchar_ftype_uchar, AVR_BUILTIN_SWAP);
   DEF_BUILTIN ("__builtin_avr_delay_cycles", void_ftype_ulong, 
                AVR_BUILTIN_DELAY_CYCLES);
+  DEF_BUILTIN ("__builtin_avr_parity8", uchar_ftype_uchar,
+               AVR_BUILTIN_PARITY8);
+  DEF_BUILTIN ("__builtin_avr_popcount8", uchar_ftype_uchar,
+               AVR_BUILTIN_POPCOUNT8);
+  DEF_BUILTIN ("__builtin_avr_pow2", uchar_ftype_uchar,
+               AVR_BUILTIN_POW2);
 
   if (AVR_HAVE_MUL)
     {
@@ -6694,6 +6705,9 @@ static const struct avr_builtin_descript
 bdesc_1arg[] =
   {
     { CODE_FOR_rotlqi3_4, "__builtin_avr_swap", AVR_BUILTIN_SWAP }
+    , { CODE_FOR_parityqi2, "__builtin_avr_parity8", AVR_BUILTIN_PARITY8 }
+    , { CODE_FOR_popcountqi2, "__builtin_avr_popcount8", AVR_BUILTIN_POPCOUNT8 }
+    , { CODE_FOR_ashlqi2_1, "__builtin_avr_pow2", AVR_BUILTIN_POW2 }
   };
 
 static const struct avr_builtin_description

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: QI builtins for parity, popcount, 1<< n
  2011-06-17 19:03       ` Georg-Johann Lay
@ 2011-06-17 19:14         ` Georg-Johann Lay
  2011-06-20 10:53         ` Joseph S. Myers
  1 sibling, 0 replies; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-17 19:14 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: gcc-patches, Denis Chertykov, Anatoly Sokolov,
	Eric B. Weddington, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 787 bytes --]

Georg-Johann Lay schrieb:

> To come back to the original topic, here is a tentative patch for
> better popcount and parity:
> 
> 	* config/avr/t-avr (LIB1ASMFUNCS): Rename _loop_ffsqi2 to
> 	_ffsqi2_nz.
> 	* confif/avr/libgcc.S: Ditto. Rename __loop_ffsqi2 to __ffsqi2_nz.
> 	(__ctzsi2, __ctzhi2): Map zero to 255.
> 	(__popcounthi2): Use r27 instead of r30.
> 	(__popcountdi2): Use r30 instead of r27.
> 	* config/avr/avr.md (parityhi2): New expander.
> 	(popcounthi2): New expander.
> 	(popcountsi2): New expander.
> 	(*parityhi2.libgcc): New insn.
> 	(*parityqihi2.libgcc): New insn.
> 	(*popcounthi2.libgcc): New insn.
> 	(*popcountsi2.libgcc): New insn.
> 	(*popcountqi2.libgcc): New insn.
> 	(*popcountqihi2.libgcc): New insn_and_split.
> 
> Johann

Oops, picked the wrong file.


[-- Attachment #2: popcount.diff --]
[-- Type: text/x-patch, Size: 5584 bytes --]

Index: config/avr/libgcc.S
===================================================================
--- config/avr/libgcc.S	(revision 175149)
+++ config/avr/libgcc.S	(working copy)
@@ -935,7 +935,7 @@ DEFUN __ffssi2
     brne 1f
     ret
 1:  mov  r24, r22
-    XJMP __loop_ffsqi2
+    XJMP __ffsqi2_nz
 ENDF __ffssi2
 #endif /* defined (L_ffssi2) */
 
@@ -946,7 +946,7 @@ ENDF __ffssi2
 DEFUN __ffshi2
     clr  r26
     cpse r24, __zero_reg__
-1:  XJMP __loop_ffsqi2
+1:  XJMP __ffsqi2_nz
     ldi  r26, 8
     or   r24, r25
     brne 1b
@@ -954,20 +954,20 @@ DEFUN __ffshi2
 ENDF __ffshi2
 #endif /* defined (L_ffshi2) */
 
-#if defined (L_loop_ffsqi2)
+#if defined (L_ffsqi2_nz)
 ;; Helper for ffshi2, ffssi2
 ;; r25:r24 = r26 + zero_extend16 (ffs8(r24))
 ;; r24 must be != 0
 ;; clobbers: r26
-DEFUN __loop_ffsqi2
+DEFUN __ffsqi2_nz
     inc  r26
     lsr  r24
-    brcc __loop_ffsqi2
+    brcc __ffsqi2_nz
     mov  r24, r26
     clr  r25
     ret    
-ENDF __loop_ffsqi2
-#endif /* defined (L_loop_ffsqi2) */
+ENDF __ffsqi2_nz
+#endif /* defined (L_ffsqi2_nz) */
 
 \f
 /**********************************
@@ -977,12 +977,11 @@ ENDF __loop_ffsqi2
 #if defined (L_ctzsi2)
 ;; count trailing zeros
 ;; r25:r24 = ctz32 (r25:r22)
-;; ctz(0) = 32
+;; ctz(0) = 255
+;; Note that ctz(0) is undefined for GCC.
 DEFUN __ctzsi2
     XCALL __ffssi2
     dec  r24
-    sbrc r24, 7
-    ldi  r24, 32
     ret
 ENDF __ctzsi2
 #endif /* defined (L_ctzsi2) */
@@ -990,12 +989,11 @@ ENDF __ctzsi2
 #if defined (L_ctzhi2)
 ;; count trailing zeros
 ;; r25:r24 = ctz16 (r25:r24)
-;; ctz(0) = 16
+;; ctz(0) = 255
+;; Note that ctz(0) is undefined for GCC.
 DEFUN __ctzhi2
     XCALL __ffshi2
     dec  r24
-    sbrc r24, 7
-    ldi  r24, 16
     ret
 ENDF __ctzhi2
 #endif /* defined (L_ctzhi2) */
@@ -1129,13 +1127,13 @@ ENDF __parityqi2
 #if defined (L_popcounthi2)
 ;; population count
 ;; r25:r24 = popcount16 (r25:r24)
-;; clobbers: r30, __tmp_reg__
+;; clobbers: r27, __tmp_reg__
 DEFUN __popcounthi2
     XCALL __popcountqi2
-    mov  r30, r24
+    mov  r27, r24
     mov  r24, r25
     XCALL __popcountqi2
-    add  r24, r30
+    add  r24, r27
     clr  r25
     ret
 ENDF __popcounthi2
@@ -1144,7 +1142,7 @@ ENDF __popcounthi2
 #if defined (L_popcountsi2)
 ;; population count
 ;; r25:r24 = popcount32 (r25:r22)
-;; clobbers: r26, r30, __tmp_reg__
+;; clobbers: r26, r27, __tmp_reg__
 DEFUN __popcountsi2
     XCALL __popcounthi2
     mov   r26, r24
@@ -1162,13 +1160,13 @@ ENDF __popcountsi2
 ;; clobbers: r22, r23, r26, r27, r30, __tmp_reg__
 DEFUN __popcountdi2
     XCALL __popcountsi2
-    mov   r27, r24
+    mov   r30, r24
     mov_l r22, r18
     mov_h r23, r19
     mov_l r24, r20
     mov_h r25, r21
     XCALL __popcountsi2
-    add   r24, r27
+    add   r24, r30
     ret
 ENDF __popcountdi2
 #endif /* defined (L_popcountdi2) */
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(revision 175149)
+++ config/avr/avr.md	(working copy)
@@ -3321,6 +3321,92 @@ (define_insn "delay_cycles_4"
   [(set_attr "length" "9")
    (set_attr "cc" "clobber")])
 
+(define_expand "parityhi2"
+  [(set (reg:HI 24)
+        (match_operand:HI 1 "register_operand" ""))
+   (set (reg:HI 24)
+        (parity:HI (reg:HI 24)))
+   (set (match_operand:HI 0 "register_operand" "")
+        (reg:HI 24))]
+  ""
+  "")
+
+(define_insn "*parityhi2.libgcc"
+  [(set (reg:HI 24)
+        (parity:HI (reg:HI 24)))]
+  ""
+  "%~call __parityhi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+(define_insn "*parityqihi2.libgcc"
+  [(set (reg:HI 24)
+        (parity:HI (reg:QI 24)))]
+  ""
+  "%~call __parityqi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+(define_expand "popcounthi2"
+  [(set (reg:HI 24)
+        (match_operand:HI 1 "register_operand" ""))
+   (parallel[(set (reg:HI 24)
+                  (popcount:HI (reg:HI 24)))
+             (clobber (reg:QI 27))])
+   (set (match_operand:HI 0 "register_operand" "")
+        (reg:HI 24))]
+  ""
+  "")
+
+(define_expand "popcountsi2"
+  [(set (reg:SI 22)
+        (match_operand:SI 1 "register_operand" ""))
+   (parallel[(set (reg:HI 24)
+                  (popcount:HI (reg:SI 22)))
+             (clobber (reg:HI 26))])
+   (set (match_operand:SI 0 "register_operand" "")
+        (zero_extend:SI (reg:HI 24)))]
+  ""
+  "")
+
+(define_insn "*popcounthi2.libgcc"
+  [(set (reg:HI 24)
+        (popcount:HI (reg:HI 24)))
+   (clobber (reg:QI 27))]
+  ""
+  "%~call __popcounthi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+(define_insn "*popcountsi2.libgcc"
+  [(set (reg:HI 24)
+        (popcount:HI (reg:SI 22)))
+   (clobber (reg:HI 26))]
+  ""
+  "%~call __popcountsi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+(define_insn "*popcountqi2.libgcc"
+  [(set (reg:QI 24)
+        (popcount:QI (reg:QI 24)))]
+  ""
+  "%~call __popcountqi2"
+  [(set_attr "type" "xcall")
+   (set_attr "cc" "clobber")])
+
+(define_insn_and_split "*popcountqihi2.libgcc"
+  [(set (reg:HI 24)
+        (popcount:HI (reg:QI 24)))]
+  ""
+  "#"
+  ""
+  [(set (reg:QI 24)
+        (popcount:QI (reg:QI 24)))
+   (set (reg:QI 25)
+        (const_int 0))]
+  "")
+
 ;; CPU instructions
 
 ;; NOP taking 1 or 2 Ticks 
Index: config/avr/t-avr
===================================================================
--- config/avr/t-avr	(revision 175149)
+++ config/avr/t-avr	(working copy)
@@ -53,7 +53,7 @@ LIB1ASMFUNCS = \
 	_dtors \
 	_ffssi2 \
 	_ffshi2 \
-	_loop_ffsqi2 \
+	_ffsqi2_nz \
 	_ctzsi2 \
 	_ctzhi2 \
 	_clzdi2 \

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: QI builtins for parity, popcount, 1<< n
  2011-06-17 19:03       ` Georg-Johann Lay
  2011-06-17 19:14         ` Georg-Johann Lay
@ 2011-06-20 10:53         ` Joseph S. Myers
  2011-06-20 14:31           ` Georg-Johann Lay
  1 sibling, 1 reply; 10+ messages in thread
From: Joseph S. Myers @ 2011-06-20 10:53 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: gcc-patches, Denis Chertykov, Anatoly Sokolov,
	Eric B. Weddington, Richard Henderson

On Fri, 17 Jun 2011, Georg-Johann Lay wrote:

> So here is my question: Would it be big deal to teach optabs to expand
> plus:di, minus:di as libcall?  Maybe even compare is feasible? Like so:

I don't know what the best approach would be, but for just about any 
operation supported by GCC it makes sense to support expanding it as a 
libcall if that's the most efficient approach for a given target.

Some of the generic optabs code could probably do with having target hooks 
added to make it work better for the less common targets.  In particular, 
there are hardcoded references to SImode on the basis that some libcalls 
only exist for SImode and wider.  Those may not be optimal for 8-bit and 
16-bit targets, and certainly would be bad if any targets with bytes 
wider than 8 bits (so SImode, which is always four bytes, wider than 32 
bits) get added in future.  Changing that might require changes to how 
libgcc is built, to build extra functions, as well as to the optabs code.

(In general *any* hardcoded reference to a machine mode other than QImode 
is suspect and likely to need fixing for non-8-bit-byte targets as well as 
potentially being suboptimal for 8-bit and 16-bit targets.  I don't know 
how many different target hooks would be needed to replace such 
hardcoding; that needs careful thought and a clear conceptual 
understanding of the purpose of each such hardcoded mode reference.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: QI builtins for parity, popcount, 1<< n
  2011-06-20 10:53         ` Joseph S. Myers
@ 2011-06-20 14:31           ` Georg-Johann Lay
  2011-06-20 15:39             ` Richard Henderson
  0 siblings, 1 reply; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-20 14:31 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: gcc-patches, Denis Chertykov, Anatoly Sokolov,
	Eric B. Weddington, Richard Henderson

Joseph S. Myers schrieb:
> On Fri, 17 Jun 2011, Georg-Johann Lay wrote:
> 
>> So here is my question: Would it be big deal to teach optabs to
>> expand plus:di, minus:di as libcall?  Maybe even compare is
>> feasible? Like so:
> 
> I don't know what the best approach would be, but for just about
> any operation supported by GCC it makes sense to support expanding
> it as a libcall if that's the most efficient approach for a given
> target.

Using a libcall for + resp. + would we way more efficient than
expanding all the carry computations inline.

> Some of the generic optabs code could probably do with having
> target hooks added to make it work better for the less common
> targets.

A libcall could be added in TARGET_INIT_LIBCALLS, so a new hook is not
needed.  All that's needed is that optabs tests for presence of such
an entry and prefers it over inline expansion (and prefers insn over
libcall).  It appears that + and - are assumed to be cheaps or inline
expansion is always cheap.

It might already help if optabs expanded to SImode and the target
could chose if expansion of some op uses word_mode or some other, more
efficient mode.

The big targets do not need that complexity, and who cares for tiny
targets...?

Johann

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: QI builtins for parity, popcount, 1<< n
  2011-06-20 14:31           ` Georg-Johann Lay
@ 2011-06-20 15:39             ` Richard Henderson
  2011-06-20 16:52               ` Georg-Johann Lay
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Henderson @ 2011-06-20 15:39 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: Joseph S. Myers, gcc-patches, Denis Chertykov, Anatoly Sokolov,
	Eric B. Weddington

On 06/20/2011 07:20 AM, Georg-Johann Lay wrote:
> A libcall could be added in TARGET_INIT_LIBCALLS, so a new hook is not
> needed.  All that's needed is that optabs tests for presence of such
> an entry and prefers it over inline expansion (and prefers insn over
> libcall).  It appears that + and - are assumed to be cheaps or inline
> expansion is always cheap.

No, we assume that if the inline pattern is present and enabled, it
is to be preferred over the libcall.  All you have to do to get 
__adddi3 called is remove/disable the adddi3 pattern.


r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: QI builtins for parity, popcount, 1<< n
  2011-06-20 15:39             ` Richard Henderson
@ 2011-06-20 16:52               ` Georg-Johann Lay
  0 siblings, 0 replies; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-20 16:52 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, gcc-patches, Denis Chertykov, Anatoly Sokolov,
	Eric B. Weddington

Richard Henderson schrieb:
> On 06/20/2011 07:20 AM, Georg-Johann Lay wrote:
>> A libcall could be added in TARGET_INIT_LIBCALLS, so a new hook is not
>> needed.  All that's needed is that optabs tests for presence of such
>> an entry and prefers it over inline expansion (and prefers insn over
>> libcall).  It appears that + and - are assumed to be cheaps or inline
>> expansion is always cheap.
> 
> No, we assume that if the inline pattern is present and enabled, it
> is to be preferred over the libcall.  All you have to do to get 
> __adddi3 called is remove/disable the adddi3 pattern.

There is no adddi3 in avr backend to disable.

--

Johann

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-06-20 16:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-16 16:08 [Patch, AVR]: QI builtins for parity, popcount, 1<< n Georg-Johann Lay
2011-06-16 17:08 ` Joseph S. Myers
2011-06-17 11:04   ` Georg-Johann Lay
2011-06-17 13:06     ` Joseph S. Myers
2011-06-17 19:03       ` Georg-Johann Lay
2011-06-17 19:14         ` Georg-Johann Lay
2011-06-20 10:53         ` Joseph S. Myers
2011-06-20 14:31           ` Georg-Johann Lay
2011-06-20 15:39             ` Richard Henderson
2011-06-20 16:52               ` Georg-Johann Lay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).