public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [Patch, AVR]: PR49313, fix PR29524
  2011-06-16  6:53 [Patch, AVR]: PR49313, fix PR29524 Georg-Johann Lay
@ 2011-06-15 10:46 ` Denis Chertykov
  2011-06-15 11:59   ` Georg-Johann Lay
  2011-06-16 10:00   ` Georg-Johann Lay
  2011-06-16 19:17 ` Richard Henderson
  1 sibling, 2 replies; 10+ messages in thread
From: Denis Chertykov @ 2011-06-15 10:46 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc-patches, Eric B. Weddington, Anatoly Sokolov

2011/6/15 Georg-Johann Lay <avr@gjlay.de>:
> This is a patch that implements some libgcc functions in assembler.
> The functions are used only very seldom but if, they lead to an
> unpleasant waste of resource. For example, some SF functions
> eventually lead to __clz_tab being dragged in (PR29524).
>
> This patch avoids that by straight forward assembler implementation of
> functions that are easy to implement.
>
> Tested without regression. Moreover, I tested functions in some
> self-written code against the old C-implementation. HI/QI functions
> tested for all possible inputs.
>

Approved for AVR.
May be you need another approval for longlong.h

Denis.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: PR49313, fix PR29524
  2011-06-15 10:46 ` Denis Chertykov
@ 2011-06-15 11:59   ` Georg-Johann Lay
  2011-06-16 10:00   ` Georg-Johann Lay
  1 sibling, 0 replies; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-15 11:59 UTC (permalink / raw)
  To: Denis Chertykov
  Cc: gcc-patches, Eric B. Weddington, Anatoly Sokolov, Ian Lance Taylor

Denis Chertykov schrieb:
> 2011/6/15 Georg-Johann Lay <avr@gjlay.de>:
>> This is a patch that implements some libgcc functions in assembler.
>> The functions are used only very seldom but if, they lead to an
>> unpleasant waste of resource. For example, some SF functions
>> eventually lead to __clz_tab being dragged in (PR29524).
>>
>> This patch avoids that by straight forward assembler implementation of
>> functions that are easy to implement.
>>
>> Tested without regression. Moreover, I tested functions in some
>> self-written code against the old C-implementation. HI/QI functions
>> tested for all possible inputs.
>>
> 
> Approved for AVR.
> May be you need another approval for longlong.h
> 
> Denis.

CCed Ian Taylor as libgcc maintainer (assuming this is his preferred
address).

Unfortunately, the original mail could not yet be delivered to
gcc-patches; I got a message reading something like (backtranslated to
en):

Subject: [Patch, AVR]: PR49313, fix PR29524
Sender: avr@gjlay.de

Attention: Mail could not be delivered since 1 hour.

Following receiver is affected:

gcc-patches@gcc.gnu.org
   Error    : 452 4.0.0 Insufficient system storage
   Explanation: host gcc.gnu.org [209.132.180.131] said: Message
denied temporarily
   Last try: Wednesday, 15. Juni 2011 12:47:22 +0200 (MEST)

I never got such message, and the patch is not really big.

As I cannot backlink to the original message :-(
copy-pasteing the relevant change inline:

--
gcc/
	PR target/49313
	PR target/29524
	
	* longlong.h: Add AVR support:
	(count_leading_zeros): New macro.
	(count_trailing_zeros): New macro.
	(COUNT_LEADING_ZEROS_0): New macro.


Index: gcc/longlong.h
===================================================================
--- gcc/longlong.h	(Revision 175036)
+++ gcc/longlong.h	(Arbeitskopie)
@@ -250,6 +250,12 @@ UDItype __umulsidi3 (USItype, USItype);
 #define COUNT_LEADING_ZEROS_0 32
 #endif

+#if defined (__AVR__) && W_TYPE_SIZE == 32
+#define count_leading_zeros(COUNT,X)  ((COUNT) = __builtin_clzl (X))
+#define count_trailing_zeros(COUNT,X) ((COUNT) = __builtin_ctzl (X))
+#define COUNT_LEADING_ZEROS_0 32
+#endif /* defined (__AVR__) && W_TYPE_SIZE == 32 */
+
 #if defined (__CRIS__) && __CRIS_arch_version >= 3
 #define count_leading_zeros(COUNT, X) ((COUNT) = __builtin_clz (X))
 #if __CRIS_arch_version >= 8

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Patch, AVR]: PR49313, fix PR29524
@ 2011-06-16  6:53 Georg-Johann Lay
  2011-06-15 10:46 ` Denis Chertykov
  2011-06-16 19:17 ` Richard Henderson
  0 siblings, 2 replies; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-16  6:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Denis Chertykov, Eric B. Weddington, Anatoly Sokolov

[-- Attachment #1: Type: text/plain, Size: 1903 bytes --]

This is a patch that implements some libgcc functions in assembler.
The functions are used only very seldom but if, they lead to an
unpleasant waste of resource. For example, some SF functions
eventually lead to __clz_tab being dragged in (PR29524).

This patch avoids that by straight forward assembler implementation of
functions that are easy to implement.

Tested without regression. Moreover, I tested functions in some
self-written code against the old C-implementation. HI/QI functions
tested for all possible inputs.

Johann

--

gcc/
	PR target/49313
	PR target/29524
	
	* longlong.h: Add AVR support:
	(count_leading_zeros): New macro.
	(count_trailing_zeros): New macro.
	(COUNT_LEADING_ZEROS_0): New macro.
	
	* config/avr/t-avr (LIB1ASMFUNCS): Add
	_ffssi2, _ffshi2, _loop_ffsqi2,
	_ctzsi2, _ctzhi2, _clzdi2, _clzsi2, _clzhi2,
	_paritydi2, _paritysi2, _parityhi2,
	_popcounthi2,_popcountsi2, _popcountdi2, _popcountqi2,
	_bswapsi2, _bswapdi2,
	_ashldi3, _ashrdi3, _lshrdi3
	(LIB2FUNCS_EXCLUDE): Add _clz.

	* config/avr/libgcc.S (XCALL): Move up in file.
	(XJMP): New C Macro.
	(DEFUN): New asm macro.
	(ENDF): New asm macro.
	(__ffssi2): New function.
	(__ffshi2): New function.
	(__loop_ffsqi2): New function.
	(__ctzsi2): New function.
	(__ctzhi2): New function.
	(__clzdi2): New function.
	(__clzsi2): New function.
	(__clzhi2): New function.
	(__paritydi2): New function.
	(__paritysi2): New function.
	(__parityhi2): New function.
	(__popcounthi2): New function.
	(__popcountsi2): New function.
	(__popcountdi2): New function.
	(__popcountqi2): New function.
	(__bswapsi2): New function.
	(__bswapdi2): New function.
	(__ashldi3): New function.
	(__ashrdi3): New function.
	(__lshrdi3): New function.
	Fix suspicous lines.

libgcc/
	PR target/49313
	PR target/29524

	* config/avr/t-avr: Fix line endings.
	(intfuncs16): Remove _ffsXX2,  _clzXX2, _ctzXX2, _popcountXX2,
	_parityXX2.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: libgcc-opt.diff --]
[-- Type: text/x-patch; name="libgcc-opt.diff", Size: 12299 bytes --]

Index: libgcc/config/avr/t-avr
===================================================================
--- libgcc/config/avr/t-avr	(Revision 175036)
+++ libgcc/config/avr/t-avr	(Arbeitskopie)
@@ -1,19 +1,17 @@
-# Extra 16-bit integer functions.
-intfuncs16 = _absvXX2 _addvXX3 _subvXX3 _mulvXX3 _negvXX2 _ffsXX2 _clzXX2 \
-             _ctzXX2 _popcountXX2 _parityXX2
-hiintfuncs16 = $(subst XX,hi,$(intfuncs16))
-siintfuncs16 = $(subst XX,si,$(intfuncs16))
-
-iter-items := $(hiintfuncs16)
-iter-labels := $(siintfuncs16)
-iter-sizes := $(patsubst %,2,$(siintfuncs16)) $(patsubst %,2,$(hiintfuncs16))
-
-
-include $(srcdir)/empty.mk $(patsubst %,$(srcdir)/siditi-object.mk,$(iter-items))
-libgcc-objects += $(patsubst %,%$(objext),$(hiintfuncs16))
-
-ifeq ($(enable_shared),yes)
-libgcc-s-objects += $(patsubst %,%_s$(objext),$(hiintfuncs16))
-endif
-
-
+# Extra 16-bit integer functions.
+intfuncs16 = _absvXX2 _addvXX3 _subvXX3 _mulvXX3 _negvXX2 
+
+hiintfuncs16 = $(subst XX,hi,$(intfuncs16))
+siintfuncs16 = $(subst XX,si,$(intfuncs16))
+
+iter-items := $(hiintfuncs16)
+iter-labels := $(siintfuncs16)
+iter-sizes := $(patsubst %,2,$(siintfuncs16)) $(patsubst %,2,$(hiintfuncs16))
+
+
+include $(srcdir)/empty.mk $(patsubst %,$(srcdir)/siditi-object.mk,$(iter-items))
+libgcc-objects += $(patsubst %,%$(objext),$(hiintfuncs16))
+
+ifeq ($(enable_shared),yes)
+libgcc-s-objects += $(patsubst %,%_s$(objext),$(hiintfuncs16))
+endif
Index: gcc/longlong.h
===================================================================
--- gcc/longlong.h	(Revision 175036)
+++ gcc/longlong.h	(Arbeitskopie)
@@ -250,6 +250,12 @@ UDItype __umulsidi3 (USItype, USItype);
 #define COUNT_LEADING_ZEROS_0 32
 #endif
 
+#if defined (__AVR__) && W_TYPE_SIZE == 32
+#define count_leading_zeros(COUNT,X)  ((COUNT) = __builtin_clzl (X))
+#define count_trailing_zeros(COUNT,X) ((COUNT) = __builtin_ctzl (X))
+#define COUNT_LEADING_ZEROS_0 32
+#endif /* defined (__AVR__) && W_TYPE_SIZE == 32 */
+
 #if defined (__CRIS__) && __CRIS_arch_version >= 3
 #define count_leading_zeros(COUNT, X) ((COUNT) = __builtin_clz (X))
 #if __CRIS_arch_version >= 8
Index: gcc/config/avr/libgcc.S
===================================================================
--- gcc/config/avr/libgcc.S	(Revision 175036)
+++ gcc/config/avr/libgcc.S	(Arbeitskopie)
@@ -52,6 +52,26 @@ see the files COPYING3 and COPYING.RUNTI
 #endif
 	.endm
 
+#if defined (__AVR_HAVE_JMP_CALL__)
+#define XCALL call
+#define XJMP  jmp
+#else
+#define XCALL rcall
+#define XJMP  rjmp
+#endif
+
+.macro DEFUN name
+.global \name
+.func \name
+\name:
+.endm
+
+.macro ENDF name
+.size \name, .-\name
+.endfunc
+.endm
+
+\f
 /* Note: mulqi3, mulhi3 are open-coded on the enhanced core.  */
 #if !defined (__AVR_HAVE_MUL__)
 /*******************************************************
@@ -779,12 +799,6 @@ __do_clear_bss:
 /* __do_global_ctors and __do_global_dtors are only necessary
    if there are any constructors/destructors.  */
 
-#if defined (__AVR_HAVE_JMP_CALL__)
-#define XCALL call
-#else
-#define XCALL rcall
-#endif
-
 #ifdef L_ctors
 	.section .init6,"ax",@progbits
 	.global	__do_global_ctors
@@ -897,3 +911,393 @@ __tablejump_elpm__:
 	.endfunc
 #endif /* defined (L_tablejump_elpm) */
 
+\f
+/**********************************
+ * Find first set Bit (ffs)
+ **********************************/
+
+#if defined (L_ffssi2)
+;; find first set bit
+;; r25:r24 = ffs32 (r25:r22)
+;; clobbers: r22, r26
+DEFUN __ffssi2
+    clr  r26
+    tst  r22
+    brne 1f
+    subi r26, -8
+    or   r22, r23
+    brne 1f
+    subi r26, -8
+    or   r22, r24
+    brne 1f
+    subi r26, -8
+    or   r22, r25
+    brne 1f
+    ret
+1:  mov  r24, r22
+    XJMP __loop_ffsqi2
+ENDF __ffssi2
+#endif /* defined (L_ffssi2) */
+
+#if defined (L_ffshi2)
+;; find first set bit
+;; r25:r24 = ffs16 (r25:r24)
+;; clobbers: r26
+DEFUN __ffshi2
+    clr  r26
+    cpse r24, __zero_reg__
+1:  XJMP __loop_ffsqi2
+    ldi  r26, 8
+    or   r24, r25
+    brne 1b
+    ret
+ENDF __ffshi2
+#endif /* defined (L_ffshi2) */
+
+#if defined (L_loop_ffsqi2)
+;; Helper for ffshi2, ffssi2
+;; r25:r24 = r26 + zero_extend16 (ffs8(r24))
+;; r24 must be != 0
+;; clobbers: r26
+DEFUN __loop_ffsqi2
+    inc  r26
+    lsr  r24
+    brcc __loop_ffsqi2
+    mov  r24, r26
+    clr  r25
+    ret    
+ENDF __loop_ffsqi2
+#endif /* defined (L_loop_ffsqi2) */
+
+\f
+/**********************************
+ * Count trailing Zeros (ctz)
+ **********************************/
+
+#if defined (L_ctzsi2)
+;; count trailing zeros
+;; r25:r24 = ctz32 (r25:r22)
+;; ctz(0) = 32
+DEFUN __ctzsi2
+    XCALL __ffssi2
+    dec  r24
+    sbrc r24, 7
+    ldi  r24, 32
+    ret
+ENDF __ctzsi2
+#endif /* defined (L_ctzsi2) */
+
+#if defined (L_ctzhi2)
+;; count trailing zeros
+;; r25:r24 = ctz16 (r25:r24)
+;; ctz(0) = 16
+DEFUN __ctzhi2
+    XCALL __ffshi2
+    dec  r24
+    sbrc r24, 7
+    ldi  r24, 16
+    ret
+ENDF __ctzhi2
+#endif /* defined (L_ctzhi2) */
+
+\f
+/**********************************
+ * Count leading Zeros (clz)
+ **********************************/
+
+#if defined (L_clzdi2)
+;; count leading zeros
+;; r25:r24 = clz64 (r25:r18)
+;; clobbers: r22, r23, r26
+DEFUN __clzdi2
+    XCALL __clzsi2
+    sbrs r24, 5
+    ret
+    mov_l r22, r18
+    mov_h r23, r19
+    mov_l r24, r20
+    mov_h r25, r21
+    XCALL __clzsi2
+    subi r24, -32
+    ret
+ENDF __clzdi2
+#endif /* defined (L_clzdi2) */
+
+#if defined (L_clzsi2)
+;; count leading zeros
+;; r25:r24 = clz32 (r25:r22)
+;; clobbers: r26
+DEFUN __clzsi2
+    XCALL __clzhi2
+    sbrs r24, 4
+    ret
+    mov_l r24, r22
+    mov_h r25, r23
+    XCALL __clzhi2
+    subi r24, -16
+    ret
+ENDF __clzsi2
+#endif /* defined (L_clzsi2) */
+
+#if defined (L_clzhi2)
+;; count leading zeros
+;; r25:r24 = clz16 (r25:r24)
+;; clobbers: r26
+DEFUN __clzhi2
+    clr  r26
+    tst  r25
+    brne 1f
+    subi r26, -8
+    or   r25, r24
+    brne 1f
+    ldi  r24, 16
+    ret
+1:  cpi  r25, 16
+    brsh 3f
+    subi r26, -3
+    swap r25
+2:  inc  r26
+3:  lsl  r25
+    brcc 2b
+    mov  r24, r26
+    clr  r25
+    ret
+ENDF __clzhi2
+#endif /* defined (L_clzhi2) */
+
+\f
+/**********************************
+ * Parity 
+ **********************************/
+
+#if defined (L_paritydi2)
+;; r25:r24 = parity64 (r25:r18)
+;; clobbers: __tmp_reg__
+DEFUN __paritydi2
+    eor  r24, r18
+    eor  r24, r19
+    eor  r24, r20
+    eor  r24, r21
+    XJMP __paritysi2
+ENDF __paritydi2
+#endif /* defined (L_paritydi2) */
+
+#if defined (L_paritysi2)
+;; r25:r24 = parity32 (r25:r22)
+;; clobbers: __tmp_reg__
+DEFUN __paritysi2
+    eor  r24, r22
+    eor  r24, r23
+    XJMP __parityhi2
+ENDF __paritysi2
+#endif /* defined (L_paritysi2) */
+
+#if defined (L_parityhi2)
+;; r25:r24 = parity16 (r25:r24)
+;; clobbers: __tmp_reg__
+DEFUN __parityhi2
+    eor  r24, r25
+;; FALLTHRU
+ENDF __parityhi2
+
+;; r25:r24 = parity8 (r24)
+;; clobbers: __tmp_reg__
+DEFUN __parityqi2
+    ;; parity is in r24[0..7]
+    mov  __tmp_reg__, r24
+    swap __tmp_reg__
+    eor  r24, __tmp_reg__
+    ;; parity is in r24[0..3]
+    subi r24, -4
+    andi r24, -5
+    subi r24, -6
+    ;; parity is in r24[0,3]
+    sbrc r24, 3
+    inc  r24
+    ;; parity is in r24[0]
+    andi r24, 1
+    clr  r25
+    ret
+ENDF __parityqi2
+#endif /* defined (L_parityhi2) */
+
+\f
+/**********************************
+ * Population Count
+ **********************************/
+
+#if defined (L_popcounthi2)
+;; population count
+;; r25:r24 = popcount16 (r25:r24)
+;; clobbers: r30, __tmp_reg__
+DEFUN __popcounthi2
+    XCALL __popcountqi2
+    mov  r30, r24
+    mov  r24, r25
+    XCALL __popcountqi2
+    add  r24, r30
+    clr  r25
+    ret
+ENDF __popcounthi2
+#endif /* defined (L_popcounthi2) */
+
+#if defined (L_popcountsi2)
+;; population count
+;; r25:r24 = popcount32 (r25:r22)
+;; clobbers: r26, r30, __tmp_reg__
+DEFUN __popcountsi2
+    XCALL __popcounthi2
+    mov   r26, r24
+    mov_l r24, r22
+    mov_h r25, r23
+    XCALL __popcounthi2
+    add   r24, r26
+    ret
+ENDF __popcountsi2
+#endif /* defined (L_popcountsi2) */
+
+#if defined (L_popcountdi2)
+;; population count
+;; r25:r24 = popcount64 (r25:r18)
+;; clobbers: r22, r23, r26, r27, r30, __tmp_reg__
+DEFUN __popcountdi2
+    XCALL __popcountsi2
+    mov   r27, r24
+    mov_l r22, r18
+    mov_h r23, r19
+    mov_l r24, r20
+    mov_h r25, r21
+    XCALL __popcountsi2
+    add   r24, r27
+    ret
+ENDF __popcountdi2
+#endif /* defined (L_popcountdi2) */
+
+#if defined (L_popcountqi2)
+;; population count
+;; r24 = popcount8 (r24)
+;; clobbers: __tmp_reg__
+DEFUN __popcountqi2
+    mov  __tmp_reg__, r24
+    andi r24, 1
+    lsr  __tmp_reg__    
+    lsr  __tmp_reg__    
+    adc  r24, __zero_reg__
+    lsr  __tmp_reg__    
+    adc  r24, __zero_reg__
+    lsr  __tmp_reg__    
+    adc  r24, __zero_reg__
+    lsr  __tmp_reg__    
+    adc  r24, __zero_reg__
+    lsr  __tmp_reg__    
+    adc  r24, __zero_reg__
+    lsr  __tmp_reg__    
+    adc  r24, __tmp_reg__    
+    ret    
+ENDF __popcountqi2
+#endif /* defined (L_popcountqi2) */
+
+\f
+/**********************************
+ * Swap bytes
+ **********************************/
+
+;; swap two registers with different register number
+.macro bswap a, b
+    eor \a, \b
+    eor \b, \a
+    eor \a, \b
+.endm
+
+#if defined (L_bswapsi2)
+;; swap bytes
+;; r25:r22 = bswap32 (r25:r22)
+DEFUN __bswapsi2
+    bswap r22, r25
+    bswap r23, r24
+    ret
+ENDF __bswapsi2
+#endif /* defined (L_bswapsi2) */
+
+#if defined (L_bswapdi2)
+;; swap bytes
+;; r25:r18 = bswap64 (r25:r18)
+DEFUN __bswapdi2
+    bswap r18, r25
+    bswap r19, r24
+    bswap r20, r23
+    bswap r21, r22
+    ret
+ENDF __bswapdi2
+#endif /* defined (L_bswapdi2) */
+
+\f
+/**********************************
+ * 64-bit shifts
+ **********************************/
+
+#if defined (L_ashrdi3)
+;; Arithmetic shift right
+;; r25:r18 = ashr64 (r25:r18, r17:r16)
+DEFUN __ashrdi3
+    push r16
+    andi r16, 31
+    breq 2f
+1:  asr  r25
+    ror  r24
+    ror  r23
+    ror  r22
+    ror  r21
+    ror  r20
+    ror  r19
+    ror  r18
+    dec  r16
+    brne 1b
+2:  pop  r16
+    ret
+ENDF __ashrdi3
+#endif /* defined (L_ashrdi3) */
+
+#if defined (L_lshrdi3)
+;; Logic shift right
+;; r25:r18 = lshr64 (r25:r18, r17:r16)
+DEFUN __lshrdi3
+    push r16
+    andi r16, 31
+    breq 2f
+1:  lsr  r25
+    ror  r24
+    ror  r23
+    ror  r22
+    ror  r21
+    ror  r20
+    ror  r19
+    ror  r18
+    dec  r16
+    brne 1b
+2:  pop  r16
+    ret
+ENDF __lshrdi3
+#endif /* defined (L_lshrdi3) */
+
+#if defined (L_ashldi3)
+;; Shift left
+;; r25:r18 = ashl64 (r25:r18, r17:r16)
+DEFUN __ashldi3
+    push r16
+    andi r16, 31
+    breq 2f
+1:  lsl  r18
+    rol  r19
+    rol  r20
+    rol  r21
+    rol  r22
+    rol  r23
+    rol  r24
+    rol  r25
+    dec  r16
+    brne 1b
+2:  pop  r16
+    ret
+ENDF __ashldi3
+#endif /* defined (L_ashldi3) */
Index: gcc/config/avr/t-avr
===================================================================
--- gcc/config/avr/t-avr	(Revision 175036)
+++ gcc/config/avr/t-avr	(Arbeitskopie)
@@ -24,12 +24,10 @@ driver-avr.o: $(srcdir)/config/avr/drive
 avr-devices.o: $(srcdir)/config/avr/avr-devices.c \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H)
 	$(CC) -c $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
-	
 
 avr-c.o: $(srcdir)/config/avr/avr-c.c \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(C_COMMON_H)
 	$(CC) -c $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
-	
 
 
 LIB1ASMSRC = avr/libgcc.S
@@ -52,7 +50,30 @@ LIB1ASMFUNCS = \
 	_copy_data \
 	_clear_bss \
 	_ctors \
-	_dtors
+	_dtors \
+	_ffssi2 \
+	_ffshi2 \
+	_loop_ffsqi2 \
+	_ctzsi2 \
+	_ctzhi2 \
+	_clzdi2 \
+	_clzsi2 \
+	_clzhi2 \
+	_paritydi2 \
+	_paritysi2 \
+	_parityhi2 \
+	_popcounthi2 \
+	_popcountsi2 \
+	_popcountdi2 \
+	_popcountqi2 \
+	_bswapsi2 \
+	_bswapdi2 \
+	_ashldi3 \
+	_ashrdi3 \
+	_lshrdi3
+
+LIB2FUNCS_EXCLUDE = \
+	_clz
 
 # We do not have the DF type.
 # Most of the C functions in libgcc2 use almost all registers,
@@ -216,8 +237,8 @@ MULTILIB_MATCHES = \
 	mmcu?avr51=mmcu?at90can128 \
 	mmcu?avr51=mmcu?at90usb1286 \
 	mmcu?avr51=mmcu?at90usb1287 \
- 	mmcu?avr6=mmcu?atmega2560 \
- 	mmcu?avr6=mmcu?atmega2561
+	mmcu?avr6=mmcu?atmega2560 \
+	mmcu?avr6=mmcu?atmega2561
 
 MULTILIB_EXCEPTIONS =
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: PR49313, fix PR29524
  2011-06-15 10:46 ` Denis Chertykov
  2011-06-15 11:59   ` Georg-Johann Lay
@ 2011-06-16 10:00   ` Georg-Johann Lay
  1 sibling, 0 replies; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-16 10:00 UTC (permalink / raw)
  To: Denis Chertykov; +Cc: gcc-patches, Eric B. Weddington, Anatoly Sokolov

Denis Chertykov schrieb:
> 2011/6/15 Georg-Johann Lay <avr@gjlay.de>:
>> This is a patch that implements some libgcc functions in assembler.
>> The functions are used only very seldom but if, they lead to an
>> unpleasant waste of resource. For example, some SF functions
>> eventually lead to __clz_tab being dragged in (PR29524).
>>
>> This patch avoids that by straight forward assembler implementation of
>> functions that are easy to implement.
>>
>> Tested without regression. Moreover, I tested functions in some
>> self-written code against the old C-implementation. HI/QI functions
>> tested for all possible inputs.
>>
> 
> Approved for AVR.
> May be you need another approval for longlong.h
> 
> Denis.

Committed original Version from

http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01200.html

together with the following corrigendum:

--- config/avr/libgcc.S (Revision 175097)
+++ config/avr/libgcc.S (Arbeitskopie)
@@ -1241,7 +1241,7 @@ ENDF __bswapdi2
 ;; r25:r18 = ashr64 (r25:r18, r17:r16)
 DEFUN __ashrdi3
     push r16
-    andi r16, 31
+    andi r16, 63
     breq 2f
 1:  asr  r25
     ror  r24
@@ -1263,7 +1263,7 @@ ENDF __ashrdi3
 ;; r25:r18 = lshr64 (r25:r18, r17:r16)
 DEFUN __lshrdi3
     push r16
-    andi r16, 31
+    andi r16, 63
     breq 2f
 1:  lsr  r25
     ror  r24
@@ -1285,7 +1285,7 @@ ENDF __lshrdi3
 ;; r25:r18 = ashl64 (r25:r18, r17:r16)
 DEFUN __ashldi3
     push r16
-    andi r16, 31
+    andi r16, 63
     breq 2f
 1:  lsl  r18
     rol  r19

Johann

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: PR49313, fix PR29524
  2011-06-16  6:53 [Patch, AVR]: PR49313, fix PR29524 Georg-Johann Lay
  2011-06-15 10:46 ` Denis Chertykov
@ 2011-06-16 19:17 ` Richard Henderson
  2011-06-17  9:41   ` Georg-Johann Lay
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Henderson @ 2011-06-16 19:17 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: gcc-patches, Denis Chertykov, Eric B. Weddington, Anatoly Sokolov

On 06/15/2011 02:47 AM, Georg-Johann Lay wrote:
> +#if defined (L_loop_ffsqi2)
> +;; Helper for ffshi2, ffssi2
> +;; r25:r24 = r26 + zero_extend16 (ffs8(r24))
> +;; r24 must be != 0
> +;; clobbers: r26
> +DEFUN __loop_ffsqi2

Why does this function have "loop" in its name?  The actual
implementation is surely irrelevant.

> +DEFUN __ffshi2
> +    clr  r26
> +    cpse r24, __zero_reg__
> +1:  XJMP __loop_ffsqi2
> +    ldi  r26, 8
> +    or   r24, r25

It probably doesn't matter to execution speed, but why the
OR here, when you know that r24 is 0?  Wouldn't the logic
be clearer spelling this with MOV?

> +#if defined (L_ctzsi2)
> +;; count trailing zeros
> +;; r25:r24 = ctz32 (r25:r22)
> +;; ctz(0) = 32

Note that GCC does not define ctz(0).  It's explicitly undefined.
Why are you forcing a particular value here?


r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: PR49313, fix PR29524
  2011-06-16 19:17 ` Richard Henderson
@ 2011-06-17  9:41   ` Georg-Johann Lay
  2011-06-17 17:26     ` Richard Henderson
  0 siblings, 1 reply; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-17  9:41 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, Denis Chertykov, Eric B. Weddington, Anatoly Sokolov

Richard Henderson schrieb:
> On 06/15/2011 02:47 AM, Georg-Johann Lay wrote:
>> +#if defined (L_loop_ffsqi2)
>> +;; Helper for ffshi2, ffssi2
>> +;; r25:r24 = r26 + zero_extend16 (ffs8(r24))
>> +;; r24 must be != 0
>> +;; clobbers: r26
>> +DEFUN __loop_ffsqi2
> 
> Why does this function have "loop" in its name?  The actual
> implementation is surely irrelevant.

hmmm. I needed some global name that can be referenced from __ffshi2
resp. __ffssi2. The function in itself is not very helpful as stand
alone. You prefer some other naming for such global helpers?

>> +DEFUN __ffshi2
>> +    clr  r26
>> +    cpse r24, __zero_reg__
>> +1:  XJMP __loop_ffsqi2
>> +    ldi  r26, 8
>> +    or   r24, r25
> 
> It probably doesn't matter to execution speed, but why the
> OR here, when you know that r24 is 0?  Wouldn't the logic
> be clearer spelling this with MOV?

The following instruction is BRNE, a conditional branch.
MOV does not modify condition code. So OR is used. Alternative would
be EOR.  Or MOV+TST (note that TST Rx is sugar for AND Rx,Rx).

>> +#if defined (L_ctzsi2)
>> +;; count trailing zeros
>> +;; r25:r24 = ctz32 (r25:r22)
>> +;; ctz(0) = 32
> 
> Note that GCC does not define ctz(0).  It's explicitly undefined.
> Why are you forcing a particular value here?

Yes, you are right. Following patchlet ok?

Johann


	* config/avr/libgcc.S (__ctzsi2, __ctzhi2):
	Map zero to 255.

Index: config/avr/libgcc.S
===================================================================
--- config/avr/libgcc.S (Revision 175104)
+++ config/avr/libgcc.S (Arbeitskopie)
@@ -977,12 +977,10 @@ ENDF __loop_ffsqi2
 #if defined (L_ctzsi2)
 ;; count trailing zeros
 ;; r25:r24 = ctz32 (r25:r22)
-;; ctz(0) = 32
+;; ctz(0) = 255
 DEFUN __ctzsi2
     XCALL __ffssi2
     dec  r24
-    sbrc r24, 7
-    ldi  r24, 32
     ret
 ENDF __ctzsi2
 #endif /* defined (L_ctzsi2) */
@@ -990,12 +988,10 @@ ENDF __ctzsi2
 #if defined (L_ctzhi2)
 ;; count trailing zeros
 ;; r25:r24 = ctz16 (r25:r24)
-;; ctz(0) = 16
+;; ctz(0) = 255
 DEFUN __ctzhi2
     XCALL __ffshi2
     dec  r24
-    sbrc r24, 7
-    ldi  r24, 16
     ret
 ENDF __ctzhi2
 #endif /* defined (L_ctzhi2) */


> r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: PR49313, fix PR29524
  2011-06-17  9:41   ` Georg-Johann Lay
@ 2011-06-17 17:26     ` Richard Henderson
  2011-06-17 17:32       ` Georg-Johann Lay
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Henderson @ 2011-06-17 17:26 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: gcc-patches, Denis Chertykov, Eric B. Weddington, Anatoly Sokolov

On 06/17/2011 02:20 AM, Georg-Johann Lay wrote:
> Richard Henderson schrieb:
>> On 06/15/2011 02:47 AM, Georg-Johann Lay wrote:
>>> +#if defined (L_loop_ffsqi2)
>>> +;; Helper for ffshi2, ffssi2
>>> +;; r25:r24 = r26 + zero_extend16 (ffs8(r24))
>>> +;; r24 must be != 0
>>> +;; clobbers: r26
>>> +DEFUN __loop_ffsqi2
>>
>> Why does this function have "loop" in its name?  The actual
>> implementation is surely irrelevant.
> 
> hmmm. I needed some global name that can be referenced from __ffshi2
> resp. __ffssi2. The function in itself is not very helpful as stand
> alone. You prefer some other naming for such global helpers?

__ffsqi_nz perhaps?

> The following instruction is BRNE, a conditional branch.

Oops, missed that, sorry.

> Yes, you are right. Following patchlet ok?
> 
> Johann
> 
> 
> 	* config/avr/libgcc.S (__ctzsi2, __ctzhi2):
> 	Map zero to 255.

You'd also delete the COUNT_LEADING_ZEROS_0 definition in longlong.h.


r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: PR49313, fix PR29524
  2011-06-17 17:26     ` Richard Henderson
@ 2011-06-17 17:32       ` Georg-Johann Lay
  0 siblings, 0 replies; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-17 17:32 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, Denis Chertykov, Eric B. Weddington, Anatoly Sokolov

Richard Henderson schrieb:
> On 06/17/2011 02:20 AM, Georg-Johann Lay wrote:
>> Richard Henderson schrieb:
>>> On 06/15/2011 02:47 AM, Georg-Johann Lay wrote:
>>
>> Yes, you are right. Following patchlet ok?
>>
>> Johann
>>
>> 	* config/avr/libgcc.S (__ctzsi2, __ctzhi2):
>> 	Map zero to 255.
> 
> You'd also delete the COUNT_LEADING_ZEROS_0 definition in longlong.h.
> 
> r~

__clzsi2(0) still returns 32 as it does not use ctz. So if
implementation of ctz affects validity of COUNT_LEADING_ZEROS_0 that's
bit confusing...

Johann


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, AVR]: PR49313, fix PR29524
  2011-06-16  6:49 Georg-Johann Lay
@ 2011-06-16  9:39 ` Ian Lance Taylor
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Lance Taylor @ 2011-06-16  9:39 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: gcc-patches, Denis Chertykov, Eric B. Weddington, Anatoly Sokolov

Georg-Johann Lay <avr@gjlay.de> writes:

> gcc/
> 	PR target/49313
> 	PR target/29524
> 	
> 	* longlong.h: Add AVR support:
> 	(count_leading_zeros): New macro.
> 	(count_trailing_zeros): New macro.
> 	(COUNT_LEADING_ZEROS_0): New macro.
> 	
> 	* config/avr/t-avr (LIB1ASMFUNCS): Add
> 	_ffssi2, _ffshi2, _loop_ffsqi2,
> 	_ctzsi2, _ctzhi2, _clzdi2, _clzsi2, _clzhi2,
> 	_paritydi2, _paritysi2, _parityhi2,
> 	_popcounthi2,_popcountsi2, _popcountdi2, _popcountqi2,
> 	_bswapsi2, _bswapdi2,
> 	_ashldi3, _ashrdi3, _lshrdi3
> 	(LIB2FUNCS_EXCLUDE): Add _clz.
>
> 	* config/avr/libgcc.S (XCALL): Move up in file.
> 	(XJMP): New C Macro.
> 	(DEFUN): New asm macro.
> 	(ENDF): New asm macro.
> 	(__ffssi2): New function.
> 	(__ffshi2): New function.
> 	(__loop_ffsqi2): New function.
> 	(__ctzsi2): New function.
> 	(__ctzhi2): New function.
> 	(__clzdi2): New function.
> 	(__clzsi2): New function.
> 	(__clzhi2): New function.
> 	(__paritydi2): New function.
> 	(__paritysi2): New function.
> 	(__parityhi2): New function.
> 	(__popcounthi2): New function.
> 	(__popcountsi2): New function.
> 	(__popcountdi2): New function.
> 	(__popcountqi2): New function.
> 	(__bswapsi2): New function.
> 	(__bswapdi2): New function.
> 	(__ashldi3): New function.
> 	(__ashrdi3): New function.
> 	(__lshrdi3): New function.
> 	Fix suspicous lines.
>
> libgcc/
> 	PR target/49313
> 	PR target/29524
>
> 	* config/avr/t-avr: Fix line endings.
> 	(intfuncs16): Remove _ffsXX2,  _clzXX2, _ctzXX2, _popcountXX2,
> 	_parityXX2.

The patch to longlong.h is fine if the rest of the patch is approved by
the AVR backend maintainers.

Thanks.

Ian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Patch, AVR]: PR49313, fix PR29524
@ 2011-06-16  6:49 Georg-Johann Lay
  2011-06-16  9:39 ` Ian Lance Taylor
  0 siblings, 1 reply; 10+ messages in thread
From: Georg-Johann Lay @ 2011-06-16  6:49 UTC (permalink / raw)
  To: gcc-patches
  Cc: Denis Chertykov, Eric B. Weddington, Anatoly Sokolov, Ian Lance Taylor

[-- Attachment #1: Type: text/plain, Size: 2014 bytes --]

[Resent as original appears to got lost in the net]

http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01141.html

This is a patch that implements some libgcc functions in assembler.
The functions are used only very seldom but if, they lead to an
unpleasant waste of resource. For example, some SF functions
eventually lead to __clz_tab being dragged in (PR29524).

This patch avoids that by straight forward assembler implementation of
functions that are easy to implement.

Tested without regression. Moreover, I tested functions in some
self-written code against the old C-implementation. HI/QI functions
tested for all possible inputs.

Johann

--

gcc/
	PR target/49313
	PR target/29524
	
	* longlong.h: Add AVR support:
	(count_leading_zeros): New macro.
	(count_trailing_zeros): New macro.
	(COUNT_LEADING_ZEROS_0): New macro.
	
	* config/avr/t-avr (LIB1ASMFUNCS): Add
	_ffssi2, _ffshi2, _loop_ffsqi2,
	_ctzsi2, _ctzhi2, _clzdi2, _clzsi2, _clzhi2,
	_paritydi2, _paritysi2, _parityhi2,
	_popcounthi2,_popcountsi2, _popcountdi2, _popcountqi2,
	_bswapsi2, _bswapdi2,
	_ashldi3, _ashrdi3, _lshrdi3
	(LIB2FUNCS_EXCLUDE): Add _clz.

	* config/avr/libgcc.S (XCALL): Move up in file.
	(XJMP): New C Macro.
	(DEFUN): New asm macro.
	(ENDF): New asm macro.
	(__ffssi2): New function.
	(__ffshi2): New function.
	(__loop_ffsqi2): New function.
	(__ctzsi2): New function.
	(__ctzhi2): New function.
	(__clzdi2): New function.
	(__clzsi2): New function.
	(__clzhi2): New function.
	(__paritydi2): New function.
	(__paritysi2): New function.
	(__parityhi2): New function.
	(__popcounthi2): New function.
	(__popcountsi2): New function.
	(__popcountdi2): New function.
	(__popcountqi2): New function.
	(__bswapsi2): New function.
	(__bswapdi2): New function.
	(__ashldi3): New function.
	(__ashrdi3): New function.
	(__lshrdi3): New function.
	Fix suspicous lines.

libgcc/
	PR target/49313
	PR target/29524

	* config/avr/t-avr: Fix line endings.
	(intfuncs16): Remove _ffsXX2,  _clzXX2, _ctzXX2, _popcountXX2,
	_parityXX2.



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: libgcc-opt.diff --]
[-- Type: text/x-patch; name="libgcc-opt.diff", Size: 12299 bytes --]

Index: libgcc/config/avr/t-avr
===================================================================
--- libgcc/config/avr/t-avr	(Revision 175036)
+++ libgcc/config/avr/t-avr	(Arbeitskopie)
@@ -1,19 +1,17 @@
-# Extra 16-bit integer functions.
-intfuncs16 = _absvXX2 _addvXX3 _subvXX3 _mulvXX3 _negvXX2 _ffsXX2 _clzXX2 \
-             _ctzXX2 _popcountXX2 _parityXX2
-hiintfuncs16 = $(subst XX,hi,$(intfuncs16))
-siintfuncs16 = $(subst XX,si,$(intfuncs16))
-
-iter-items := $(hiintfuncs16)
-iter-labels := $(siintfuncs16)
-iter-sizes := $(patsubst %,2,$(siintfuncs16)) $(patsubst %,2,$(hiintfuncs16))
-
-
-include $(srcdir)/empty.mk $(patsubst %,$(srcdir)/siditi-object.mk,$(iter-items))
-libgcc-objects += $(patsubst %,%$(objext),$(hiintfuncs16))
-
-ifeq ($(enable_shared),yes)
-libgcc-s-objects += $(patsubst %,%_s$(objext),$(hiintfuncs16))
-endif
-
-
+# Extra 16-bit integer functions.
+intfuncs16 = _absvXX2 _addvXX3 _subvXX3 _mulvXX3 _negvXX2 
+
+hiintfuncs16 = $(subst XX,hi,$(intfuncs16))
+siintfuncs16 = $(subst XX,si,$(intfuncs16))
+
+iter-items := $(hiintfuncs16)
+iter-labels := $(siintfuncs16)
+iter-sizes := $(patsubst %,2,$(siintfuncs16)) $(patsubst %,2,$(hiintfuncs16))
+
+
+include $(srcdir)/empty.mk $(patsubst %,$(srcdir)/siditi-object.mk,$(iter-items))
+libgcc-objects += $(patsubst %,%$(objext),$(hiintfuncs16))
+
+ifeq ($(enable_shared),yes)
+libgcc-s-objects += $(patsubst %,%_s$(objext),$(hiintfuncs16))
+endif
Index: gcc/longlong.h
===================================================================
--- gcc/longlong.h	(Revision 175036)
+++ gcc/longlong.h	(Arbeitskopie)
@@ -250,6 +250,12 @@ UDItype __umulsidi3 (USItype, USItype);
 #define COUNT_LEADING_ZEROS_0 32
 #endif
 
+#if defined (__AVR__) && W_TYPE_SIZE == 32
+#define count_leading_zeros(COUNT,X)  ((COUNT) = __builtin_clzl (X))
+#define count_trailing_zeros(COUNT,X) ((COUNT) = __builtin_ctzl (X))
+#define COUNT_LEADING_ZEROS_0 32
+#endif /* defined (__AVR__) && W_TYPE_SIZE == 32 */
+
 #if defined (__CRIS__) && __CRIS_arch_version >= 3
 #define count_leading_zeros(COUNT, X) ((COUNT) = __builtin_clz (X))
 #if __CRIS_arch_version >= 8
Index: gcc/config/avr/libgcc.S
===================================================================
--- gcc/config/avr/libgcc.S	(Revision 175036)
+++ gcc/config/avr/libgcc.S	(Arbeitskopie)
@@ -52,6 +52,26 @@ see the files COPYING3 and COPYING.RUNTI
 #endif
 	.endm
 
+#if defined (__AVR_HAVE_JMP_CALL__)
+#define XCALL call
+#define XJMP  jmp
+#else
+#define XCALL rcall
+#define XJMP  rjmp
+#endif
+
+.macro DEFUN name
+.global \name
+.func \name
+\name:
+.endm
+
+.macro ENDF name
+.size \name, .-\name
+.endfunc
+.endm
+
+\f
 /* Note: mulqi3, mulhi3 are open-coded on the enhanced core.  */
 #if !defined (__AVR_HAVE_MUL__)
 /*******************************************************
@@ -779,12 +799,6 @@ __do_clear_bss:
 /* __do_global_ctors and __do_global_dtors are only necessary
    if there are any constructors/destructors.  */
 
-#if defined (__AVR_HAVE_JMP_CALL__)
-#define XCALL call
-#else
-#define XCALL rcall
-#endif
-
 #ifdef L_ctors
 	.section .init6,"ax",@progbits
 	.global	__do_global_ctors
@@ -897,3 +911,393 @@ __tablejump_elpm__:
 	.endfunc
 #endif /* defined (L_tablejump_elpm) */
 
+\f
+/**********************************
+ * Find first set Bit (ffs)
+ **********************************/
+
+#if defined (L_ffssi2)
+;; find first set bit
+;; r25:r24 = ffs32 (r25:r22)
+;; clobbers: r22, r26
+DEFUN __ffssi2
+    clr  r26
+    tst  r22
+    brne 1f
+    subi r26, -8
+    or   r22, r23
+    brne 1f
+    subi r26, -8
+    or   r22, r24
+    brne 1f
+    subi r26, -8
+    or   r22, r25
+    brne 1f
+    ret
+1:  mov  r24, r22
+    XJMP __loop_ffsqi2
+ENDF __ffssi2
+#endif /* defined (L_ffssi2) */
+
+#if defined (L_ffshi2)
+;; find first set bit
+;; r25:r24 = ffs16 (r25:r24)
+;; clobbers: r26
+DEFUN __ffshi2
+    clr  r26
+    cpse r24, __zero_reg__
+1:  XJMP __loop_ffsqi2
+    ldi  r26, 8
+    or   r24, r25
+    brne 1b
+    ret
+ENDF __ffshi2
+#endif /* defined (L_ffshi2) */
+
+#if defined (L_loop_ffsqi2)
+;; Helper for ffshi2, ffssi2
+;; r25:r24 = r26 + zero_extend16 (ffs8(r24))
+;; r24 must be != 0
+;; clobbers: r26
+DEFUN __loop_ffsqi2
+    inc  r26
+    lsr  r24
+    brcc __loop_ffsqi2
+    mov  r24, r26
+    clr  r25
+    ret    
+ENDF __loop_ffsqi2
+#endif /* defined (L_loop_ffsqi2) */
+
+\f
+/**********************************
+ * Count trailing Zeros (ctz)
+ **********************************/
+
+#if defined (L_ctzsi2)
+;; count trailing zeros
+;; r25:r24 = ctz32 (r25:r22)
+;; ctz(0) = 32
+DEFUN __ctzsi2
+    XCALL __ffssi2
+    dec  r24
+    sbrc r24, 7
+    ldi  r24, 32
+    ret
+ENDF __ctzsi2
+#endif /* defined (L_ctzsi2) */
+
+#if defined (L_ctzhi2)
+;; count trailing zeros
+;; r25:r24 = ctz16 (r25:r24)
+;; ctz(0) = 16
+DEFUN __ctzhi2
+    XCALL __ffshi2
+    dec  r24
+    sbrc r24, 7
+    ldi  r24, 16
+    ret
+ENDF __ctzhi2
+#endif /* defined (L_ctzhi2) */
+
+\f
+/**********************************
+ * Count leading Zeros (clz)
+ **********************************/
+
+#if defined (L_clzdi2)
+;; count leading zeros
+;; r25:r24 = clz64 (r25:r18)
+;; clobbers: r22, r23, r26
+DEFUN __clzdi2
+    XCALL __clzsi2
+    sbrs r24, 5
+    ret
+    mov_l r22, r18
+    mov_h r23, r19
+    mov_l r24, r20
+    mov_h r25, r21
+    XCALL __clzsi2
+    subi r24, -32
+    ret
+ENDF __clzdi2
+#endif /* defined (L_clzdi2) */
+
+#if defined (L_clzsi2)
+;; count leading zeros
+;; r25:r24 = clz32 (r25:r22)
+;; clobbers: r26
+DEFUN __clzsi2
+    XCALL __clzhi2
+    sbrs r24, 4
+    ret
+    mov_l r24, r22
+    mov_h r25, r23
+    XCALL __clzhi2
+    subi r24, -16
+    ret
+ENDF __clzsi2
+#endif /* defined (L_clzsi2) */
+
+#if defined (L_clzhi2)
+;; count leading zeros
+;; r25:r24 = clz16 (r25:r24)
+;; clobbers: r26
+DEFUN __clzhi2
+    clr  r26
+    tst  r25
+    brne 1f
+    subi r26, -8
+    or   r25, r24
+    brne 1f
+    ldi  r24, 16
+    ret
+1:  cpi  r25, 16
+    brsh 3f
+    subi r26, -3
+    swap r25
+2:  inc  r26
+3:  lsl  r25
+    brcc 2b
+    mov  r24, r26
+    clr  r25
+    ret
+ENDF __clzhi2
+#endif /* defined (L_clzhi2) */
+
+\f
+/**********************************
+ * Parity 
+ **********************************/
+
+#if defined (L_paritydi2)
+;; r25:r24 = parity64 (r25:r18)
+;; clobbers: __tmp_reg__
+DEFUN __paritydi2
+    eor  r24, r18
+    eor  r24, r19
+    eor  r24, r20
+    eor  r24, r21
+    XJMP __paritysi2
+ENDF __paritydi2
+#endif /* defined (L_paritydi2) */
+
+#if defined (L_paritysi2)
+;; r25:r24 = parity32 (r25:r22)
+;; clobbers: __tmp_reg__
+DEFUN __paritysi2
+    eor  r24, r22
+    eor  r24, r23
+    XJMP __parityhi2
+ENDF __paritysi2
+#endif /* defined (L_paritysi2) */
+
+#if defined (L_parityhi2)
+;; r25:r24 = parity16 (r25:r24)
+;; clobbers: __tmp_reg__
+DEFUN __parityhi2
+    eor  r24, r25
+;; FALLTHRU
+ENDF __parityhi2
+
+;; r25:r24 = parity8 (r24)
+;; clobbers: __tmp_reg__
+DEFUN __parityqi2
+    ;; parity is in r24[0..7]
+    mov  __tmp_reg__, r24
+    swap __tmp_reg__
+    eor  r24, __tmp_reg__
+    ;; parity is in r24[0..3]
+    subi r24, -4
+    andi r24, -5
+    subi r24, -6
+    ;; parity is in r24[0,3]
+    sbrc r24, 3
+    inc  r24
+    ;; parity is in r24[0]
+    andi r24, 1
+    clr  r25
+    ret
+ENDF __parityqi2
+#endif /* defined (L_parityhi2) */
+
+\f
+/**********************************
+ * Population Count
+ **********************************/
+
+#if defined (L_popcounthi2)
+;; population count
+;; r25:r24 = popcount16 (r25:r24)
+;; clobbers: r30, __tmp_reg__
+DEFUN __popcounthi2
+    XCALL __popcountqi2
+    mov  r30, r24
+    mov  r24, r25
+    XCALL __popcountqi2
+    add  r24, r30
+    clr  r25
+    ret
+ENDF __popcounthi2
+#endif /* defined (L_popcounthi2) */
+
+#if defined (L_popcountsi2)
+;; population count
+;; r25:r24 = popcount32 (r25:r22)
+;; clobbers: r26, r30, __tmp_reg__
+DEFUN __popcountsi2
+    XCALL __popcounthi2
+    mov   r26, r24
+    mov_l r24, r22
+    mov_h r25, r23
+    XCALL __popcounthi2
+    add   r24, r26
+    ret
+ENDF __popcountsi2
+#endif /* defined (L_popcountsi2) */
+
+#if defined (L_popcountdi2)
+;; population count
+;; r25:r24 = popcount64 (r25:r18)
+;; clobbers: r22, r23, r26, r27, r30, __tmp_reg__
+DEFUN __popcountdi2
+    XCALL __popcountsi2
+    mov   r27, r24
+    mov_l r22, r18
+    mov_h r23, r19
+    mov_l r24, r20
+    mov_h r25, r21
+    XCALL __popcountsi2
+    add   r24, r27
+    ret
+ENDF __popcountdi2
+#endif /* defined (L_popcountdi2) */
+
+#if defined (L_popcountqi2)
+;; population count
+;; r24 = popcount8 (r24)
+;; clobbers: __tmp_reg__
+DEFUN __popcountqi2
+    mov  __tmp_reg__, r24
+    andi r24, 1
+    lsr  __tmp_reg__    
+    lsr  __tmp_reg__    
+    adc  r24, __zero_reg__
+    lsr  __tmp_reg__    
+    adc  r24, __zero_reg__
+    lsr  __tmp_reg__    
+    adc  r24, __zero_reg__
+    lsr  __tmp_reg__    
+    adc  r24, __zero_reg__
+    lsr  __tmp_reg__    
+    adc  r24, __zero_reg__
+    lsr  __tmp_reg__    
+    adc  r24, __tmp_reg__    
+    ret    
+ENDF __popcountqi2
+#endif /* defined (L_popcountqi2) */
+
+\f
+/**********************************
+ * Swap bytes
+ **********************************/
+
+;; swap two registers with different register number
+.macro bswap a, b
+    eor \a, \b
+    eor \b, \a
+    eor \a, \b
+.endm
+
+#if defined (L_bswapsi2)
+;; swap bytes
+;; r25:r22 = bswap32 (r25:r22)
+DEFUN __bswapsi2
+    bswap r22, r25
+    bswap r23, r24
+    ret
+ENDF __bswapsi2
+#endif /* defined (L_bswapsi2) */
+
+#if defined (L_bswapdi2)
+;; swap bytes
+;; r25:r18 = bswap64 (r25:r18)
+DEFUN __bswapdi2
+    bswap r18, r25
+    bswap r19, r24
+    bswap r20, r23
+    bswap r21, r22
+    ret
+ENDF __bswapdi2
+#endif /* defined (L_bswapdi2) */
+
+\f
+/**********************************
+ * 64-bit shifts
+ **********************************/
+
+#if defined (L_ashrdi3)
+;; Arithmetic shift right
+;; r25:r18 = ashr64 (r25:r18, r17:r16)
+DEFUN __ashrdi3
+    push r16
+    andi r16, 31
+    breq 2f
+1:  asr  r25
+    ror  r24
+    ror  r23
+    ror  r22
+    ror  r21
+    ror  r20
+    ror  r19
+    ror  r18
+    dec  r16
+    brne 1b
+2:  pop  r16
+    ret
+ENDF __ashrdi3
+#endif /* defined (L_ashrdi3) */
+
+#if defined (L_lshrdi3)
+;; Logic shift right
+;; r25:r18 = lshr64 (r25:r18, r17:r16)
+DEFUN __lshrdi3
+    push r16
+    andi r16, 31
+    breq 2f
+1:  lsr  r25
+    ror  r24
+    ror  r23
+    ror  r22
+    ror  r21
+    ror  r20
+    ror  r19
+    ror  r18
+    dec  r16
+    brne 1b
+2:  pop  r16
+    ret
+ENDF __lshrdi3
+#endif /* defined (L_lshrdi3) */
+
+#if defined (L_ashldi3)
+;; Shift left
+;; r25:r18 = ashl64 (r25:r18, r17:r16)
+DEFUN __ashldi3
+    push r16
+    andi r16, 31
+    breq 2f
+1:  lsl  r18
+    rol  r19
+    rol  r20
+    rol  r21
+    rol  r22
+    rol  r23
+    rol  r24
+    rol  r25
+    dec  r16
+    brne 1b
+2:  pop  r16
+    ret
+ENDF __ashldi3
+#endif /* defined (L_ashldi3) */
Index: gcc/config/avr/t-avr
===================================================================
--- gcc/config/avr/t-avr	(Revision 175036)
+++ gcc/config/avr/t-avr	(Arbeitskopie)
@@ -24,12 +24,10 @@ driver-avr.o: $(srcdir)/config/avr/drive
 avr-devices.o: $(srcdir)/config/avr/avr-devices.c \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H)
 	$(CC) -c $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
-	
 
 avr-c.o: $(srcdir)/config/avr/avr-c.c \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(C_COMMON_H)
 	$(CC) -c $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
-	
 
 
 LIB1ASMSRC = avr/libgcc.S
@@ -52,7 +50,30 @@ LIB1ASMFUNCS = \
 	_copy_data \
 	_clear_bss \
 	_ctors \
-	_dtors
+	_dtors \
+	_ffssi2 \
+	_ffshi2 \
+	_loop_ffsqi2 \
+	_ctzsi2 \
+	_ctzhi2 \
+	_clzdi2 \
+	_clzsi2 \
+	_clzhi2 \
+	_paritydi2 \
+	_paritysi2 \
+	_parityhi2 \
+	_popcounthi2 \
+	_popcountsi2 \
+	_popcountdi2 \
+	_popcountqi2 \
+	_bswapsi2 \
+	_bswapdi2 \
+	_ashldi3 \
+	_ashrdi3 \
+	_lshrdi3
+
+LIB2FUNCS_EXCLUDE = \
+	_clz
 
 # We do not have the DF type.
 # Most of the C functions in libgcc2 use almost all registers,
@@ -216,8 +237,8 @@ MULTILIB_MATCHES = \
 	mmcu?avr51=mmcu?at90can128 \
 	mmcu?avr51=mmcu?at90usb1286 \
 	mmcu?avr51=mmcu?at90usb1287 \
- 	mmcu?avr6=mmcu?atmega2560 \
- 	mmcu?avr6=mmcu?atmega2561
+	mmcu?avr6=mmcu?atmega2560 \
+	mmcu?avr6=mmcu?atmega2561
 
 MULTILIB_EXCEPTIONS =
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-06-17 15:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-16  6:53 [Patch, AVR]: PR49313, fix PR29524 Georg-Johann Lay
2011-06-15 10:46 ` Denis Chertykov
2011-06-15 11:59   ` Georg-Johann Lay
2011-06-16 10:00   ` Georg-Johann Lay
2011-06-16 19:17 ` Richard Henderson
2011-06-17  9:41   ` Georg-Johann Lay
2011-06-17 17:26     ` Richard Henderson
2011-06-17 17:32       ` Georg-Johann Lay
  -- strict thread matches above, loose matches on Subject: below --
2011-06-16  6:49 Georg-Johann Lay
2011-06-16  9:39 ` Ian Lance Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).