* SH optimized software floating point routines
@ 2010-06-10 12:01 Naveen H. S
2010-06-14 4:50 ` Kaz Kojima
0 siblings, 1 reply; 30+ messages in thread
From: Naveen H. S @ 2010-06-10 12:01 UTC (permalink / raw)
To: gcc; +Cc: kkojima, Prafulla Thakare
Hi,
Software floating point(libgcc) routines were implemented for SH in the
following links:-
http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00063.html
http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00614.html
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00624.html
There were some discussions regarding the testing of these routines.
We had briefly tested those routines and found that they did not have
any major issues.
http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00791.html
Please let me know whether these routines can be used in SH toolchain.
Please let me know whether we should invoke these routines by default.
Currently, we are thinking of invoking these routines only on specifying
command line options.
Regards,
Naveen.H.S
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-06-10 12:01 SH optimized software floating point routines Naveen H. S
@ 2010-06-14 4:50 ` Kaz Kojima
2010-06-14 7:27 ` Joern Rennecke
2010-07-16 10:04 ` Naveen H. S
0 siblings, 2 replies; 30+ messages in thread
From: Kaz Kojima @ 2010-06-14 4:50 UTC (permalink / raw)
To: Naveen.S; +Cc: gcc, Prafulla.Thakare
"Naveen H. S" <Naveen.S@kpitcummins.com> wrote:
> Software floating point(libgcc) routines were implemented for SH in the
> following links:-
> http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00063.html
> http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00614.html
> http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00624.html
>
> There were some discussions regarding the testing of these routines.
> We had briefly tested those routines and found that they did not have
> any major issues.
> http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00791.html
> Please let me know whether these routines can be used in SH toolchain.
SH currently uses fp-bit.c implementation of soft fp which is
known as a rather inefficient one. PowerPC uses a more efficient
soft-fp derived from glibc. Other targets like arm, ia64, sparc
and also some new targets use it. Please see config/*/t-*softfp
and config/*/sfp-machine.h. We can expect the best performance
from Joern's assembly soft fp, but in general the maintenance of
hand crafted assembly codes will be hard.
If I remember correctly, there were some arguments for this
target specific one vs. generic one when soft-fp was introduced
in gcc. It would be better to try soft-fp on SH and see numbers
with current fp-bit, soft-fp and assembly one.
If soft-fp is yet too inefficient for you, you can free to propose
a complete and regtested patch for SH assembly soft fp against
trunk. The original Joern's patch touched not only SH specific
part but also the generic part of the compiler. The revised patch
is needed to test it for various targets too.
Regards,
kaz
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-06-14 4:50 ` Kaz Kojima
@ 2010-06-14 7:27 ` Joern Rennecke
2010-07-16 10:04 ` Naveen H. S
1 sibling, 0 replies; 30+ messages in thread
From: Joern Rennecke @ 2010-06-14 7:27 UTC (permalink / raw)
To: Kaz Kojima; +Cc: Naveen.S, gcc, Prafulla.Thakare
Quoting Kaz Kojima <kkojima@rr.iij4u.or.jp>:
> but in general the maintenance of
> hand crafted assembly codes will be hard.
If you have a fixed feature set and pipeline, and have made sure the code
is correct, no further maintenance should be required - which is more
than can be said of the target port code generator, which tends to fall
back in generated code performance if you have a finely tuned port and
don't keep up with changes in the optimizers.
AFAIK the pipeline variations in the SH still don't come close to the
difference between compiler generated code and assembly code that is
properly hand-optimized for the SH4-[123]00 pipelines.
> If soft-fp is yet too inefficient for you, you can free to propose
> a complete and regtested patch for SH assembly soft fp against
> trunk.
IIRC I never finished working on some corner cases of division rounding in
the SH floating-point emulation because I had no testcase (and it was not
high on my agenda because there seemed to be little practical relevance).
In order to test the ARCompact floating point emulations, I made a new test
to check rounding of subnormal numbers during division:
http://gcc.gnu.org/viewcvs/branches/arc-4_4-20090909-branch/gcc/testsuite/gcc.c-torture/execute/ieee/denorm-rand.c?limit_changes=0&view=markup&pathrev=151545
To give the code a good workout, you can up the iteration count, e.g.
1000 -> 1000000 . However, you probably don't want to do this with
fp-bit at -O0.
Also note that fp-bit.c in SVN gets the rounding wrong, you can use this
patch to have a sane comparison:
http://gcc.gnu.org/viewcvs/branches/arc-4_4-20090909-branch/gcc/config/fp-bit.c?limit_changes=0&r1=151545&r2=151544&pathrev=151545
FWIW, these are the associated ChangeLog entries:
gcc/testsuite:
2008-04-04 J"orn Rennecke <joern.rennecke@arc.com>
* gcc.c-torture/execute/ieee/denorm-rand.c: New file.
* gcc.dg/torture/fp-int-convert.h: Avoid undefined behaviour.
gcc:
2008-04-04 J"orn Rennecke <joern.rennecke@arc.com>
* config/fp-bit.c (_fpdiv_parts): Avoid double rounding.
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: SH optimized software floating point routines
2010-06-14 4:50 ` Kaz Kojima
2010-06-14 7:27 ` Joern Rennecke
@ 2010-07-16 10:04 ` Naveen H. S
2010-07-16 10:26 ` Joern Rennecke
` (2 more replies)
1 sibling, 3 replies; 30+ messages in thread
From: Naveen H. S @ 2010-07-16 10:04 UTC (permalink / raw)
To: Kaz Kojima; +Cc: gcc, Prafulla Thakare, amylaar
[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]
Hi,
>> you can free to propose a complete and regtested patch for SH
>> assembly soft fp against trunk.
Please find attached the ported soft float patch "sh_softfloat.patch".
The original patch was posted at the following link by Joern RENNECKE.
http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00614.html
The following modifications have been done in the patch.
sched-deps.c :- Hunk was not applied due to modifications in the
current gcc source.
t-sh :- divsf3, extendsfdf2 and truncdfsf2 routines are not included
as they resulted in some regressions.
divsf3 - c-c++-common/torture/complex-sign-mixed-div.c
extendsfdf2 - gcc.c-torture/execute/conversion.c
gcc.dg/torture/fp-int-convert-float.c, gcc.dg/pr28796-2.c
gcc.dg/torture/type-generic-1.c
sh.md :- cbranchsf4, cbranchdf4, cstoresf4, cstoredf4 instruction
patterns are not included as they are already present in current source.
Modifying the routines referring patch resulted in build failure.
Branch and other related instruction patterns bgt, blt, ble, bge, bleu,
bunle, bunordered, bunlt, bunge, bungt, seq, sge, sgeu, sne etc are not
modified as they should not be present in current gcc source.
After the above modifications, the regressions are reduced. However,
there are some regressions in SH3 and related (m4-nofpu,
m4a-nofpu and m2a-single) targets.
gcc.c-torture/execute/20060420-1.c - O1 and O2 execution failures
gcc.dg/fold-overflow-1.c scan-assembler-times 2139095040
We are working on fixing the above failures and also extendsfdf2, divsf3
and truncdfsf2 routines. Please review the patch and let us know if
there should be any modifications in it.
Please let us know if there are any inputs in resolving the failures and
routines.
>> It would be better to try soft-fp on SH
Please find attached the patch ""sh_softfp.patch" which implements basic
support of soft-fp for SH target. There were no regressions found with
the patch. Please let us know if there should be any further improvements
required for complete soft-fp support.
Thanks for the co-operation and support.
Regards,
Naveen.H.S
[-- Attachment #2: sh_softfloat.patch --]
[-- Type: application/octet-stream, Size: 270966 bytes --]
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/Makefile.in gcc-4.5.0/gcc/Makefile.in
--- gcc-4.5.0/gcc/Makefile.in 2010-04-02 13:19:06.000000000 +0530
+++ gcc-4.5.0/gcc/Makefile.in 2010-07-14 11:58:33.000000000 +0530
@@ -2701,7 +2701,7 @@ opts-common.o : opts-common.c opts.h $(C
coretypes.h intl.h
targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \
$(EXPR_H) $(TM_H) $(RTL_H) $(TM_P_H) $(FUNCTION_H) output.h $(TOPLEV_H) \
- $(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) gt-targhooks.h \
+ $(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) $(REGS_H) gt-targhooks.h \
$(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h
bversion.h: s-bversion; @true
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/reload.c gcc-4.5.0/gcc/reload.c
--- gcc-4.5.0/gcc/reload.c 2009-12-21 22:02:44.000000000 +0530
+++ gcc-4.5.0/gcc/reload.c 2010-07-14 11:58:33.000000000 +0530
@@ -2212,14 +2212,8 @@ operands_match_p (rtx x, rtx y)
multiple hard register group of scalar integer registers, so that
for example (reg:DI 0) and (reg:SI 1) will be considered the same
register. */
- if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
- && SCALAR_INT_MODE_P (GET_MODE (x))
- && i < FIRST_PSEUDO_REGISTER)
- i += hard_regno_nregs[i][GET_MODE (x)] - 1;
- if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (y)) > UNITS_PER_WORD
- && SCALAR_INT_MODE_P (GET_MODE (y))
- && j < FIRST_PSEUDO_REGISTER)
- j += hard_regno_nregs[j][GET_MODE (y)] - 1;
+ i = targetm.match_adjust (x, i);
+ j = targetm.match_adjust (y, j);
return i == j;
}
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/target-def.h gcc-4.5.0/gcc/target-def.h
--- gcc-4.5.0/gcc/target-def.h 2010-03-25 02:14:48.000000000 +0530
+++ gcc-4.5.0/gcc/target-def.h 2010-07-14 11:58:33.000000000 +0530
@@ -401,6 +401,10 @@
#define TARGET_SUPPORT_VECTOR_MISALIGNMENT \
default_builtin_support_vector_misalignment
+#ifndef TARGET_MATCH_ADJUST
+#define TARGET_MATCH_ADJUST default_match_adjust
+#endif
+
#define TARGET_VECTORIZE \
{ \
@@ -1009,6 +1013,7 @@
TARGET_CONVERT_TO_TYPE, \
TARGET_IRA_COVER_CLASSES, \
TARGET_SECONDARY_RELOAD, \
+ TARGET_MATCH_ADJUST, \
TARGET_EXPAND_TO_RTL_HOOK, \
TARGET_INSTANTIATE_DECLS, \
TARGET_HARD_REGNO_SCRATCH_OK, \
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/target.h gcc-4.5.0/gcc/target.h
--- gcc-4.5.0/gcc/target.h 2010-03-27 15:57:39.000000000 +0530
+++ gcc-4.5.0/gcc/target.h 2010-07-14 11:58:33.000000000 +0530
@@ -1027,6 +1027,10 @@ struct gcc_target
enum machine_mode,
secondary_reload_info *);
+ /* Take an rtx and its regno, and return the regno for purposes of
+ checking a matching constraint. */
+ int (*match_adjust) (rtx, int);
+
/* This target hook allows the backend to perform additional
processing while initializing for variable expansion. */
void (* expand_to_rtl_hook) (void);
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/targhooks.c gcc-4.5.0/gcc/targhooks.c
--- gcc-4.5.0/gcc/targhooks.c 2010-03-27 15:57:39.000000000 +0530
+++ gcc-4.5.0/gcc/targhooks.c 2010-07-14 11:58:33.000000000 +0530
@@ -66,6 +66,7 @@ along with GCC; see the file COPYING3.
#include "reload.h"
#include "optabs.h"
#include "recog.h"
+#include "regs.h"
bool
@@ -268,6 +269,27 @@ default_cxx_guard_type (void)
return long_long_integer_type_node;
}
+/* Given an rtx and its regno, return a regno value that shall be used for
+ purposes of comparison in operands_match_p.
+ Generally, we say that integer registers are subject to big-endian
+ adjustment. This default target hook should generally work if the mode
+ of a register is a sufficient indication if this adjustment is to take
+ place; this will not work when software floating point is done in integer
+ registers. */
+int
+default_match_adjust (rtx x, int regno)
+{
+ /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+ multiple hard register group of scalar integer registers, so that
+ for example (reg:DI 0) and (reg:SI 1) will be considered the same
+ register. */
+ if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+ && SCALAR_INT_MODE_P (GET_MODE (x))
+ && regno < FIRST_PSEUDO_REGISTER)
+ regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+ return regno;
+}
+
/* Returns the size of the cookie to use when allocating an array
whose elements have the indicated TYPE. Assumes that it is already
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/targhooks.h gcc-4.5.0/gcc/targhooks.h
--- gcc-4.5.0/gcc/targhooks.h 2010-03-27 15:57:39.000000000 +0530
+++ gcc-4.5.0/gcc/targhooks.h 2010-07-14 11:58:33.000000000 +0530
@@ -109,6 +109,7 @@ extern const enum reg_class *default_ira
extern enum reg_class default_secondary_reload (bool, rtx, enum reg_class,
enum machine_mode,
secondary_reload_info *);
+extern int default_match_adjust (rtx, int);
extern void hook_void_bitmap (bitmap);
extern bool default_handle_c_option (size_t, const char *, int);
extern int default_reloc_rw_mask (void);
Common subdirectories: gcc-4.5.0/gcc/testsuite and gcc-4.5.0/gcc/testsuite
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/adddf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/adddf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/adddf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/adddf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,799 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for adding two double numbers
+
+! Author: Rakesh Kumar
+! SH1 Support by Joern Rennecke
+! Sticky Bit handling : Joern Rennecke
+
+! Arguments: r4-r5, r6-r7
+! Result: r0-r1
+
+! The value in r4-r5 is referred to as op1
+! and that in r6-r7 is referred to as op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (subdf3)
+ FUNC (GLOBAL (subdf3))
+ .global GLOBAL (adddf3)
+ FUNC (GLOBAL (adddf3))
+
+GLOBAL (subdf3):
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r6,r2
+
+ mov r5,r4
+ mov r7,r6
+
+ mov r1,r5
+ mov r2,r7
+#endif
+ mov.l .L_sign,r2
+ bra .L_adddf3_1
+ xor r2,r6
+
+GLOBAL (adddf3):
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r6,r2
+
+ mov r5,r4
+ mov r7,r6
+
+ mov r1,r5
+ mov r2,r7
+#endif
+
+.L_adddf3_1:
+ mov.l r8,@-r15
+ mov r4,r1
+
+ mov.l .L_inf,r2
+ mov r6,r3
+
+ mov.l r9,@-r15
+ and r2,r1 !Exponent of op1 in r1
+
+ mov.l r10,@-r15
+ and r2,r3 !Exponent of op2 in r3
+
+ ! Check for Nan or Infinity
+ mov.l .L_sign,r9
+ cmp/eq r2,r1
+
+ mov r9,r10
+ bt .L_thread_inv_exp_op1
+
+ mov r9,r0
+ cmp/eq r2,r3
+! op1 has a valid exponent. We need not check it again.
+! Return op2 straight away.
+ and r4,r9 !r9 has sign bit for op1
+ bt .L_ret_op2
+
+ ! Check for -ve zero
+ cmp/eq r4,r0
+ and r6,r10 !r10 has sign bit for op2
+
+ bt .L_op1_nzero
+
+ cmp/eq r6,r0
+ bt .L_op2_nzero
+
+! Check for zero
+.L_non_zero:
+ tst r4,r4
+ bt .L_op1_zero
+
+ ! op1 is not zero, check op2 for zero
+ tst r6,r6
+ bt .L_op2_zero
+
+! r1 and r3 has masked out exponents, r9 and r10 has signs
+.L_add:
+ mov.l .L_high_mant,r8
+ mov #-20,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r1 ! r1 now has exponent for op1 in its lower bits
+#else
+ SHLR20 (r1)
+#endif
+ and r8,r6 ! Higher bits of mantissa of op2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3 ! r3 has exponent for op2 in its lower bits
+#else
+ SHLR20 (r3)
+#endif
+ and r8,r4 ! Higher bits of mantissa of op1
+
+ mov.l .L_21bit,r8
+
+ tst r1,r1
+ bt .L_norm_op1
+
+ ! Set the 21st bit.
+ or r8,r4
+ tst r3,r3
+
+ bt .L_norm_op2
+ or r8,r6
+
+! Check for negative mantissas. Make them positive by negation
+! r9 and r10 have signs of op1 and op2 respectively
+.L_neg_mant:
+ tst r9,r9
+ bf .L_neg_op1
+
+ tst r10,r10
+ bf .L_neg_op2
+
+.L_add_1:
+ cmp/ge r1,r3
+
+ mov r1,r0
+ bt .L_op2_exp_greater
+
+ sub r3,r0
+ ! If exponent difference is greater than 54, the resultant exponent
+ ! won't be changed. Return op1 straight away.
+ mov #54,r2
+ cmp/gt r2,r0
+
+ bt .L_pack_op1
+
+ mov r1,r3
+ clrt
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+
+ ! Shift left the first operand and apply rest of shifts to second operand.
+ mov #0,r2
+ shll r5
+
+ rotcl r4
+
+ add #-1,r3
+ dt r0
+
+ bt .L_add_mant
+ dt r0
+
+ bt LOCAL(got_guard)
+ dt r0
+
+ bt LOCAL(got_sticky)
+
+! Shift the mantissa part of op2 so that both exponents are equal
+.L_shfrac_op2:
+ shar r6
+ or r7,r2 ! sticky bit
+
+ rotcr r7
+ dt r0
+
+ bf .L_shfrac_op2
+
+ shlr r2
+
+ subc r2,r2 ! spread sticky bit across r2
+LOCAL(got_sticky):
+ shar r6
+
+ rotcr r7
+
+ rotcr r2
+LOCAL(got_guard):
+ shar r6
+
+ rotcr r7
+
+ rotcr r2
+
+
+! Add the psotive mantissas and check for overflow by checking the
+! MSB of the resultant. In case of overflow, negate the result.
+.L_add_mant:
+ clrt
+ addc r7,r5
+
+ mov #0,r10 ! Assume resultant to be positive
+ addc r6,r4
+
+ cmp/pz r4
+
+ bt .L_mant_ptv
+ negc r2,r2
+
+ negc r5,r5
+
+ mov.l .L_sign,r10 ! The assumption was wrong, result is negative
+ negc r4,r4
+
+! 23rd bit in the high part of mantissa could be set.
+! In this case, right shift the mantissa.
+.L_mant_ptv:
+ mov.l .L_23bit,r0
+
+ tst r4,r0
+ bt .L_mant_ptv_0
+
+ shlr r4
+ rotcr r5
+
+ add #1,r3
+ bra .L_mant_ptv_1
+ rotcr r2
+
+.L_mant_ptv_0:
+ mov.l .L_22bit,r0
+ tst r4,r0
+
+ bt .L_norm_mant
+
+.L_mant_ptv_1:
+ ! 22 bit of resultant mantissa is set. Shift right the mantissa
+ ! and add 1 to exponent
+ add #1,r3
+ shlr r4
+ rotcr r5
+ ! The mantissa is already normalized. We don't need to
+ ! spend any effort. Branch to epilogue.
+ bra .L_epil
+ rotcr r2
+
+! Normalize operands
+.L_norm_op1:
+ shll r5
+
+ rotcl r4
+ add #-1,r1
+
+ tst r4,r8
+ bt .L_norm_op1
+
+ tst r3,r3
+ SL(bf, .L_neg_mant,
+ add #1,r1)
+
+.L_norm_op2:
+ shll r7
+
+ rotcl r6
+ add #-1,r3
+
+ tst r6,r8
+ bt .L_norm_op2
+
+ bra .L_neg_mant
+ add #1,r3
+
+! Negate the mantissa of op1
+.L_neg_op1:
+ clrt
+ negc r5,r5
+
+ negc r4,r4
+ tst r10,r10
+
+ bt .L_add_1
+
+! Negate the mantissa of op2
+.L_neg_op2:
+ clrt
+ negc r7,r7
+
+ bra .L_add_1
+ negc r6,r6
+
+! Thread the jump to .L_inv_exp_op1
+.L_thread_inv_exp_op1:
+ bra .L_inv_exp_op1
+ nop
+
+.L_ret_op2:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r6,r1
+#else
+ mov r6,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r7,r0
+#else
+ mov r7,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_op1_nzero:
+ tst r5,r5
+ bt .L_ret_op2
+
+ ! op1 is not zero. Check op2 for negative zero
+ cmp/eq r6,r0
+ bf .L_non_zero ! both op1 and op2 are not -0
+
+.L_op2_nzero:
+ tst r7,r7
+ bf .L_non_zero
+
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0 ! op2 is -0, return op1
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! High bit of op1 is known to be zero.
+! Check low bit. r2 contains 0x00000000
+.L_op1_zero:
+ tst r5,r5
+ bt .L_ret_op2
+
+ ! op1 is not zero. Check high bit of op2
+ tst r6,r6
+ bf .L_add ! both op1 and op2 are not zero
+
+! op1 is not zero. High bit of op2 is known to be zero.
+! Check low bit of op2. r2 contains 0x00000000
+.L_op2_zero:
+ tst r7,r7
+ bf .L_add
+
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0 ! op2 is zero, return op1
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! exp (op1) is smaller or equal to exp (op2)
+! The logic of same operations is present in .L_add. Kindly refer it for
+! comments
+.L_op2_exp_greater:
+ mov r3,r0
+ sub r1,r0
+
+ mov #54,r2
+ cmp/gt r2,r0
+
+ bt .L_pack_op2
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+
+ mov #0,r2
+ shll r7
+ rotcl r6
+ add #-1,r0
+ add #-1,r3
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+.L_shfrac_op1:
+ add #-1,r0
+ shar r4
+
+ rotcr r5
+ rotcr r2
+
+ cmp/eq #0,r0
+ bf .L_shfrac_op1
+
+ bra .L_add_mant
+ nop
+
+! Return the value in op1
+.L_ret_op1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! r1 has exp, r9 has sign, r4 and r5 mantissa
+.L_pack_op1:
+ mov.l .L_high_mant,r7
+ mov r4,r0
+
+ tst r9,r9
+ bt .L_pack_op1_1
+
+ clrt
+ negc r5,r5
+ negc r0,r0
+
+.L_pack_op1_1:
+ and r7,r0
+ mov r1,r3
+
+ mov #20,r2
+ mov r5,r1
+
+ mov.l @r15+,r10
+ or r9,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+ mov.l @r15+,r9
+
+ or r3,r0
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+!r2 has exp, r10 has sign, r6 and r7 mantissa
+.L_pack_op2:
+ mov.l .L_high_mant,r9
+ mov r6,r0
+
+ tst r10,r10
+ bt .L_pack_op2_1
+
+ clrt
+ negc r7,r7
+ negc r0,r0
+
+.L_pack_op2_1:
+ and r9,r0
+ mov r7,r1
+
+ mov #20,r2
+ or r10,r0
+
+ mov.l @r15+,r10
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+
+ mov.l @r15+,r9
+
+ or r3,r0
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+! Normalize the mantissa by setting its 21 bit in high part
+.L_norm_mant:
+ mov.l .L_21bit,r0
+
+ tst r4,r0
+ bf .L_epil
+
+ tst r4,r4
+ bf .L_shift_till_1
+
+ tst r5,r5
+ bf .L_shift_till_1
+
+ ! Mantissa is zero, return 0
+ mov.l @r15+,r10
+ mov #0,r0
+
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+
+ rts
+ mov #0,r1
+
+! A loop for making the 21st bit 1 in high part of resultant mantissa
+! It is already ensured that 1 bit is present in the mantissa
+.L_shift_till_1:
+ clrt
+ shll r5
+
+ rotcl r4
+ add #-1,r3
+
+ tst r4,r0
+ bt .L_shift_till_1
+
+! Return the result. Mantissa is in r4-r5. Exponent is in r3
+! Sign bit in r10
+.L_epil:
+ cmp/pl r3
+
+ bf .L_denorm
+ mov.l LOCAL(x7fffffff),r0
+
+ mov r5,r1
+ shlr r1
+
+ mov #0,r1
+ addc r0,r2
+
+! Check extra MSB here
+ mov.l .L_22bit,r9
+ addc r1,r5 ! round to even
+
+ addc r1,r4
+ tst r9,r4
+
+ bf .L_epil_1
+
+.L_epil_0:
+ mov.l .L_21bit,r1
+
+ not r1,r1
+ and r1,r4
+
+ mov r4,r0
+ or r10,r0
+
+ mov.l @r15+,r10
+ mov #20,r2
+
+ mov.l @r15+,r9
+ mov r5,r1
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+ or r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_epil_1:
+ shlr r4
+ add #1,r3
+ bra .L_epil_0
+ rotcr r5
+
+.L_denorm:
+ add #-1,r3
+.L_denorm_1:
+ tst r3,r3
+ bt .L_denorm_2
+
+ shlr r4
+ rotcr r5
+
+ movt r1
+ bra .L_denorm_1
+ add #1,r3
+
+.L_denorm_2:
+ clrt
+ mov #0,r2
+ addc r1,r5
+
+ addc r2,r4
+ mov r4,r0
+
+ or r10,r0
+ mov.l @r15+,r10
+
+ mov r5,r1
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+! op1 is known to be positive infinity, and op2 is Inf. The sign
+! of op2 is not known. Return the appropriate value
+.L_op1_pinf_op2_inf:
+ mov.l .L_sign,r0
+ tst r6,r0
+
+ bt .L_ret_op2_1
+
+ ! op2 is negative infinity. Inf - Inf is being performed
+ mov.l .L_inf,r0
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r1
+#endif
+ mov.l @r15+,r8
+
+ rts
+#ifdef __LITTLE_ENDIAN__
+ mov #1,r0
+#else
+ mov #1,r1 ! Any value here will return Nan
+#endif
+
+.L_ret_op1_1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_ret_op2_1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r6,r1
+#else
+ mov r6,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r7,r0
+#else
+ mov r7,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! op1 is negative infinity. Check op2 for infinity or Nan
+.L_op1_ninf:
+ cmp/eq r2,r3
+ bf .L_ret_op1_1 ! op2 is neither Nan nor Inf
+
+ mov.l @r15+,r9
+ div0s r4,r6 ! different signs -> NaN
+ mov r4,DBLRH
+ or r6,DBLRH
+ mov.l @r15+,r8
+ SL(bf, 0f,
+ mov r5,DBLRL)
+ mov #-1,DBLRH ! return NaN.
+0: rts
+ or r7,DBLRL
+
+!r1 contains exponent for op1, r3 contains exponent for op2
+!r2 has .L_inf (+ve Inf)
+!op1 has invalid exponent. Either it contains Nan or Inf
+.L_inv_exp_op1:
+ ! Check if a is Nan
+ cmp/pl r5
+ bt .L_ret_op1_1
+
+ mov.l .L_high_mant,r0
+ and r4,r0
+
+ cmp/pl r0
+ bt .L_ret_op1_1
+
+ ! op1 is not Nan. It is infinity. Check the sign of it.
+ ! If op2 is Nan, return op2
+ cmp/pz r4
+
+ bf .L_op1_ninf
+
+ ! op2 is +ve infinity here
+ cmp/eq r2,r3
+ bf .L_ret_op1_1 ! op2 is neither Nan nor Inf
+
+ ! r2 is free now
+ mov.l .L_high_mant,r0
+ tst r6,r0 ! op2 also has invalid exponent
+
+ bf .L_ret_op2_1 ! op2 is Infinity, and op1 is +Infinity
+
+ tst r7,r7
+ bt .L_op1_pinf_op2_inf ! op2 is Infinity, and op1 is +Infinity
+ !op2 is not infinity, It is Nan
+ bf .L_ret_op2_1
+
+ .align 2
+.L_high_mant:
+ .long 0x000FFFFF
+
+.L_21bits:
+ .long 0x001FFFFF
+
+.L_22bit:
+ .long 0x00200000
+
+.L_23bit:
+ .long 0x00400000
+
+.L_21bit:
+ .long 0x00100000
+
+.L_sign:
+ .long 0x80000000
+
+.L_inf:
+ .long 0x7ff00000
+
+LOCAL(x7fffffff): .long 0x7fffffff
+
+ENDFUNC (GLOBAL (subdf3))
+ENDFUNC (GLOBAL (adddf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/addsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/addsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/addsf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/addsf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,535 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Add floating point numbers in r4, r5.
+
+! Author: Rakesh Kumar
+
+! Arguments are in r4, r5 and result in r0
+
+! Entry points: ___subsf3, ___addsf3
+
+! r4 and r5 are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (subsf3)
+ .global GLOBAL (addsf3)
+ FUNC (GLOBAL (subsf3))
+ FUNC (GLOBAL (addsf3))
+
+GLOBAL (subsf3):
+ mov.l .L_sign_bit,r1
+ xor r1,r5
+
+GLOBAL (addsf3):
+ mov.l r8,@-r15
+ mov r4,r3
+
+ mov.l .L_pinf,r2
+ mov #0,r8
+
+ and r2,r3 ! op1's exponent.
+ mov r5,r6
+
+ ! Check NaN or Infinity
+ and r2,r6 ! op2's exponent.
+ cmp/eq r2,r3
+
+ ! go if op1 is NaN or INF.
+ mov.l .L_sign_bit,r0
+ SL(bt, .L_inv_op1,
+ mov #-23,r1)
+
+ ! Go if op2 is NaN/INF.
+ cmp/eq r2,r6
+ mov r0,r7
+ bt .L_ret_op2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r3)
+#else
+ shld r1,r3
+#endif
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+#else
+ shld r1,r6
+#endif
+
+ ! Check for negative zero
+ cmp/eq r0,r5
+
+ mov r5,r1
+ SL(bt, .L_ret_op1,
+ and r7,r1)
+
+ cmp/eq r0,r4
+ bt .L_ret_op2
+
+ ! if op1 is zero return op2
+ tst r4,r4
+ bt .L_ret_op2
+
+ ! Equal numbers with opposite sign
+ mov r4,r2
+ xor r5,r2
+
+ cmp/eq r0,r2
+ bt .L_ret_zero
+
+ ! if op2 is zero return op1
+ mov.l .L_mask_fra,r2
+ tst r5,r5
+
+ ! Extract the mantissa
+ mov r4,r0
+ SL(bt, .L_ret_op1,
+ and r2,r5)
+
+ and r2,r4
+
+ mov.l .L_imp_bit,r2
+ and r7,r0 ! sign bit of op1
+
+ ! Check for denormals
+ tst r3,r3
+ bt .L_norm_op1
+
+ ! Attach the implicit bit
+ or r2,r4
+ tst r6,r6
+
+ bt .L_norm_op2
+
+ or r2,r5
+ tst r0,r0
+
+ ! operands are +ve or -ve??
+ bt .L_ptv_op1
+
+ neg r4,r4
+
+.L_ptv_op1:
+ tst r1,r1
+ bt .L_ptv_op2
+
+ neg r5,r5
+
+! Test exponents for equality
+.L_ptv_op2:
+ cmp/eq r3,r6
+ bt .L_exp_eq
+
+! Make exponents of two arguments equal
+.L_exp_ne:
+ ! r0, r1 contain sign bits.
+ ! r4, r5 contain mantissas.
+ ! r3, r6 contain exponents.
+ ! r2, r7 scratch.
+
+ ! Calculate result exponent.
+ mov r6,r2
+ sub r3,r2 ! e2 - e1
+
+ cmp/pl r2
+ mov #23,r7
+
+ ! e2 - e1 is -ve
+ bf .L_exp_ne_1
+
+ mov r6,r3 ! Result exp.
+ cmp/gt r7,r2 ! e2-e1 > 23
+
+ mov #1,r7
+ bt .L_pack_op2_0
+
+ ! Align the mantissa
+.L_loop_ne:
+ shar r4
+
+ rotcr r8
+ cmp/eq r7,r2
+
+ add #-1,r2
+ bf .L_loop_ne
+
+ bt .L_exp_eq
+
+! Exponent difference is too high.
+! Return op2 after placing pieces in proper place
+.L_pack_op2_0:
+ ! If op1 is -ve
+ tst r1,r1
+ bt .L_pack_op2
+
+ neg r5,r5
+
+! r6 has exponent
+! r5 has mantissa, r1 has sign
+.L_pack_op2:
+ mov.l .L_nimp_bit,r2
+ mov #23,r3
+
+ mov r1,r0
+
+ and r2,r5
+ mov.l @r15+,r8
+
+ or r5,r0
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+ rts
+ or r6,r0
+
+! return op1. It is NAN or INF or op2 is zero.
+.L_ret_op1:
+ mov r4,r0
+
+ rts
+ mov.l @r15+,r8
+
+! return zero
+.L_ret_zero:
+ mov #0,r0
+
+ rts
+ mov.l @r15+,r8
+
+! return op2. It is NaN or INF or op1 is zero.
+.L_ret_op2:
+ mov r5,r0
+
+ rts
+ mov.l @r15+,r8
+
+! op2 is denormal. Normalize it.
+.L_norm_op2:
+ shll r5
+ add #-1,r6
+
+ tst r2,r5
+ bt .L_norm_op2
+
+ ! Check sign
+ tst r1,r1
+ bt .L_norm_op2_2
+
+ neg r5,r5
+
+.L_norm_op2_2:
+ add #1,r6
+ cmp/eq r3,r6
+
+ bf .L_exp_ne
+ bt .L_exp_eq
+
+! Normalize op1
+.L_norm_op1:
+ shll r4
+ add #-1,r3
+
+ tst r2,r4
+ bt .L_norm_op1
+
+ ! Check sign
+ tst r0,r0
+ bt .L_norm_op1_1
+
+ neg r4,r4
+
+.L_norm_op1_1:
+ ! Adjust biasing
+ add #1,r3
+
+ ! Check op2 for denormalized value
+ tst r6,r6
+ bt .L_norm_op2
+
+ mov.l .L_imp_bit,r2
+
+ tst r1,r1 ! Check sign
+ or r2,r5 ! Attach 24th bit
+
+ bt .L_norm_op1_2
+
+ neg r5,r5
+
+.L_norm_op1_2:
+ cmp/eq r3,r6
+
+ bt .L_exp_eq
+ bf .L_exp_ne
+
+! op1 is NaN or Inf
+.L_inv_op1:
+ ! Return op1 if it is NAN.
+ ! r2 is infinity
+ cmp/gt r2,r4
+ bt .L_ret_op1
+
+ ! op1 is +/- INF
+ ! If op2 is same return now.
+ cmp/eq r4,r5
+ bt .L_ret_op1
+
+ ! return op2 if it is NAN
+ cmp/gt r2,r5
+ bt .L_ret_op2
+
+ ! Check if op2 is inf
+ cmp/eq r2,r6
+ bf .L_ret_op1
+
+ ! Both op1 and op2 are infinities
+ !of opp signs, or there is -NAN. Return a NAN.
+ mov.l @r15+,r8
+ rts
+ mov #-1,r0
+
+! Make unequal exponents equal.
+.L_exp_ne_1:
+ mov #-25,r7
+ cmp/gt r2,r7 ! -23 > e2 - e1
+
+ add #1,r2
+ bf .L_exp_ne_2
+
+ tst r0,r0
+ bt .L_pack_op1
+
+.L_pack_op1_0:
+ bra .L_pack_op1
+ neg r4,r4
+
+! Accumulate the shifted bits in r8
+.L_exp_ne_2:
+ ! Shift with rounding
+ shar r5
+ rotcr r8
+
+ tst r2,r2
+
+ add #1,r2
+ bf .L_exp_ne_2
+
+! Exponents of op1 and op2 are equal (or made so)
+! The mantissas are in r4-r5 and remaining bits in r8
+.L_exp_eq:
+ add r5,r4 ! Add fractions.
+ mov.l .L_sign_bit,r2
+
+ ! Check for negative result
+ mov #0,r0
+ tst r2,r4
+
+ mov.l .L_255,r5
+ bt .L_post_add
+
+ negc r8,r8
+ negc r4,r4
+ or r2,r0
+
+.L_post_add:
+ ! Check for extra MSB
+ mov.l .L_chk_25,r2
+
+ tst r2,r4
+ bt .L_imp_check
+
+ shar r4
+ rotcr r8
+
+ add #1,r3
+ cmp/ge r5,r3
+
+ ! Return Inf if exp > 254
+ bt .L_ret_inf
+
+! Check for implicit (24th) bit in result
+.L_imp_check:
+ mov.l .L_imp_bit,r2
+ tst r2,r4
+
+ bf .L_pack_op1
+
+! Result needs left shift
+.L_lft_shft:
+ shll r8
+ rotcl r4
+
+ add #-1,r3
+ tst r2,r4
+
+ bt .L_lft_shft
+
+! Pack the result after rounding
+.L_pack_op1:
+ ! See if denormalized result is possible
+ mov.l .L_chk_25,r5
+ cmp/pl r3
+
+ bf .L_denorm_res
+
+ ! Are there any bits shifted previously?
+ tst r8,r8
+ bt .L_pack_1
+
+ ! Round
+ shll r8
+ movt r6
+
+ add r6,r4
+
+ ! If we are halfway between two numbers,
+ ! round towards LSB = 0
+ tst r8,r8
+
+ bf .L_pack_1
+
+ shlr r4
+ shll r4
+
+.L_pack_1:
+ ! Adjust extra MSB generated after rounding
+ tst r4,r5
+ mov.l .L_255,r2
+
+ bt .L_pack_2
+ shar r4
+
+ add #1,r3
+ cmp/ge r2,r3 ! Check for exp overflow
+
+ bt .L_ret_inf
+
+! Pack it finally
+.L_pack_2:
+ ! Do not store implicit bit
+ mov.l .L_nimp_bit,r2
+ mov #23,r1
+
+ and r2,r4
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r3)
+#else
+ shld r1,r3
+#endif
+ mov.l @r15+,r8
+
+ or r4,r0
+ rts
+ or r3,r0
+
+! Return infinity
+.L_ret_inf:
+ mov.l .L_pinf,r2
+
+ mov.l @r15+,r8
+ rts
+ or r2,r0
+
+! Result must be denormalized
+.L_denorm_res:
+ mov #0,r2
+
+! Denormalizing loop with rounding
+.L_den_1:
+ shar r4
+ movt r6
+
+ tst r3,r3
+ bt .L_den_2
+
+ ! Increment the exponent
+ add #1,r3
+
+ tst r6,r6
+ bt .L_den_0
+
+ ! Count number of ON bits shifted
+ add #1,r2
+
+.L_den_0:
+ bra .L_den_1
+ nop
+
+! Apply rounding
+.L_den_2:
+ cmp/eq r6,r1
+ bf .L_den_3
+
+ add r6,r4
+ mov #1,r1
+
+ ! If halfway between two numbers,
+ ! round towards LSB = 0
+ cmp/eq r2,r1
+ bf .L_den_3
+
+ shar r4
+ shll r4
+
+.L_den_3:
+
+ mov.l @r15+,r8
+ rts
+ or r4,r0
+
+ .align 2
+.L_imp_bit:
+ .long 0x00800000
+
+.L_nimp_bit:
+ .long 0xFF7FFFFF
+
+.L_mask_fra:
+ .long 0x007FFFFF
+
+.L_pinf:
+ .long 0x7F800000
+
+.L_sign_bit:
+ .long 0x80000000
+
+.L_bit_25:
+ .long 0x01000000
+
+.L_chk_25:
+ .long 0x7F000000
+
+.L_255:
+ .long 0x000000FF
+
+ENDFUNC (GLOBAL (addsf3))
+ENDFUNC (GLOBAL (subsf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/divdf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/divdf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/divdf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/divdf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,598 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!division of two double precision floating point numbers
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:dividend
+!
+!r6,r7:divisor
+!
+!Exit:
+!r0,r1:quotient
+
+!Notes: dividend is passed in regs r4 and r5 and divisor is passed in regs
+!r6 and r7, quotient is returned in regs r0 and r1. dividend is referred as op1
+!and divisor as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (divdf3)
+ FUNC (GLOBAL (divdf3))
+
+GLOBAL (divdf3):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+ mov r6,r1
+ mov r7,r6
+ mov r1,r7
+#endif
+ mov r4,r2
+ mov.l .L_inf,r1
+
+ and r1,r2
+ mov.l r8,@-r15
+
+ cmp/eq r1,r2
+ mov r6,r8
+
+ bt .L_a_inv
+ and r1,r8
+
+ cmp/eq r1,r8
+ mov.l .L_high_mant,r3
+
+ bf .L_chk_zero
+ and r6,r3
+
+ mov.l .L_mask_sign,r8
+ cmp/pl r7
+
+ mov r8,r0
+ bt .L_ret_b !op2=NaN,return op2
+
+ and r4,r8
+ cmp/pl r3
+
+ and r6,r0
+ bt .L_ret_b !op2=NaN,return op2
+
+ xor r8,r0 !op1=normal no,op2=Inf, return Zero
+ mov #0,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_ret_b:
+ mov r7,r1
+ mov r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_a_inv:
+ !chk if op1 is Inf or NaN
+ mov.l .L_high_mant,r2
+ cmp/pl r5
+
+ and r4,r2
+ bt .L_ret_a
+
+ and r1,r8 !r1 contains infinity
+ cmp/pl r2
+
+ bt .L_ret_a
+ cmp/eq r1,r8
+
+ mov r1,DBLRH
+ add DBLRH,DBLRH
+ bf 0f
+ mov #-1,DBLRH ! Inf/Inf, return NaN.
+0: div0s r4,r6
+ mov.l @r15+,r8
+ rts
+ rotcr DBLRH
+
+.L_ret_a:
+ !return op1
+ mov r5,r1
+ mov r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_chk_zero:
+ !chk if op1=0
+ mov.l .L_mask_sign,r0
+ mov r4,r3
+
+ and r0,r3
+ shll r4
+
+ and r6,r0
+ shlr r4
+
+ xor r3,r0
+ shll r6
+
+ shlr r6
+ tst r4,r4
+
+
+ bf .L_op1_not_zero
+ tst r5,r5
+
+ bf .L_op1_not_zero
+ tst r7,r7
+
+ mov.l @r15+,r8
+ bf .L_ret_zero
+
+ tst r6,r6
+ bf .L_ret_zero
+
+ rts
+ mov #-1,DBLRH !op1=op2=0, return NaN
+
+.L_ret_zero:
+ !return zero
+ mov r0,r1
+ rts
+#ifdef __LITTLE__ENDIAN
+ mov #0,r0
+#else
+ mov #0,r1 !op1=0,op2=normal no,return zero
+#endif
+
+.L_norm_b:
+ !normalize op2
+ shll r7
+ mov.l .L_imp_bit,r3
+
+ rotcl r6
+ tst r3,r6
+
+ add #-1,r8
+ bt .L_norm_b
+
+ bra .L_divide
+ add #1,r8
+
+.L_op1_not_zero:
+ !op1!=0, chk if op2=0
+ tst r7,r7
+ mov r1,r3
+
+ mov #0,r1
+ bf .L_normal_nos
+
+ tst r6,r6
+ bf .L_normal_nos
+
+ mov.l @r15+,r8
+ or r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ nop
+
+.L_normal_nos:
+ !op1 and op2 are normal nos
+ tst r2,r2
+ mov #-20,r1
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2
+#else
+ SHLR20 (r2)
+#endif
+ bt .L_norm_a !normalize dividend
+
+.L_chk_b:
+ mov.l r9,@-r15
+ tst r8,r8
+
+ mov.l .L_high_mant,r9
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r8
+#else
+ SHLR20 (r8)
+#endif
+ ! T set -> normalize divisor
+ SL(bt, .L_norm_b,
+ and r9,r4)
+
+.L_divide:
+ mov.l .L_2047,r1
+ sub r8,r2
+
+ mov.l .L_1023,r8
+ and r9,r6
+
+ !resultant exponent
+ add r8,r2
+ !chk the exponent for overflow
+ cmp/ge r1,r2
+
+ mov.l .L_imp_bit,r1
+ bt .L_overflow
+
+ mov #0,r8
+ or r1,r4
+
+ or r1,r6
+ mov #-24,r3
+
+ !chk if the divisor is 1(mantissa only)
+ cmp/eq r8,r7
+ bf .L_div2
+
+ cmp/eq r6,r1
+ bt .L_den_one
+
+.L_div2:
+ !divide the mantissas
+ shll8 r4
+ mov r5,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r9
+#else
+ SHLR24 (r9)
+#endif
+ shll8 r6
+
+ or r9,r4
+ shll8 r5
+
+ mov r7,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r9
+#else
+ SHLR24 (r9)
+#endif
+ mov r8,r3
+ shll8 r7
+
+ or r9,r6
+ cmp/gt r4,r6
+
+ mov r3,r9
+ bt .L_shift
+
+ cmp/eq r4,r6
+ bf .L_loop
+
+ cmp/gt r5,r7
+ bf .L_loop
+
+.L_shift:
+ add #-1,r2
+ shll r5
+ rotcl r4
+
+.L_loop:
+ !actual division loop
+ cmp/gt r6,r4
+ bt .L_subtract
+
+ cmp/eq r6,r4
+ bf .L_skip
+
+ cmp/ge r7,r5
+ bf .L_skip
+
+.L_subtract:
+ clrt
+ subc r7,r5
+
+ or r1,r8
+ subc r6,r4
+
+.L_skip:
+ shlr r1
+ shll r5
+
+ rotcl r4
+ cmp/eq r1,r3
+
+ bf .L_loop
+ mov.l .L_imp_bit,r1
+
+ !chk if the divison was for the higher word of the quotient
+ tst r1,r9
+ bf .L_chk_exp
+
+ mov r8,r9
+ mov.l .L_mask_sign,r1
+
+ !divide for the lower word of the quotient
+ bra .L_loop
+ mov r3,r8
+
+.L_chk_exp:
+ !chk if the result needs to be denormalized
+ cmp/gt r2,r3
+ bf .L_round
+ mov #-53,r7
+
+.L_underflow:
+ !denormalize the result
+ add #1,r2
+ cmp/gt r2,r7
+
+ or r4,r5 !remainder
+ add #-2,r2
+
+ mov #32,r4
+ bt .L_return_zero
+
+ add r2,r4
+ cmp/ge r3,r4
+
+ mov r2,r7
+ mov r3,r1
+
+ mov #-54,r2
+ bt .L_denorm
+ mov #-32,r7
+
+.L_denorm:
+ shlr r8
+ rotcr r1
+
+ shll r8
+ add #1,r7
+
+ shlr r9
+ rotcr r8
+
+ cmp/eq r3,r7
+ bf .L_denorm
+
+ mov r4,r7
+ cmp/eq r2,r4
+
+ bt .L_break
+ mov r3,r6
+
+ cmp/gt r7,r3
+ bf .L_break
+
+ mov r2,r4
+ mov r1,r6
+
+ mov r3,r1
+ bt .L_denorm
+
+.L_break:
+ mov #0,r2
+
+ cmp/gt r1,r2
+
+ addc r2,r8
+ mov.l .L_comp_1,r4
+
+ addc r3,r9
+ or r9,r0
+
+ cmp/eq r5,r3
+ bf .L_return
+
+ cmp/eq r3,r6
+ mov.l .L_mask_sign,r7
+
+ bf .L_return
+ cmp/eq r7,r1
+
+ bf .L_return
+ and r4,r8
+
+.L_return:
+ mov.l @r15+,r9
+ mov r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_norm_a:
+ !normalize op1
+ shll r5
+ mov.l .L_imp_bit,r3
+
+ rotcl r4
+ tst r3,r4
+
+ add #-1,r2
+ bt .L_norm_a
+
+ bra .L_chk_b
+ add #1,r2
+
+.L_overflow:
+ !overflow, return inf
+ mov.l .L_inf,r2
+#ifdef __LITTLE_ENDIAN__
+ or r2,r1
+ mov #0,r0
+#else
+ or r2,r0
+ mov #0,r1
+#endif
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+.L_den_one:
+ !denominator=1, result=numerator
+ mov r4,r9
+ mov #-53,r7
+
+ cmp/ge r2,r8
+ mov r8,r4
+
+ mov r5,r8
+ mov r4,r3
+
+ !chk the exponent for underflow
+ SL(bt, .L_underflow,
+ mov r4,r5)
+
+ mov.l .L_high_mant,r7
+ bra .L_pack
+ mov #20,r6
+
+.L_return_zero:
+ !return zero
+ mov r3,r1
+ mov.l @r15+,r9
+
+ rts
+ mov.l @r15+,r8
+
+.L_round:
+ !apply rounding
+ cmp/eq r4,r6
+ bt .L_lower
+
+ clrt
+ subc r6,r4
+
+ bra .L_rounding
+ mov r4,r6
+
+.L_lower:
+ clrt
+ subc r7,r5
+ mov r5,r6
+
+.L_rounding:
+ !apply rounding
+ mov.l .L_invert,r1
+ mov r3,r4
+
+ movt r3
+ clrt
+
+ not r3,r3
+ and r1,r3
+
+ addc r3,r8
+ mov.l .L_high_mant,r7
+
+ addc r4,r9
+ cmp/eq r4,r6
+
+ mov.l .L_comp_1,r3
+ SL (bf, .L_pack,
+ mov #20,r6)
+ and r3,r8
+
+.L_pack:
+ !pack the result, r2=exponent,r0=sign,r8=lower mantissa, r9=higher mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ SHLL20 (r2)
+#endif
+ and r7,r9
+
+ or r2,r0
+ mov r8,r1
+
+ or r9,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+ .align 2
+
+.L_mask_sign:
+ .long 0x80000000
+.L_high_mant:
+ .long 0x000fffff
+.L_inf:
+ .long 0x7ff00000
+.L_1023:
+ .long 1023
+.L_2047:
+ .long 2047
+.L_imp_bit:
+ .long 0x00100000
+.L_comp_1:
+ .long 0xfffffffe
+.L_invert:
+ .long 0x00000001
+
+ENDFUNC (GLOBAL (divdf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/divsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/divsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/divsf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/divsf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,404 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!divides two single precision floating point
+
+! Author: Aanchal Khanna
+
+! Arguments: Dividend is in r4, divisor in r5
+! Result: r0
+
+! r4 and r5 are referred as op1 and op2 resp.
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (divsf3)
+ FUNC (GLOBAL (divsf3))
+
+GLOBAL (divsf3):
+ mov.l .L_mask_sign,r1
+ mov r4,r3
+
+ xor r5,r3
+ shll r4
+
+ shlr r4
+ mov.l .L_inf,r2
+
+ and r3,r1 !r1=resultant sign
+ mov r4,r6
+
+ shll r5
+ mov #0,r0
+
+ shlr r5
+ and r2,r6
+
+ cmp/eq r2,r6
+ mov r5,r7
+
+ and r2,r7
+ bt .L_op1_inv
+
+ cmp/eq r2,r7
+ mov #-23,r3
+
+ bt .L_op2_inv
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+ SHLR23 (r7)
+#else
+ shld r3,r6
+ shld r3,r7
+#endif
+
+ cmp/eq r0,r4
+
+ bt .L_op1_zero !dividend=0
+ cmp/eq r0,r6
+
+ mov.l .L_imp_bit,r3
+ bt .L_norm_op1 !normalize dividend
+.L_chk_op2:
+ cmp/eq r0,r5
+ bt .L_op2_zero !divisor=0
+
+ cmp/eq r0,r7
+ bt .L_norm_op2 !normalize divisor
+
+.L_div1:
+ sub r7,r6
+ add #127,r6 !r6=resultant exponent
+
+ mov r3,r7
+ mov.l .L_mask_mant,r3
+
+ and r3,r4
+ !chk exponent for overflow
+ mov.l .L_255,r2
+
+ and r3,r5
+ or r7,r4
+
+ cmp/ge r2,r6
+ or r7,r5
+
+ bt .L_return_inf
+ mov r0,r2
+
+ cmp/eq r4,r5
+ bf .L_den_one
+
+ cmp/ge r6,r0
+ !numerator=denominator, quotient=1, remainder=0
+ mov r7,r2
+
+ mov r0,r4
+ !chk exponent for underflow
+ bt .L_underflow
+ bra .L_pack
+ nop
+
+.L_den_one:
+ !denominator=1, result=numerator
+
+ cmp/eq r7,r5
+ bf .L_divide
+
+ !chk exponent for underflow
+ cmp/ge r6,r0
+ mov r4,r2
+
+ SL(bt, .L_underflow,
+ mov r0,r4)
+ bra .L_pack
+ nop
+
+.L_divide:
+ !dividing the mantissas r4<-dividend, r5<-divisor
+
+ cmp/hi r4,r5
+ bf .L_loop
+
+ shll r4 ! if mantissa(op1)< mantissa(op2)
+ add #-1,r6 ! shift left the numerator and decrease the exponent.
+
+.L_loop:
+ !division loop
+
+ cmp/ge r5,r4
+ bf .L_skip
+
+ or r7,r2
+ sub r5,r4
+
+.L_skip:
+ shlr r7
+ shll r4
+
+ cmp/eq r0,r7
+ bf .L_loop
+
+ !chk the exponent for underflow
+ cmp/ge r6,r0
+ bt .L_underflow
+
+ !apply rounding
+ cmp/gt r5,r4
+ bt .L_round1
+
+ cmp/eq r4,r5
+ bt .L_round2
+
+.L_pack:
+ !pack the result, r1=sign, r2=quotient, r6=exponent
+
+ mov #23,r4
+ and r3,r2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r4,r6
+#endif
+ or r2,r1
+
+ or r6,r1
+ mov r1,r0
+
+ rts
+ nop
+
+.L_round1:
+ !Apply proper rounding
+
+ bra .L_pack
+ add #1,r2
+
+.L_round2:
+ !Apply proper rounding
+
+ mov.l .L_comp_1,r5
+ bra .L_pack
+ and r5,r2
+
+.L_op1_inv:
+ !chk if op1 is Inf or NaN
+
+ mov.l .L_mask_mant,r3
+ mov r4,r6
+
+ and r3,r6
+ cmp/hi r0,r6
+
+ bt .L_ret_op1
+ cmp/eq r2,r7
+
+ SL(bf, .L_ret_op1,
+ mov r1,r0)
+
+ rts
+ mov #-1,r0 ! 0/0, return NaN
+
+.L_op2_inv:
+ !chk if op2 is Inf or NaN
+
+ mov.l .L_mask_mant,r3
+ mov r5,r7
+
+ and r3,r7
+ cmp/hi r0,r7
+
+ bt .L_ret_op2
+ mov r1,r0
+
+ rts
+ nop
+
+.L_op1_zero:
+ !op1 is zero. If op2 is zero, return NaN, else return zero
+
+ cmp/eq r0,r5
+
+ bf .L_ret_op1
+
+ rts
+ mov #-1,r0
+
+.L_op2_zero:
+ !B is zero,return Inf
+
+ rts
+ or r2,r0
+
+.L_return_inf:
+ mov.l .L_inf,r0
+
+ rts
+ or r1,r0
+
+.L_norm_op1:
+ !normalize dividend
+
+ shll r4
+ tst r2,r4
+
+ add #-1,r6
+ bt .L_norm_op1
+
+ bra .L_chk_op2
+ add #1,r6
+
+.L_norm_op2:
+ !normalize divisor
+
+ shll r5
+ tst r2,r5
+
+ add #-1,r7
+ bt .L_norm_op2
+
+ bra .L_div1
+ add #1,r7
+
+.L_underflow:
+ !denormalize the result
+
+ add #1,r6
+ mov #-24,r7
+
+ cmp/gt r6,r7
+ mov r2,r5
+
+ bt .L_return_zero
+ add #-1,r6
+
+ mov #32,r3
+ neg r6,r7
+
+ add #1,r7
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ cmp/ge r0,r6
+ bf .L_mov_right
+
+.L_mov_left:
+ cmp/eq r0,r6
+ bt .L_out
+
+ shll r2
+ bra .L_mov_left
+ add #-1,r6
+
+.L_mov_right:
+ cmp/eq r0,r6
+ bt .L_out
+
+ add #1,r6
+ bra .L_mov_right
+ shlr r2
+
+.L_out:
+#endif
+ sub r7,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r5
+#else
+ cmp/ge r0,r3
+ bf .L_mov_right_1
+
+.L_mov_left_1:
+ shll r5
+ add #-1,r3
+
+ cmp/eq r0,r3
+ bf .L_mov_left_1
+
+ bt .L_out_1
+
+.L_mov_right_1:
+ cmp/eq r0,r3
+ bt .L_out_1
+
+ add #1,r3
+ bra .L_mov_right_1
+ shlr r5
+
+.L_out_1:
+#endif
+ shlr r2
+ addc r0,r2
+
+ cmp/eq r4,r0 !r4 contains the remainder
+ mov r2,r0
+
+ mov.l .L_mask_sign,r7
+ bf .L_return
+
+ mov.l .L_comp_1,r2
+ cmp/eq r7,r5
+
+ bf .L_return
+ and r2,r0
+
+.L_return:
+ rts
+ or r1,r0
+
+.L_ret_op1:
+ rts
+ or r4,r0
+
+.L_ret_op2:
+ rts
+ or r5,r0
+
+.L_return_zero:
+ rts
+ or r1,r0
+
+
+
+ .align 2
+.L_inf:
+ .long 0x7f800000
+.L_mask_sign:
+ .long 0x80000000
+.L_mask_mant:
+ .long 0x007fffff
+.L_imp_bit:
+ .long 0x00800000
+.L_comp_1:
+ .long 0xfffffffe
+.L_255:
+ .long 255
+
+ENDFUNC (GLOBAL (divsf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/fixdfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/fixdfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/fixdfsi.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/fixdfsi.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,200 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to signed integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixdfsi)
+ FUNC (GLOBAL (fixdfsi))
+
+GLOBAL (fixdfsi):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+#endif
+ mov.l .L_p_inf,r2
+ mov #-20,r1
+
+ mov r2,r7
+ mov.l .L_1023,r3
+
+ and r4,r2
+ shll r4
+
+ movt r6 ! r6 contains the sign bit
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2 ! r2 contains the exponent
+#else
+ SHLR20 (r2)
+#endif
+ shlr r4
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r7
+#else
+ SHLR20 (r7)
+#endif
+ cmp/hi r2,r3 ! if exp < 1023,return 0
+ mov.l .L_mask_high_mant,r1
+
+ SL(bt, .L_epil,
+ mov #0,r0)
+ and r4,r1 ! r1 contains high mantissa
+
+ cmp/eq r2,r7 ! chk if exp is invalid
+ mov.l .L_1053,r7
+
+ bt .L_inv_exp
+ mov #11,r0
+
+ cmp/hi r7,r2 ! If exp > 1053,return maxint
+ sub r2,r7
+
+ mov.l .L_21bit,r2
+ SL(bt, .L_ret_max,
+ add #1,r7) ! r7 contains the number of shifts
+
+ or r2,r1
+ mov r7,r3
+ shll8 r1
+
+ neg r7,r7
+ shll2 r1
+
+ shll r1
+ cmp/hi r3,r0
+
+ !chk if the result can be made only from higher mantissa
+ SL(bt, .L_lower_mantissa,
+ mov #21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_loop:
+ tst r7,r7
+ bt .L_break1
+ add #1,r7
+ bra .L_loop
+ shlr r1
+
+.L_break1:
+#endif
+ tst r6,r6
+ SL(bt, .L_epil,
+ mov r1,r0)
+
+ rts
+ neg r0,r0
+
+.L_lower_mantissa:
+ !result is made from lower mantissa also
+ neg r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r0,r5
+#else
+ SHLR21 (r5)
+#endif
+
+ or r5,r1 !pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_sh_loop:
+ tst r7,r7
+ bt .L_break
+ add #1,r7
+ bra .L_sh_loop
+ shlr r1
+
+.L_break:
+#endif
+ mov r1,r0
+ bra .L_chk_sign
+ nop
+
+.L_epil:
+ rts
+ nop
+
+.L_inv_exp:
+ cmp/hi r0,r5
+ bt .L_epil
+
+ cmp/hi r0,r1 !compare high mantissa,r1
+ bt .L_epil
+
+.L_ret_max:
+ mov.l .L_maxint,r0
+ tst r6,r6
+ bt .L_epil
+
+ rts
+ add #1,r0
+
+.L_chk_sign:
+ tst r6,r6 !sign bit is set, number is -ve
+ bt .L_epil
+
+ rts
+ neg r0,r0
+
+ .align 2
+
+.L_maxint:
+ .long 0x7fffffff
+.L_p_inf:
+ .long 0x7ff00000
+.L_mask_high_mant:
+ .long 0x000fffff
+.L_1023:
+ .long 0x000003ff
+.L_1053:
+ .long 1053
+.L_21bit:
+ .long 0x00100000
+
+ENDFUNC (GLOBAL (fixdfsi))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/fixsfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/fixsfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/fixsfsi.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/fixsfsi.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,165 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion routine for float to integer
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 (in floating point format)
+! Return: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixsfsi)
+ FUNC (GLOBAL (fixsfsi))
+
+GLOBAL (fixsfsi):
+ mov.l .L_mask_sign,r7
+ mov r4,r2
+
+ ! Check for NaN
+ mov.l .L_inf,r1
+ and r7,r2
+
+ cmp/gt r1,r2
+ mov #127,r5
+
+ mov r4,r3
+ SL(bt, .L_epil,
+ mov #0,r0)
+
+ shll r2
+ mov.l .L_frac,r6
+
+ shlr16 r2
+ and r6,r3 ! r3 has fraction
+
+ shlr8 r2 ! r2 has exponent
+ mov.l .L_24bit,r1
+
+ ! If exponent is less than 127, return 0
+ cmp/gt r2,r5
+ or r1,r3 ! Set the implicit bit
+
+ mov.l .L_157,r1
+ SL1(bt, .L_epil,
+ shll8 r3)
+
+ ! If exponent is greater than 157,
+ ! return the maximum/minumum integer
+ ! value deducing from sign
+ cmp/gt r1,r2
+ sub r2,r1
+
+ mov.l .L_sign,r2
+ SL(bt, .L_ret_max,
+ add #1,r1)
+
+ and r4,r2 ! Sign in r2
+ neg r1,r1
+
+ ! Shift mantissa by exponent difference from 157
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r3
+#else
+ cmp/gt r0,r1
+ bt .L_mov_left
+
+.L_mov_right:
+ cmp/eq r1,r0
+ bt .L_ret
+
+ add #1,r1
+ bra .L_mov_right
+
+ shlr r3
+
+.L_mov_left:
+ add #-1,r1
+
+ shll r3
+ cmp/eq r1,r0
+
+ bf .L_mov_left
+.L_ret:
+#endif
+ ! If op1 is negative, negate the result
+ cmp/eq r0,r2
+ SL(bf, .L_negate,
+ mov r3,r0)
+
+! r0 has the appropriate value
+.L_epil:
+ rts
+ nop
+
+! Return the max/min integer value
+.L_ret_max:
+ and r4,r2 ! Sign in r2
+ mov.l .L_max,r3
+
+ mov.l .L_sign,r1
+ cmp/eq r0,r2
+
+ mov r3,r0
+ bt .L_epil
+
+ ! Negative number, return min int
+ rts
+ mov r1,r0
+
+! Negate the result
+.L_negate:
+ rts
+ neg r0,r0
+
+ .align 2
+.L_inf:
+ .long 0x7F800000
+
+.L_157:
+ .long 157
+
+.L_max:
+ .long 0x7FFFFFFF
+
+.L_frac:
+ .long 0x007FFFFF
+
+.L_sign:
+ .long 0x80000000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_mask_sign:
+ .long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixsfsi))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/fixunsdfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/fixunsdfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/fixunsdfsi.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/fixunsdfsi.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,181 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to unsigned integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixunsdfsi)
+ FUNC (GLOBAL (fixunsdfsi))
+
+GLOBAL (fixunsdfsi):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+#endif
+ mov.l .L_p_inf,r2
+ mov #-20,r1
+
+ mov r2,r7
+ mov.l .L_1023,r3
+
+ and r4,r2
+ shll r4
+
+ movt r6 ! r6 contains the sign bit
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2 ! r2 contains the exponent
+#else
+ SHLR20 (r2)
+#endif
+ shlr r4
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r7
+#else
+ SHLR20 (r7)
+#endif
+ tst r6,r6
+ SL(bf, .L_epil,
+ mov #0,r0)
+
+ cmp/hi r2,r3 ! if exp < 1023,return 0
+ mov.l .L_high_mant,r1
+
+ SL(bt, .L_epil,
+ and r4,r1) ! r1 contains high mantissa
+
+ cmp/eq r2,r7 ! chk if exp is invalid
+ mov.l .L_1054,r7
+
+ bt .L_inv_exp
+ mov #11,r0
+
+ cmp/hi r7,r2 ! If exp > 1054,return maxint
+ sub r2,r7 !r7 contains the number of shifts
+
+ mov.l .L_21bit,r2
+ bt .L_ret_max
+
+ or r2,r1
+ mov r7,r3
+
+ shll8 r1
+ neg r7,r7
+
+ shll2 r1
+
+ shll r1
+ cmp/hi r3,r0
+
+ SL(bt, .L_lower_mant,
+ mov #21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_sh_loop:
+ tst r7,r7
+ bt .L_break
+ add #1,r7
+ bra .L_sh_loop
+ shlr r1
+
+.L_break:
+#endif
+ rts
+ mov r1,r0
+
+.L_lower_mant:
+ neg r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r0,r5
+#else
+ SHLR21 (r5)
+#endif
+ or r5,r1 !pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_loop:
+ tst r7,r7
+ bt .L_break1
+ add #1,r7
+ bra .L_loop
+ shlr r1
+
+.L_break1:
+#endif
+ mov r1,r0
+.L_epil:
+ rts
+ nop
+
+.L_inv_exp:
+ cmp/hi r0,r5
+ bt .L_epil
+
+ cmp/hi r0,r1 !compare high mantissa,r1
+ bt .L_epil
+
+.L_ret_max:
+ mov.l .L_maxint,r0
+
+ rts
+ nop
+
+ .align 2
+
+.L_maxint:
+ .long 0xffffffff
+.L_p_inf:
+ .long 0x7ff00000
+.L_high_mant:
+ .long 0x000fffff
+.L_1023:
+ .long 0x000003ff
+.L_1054:
+ .long 1054
+.L_21bit:
+ .long 0x00100000
+
+ENDFUNC (GLOBAL (fixunsdfsi))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/fixunssfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/fixunssfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/fixunssfsi.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/fixunssfsi.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,155 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion from floating point to unsigned integer
+
+! Author: Rakesh Kumar
+
+! Argument: r4 (in floating point format)
+! Result: r0
+
+! For negative floating point numbers, it returns zero
+
+! The argument is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixunssfsi)
+ FUNC (GLOBAL (fixunssfsi))
+
+GLOBAL (fixunssfsi):
+ mov.l .L_sign,r0
+ mov r4,r2
+
+ ! Check for NaN
+ mov.l .L_inf,r1
+ and r4,r0
+
+ mov.l .L_mask_sign,r7
+ mov #127,r5
+
+ ! Remove sign bit
+ cmp/eq #0,r0
+ and r7,r2
+
+ ! If number is negative, return 0
+ ! LIBGCC deviates from standard in this regard.
+ mov r4,r3
+ SL(bf, .L_epil,
+ mov #0,r0)
+
+ mov.l .L_frac,r6
+ cmp/gt r1,r2
+
+ shll r2
+ SL1(bt, .L_epil,
+ shlr16 r2)
+
+ shlr8 r2 ! r2 has exponent
+ mov.l .L_24bit,r1
+
+ and r6,r3 ! r3 has fraction
+ cmp/gt r2,r5
+
+ ! If exponent is less than 127, return 0
+ or r1,r3
+ bt .L_epil
+
+ ! Process only if exponent is less than 158
+ mov.l .L_158,r1
+ shll8 r3
+
+ cmp/gt r1,r2
+ sub r2,r1
+
+ neg r1,r1
+ bt .L_ret_max
+
+! Shift the mantissa with exponent difference from 158
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r3
+#else
+ cmp/gt r0,r1
+ bt .L_mov_left
+
+.L_mov_right:
+ cmp/eq r1,r0
+ bt .L_ret
+
+ add #1,r1
+ bra .L_mov_right
+ shlr r3
+
+.L_mov_left:
+ add #-1,r1
+
+ shll r3
+ cmp/eq r1,r0
+
+ bf .L_mov_left
+
+.L_ret:
+#endif
+ rts
+ mov r3,r0
+
+! r0 already has appropriate value
+.L_epil:
+ rts
+ nop
+
+! Return the maximum unsigned integer value
+.L_ret_max:
+ mov.l .L_max,r3
+
+ rts
+ mov r3,r0
+
+ .align 2
+.L_inf:
+ .long 0x7F800000
+
+.L_158:
+ .long 158
+
+.L_max:
+ .long 0xFFFFFFFF
+
+.L_frac:
+ .long 0x007FFFFF
+
+.L_sign:
+ .long 0x80000000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_mask_sign:
+ .long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixunssfsi))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/floatsidf.S gcc-4.5.0/gcc/config/sh/IEEE-754/floatsidf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/floatsidf.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/floatsidf.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,151 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of signed integer to double precision floating point number
+!Author:Rakesh Kumar
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatsidf)
+ FUNC (GLOBAL (floatsidf))
+
+GLOBAL (floatsidf):
+ mov.l .L_sign,r0
+ mov #0,r1
+
+ mov r0,r2
+ tst r4,r4 ! check r4 for zero
+
+ ! Extract the sign
+ mov r2,r3
+ SL(bt, .L_ret_zero,
+ and r4,r0)
+
+ cmp/eq r1,r0
+ not r3,r3
+
+ mov r1,r7
+ SL(bt, .L_loop,
+ and r4,r3)
+
+ ! Treat -2147483648 as special case
+ cmp/eq r1,r3
+ neg r4,r4
+
+ bt .L_ret_min
+
+.L_loop:
+ shll r4
+ mov r4,r5
+
+ and r2,r5
+ cmp/eq r1,r5
+
+ add #1,r7
+ bt .L_loop
+
+ mov.l .L_initial_exp,r6
+ not r2,r2
+
+ and r2,r4
+ mov #21,r3
+
+ sub r7,r6
+ mov r4,r1
+
+ mov #20,r7
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r1
+#else
+ SHLL21 (r1)
+#endif
+ mov #-11,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r6 ! Exponent in proper place
+#else
+ SHLL20 (r6)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r4
+#else
+ SHLR11 (r4)
+#endif
+ or r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+#ifdef __LITTLE_ENDIAN__
+ or r4,r1
+#else
+ or r4,r0
+#endif
+
+.L_ret_zero:
+ rts
+ mov #0,r0
+
+.L_ret_min:
+ mov.l .L_min,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ nop
+
+ .align 2
+
+.L_initial_exp:
+ .long 0x0000041E
+
+.L_sign:
+ .long 0x80000000
+
+.L_min:
+ .long 0xC1E00000
+
+ENDFUNC (GLOBAL (floatsidf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/floatsisf.S gcc-4.5.0/gcc/config/sh/IEEE-754/floatsisf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/floatsisf.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/floatsisf.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,200 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatsisf)
+ FUNC (GLOBAL (floatsisf))
+
+GLOBAL (floatsisf):
+ mov.l .L_sign,r2
+ mov #23,r6
+
+ ! Check for zero
+ tst r4,r4
+ mov.l .L_24_bits,r7
+
+ ! Extract sign
+ and r4,r2
+ bt .L_ret
+
+ ! Negative ???
+ mov.l .L_imp_bit,r5
+ cmp/pl r4
+
+ not r7,r3
+ bf .L_neg
+
+ ! Decide the direction for shifting
+ cmp/gt r7,r4
+ mov r4,r0
+
+ and r5,r0
+ bt .L_shr_0
+
+ ! Number may already be in normalized form
+ cmp/eq #0,r0
+ bf .L_pack
+
+! Shift the bits to the left. Adjust the exponent
+.L_shl:
+ shll r4
+ mov r4,r0
+
+ and r5,r0
+ cmp/eq #0,r0
+
+ SL(bt, .L_shl,
+ add #-1,r6)
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa, r2 has sign
+.L_pack:
+ mov #23,r3
+ not r5,r5
+
+ mov r2,r0
+ add #127,r6
+
+ and r5,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+
+ or r6,r0
+ rts
+ or r4,r0
+
+! Negate the number
+.L_neg:
+ ! Take care for -2147483648.
+ mov r4,r0
+ shll r0
+
+ cmp/eq #0,r0
+ SL(bt, .L_ret_min,
+ neg r4,r4)
+
+ cmp/gt r7,r4
+ bt .L_shr_0
+
+ mov r4,r0
+ and r5,r0
+
+ cmp/eq #0,r0
+ bf .L_pack
+ bt .L_shl
+
+.L_shr_0:
+ mov #0,r1
+
+! Shift right the number with rounding
+.L_shr:
+ shlr r4
+ movt r7
+
+ tst r7,r7
+
+ ! Count number of ON bits shifted
+ bt .L_shr_1
+ add #1,r1
+
+.L_shr_1:
+ mov r4,r0
+ add #1,r6
+
+ and r3,r0
+ cmp/eq #0,r0
+
+ ! Add MSB of shifted bits
+ bf .L_shr
+ add r7,r4
+
+ tst r7,r7
+ bt .L_pack
+
+.L_pack1:
+ mov #1,r0
+ cmp/eq r1,r0
+
+ bt .L_rnd
+ mov r4,r0
+
+ ! Rounding may have misplaced MSB. Adjust.
+ and r3,r0
+ cmp/eq #0,r0
+
+ bf .L_shr
+ bt .L_pack
+
+! If only MSB of shifted bits is ON, we are halfway
+! between two numbers. Round towards even LSB of
+! resultant mantissa.
+.L_rnd:
+ shlr r4
+ bra .L_pack
+ shll r4
+
+.L_ret:
+ rts
+ mov r4,r0
+
+! Return value for -2147483648
+.L_ret_min:
+ mov.l .L_min_val,r0
+ rts
+ nop
+
+ .align 2
+.L_sign:
+ .long 0x80000000
+
+.L_imp_bit:
+ .long 0x00800000
+
+.L_24_bits:
+ .long 0x00FFFFFF
+
+.L_nsign:
+ .long 0x7FFFFFFF
+
+.L_min_val:
+ .long 0xCF000000
+
+ENDFUNC (GLOBAL (floatsisf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssidf.S gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssidf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssidf.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssidf.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,76 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of unsigned integer to double precision floating point number
+!Author:Rakesh Kumar
+!Rewritten for SH1 support: Joern Rennecke
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatunsidf)
+ FUNC (GLOBAL (floatunsidf))
+
+GLOBAL (floatunsidf):
+ mov.w LOCAL(x41f0),DBLRH ! bias + 32
+ tst r4,r4 ! check for zero
+ bt .L_ret_zero
+.L_loop:
+ shll r4
+ SL(bf, .L_loop,
+ add #-16,DBLRH)
+
+ mov r4,DBLRL
+
+ SHLL20 (DBLRL)
+
+ shll16 DBLRH ! put exponent in proper place
+
+ SHLR12 (r4)
+
+ rts
+ or r4,DBLRH
+
+.L_ret_zero:
+ mov #0,r1
+ rts
+ mov #0,r0
+
+LOCAL(x41f0): .word 0x41f0
+ .align 2
+
+ENDFUNC (GLOBAL (floatunsidf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssisf.S gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssisf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssisf.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssisf.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,137 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of unsigned integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatunsisf)
+ FUNC (GLOBAL (floatunsisf))
+
+GLOBAL (floatunsisf):
+ tst r4,r4
+ mov #23,r6
+
+ mov.l .L_set_24_bits,r7
+ SL(bt, .L_return,
+ not r7,r3)
+
+ ! Decide the direction for shifting
+ mov.l .L_set_24_bit,r5
+ cmp/hi r7,r4
+
+ not r5,r2
+ SL(bt, .L_shift_right,
+ mov #0,r7)
+
+ tst r5,r4
+
+ mov #0,r0
+ bf .L_pack_sf
+
+! Shift the bits to the left. Adjust the exponent
+.L_shift_left:
+ shll r4
+ tst r5,r4
+
+ add #-1,r6
+ bt .L_shift_left
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa
+.L_pack_sf:
+ mov #23,r3
+ add #127,r6
+
+ ! Align the exponent
+ and r2,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+
+ or r6,r0
+ rts
+ or r4,r0
+
+! Shift right the number with rounding
+.L_shift_right:
+ shlr r4
+ rotcr r7
+
+ tst r4,r3
+ add #1,r6
+
+ bf .L_shift_right
+
+ tst r7,r7
+ bt .L_sh_rt_1
+
+ shll r7
+ movt r1
+
+ add r1,r4
+
+ tst r7,r7
+ bf .L_sh_rt_1
+
+ ! Halfway between two numbers.
+ ! Round towards LSB = 0
+ shlr r4
+ shll r4
+
+.L_sh_rt_1:
+ mov r4,r0
+
+ ! Rounding may have misplaced MSB. Adjust.
+ and r3,r0
+ cmp/eq #0,r0
+
+ bf .L_shift_right
+ bt .L_pack_sf
+
+.L_return:
+ rts
+ mov r4,r0
+
+ .align 2
+.L_set_24_bit:
+ .long 0x00800000
+
+.L_set_24_bits:
+ .long 0x00FFFFFF
+
+ENDFUNC (GLOBAL (floatunsisf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/adddf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/adddf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/adddf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/adddf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,587 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! adddf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4-200 without FPU, but can also be used for SH3.
+! Numbers with same sign are added in typically 37 cycles, worst case is
+! 43 cycles, unless there is an overflow, in which case the addition can
+! take up to takes 47 cycles.
+! Normal numbers with different sign are added in 56 (57 for PIC) cycles
+! or less on SH4.
+! If one of the inputs is a denormal, the worst case is 59 (60 for PIC)
+! cycles. (Two denormal inputs are faster than normal inputs, and
+! denormal outputs don't slow down computation).
+! Subtraction takes two cycles to negate the second input and then drops
+! through to addition.
+
+/* If the input exponents of a difference of two normalized numbers
+ differ by more than one, the output does not need to be adjusted
+ by more than one bit position. Hence, it makes sense to ensure that
+ the shifts by 0 & 1 are handled quickly to reduce average and worst
+ case times. */
+FUNC(GLOBAL(adddf3))
+FUNC(GLOBAL(subdf3))
+ .global GLOBAL(adddf3)
+ .global GLOBAL(subdf3)
+LOCAL(denorm_arg1):
+ bt LOCAL(inf_nan_arg0)
+ tst r0,r2
+ bt/s LOCAL(denorm_both)
+ shlr r1
+ mov.l LOCAL(x00100000),r3
+ bra LOCAL(denorm_arg1_done)
+ sub r2,r3
+
+! Handle denorm addition here because otherwise the ordinary addition would
+! have to check for denormal results.
+! Denormal subtraction could also be done faster, but the denorm subtraction
+! path here is still one cycles faster than the one for normalized input
+! numbers, and 16 instructions shorter than the fastest version.
+! Here we also generate +0.0 + +0.0 -> +0.0 ; -0.0 + -0.0 -> -0.0
+LOCAL(denorm_both):
+ div0s DBL0H,DBL1H
+ mov.l LOCAL(x800fffff),r9
+ bt/s LOCAL(denorm_sub)
+ and r1,DBL1H
+ and r9,DBL0H
+ mov.l @r15+,r9
+ mov DBL0L,DBLRL
+ mov DBL0H,DBLRH
+ addc DBL1L,DBLRL
+ mov.l @r15+,r8
+ rts
+ addc DBL1H,DBLRH
+
+! N.B., since subtraction also generates +0.0 for subtraction of numbers
+! with identical fractions, this also covers the +0.0 + -0.0 -> +0.0 /
+! -0.0 + +0.0 -> +0.0 cases.
+LOCAL(denorm_sub):
+ mov DBL0H,r8 ! tentative result sign
+ and r1,DBL0H
+ bra LOCAL(sub_same_exp)
+ addc r1,r2 ! exponent++, clear T
+
+LOCAL(inf_nan_arg0):
+ mov DBL0L,DBLRL
+ bra LOCAL(pop_r8_r9)
+ mov DBL0H,DBLRH
+
+LOCAL(ret_arg0):
+ mov.l LOCAL(x800fffff),DBLRH
+ mov DBL0L,DBLRL
+ mov r2,r3
+LOCAL(ret_arg):
+ mov.l @r15+,r9
+ and r8,DBLRH
+ mov.l @r15+,r8
+ rts
+ or r3,DBLRH
+
+ .balign 4
+GLOBAL(subdf3):
+ cmp/pz DBL1H
+ add DBL1H,DBL1H
+ rotcr DBL1H
+ nop
+
+GLOBAL(adddf3):
+ mov.l LOCAL(x7ff00000),r0
+ mov DBL0H,r2
+ mov.l LOCAL(x001fffff),r1
+ mov DBL1H,r3
+ mov.l r8,@-r15
+ and r0,r2
+ mov.l r9,@-r15
+ and r0,r3
+ cmp/hi r2,r3
+ or r0,DBL0H
+ or r0,DBL1H
+ bt LOCAL(arg1_gt)
+ tst r0,r3
+ mov #-20,r9
+ bt/s LOCAL(denorm_arg1)
+ cmp/hs r0,r2
+ bt LOCAL(inf_nan_arg0)
+ sub r2,r3
+LOCAL(denorm_arg1_done): ! r2 is tentative result exponent
+ shad r9,r3
+ mov.w LOCAL(m32),r9
+ mov DBL0H,r8 ! tentative result sign
+ and r1,DBL0H ! arg0 fraction
+ mov DBL1H,r0 ! the 'other' sign
+ and r1,DBL1H ! arg1 fraction
+ cmp/ge r9,r3
+ mov DBL1H,r1
+ bf/s LOCAL(large_shift_arg1)
+ shld r3,DBL1H
+LOCAL(small_shift_arg1):
+ mov DBL1L,r9
+ shld r3,DBL1L
+ tst r3,r3
+ add #32,r3
+ bt/s LOCAL(same_exp)
+ div0s r8,r0 ! compare signs
+ shld r3,r1
+
+ or r1,DBL1L
+ bf/s LOCAL(add)
+ shld r3,r9
+ clrt
+ negc r9,r9
+ mov.l LOCAL(x001f0000),r3
+LOCAL(sub_high):
+ mov DBL0L,DBLRL
+ subc DBL1L,DBLRL
+ mov DBL0H,DBLRH
+ bra LOCAL(subtract_done)
+ subc DBL1H,DBLRH
+
+LOCAL(large_shift_arg1):
+ mov.w LOCAL(d0),r9
+ add #64,r3
+ cmp/pl r3
+ shld r3,r1
+ bf LOCAL(ret_arg0)
+ cmp/hi r9,DBL1L
+ mov DBL1H,DBL1L
+ mov r9,DBL1H
+ addc r1,r9
+
+ div0s r8,r0 ! compare signs
+
+ bf LOCAL(add)
+ clrt
+ mov.l LOCAL(x001f0000),r3
+ bra LOCAL(sub_high)
+ negc r9,r9
+
+LOCAL(add_clr_r9):
+ mov #0,r9
+LOCAL(add):
+ mov.l LOCAL(x00200000),r3
+ addc DBL1L,DBL0L
+ addc DBL1H,DBL0H
+ mov.l LOCAL(x80000000),r1
+ tst r3,DBL0H
+ mov.l LOCAL(x7fffffff),r3
+ mov DBL0L,r0
+ bt/s LOCAL(no_carry)
+ and r1,r8
+ tst r9,r9
+ bf LOCAL(add_one)
+ tst #2,r0
+LOCAL(add_one):
+ subc r9,r9
+ sett
+ mov r0,DBLRL
+ addc r9,DBLRL
+ mov DBL0H,DBLRH
+ addc r9,DBLRH
+ shlr DBLRH
+ mov.l LOCAL(x7ff00000),r3
+ add r2,DBLRH
+ mov.l @r15+,r9
+ rotcr DBLRL
+ cmp/hi r3,DBLRH
+LOCAL(add_done):
+ bt LOCAL(inf)
+LOCAL(or_sign):
+ or r8,DBLRH
+ rts
+ mov.l @r15+,r8
+
+LOCAL(inf):
+ bra LOCAL(or_sign)
+ mov r3,DBLRH
+
+LOCAL(pos_difference_0):
+ tst r3,DBL0H
+ mov DBL0L,DBLRL
+ mov.l LOCAL(x80000000),DBL0L
+ mov DBL0H,DBLRH
+ mov.l LOCAL(x00100000),DBL0H
+ bt/s LOCAL(long_norm)
+ and DBL0L,r8
+ bra LOCAL(norm_loop)
+ not DBL0L,r3
+
+LOCAL(same_exp):
+ bf LOCAL(add_clr_r9)
+ clrt
+LOCAL(sub_same_exp):
+ subc DBL1L,DBL0L
+ mov.l LOCAL(x001f0000),r3
+ subc DBL1H,DBL0H
+ mov.w LOCAL(d0),r9
+ bf LOCAL(pos_difference_0)
+ clrt
+ negc DBL0L,DBLRL
+ mov.l LOCAL(x80000000),DBL0L
+ negc DBL0H,DBLRH
+ mov.l LOCAL(x00100000),DBL0H
+ tst r3,DBLRH
+ not r8,r8
+ bt/s LOCAL(long_norm)
+ and DBL0L,r8
+ bra LOCAL(norm_loop)
+ not DBL0L,r3
+
+LOCAL(large_shift_arg0):
+ add #64,r2
+
+ mov #0,r9
+ cmp/pl r2
+ shld r2,r1
+ bf LOCAL(ret_arg1_exp_r3)
+ cmp/hi r9,DBL0L
+ mov DBL0H,DBL0L
+ mov r9,DBL0H
+ addc r1,r9
+ div0s r8,r0 ! compare signs
+ mov r3,r2 ! tentative result exponent
+ bf LOCAL(add)
+ clrt
+ negc r9,r9
+ bra LOCAL(subtract_arg0_arg1_done)
+ mov DBL1L,DBLRL
+
+LOCAL(arg1_gt):
+ tst r0,r2
+ mov #-20,r9
+ bt/s LOCAL(denorm_arg0)
+ cmp/hs r0,r3
+ bt LOCAL(inf_nan_arg1)
+ sub r3,r2
+LOCAL(denorm_arg0_done):
+ shad r9,r2
+ mov.w LOCAL(m32),r9
+ mov DBL1H,r8 ! tentative result sign
+ and r1,DBL1H
+ mov DBL0H,r0 ! the 'other' sign
+ and r1,DBL0H
+ cmp/ge r9,r2
+ mov DBL0H,r1
+ shld r2,DBL0H
+ bf LOCAL(large_shift_arg0)
+ mov DBL0L,r9
+ shld r2,DBL0L
+ add #32,r2
+ mov.l r3,@-r15
+ shld r2,r1
+ mov r2,r3
+ div0s r8,r0 ! compare signs
+ mov.l @r15+,r2 ! tentative result exponent
+ shld r3,r9
+ bf/s LOCAL(add)
+ or r1,DBL0L
+ clrt
+ negc r9,r9
+ mov DBL1L,DBLRL
+LOCAL(subtract_arg0_arg1_done):
+ subc DBL0L,DBLRL
+ mov DBL1H,DBLRH
+ mov.l LOCAL(x001f0000),r3
+ subc DBL0H,DBLRH
+/* Since the exponents were different, the difference is positive. */
+/* Fall through */
+LOCAL(subtract_done):
+/* First check if a shift by a few bits is sufficient. This not only
+ speeds up this case, but also alleviates the need for considering
+ lower bits from r9 or rounding in the other code.
+ Moreover, by handling the upper 1+4 bits of the fraction here, long_norm
+ can assume that DBLRH fits into 20 (20 < 16) bit. */
+ tst r3,DBLRH
+ mov.l LOCAL(x80000000),r3
+ mov.l LOCAL(x00100000),DBL0H
+ bt/s LOCAL(long_norm)
+ and r3,r8
+ mov.l LOCAL(x7fffffff),r3
+LOCAL(norm_loop): ! Well, this used to be a loop...
+ tst DBL0H,DBLRH
+ sub DBL0H,r2
+ bf LOCAL(norm_round)
+ shll r9
+ rotcl DBLRL
+
+ rotcl DBLRH
+
+ tst DBL0H,DBLRH
+ sub DBL0H,r2
+ bf LOCAL(norm_round)
+ shll DBLRL
+ rotcl DBLRH
+ mov.l @r15+,r9
+ cmp/gt r2,DBL0H
+ sub DBL0H,r2
+LOCAL(norm_loop_1):
+ bt LOCAL(denorm0_n)
+ tst DBL0H,DBLRH
+ bf LOCAL(norm_pack)
+ shll DBLRL
+ rotcl DBLRH ! clears T
+ bra LOCAL(norm_loop_1)
+ subc DBL0H,r2
+
+LOCAL(no_carry):
+ shlr r0
+ mov.l LOCAL(x000fffff),DBLRH
+ addc r3,r9
+ mov.w LOCAL(d0),DBL1H
+ mov DBL0L,DBLRL
+ and DBL0H,DBLRH ! mask out implicit 1
+ mov.l LOCAL(x7ff00000),r3
+ addc DBL1H,DBLRL
+ addc r2,DBLRH
+ mov.l @r15+,r9
+ add DBL1H,DBLRH ! fraction overflow -> exp increase
+ bra LOCAL(add_done)
+ cmp/hi r3,DBLRH
+
+LOCAL(denorm_arg0):
+ bt LOCAL(inf_nan_arg1)
+ mov.l LOCAL(x00100000),r2
+ shlr r1
+ bra LOCAL(denorm_arg0_done)
+ sub r3,r2
+
+LOCAL(inf_nan_arg1):
+ mov DBL1L,DBLRL
+ bra LOCAL(pop_r8_r9)
+ mov DBL1H,DBLRH
+
+LOCAL(ret_arg1_exp_r3):
+ mov.l LOCAL(x800fffff),DBLRH
+ bra LOCAL(ret_arg)
+ mov DBL1L,DBLRL
+
+#ifdef __pic__
+ .balign 8
+#endif
+LOCAL(m32):
+ .word -32
+LOCAL(d0):
+ .word 0
+#ifndef __pic__
+ .balign 8
+#endif
+! Because we had several bits of cancellations, we know that r9 contains
+! only one bit.
+! We'll normalize by shifting words so that DBLRH:DBLRL contains
+! the fraction with 0 < DBLRH <= 0x1fffff, then we shift DBLRH:DBLRL
+! up by 21 minus the number of non-zero bits in DBLRH.
+LOCAL(long_norm):
+ tst DBLRH,DBLRH
+ mov.w LOCAL(xff),DBL0L
+ mov #21,r3
+ bf LOCAL(long_norm_highset)
+ mov.l LOCAL(x02100000),DBL1L ! shift 32, implicit 1
+ tst DBLRL,DBLRL
+ extu.w DBLRL,DBL0H
+ bt LOCAL(zero_or_ulp)
+ mov DBLRL,DBLRH
+ cmp/hi DBL0H,DBLRL
+ bf 0f
+ mov.l LOCAL(x01100000),DBL1L ! shift 16, implicit 1
+ clrt
+ shlr16 DBLRH
+ xtrct DBLRL,r9
+ mov DBLRH,DBL0H
+LOCAL(long_norm_ulp_done):
+0: mov r9,DBLRL ! DBLRH:DBLRL == fraction; DBL0H == DBLRH
+ subc DBL1L,r2
+ bt LOCAL(denorm1_b)
+#ifdef __pic__
+ mov.l LOCAL(c__clz_tab),DBL1H
+LOCAL(long_norm_lookup):
+ mov r0,r9
+ mova LOCAL(c__clz_tab),r0
+ add DBL1H,r0
+#else
+ mov r0,r9
+LOCAL(long_norm_lookup):
+ mov.l LOCAL(c__clz_tab),r0
+#endif /* __pic__ */
+ cmp/hi DBL0L,DBL0H
+ bf 0f
+ shlr8 DBL0H
+0: mov.b @(r0,DBL0H),r0
+ bf 0f
+ add #-8,r3
+0: mov.w LOCAL(d20),DBL0L
+ mov #-20,DBL0H
+ clrt
+ sub r0,r3
+ mov r9,r0
+ mov r3,DBL1H
+ shld DBL0L,DBL1H
+ subc DBL1H,r2
+ !
+ bf LOCAL(no_denorm)
+ shad DBL0H,r2
+ bra LOCAL(denorm1_done)
+ add r2,r3
+
+LOCAL(norm_round):
+ cmp/pz r2
+ mov #0,DBL1H
+ bf LOCAL(denorm0_1)
+ or r8,r2
+ mov DBLRL,DBL1L
+ shlr DBL1L
+ addc r3,r9
+ mov.l @r15+,r9
+ addc DBL1H,DBLRL ! round to even
+ mov.l @r15+,r8
+ rts
+ addc r2,DBLRH
+
+LOCAL(norm_pack):
+ add r8,DBLRH
+ mov.l @r15+,r8
+ rts
+ add r2,DBLRH
+
+LOCAL(denorm0_1):
+ mov.l @r15+,r9
+ mov r8,DBL0L
+ mov.l @r15+,r8
+LOCAL(denorm0_shift):
+ shlr DBLRH
+ rotcr DBLRL
+
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(denorm0_n):
+ mov r8,DBL0L
+ addc DBL0H,r2
+ mov.l @r15+,r8
+ bf LOCAL(denorm0_shift)
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(no_denorm):
+ add r2,r8 ! add (exponent - 1) to sign
+
+LOCAL(denorm1_done):
+ shld r3,DBLRH
+ mov DBLRL,DBL0L
+ shld r3,DBLRL
+
+ add r8,DBLRH ! add in sign and (exponent - 1)
+ mov.l @r15+,r9
+ add #-32,r3
+ mov.l @r15+,r8
+ shld r3,DBL0L
+
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(long_norm_highset):
+ mov.l LOCAL(x00200000),DBL1L ! shift 1, implicit 1
+ shll r9
+ rotcl DBLRL
+ mov DBLRH,DBL0H
+ rotcl DBLRH ! clears T
+#ifdef __pic__
+ mov.l LOCAL(c__clz_tab),DBL1H
+#else
+ mov r0,r9
+#endif /* __pic__ */
+ subc DBL1L,r2
+ add #-1,r3
+ bf LOCAL(long_norm_lookup)
+LOCAL(denorm1_a):
+ shlr DBLRH
+ rotcr DBLRL
+ mov.l @r15+,r9
+ or r8,DBLRH
+
+ rts
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(denorm1_b):
+ mov #-20,DBL0L
+ shad DBL0L,r2
+ mov DBLRH,DBL0L
+ shld r2,DBLRH
+ shld r2,DBLRL
+ or r8,DBLRH
+ mov.l @r15+,r9
+ add #32,r2
+ mov.l @r15+,r8
+ shld r2,DBL0L
+ rts
+ or DBL0L,DBLRL
+
+LOCAL(zero_or_ulp):
+ tst r9,r9
+ bf LOCAL(long_norm_ulp_done)
+ ! return +0.0
+LOCAL(pop_r8_r9):
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+LOCAL(d20):
+ .word 20
+LOCAL(xff):
+ .word 0xff
+ .balign 4
+LOCAL(x7ff00000):
+ .long 0x7ff00000
+LOCAL(x001fffff):
+ .long 0x001fffff
+LOCAL(x80000000):
+ .long 0x80000000
+LOCAL(x000fffff):
+ .long 0x000fffff
+LOCAL(x800fffff):
+ .long 0x800fffff
+LOCAL(x001f0000):
+ .long 0x001f0000
+LOCAL(x00200000):
+ .long 0x00200000
+LOCAL(x7fffffff):
+ .long 0x7fffffff
+LOCAL(x00100000):
+ .long 0x00100000
+LOCAL(x02100000):
+ .long 0x02100000
+LOCAL(x01100000):
+ .long 0x01100000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(adddf3))
+ENDFUNC(GLOBAL(subdf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/addsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/addsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/addsf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/addsf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,290 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! addsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+#ifdef L_add_sub_sf3
+ .balign 4
+ .global GLOBAL(subsf3)
+ FUNC(GLOBAL(subsf3))
+ .global GLOBAL(addsf3)
+ FUNC(GLOBAL(addsf3))
+GLOBAL(subsf3):
+ cmp/pz r5
+ add r5,r5
+ rotcr r5
+ .balign 4
+GLOBAL(addsf3):
+ mov.l LOCAL(x7f800000),r3
+ mov r4,r6
+ add r6,r6
+ mov r5,r7
+ add r7,r7
+ mov r4,r0
+ or r3,r0
+ cmp/hi r6,r7
+ mov r5,r1
+ bf/s LOCAL(r4_hs)
+ or r3,r1
+ cmp/eq r5,r1
+ bt LOCAL(ret_r5) /* sole Inf or NaN, return unchanged. */
+ shll8 r0 ! r4 fraction
+ shll8 r1 ! r5 fraction
+ mov r6,r3
+ mov #-24,r2
+ mov r7,r6
+ shld r2,r6 ! r5 exp
+ mov r0,r7
+ shld r2,r3 ! r4 exp
+ tst r6,r6
+ sub r6,r3 ! exp difference (negative or 0)
+ bt LOCAL(denorm_r4)
+LOCAL(denorm_r4_done): ! r1: u1.31
+ shld r3,r0 ! Get 31 upper bits, including 8 guard bits
+ mov.l LOCAL(xff000000),r2
+ add #31,r3
+ mov.l r5,@-r15 ! push result sign.
+ cmp/pl r3 ! r0 has no more than one bit set -> return arg 1
+ shld r3,r7 ! copy of lowest guard bit in r0 and lower guard bits
+ bf LOCAL(ret_stack)
+ div0s r4,r5
+ bf/s LOCAL(add)
+ cmp/pl r7 /* Is LSB in r0 clear, but any lower guards bit set? */
+ subc r0,r1
+ mov.l LOCAL(c__clz_tab),r7
+ tst r2,r1
+ mov #-24,r3
+ bf/s LOCAL(norm_r0)
+ mov r1,r0
+ extu.w r1,r1
+ bra LOCAL(norm_check2)
+ cmp/eq r0,r1
+LOCAL(ret_r5):
+ rts
+ mov r5,r0
+LOCAL(ret_stack):
+ rts
+ mov.l @r15+,r0
+
+/* We leave the numbers denormalized, but we change the bit position to be
+ consistent with normalized numbers. This also removes the spurious
+ leading one that was inserted before. */
+LOCAL(denorm_r4):
+ tst r3,r3
+ bf/s LOCAL(denorm_r4_done)
+ add r0,r0
+ bra LOCAL(denorm_r4_done)
+ add r1,r1
+LOCAL(denorm_r5):
+ tst r6,r6
+ add r1,r1
+ bf LOCAL(denorm_r5_done)
+ clrt
+ bra LOCAL(denorm_r5_done)
+ add r0,r0
+
+/* If the exponent differs by two or more, normalization is minimal, and
+ few guard bits are needed for an exact final result, so sticky guard
+ bit compresion before subtraction (or add) works fine.
+ If the exponent differs by one, only one extra guard bit is generated,
+ and effectively no guard bit compression takes place. */
+
+ .balign 4
+LOCAL(r4_hs):
+ cmp/eq r4,r0
+ mov #-24,r3
+ bt LOCAL(inf_nan_arg0)
+ shld r3,r7
+ shll8 r0
+ tst r7,r7
+ shll8 r1
+ mov.l LOCAL(xff000000),r2
+ bt/s LOCAL(denorm_r5)
+ shld r3,r6
+LOCAL(denorm_r5_done):
+ mov r1,r3
+ subc r6,r7
+ bf LOCAL(same_exp)
+ shld r7,r1 /* Get 31 upper bits. */
+ add #31,r7
+ mov.l r4,@-r15 ! push result sign.
+ cmp/pl r7
+ shld r7,r3
+ bf LOCAL(ret_stack)
+ div0s r4,r5
+ bf/s LOCAL(add)
+ cmp/pl r3 /* Is LSB in r1 clear, but any lower guard bit set? */
+ subc r1,r0
+ mov.l LOCAL(c__clz_tab),r7
+LOCAL(norm_check):
+ tst r2,r0
+ mov #-24,r3
+ bf LOCAL(norm_r0)
+ extu.w r0,r1
+ cmp/eq r0,r1
+LOCAL(norm_check2):
+ mov #-8,r3
+ bt LOCAL(norm_r0)
+ mov #-16,r3
+LOCAL(norm_r0):
+ mov r0,r1
+ shld r3,r0
+#ifdef __pic__
+ add r0,r7
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r7),r7
+ add #25,r3
+ add #-9+1,r6
+ mov r1,r0
+ sub r7,r3
+ mov.l LOCAL(xbfffffff),r7
+ sub r3,r6 /* generate exp-1 */
+ mov.w LOCAL(d24),r2
+ cmp/pz r6 /* check exp > 0 */
+ shld r3,r0 /* Leading 1 becomes +1 exp adjustment. */
+ bf LOCAL(zero_denorm)
+LOCAL(denorm_done):
+ add #30,r3
+ shld r3,r1
+ mov.w LOCAL(m1),r3
+ tst r7,r1 ! clear T if rounding up
+ shld r2,r6
+ subc r3,r0 ! round - overflow will boost exp adjustment to 2.
+ mov.l @r15+,r2
+ add r6,r0 ! overflow will generate inf
+ cmp/ge r2,r3 ! get sign into T
+ rts
+ rotcr r0
+LOCAL(ret_r4):
+ rts
+ mov r4,r0
+
+/* At worst, we are shifting the number back in place where an incoming
+ denormal was. Thus, the shifts won't get out of range. They still
+ might generate a zero fraction, but that's OK, that makes it 0. */
+LOCAL(zero_denorm):
+ add r6,r3
+ mov r1,r0
+ mov #0,r6 /* leading one will become free (except for rounding) */
+ bra LOCAL(denorm_done)
+ shld r3,r0
+
+/* Handle abs(r4) >= abs(r5), same exponents specially so we don't need
+ check for a zero fraction in the main path. */
+LOCAL(same_exp):
+ div0s r4,r5
+ mov.l r4,@-r15
+ bf LOCAL(add)
+ cmp/eq r1,r0
+ mov.l LOCAL(c__clz_tab),r7
+ bf/s LOCAL(norm_check)
+ sub r1,r0
+ rts ! zero difference -> return +zero
+ mov.l @r15+,r1
+
+/* r2: 0xff000000 */
+LOCAL(add):
+ addc r1,r0
+ mov.w LOCAL(x2ff),r7
+ shll8 r6
+ bf/s LOCAL(no_carry)
+ shll16 r6
+ tst r7,r0
+ shlr8 r0
+ mov.l @r15+,r3 ! discard saved sign
+ subc r2,r0
+ sett
+ addc r6,r0
+ cmp/hs r2,r0
+ bt/s LOCAL(inf)
+ div0s r7,r4 /* Copy sign. */
+ rts
+ rotcr r0
+LOCAL(inf):
+ mov r6,r0
+ rts
+ rotcr r0
+LOCAL(no_carry):
+ mov.w LOCAL(m1),r3
+ tst r6,r6
+ bt LOCAL(denorm_add)
+ add r0,r0
+ tst r7,r0 ! check if lower guard bit set or round to even
+ shlr8 r0
+ mov.l @r15+,r1 ! discard saved sign
+ subc r3,r0 ! round ; overflow -> exp++
+ cmp/ge r4,r3 /* Copy sign. */
+ add r6,r0 ! overflow -> inf
+ rts
+ rotcr r0
+
+LOCAL(denorm_add):
+ cmp/ge r4,r3 /* Copy sign. */
+ shlr8 r0
+ mov.l @r15+,r1 ! discard saved sign
+ rts
+ rotcr r0
+
+LOCAL(inf_nan_arg0):
+ cmp/eq r5,r1
+ bf LOCAL(ret_r4)
+ div0s r4,r5 /* Both are inf or NaN, check signs. */
+ bt LOCAL(ret_nan) /* inf - inf, or NaN. */
+ mov r4,r0 ! same sign; return NaN if either is NaN.
+ rts
+ or r5,r0
+LOCAL(ret_nan):
+ rts
+ mov #-1,r0
+
+LOCAL(d24):
+ .word 24
+LOCAL(x2ff):
+ .word 0x2ff
+LOCAL(m1):
+ .word -1
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(xbfffffff):
+ .long 0xbfffffff
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(xfe000000):
+ .long 0xfe000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+
+ ENDFUNC(GLOBAL(addsf3))
+ ENDFUNC(GLOBAL(subsf3))
+#endif /* L_add_sub_sf3 */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3-rt.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3-rt.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3-rt.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3-rt.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,519 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* This version is not quite finshed, since I've found that I can
+ get better average performance with a slightly altered algorithm.
+ Still, if you want a version for hard real time, this version here might
+ be a good starting point, since it has effectively no conditional
+ branches in the path that deals with normal numbers
+ (branches with zero offset are effectively conditional execution),
+ and thus it has a uniform execution time in this path. */
+
+/* y = 1/x ; x (- [1,2)
+ y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+ y1 = y0 - ((y0) * x - 1) * y0 = y-x*d^2
+ y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+ z0 = y2*a ; a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+ z1 = y2*a1 (round to nearest odd 0.5 ulp);
+ a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+ z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+ Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+ with suitable scaling and/or top truncation.
+ x truncated to 20 bits is sufficient to calculate y0 or even y1.
+ Table entries are adjusted by about +128 to use full signed byte range.
+ This adjustment has been perturbed slightly to allow cse with the
+ shift count constant -26.
+ The threshold point for the shift adjust before rounding is found by
+ comparing the fractions, which is exact, unlike the top bit of y2.
+ Therefore, the top bit of y2 becomes slightly random after the adjustment
+ shift, but that's OK because this can happen only at the boundaries of
+ the interval, and the baising of the error means that it can in fact happen
+ only at the bottom end. And there, the carry propagation will make sure
+ that in the end we will have in effect an implicit 1 (or two whem rounding
+ up...) */
+/* If an exact result exists, it can have no more bits than the divident.
+ Hence, we don't need to bother with the round-to-even tie breaker
+ unless the result is denormalized. */
+/* 70 cycles through main path for sh4-300 . Some cycles might be
+ saved by more careful register allocation.
+ 122 cycles for sh4-200. If execution time for sh4-200 is of concern,
+ a specially scheduled version makes sense. */
+
+#define x_h r12
+#define yn r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too. We still have to come back to denorm_arg1_done,
+ since we heven't done any of the work yet that we do till the denorm_arg0
+ entry point. We know that neither of the arguments is inf/nan, but
+ arg0 might be zero. Check for that first to avoid having to establish an
+ rts return address. */
+LOCAL(both_denorm):
+ mov.l r9,@-r15
+ mov DBL0H,r1
+ mov.l r0,@-r15
+ shll2 r1
+ mov.w LOCAL(both_denorm_cleanup_off),r9
+ or DBL0L,r1
+ tst r1,r1
+ mov DBL0H,r0
+ bf/s LOCAL(zero_denorm_arg0_1)
+ shll2 r0
+ mov.l @(4,r15),r9
+ add #8,r15
+ bra LOCAL(ret_inf_nan_0)
+ mov r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+ mov.l @r15+,r0
+ !
+ mov.l @r15+,r9
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ bra LOCAL(denorm_arg1_done)
+ !
+ add r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+ (implicit 1). To leave the result exponent unaltered, the other
+ argument's exponent is adjusted by the the shift count. */
+
+ .balign 4
+LOCAL(arg0_tiny):
+ bsr LOCAL(clz)
+ mov DBL0L,r0
+ shll DBL0H
+ add #1,r0
+ mov DBL0L,DBL0H
+ shld r0,DBL0H
+ rotcr DBL0H
+ tst DBL0L,DBL0L /* Check for divide of zero. */
+ add #-33,r0
+ shld r0,DBL0L
+ bf/s LOCAL(adjust_arg1_exp)
+ add #64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign. */
+ mov.l @r15+,r10
+ mov #0,DBLRH
+ mov.l @r15+,r9
+ bra LOCAL(ret_inf_nan_0)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(arg1_tiny):
+ bsr LOCAL(clz)
+ mov DBL1L,r0
+ shll DBL1H
+ add #1,r0
+ mov DBL1L,DBL1H
+ shld r0,DBL1H
+ rotcr DBL1H
+ tst DBL1L,DBL1L /* Check for divide by zero. */
+ add #-33,r0
+ shld r0,DBL1L
+ bf/s LOCAL(adjust_arg0_exp)
+ add #64,r0
+ mov DBL0H,r0
+ add r0,r0
+ tst r0,r0 ! 0 / 0 ?
+ mov #-1,DBLRH
+ bf LOCAL(return_inf)
+ !
+ bt LOCAL(ret_inf_nan_0)
+ !
+
+ .balign 4
+LOCAL(zero_denorm_arg1):
+ not DBL0H,r3
+ mov DBL1H,r0
+ tst r2,r3
+ shll2 r0
+ bt LOCAL(early_inf_nan_arg0)
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg1_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ !
+ shll DBL1H
+ mov DBL1L,r3
+ shld r0,DBL1H
+ shld r0,DBL1L
+ rotcr DBL1H
+ add #-32,r0
+ shld r0,r3
+ add #32,r0
+ or r3,DBL1H
+LOCAL(adjust_arg0_exp):
+ tst r2,DBL0H
+ mov #20,r3
+ shld r3,r0
+ bt LOCAL(both_denorm)
+ add DBL0H,r0
+ div0s r0,DBL0H ! Check for obvious overflow. */
+ not r0,r3 ! Check for more subtle overflow - lest
+ bt LOCAL(return_inf)
+ mov r0,DBL0H
+ tst r2,r3 ! we mistake it for NaN later
+ mov #12,r3
+ bf LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign. */
+ mov #20,r3
+ mov #-2,DBLRH
+ bra LOCAL(ret_inf_nan_0)
+ shad r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan nan/x -> nan */
+LOCAL(inf_nan_arg0):
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+LOCAL(early_inf_nan_arg0):
+ not DBL1H,r3
+ mov DBL0H,DBLRH
+ tst r2,r3 ! both inf/nan?
+ add DBLRH,DBLRH
+ bf LOCAL(ret_inf_nan_0)
+ mov #-1,DBLRH
+LOCAL(ret_inf_nan_0):
+ mov #0,DBLRL
+ mov.l @r15+,r12
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+/* Already handled: inf/x, nan/x . Thus: x/inf -> 0; x/nan -> nan */
+ .balign 4
+LOCAL(inf_nan_arg1):
+ mov DBL1H,r2
+ mov #12,r1
+ shld r1,r2
+ mov.l @r15+,r10
+ mov #0,DBLRL
+ mov.l @r15+,r9
+ or DBL1L,r2
+ mov.l @r15+,r8
+ cmp/hi DBLRL,r2
+ mov.l @r15+,r12
+ subc DBLRH,DBLRH
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(zero_denorm_arg0):
+ mov.w LOCAL(denorm_arg0_done_off),r9
+ not DBL1H,r1
+ mov DBL0H,r0
+ tst r2,r1
+ shll2 r0
+ bt LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg0_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ shll DBL0H
+ mov DBL0L,r12
+ shld r0,DBL0H
+ shld r0,DBL0L
+ rotcr DBL0H
+ add #-32,r0
+ shld r0,r12
+ add #32,r0
+ or r12,DBL0H
+LOCAL(adjust_arg1_exp):
+ mov #20,r12
+ shld r12,r0
+ add DBL1H,r0
+ div0s r0,DBL1H ! Check for obvious underflow. */
+ not r0,r12 ! Check for more subtle underflow - lest
+ bt LOCAL(return_0)
+ mov r0,DBL1H
+ tst r2,r12 ! we mistake it for NaN later
+ bt LOCAL(return_0)
+ !
+ braf r9
+ mov #13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00): .word 0xff00
+LOCAL(denorm_arg0_done_off):
+ .word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+ .word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign 8
+GLOBAL(divdf3):
+ mov.l LOCAL(x7ff00000),r2
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst r2,DBL1H
+ mov.l r12,@-r15
+ bt LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov DBL1H,x_h ! x_h live in r12
+ shld r3,x_h ! x - 1 ; u0.20
+ mov x_h,yn
+ mova LOCAL(ytab),r0
+ mov.l r8,@-r15
+ shld r1,yn ! x-1 ; u26.6
+ mov.b @(r0,yn),yn
+ mov #6,r0
+ mov.l r9,@-r15
+ mov x_h,r8
+ mov.l r10,@-r15
+ shlr16 x_h ! x - 1; u16.16 ! x/2 - 0.5 ; u15.17
+ add x_h,r1 ! SH4-200 single-issues this insn
+ shld r0,yn
+ sub r1,yn ! yn := y0 ; u15.17
+ mov DBL1L,r1
+ mov #-20,r10
+ mul.l yn,x_h ! r12 dead
+ swap.w yn,r9
+ shld r10,r1
+ sts macl,r0 ! y0 * (x-1) - n ; u-1.32
+ add r9,r0 ! y0 * x - 1 ; s-1.32
+ tst r2,DBL0H
+ dmuls.l r0,yn
+ mov.w LOCAL(d13),r0
+ or r1,r8 ! x - 1; u0.32
+ add yn,yn ! yn = y0 ; u14.18
+ bt LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done): ! This label must stay aligned.
+ sts mach,r1 ! d0 ; s14.18
+ sub r1,yn ! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov DBL0L,r12
+ shld r0,yn ! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld r10,r12
+ mov yn,r0
+ mov DBL0H,r8
+ add yn,yn ! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts mach,r1 ! y1 * (x-1); u1.31
+ add r0,r1 ! y1 * x ; u1.31
+ dmulu.l yn,r1
+ not DBL0H,r10
+ shld r9,r8
+ tst r2,r10
+ or r8,r12 ! a - 1; u0.32
+ bt LOCAL(inf_nan_arg0)
+ sts mach,r1 ! d1+yn; u1.31
+ sett ! adjust y2 so that it can be interpreted as s1.31
+ not DBL1H,r10
+ subc r1,yn ! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst r2,r10
+ or DBL1H,r2
+ bt LOCAL(inf_nan_arg1)
+ mov.l r11,@-r15
+ sts mach,r11 ! y2*(a-1) ; u1.31
+ add yn,r11 ! z0 ; u1.31
+ dmulu.l r11,DBL1L
+ mov.l LOCAL(x40000000),DBLRH ! bias + 1
+ and r9,r2 ! x ; u12.20
+ cmp/hi DBL0L,DBL1L
+ sts macl,r8
+ mov #-24,r12
+ sts mach,r9 ! r9:r8 := z0 * DBL1L; u-19.64
+ subc DBL1H,DBLRH
+ mul.l r11,r2 ! (r9+macl):r8 == z0*x; u-19.64
+ shll r8
+ add DBL0H,DBLRH ! result sign/exponent + 1
+ mov r8,r10
+ sts macl,DBLRL
+ add DBLRL,r9
+ rotcl r9 ! r9:r8 := z*x; u-20.63
+ shld r12,r10
+ mov.l LOCAL(x7fe00000),DBLRL
+ sub DBL0L,r9 ! r9:r8 := -a ; u-20.63
+ mov.l LOCAL(x00200000),r12
+FIXME: the following shift might loose the sign.
+ shll8 r9
+ or r10,r9 ! -a1 ; s-28.32
+ mov.l LOCAL(x00100000),r10
+ dmuls.l r9,yn ! r3 dead
+ mov DBL1H,r3
+ mov.l LOCAL(xfff00000),DBL0L
+ xor DBL0H,r3 ! calculate expected sign & bit20
+ div0s r3,DBLRH
+ xor DBLRH,r3
+ bt LOCAL(ret_denorm_inf)
+ tst DBLRL,DBLRH
+ bt LOCAL(ret_denorm)
+ sub r12,DBLRH ! calculate sign / exponent minus implicit 1
+ tst r10,r3 ! set T if a >= x
+ sts mach,r12! -z1 ; s-27.32
+ bt 0f
+ add r11,r11 ! z0 ; u1.31 / u0.31
+0: mov #6,r3
+ negc r3,r10 ! shift count := a >= x ? -7 : -6; T := 1
+ shll8 r8 ! r9:r8 := -a1 ; s-28.64
+ shad r10,r12 ! -z1 ; truncate to s-20.32 / s-21.32
+ rotcl r12 ! -z1 ; s-21.32 / s-22.32 / round to odd 0.5 ulp ; T := sign
+ add #20,r10
+ dmulu.l r12,DBL1L ! r12 signed, DBL1L unsigned
+ and DBL0L,DBLRH ! isolate sign / exponent
+ shld r10,r9
+ mov r8,r3
+ shld r10,r8
+ sts macl,DBL0L
+ sts mach,DBLRL
+ add #-32,r10
+ shld r10,r3
+ mul.l r12,r2
+ bf 0f ! adjustment for signed/unsigned multiply
+ sub DBL1L,DBLRL ! DBL1L dead
+0: shar r12 ! -z1 ; truncate to s-20.32 / s-21.32
+ sts macl,DBL1L
+ or r3,r9 ! r9:r8 := -a1 ; s-41.64/s-42.64
+ !
+ cmp/hi r8,DBL0L
+ add DBLRL,DBL1L ! DBL1L:DBL0L := -z1*x ; s-41.64/s-42.64
+ subc DBL1L,r9
+ not r12,DBLRL ! z1, truncated to s-20.32 / s-21.32
+ shll r9 ! T := a2 > 0
+ mov r11,r2
+ mov #21,r7
+ shld r7,r11
+ addc r11,DBLRL
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ mov #-11,r7
+ mov.l @r15+,r9
+ shld r7,r2
+ mov.l @r15+,r8
+ addc r2,DBLRH
+ rts
+ mov.l @r15+,r12
+
+LOCAL(ret_denorm):
+ tst r10,DBLRH
+ bra LOCAL(denorm_have_count)
+ movt DBLRH ! calculate shift count (off by 2)
+
+LOCAL(ret_denorm_inf):
+ mov DBLRH,r12
+ add r12,r12
+ cmp/pz r12
+ mov #-21,DBLRL
+ bt LOCAL(ret_inf_late)
+ shld DBLRL,DBLRH
+LOCAL(denorm_have_count):
+ add #-2,DBLRH
+/* FIXME */
+ bra LOCAL(return_0)
+ mov.l @r15+,r11
+
+LOCAL(ret_inf_late):
+ mov.l @r15+,r11
+ !
+ mov.l @r15+,r10
+ !
+ mov.l @r15+,r9
+ bra LOCAL(return_inf)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #8-11,r9
+ xtrct r0,r8
+ add #16,r9
+0: tst r12,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(d1): .word 1
+LOCAL(d12): .word 12
+LOCAL(d13): .word 13
+
+ .balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+ .byte 120, 105, 91, 78, 66, 54, 43, 33
+ .byte 24, 15, 8, 0, -5, -12, -17, -22
+ .byte -27, -31, -34, -37, -40, -42, -44, -45
+ .byte -46, -46, -47, -46, -46, -45, -44, -42
+ .byte -41, -39, -36, -34, -31, -28, -24, -20
+ .byte -17, -12, -8, -4, 0, 5, 10, 16
+ .byte 21, 27, 33, 39, 45, 52, 58, 65
+ .byte 72, 79, 86, 93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,608 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* y = 1/x ; x (- [1,2)
+ y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+ y1 = y0 - ((y0) * x - 1) * y0 = y-x*d^2
+ y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+ z0 = y2*a ; a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+ z1 = y2*a1 (round to nearest odd 0.5 ulp);
+ a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+ z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+ Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+ with suitable scaling and/or top truncation.
+ We use a slightly modified algorithm here that checks if the lower
+ bits in z1 are sufficient to determine the outcome of rounding - in that
+ case a2 is not computed.
+ -z1 is computed in units of 1/128 ulp, with an error in the range
+ -0x3.e/128 .. +0 ulp.
+ Thus, after adding three, the result can be safely rounded for normal
+ numbers if any of the bits 5..2 is set, or if the highest guard bit
+ (bit 6 if y <1, otherwise bit 7) is set.
+ (Because of the way truncation works, we would be fine for an open
+ error interval of (-4/128..+1/128) ulp )
+ For denormal numbers, the rounding point lies higher, but it would be
+ quite cumbersome to calculate where exactly; it is sufficient if any
+ of the bits 7..3 is set.
+ x truncated to 20 bits is sufficient to calculate y0 or even y1.
+ Table entries are adjusted by about +128 to use full signed byte range.
+ This adjustment has been perturbed slightly to allow cse with the
+ shift count constant -26.
+ The threshold point for the shift adjust before rounding is found by
+ comparing the fractions, which is exact, unlike the top bit of y2.
+ Therefore, the top bit of y2 becomes slightly random after the adjustment
+ shift, but that's OK because this can happen only at the boundaries of
+ the interval, and the biasing of the error means that it can in fact happen
+ only at the bottom end. And there, the carry propagation will make sure
+ that in the end we will have in effect an implicit 1 (or two whem rounding
+ up...) */
+/* If an exact result exists, it can have no more bits than the divident.
+ Hence, we don't need to bother with the round-to-even tie breaker
+ unless the result is denormalized. */
+/* 64 cycles through main path for sh4-300 (about 93.7% of normalized numbers),
+ 82 for the path for rounding tie-breaking for normalized numbers
+ (including one branch mispredict).
+ Some cycles might be saved by more careful register allocation. */
+
+#define x_h r12
+#define yn r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too. We still have to come back to denorm_arg1_done,
+ since we heven't done any of the work yet that we do till the denorm_arg0
+ entry point. We know that neither of the arguments is inf/nan, but
+ arg0 might be zero. Check for that first to avoid having to establish an
+ rts return address. */
+LOCAL(both_denorm):
+ mov.l r9,@-r15
+ mov DBL0H,r1
+ mov.l r0,@-r15
+ shll2 r1
+ mov.w LOCAL(both_denorm_cleanup_off),r9
+ or DBL0L,r1
+ tst r1,r1
+ mov DBL0H,r0
+ bf/s LOCAL(zero_denorm_arg0_1)
+ shll2 r0
+ mov.l @(4,r15),r9
+ add #8,r15
+ bra LOCAL(ret_inf_nan_0)
+ mov r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+ mov.l @r15+,r0
+ !
+ mov.l @r15+,r9
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ bra LOCAL(denorm_arg1_done)
+ !
+ add r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+ (implicit 1). To leave the result exponent unaltered, the other
+ argument's exponent is adjusted by the the shift count. */
+
+ .balign 4
+LOCAL(arg0_tiny):
+ bsr LOCAL(clz)
+ mov DBL0L,r0
+ shll DBL0H
+ add #1,r0
+ mov DBL0L,DBL0H
+ shld r0,DBL0H
+ rotcr DBL0H
+ tst DBL0L,DBL0L /* Check for divide of zero. */
+ add #-33,r0
+ shld r0,DBL0L
+ bf/s LOCAL(adjust_arg1_exp)
+ add #64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign. */
+ mov.l @r15+,r10
+ mov #0,DBLRH
+ mov.l @r15+,r9
+ bra LOCAL(ret_inf_nan_0)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(arg1_tiny):
+ bsr LOCAL(clz)
+ mov DBL1L,r0
+ shll DBL1H
+ add #1,r0
+ mov DBL1L,DBL1H
+ shld r0,DBL1H
+ rotcr DBL1H
+ tst DBL1L,DBL1L /* Check for divide by zero. */
+ add #-33,r0
+ shld r0,DBL1L
+ bf/s LOCAL(adjust_arg0_exp)
+ add #64,r0
+ mov DBL0H,r0
+ add r0,r0
+ tst r0,r0 ! 0 / 0 ?
+ mov #-1,DBLRH
+ bf LOCAL(return_inf)
+ !
+ bt LOCAL(ret_inf_nan_0)
+ !
+
+ .balign 4
+LOCAL(zero_denorm_arg1):
+ not DBL0H,r3
+ mov DBL1H,r0
+ tst r2,r3
+ shll2 r0
+ bt LOCAL(early_inf_nan_arg0)
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg1_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ !
+ shll DBL1H
+ mov DBL1L,r3
+ shld r0,DBL1H
+ shld r0,DBL1L
+ rotcr DBL1H
+ add #-32,r0
+ shld r0,r3
+ add #32,r0
+ or r3,DBL1H
+LOCAL(adjust_arg0_exp):
+ tst r2,DBL0H
+ mov #20,r3
+ shld r3,r0
+ bt LOCAL(both_denorm)
+ add DBL0H,r0
+ div0s r0,DBL0H ! Check for obvious overflow. */
+ not r0,r3 ! Check for more subtle overflow - lest
+ bt LOCAL(return_inf)
+ mov r0,DBL0H
+ tst r2,r3 ! we mistake it for NaN later
+ mov #12,r3
+ bf LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign. */
+ mov #20,r3
+ mov #-2,DBLRH
+ bra LOCAL(ret_inf_nan_0)
+ shad r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan nan/x -> nan */
+LOCAL(inf_nan_arg0):
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+LOCAL(early_inf_nan_arg0):
+ not DBL1H,r3
+ mov DBL0H,DBLRH
+ tst r2,r3 ! both inf/nan?
+ add DBLRH,DBLRH
+ bf LOCAL(ret_inf_nan_0)
+ mov #-1,DBLRH
+LOCAL(ret_inf_nan_0):
+ mov #0,DBLRL
+ mov.l @r15+,r12
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+/* Already handled: inf/x, nan/x . Thus: x/inf -> 0; x/nan -> nan */
+ .balign 4
+LOCAL(inf_nan_arg1):
+ mov DBL1H,r2
+ mov #12,r1
+ shld r1,r2
+ mov.l @r15+,r10
+ mov #0,DBLRL
+ mov.l @r15+,r9
+ or DBL1L,r2
+ mov.l @r15+,r8
+ cmp/hi DBLRL,r2
+ mov.l @r15+,r12
+ subc DBLRH,DBLRH
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(zero_denorm_arg0):
+ mov.w LOCAL(denorm_arg0_done_off),r9
+ not DBL1H,r1
+ mov DBL0H,r0
+ tst r2,r1
+ shll2 r0
+ bt LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg0_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ shll DBL0H
+ mov DBL0L,r12
+ shld r0,DBL0H
+ shld r0,DBL0L
+ rotcr DBL0H
+ add #-32,r0
+ shld r0,r12
+ add #32,r0
+ or r12,DBL0H
+LOCAL(adjust_arg1_exp):
+ mov #20,r12
+ shld r12,r0
+ add DBL1H,r0
+ div0s r0,DBL1H ! Check for obvious underflow. */
+ not r0,r12 ! Check for more subtle underflow - lest
+ bt LOCAL(return_0)
+ mov r0,DBL1H
+ tst r2,r12 ! we mistake it for NaN later
+ bt LOCAL(return_0)
+ !
+ braf r9
+ mov #13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00): .word 0xff00
+LOCAL(denorm_arg0_done_off):
+ .word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+ .word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign 8
+GLOBAL(divdf3):
+ mov.l LOCAL(x7ff00000),r2
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst r2,DBL1H
+ mov.l r12,@-r15
+ bt LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov DBL1H,x_h ! x_h live in r12
+ shld r3,x_h ! x - 1 ; u0.20
+ mov x_h,yn
+ mova LOCAL(ytab),r0
+ mov.l r8,@-r15
+ shld r1,yn ! x-1 ; u26.6
+ mov.b @(r0,yn),yn
+ mov #6,r0
+ mov.l r9,@-r15
+ mov x_h,r8
+ mov.l r10,@-r15
+ shlr16 x_h ! x - 1; u16.16 ! x/2 - 0.5 ; u15.17
+ add x_h,r1 ! SH4-200 single-issues this insn
+ shld r0,yn
+ sub r1,yn ! yn := y0 ; u15.17
+ mov DBL1L,r1
+ mov #-20,r10
+ mul.l yn,x_h ! r12 dead
+ swap.w yn,r9
+ shld r10,r1
+ sts macl,r0 ! y0 * (x-1) - n ; u-1.32
+ add r9,r0 ! y0 * x - 1 ; s-1.32
+ tst r2,DBL0H
+ dmuls.l r0,yn
+ mov.w LOCAL(d13),r0
+ or r1,r8 ! x - 1; u0.32
+ add yn,yn ! yn = y0 ; u14.18
+ bt LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):
+ sts mach,r1 ! d0 ; s14.18
+ sub r1,yn ! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov DBL0L,r12
+ shld r0,yn ! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld r10,r12
+ mov yn,r0
+ mov DBL0H,r8
+ add yn,yn ! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts mach,r1 ! y1 * (x-1); u1.31
+ add r0,r1 ! y1 * x ; u1.31
+ dmulu.l yn,r1
+ not DBL0H,r10
+ shld r9,r8
+ tst r2,r10
+ or r8,r12 ! a - 1; u0.32
+ bt LOCAL(inf_nan_arg0)
+ sts mach,r1 ! d1+yn; u1.31
+ sett ! adjust y2 so that it can be interpreted as s1.31
+ not DBL1H,r10
+ subc r1,yn ! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst r2,r10
+ or DBL1H,r2
+ bt LOCAL(inf_nan_arg1)
+ mov.l r11,@-r15
+ sts mach,r12 ! y2*(a-1) ; u1.31
+ add yn,r12 ! z0 ; u1.31
+ dmulu.l r12,DBL1L
+ mov.l LOCAL(x40000000),DBLRH ! bias + 1
+ and r9,r2 ! x ; u12.20
+ cmp/hi DBL0L,DBL1L
+ sts macl,r8
+ mov #-24,r11
+ sts mach,r9 ! r9:r8 := z0 * DBL1L; u-19.64
+ subc DBL1H,DBLRH
+ mul.l r12,r2 ! (r9+macl):r8 == z0*x; u-19.64
+ shll r8
+ add DBL0H,DBLRH ! result sign/exponent + 1
+ mov r8,r10
+ sts macl,DBLRL
+ add DBLRL,r9
+ rotcl r9 ! r9:r8 := z*x; u-20.63
+ shld r11,r10
+ mov.l LOCAL(x7fe00000),DBLRL
+ sub DBL0L,r9 ! r9:r8 := -a ; u-20.63
+ cmp/pz r9 ! In corner cases this shift can loose ..
+ shll8 r9 ! .. the sign, so check it first.
+ mov.l LOCAL(x00200000),r11
+ or r10,r9 ! -a1 ; s-28.32
+ mov.l LOCAL(x00100000),r10
+ dmulu.l r9,yn ! sign for r9 is in T
+ xor DBL0H,DBL1H ! calculate expected sign & bit20
+ mov.w LOCAL(d120),DBL0H ! to test bits 6..4
+ xor DBLRH,DBL1H
+ !
+ sts mach,DBL0L ! -z1 ; s-27.32
+ bt 0f
+ sub yn,DBL0L ! multiply adjust for -a1 negative; r3 dies here
+0:tst r10,DBL1H ! set T if a >= x
+ mov.l LOCAL(xfff00000),r3
+ bt 0f
+ add DBL0L,DBL0L ! z1 ; s-27.32 / s-28.32
+0:bt 0f
+ add r12,r12 ! z0 ; u1.31 / u0.31
+0:add #6-64,DBL0L
+ and r3,DBLRH ! isolate sign / exponent
+ tst DBL0H,DBL0L
+ bf/s LOCAL(exact) ! make the hot path taken for best branch prediction
+ cmp/pz DBL1H
+
+! Unless we follow the next branch, we need to test which way the rounding
+! should go.
+! For normal numbers, we know that the result is not exact, so the sign
+! of the rest will be conclusive.
+! We generate a number that looks safely rounded so that denorm handling
+! can safely test the number twice.
+! r10:r8 == 0 will indicate if the number was exact, which can happen
+! when we come here for denormals to check a number that is close or
+! equal to a result in whole ulps.
+ bf LOCAL(ret_denorm_inf) ! denorm or infinity, DBLRH has inverted sign
+ add #64,DBL0L
+LOCAL(find_adjust): tst r10,DBL1H ! set T if a >= x
+ mov #-2,r10
+ addc r10,r10
+ mov DBL0L,DBLRL ! z1 ; s-27.32 / s-28.32 ; lower 4 bits unsafe.
+ shad r10,DBLRL ! tentatively rounded z1 ; s-24.32
+ shll8 r8 ! r9:r8 := -a1 ; s-28.64
+ clrt
+ dmuls.l DBLRL,DBL1L ! DBLRL signed, DBL1L unsigned
+ mov r8,r10
+ shll16 r8 ! r8 := lowpart of -a1 ; s-44.48
+ xtrct r9,r10 ! r10 := highpart of -a1 ; s-44.48
+ !
+ sts macl,r3
+ subc r3,r8
+ sts mach,r3
+ subc r3,r10
+ cmp/pz DBL1L
+ mul.l DBLRL,r2
+ bt 0f
+ sub DBLRL,r10 ! adjust for signed/unsigned multiply
+0: mov.l LOCAL(x7fe00000),DBLRL
+ mov #-26,r2
+ sts macl,r9
+ sub r9,r10 ! r10:r8 := -a2
+ add #-64+16,DBL0L ! the denorm code negates this adj. for exact results
+ shld r2,r10 ! convert sign into adjustment in the range 32..63
+ sub r10,DBL0L
+ cmp/pz DBL1H
+
+ .balign 4
+LOCAL(exact):
+ bf LOCAL(ret_denorm_inf) ! denorm or infinity, DBLRH has inverted sign
+ tst DBLRL,DBLRH
+ bt LOCAL(ret_denorm_inf) ! denorm, DBLRH has correct sign
+ mov #-7,DBL1H
+ cmp/pz DBL0L ! T is sign extension of z1
+ not DBL0L,DBLRL
+ subc r11,DBLRH ! calculate sign / exponent minus implicit 1 minus T
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ shad DBL1H,DBLRL
+ mov.l @r15+,r9
+ mov #-11,DBL1H
+ mov r12,r8 ! z0 contributes to DBLRH and DBLRL
+ shld DBL1H,r12
+ mov #21,DBL1H
+ clrt
+ shld DBL1H,r8
+ addc r8,DBLRL
+ mov.l @r15+,r8
+ addc r12,DBLRH
+ rts
+ mov.l @r15+,r12
+
+! sign in DBLRH ^ DBL1H
+! If the last 7 bits are in the range 64..64+7, we might have an exact
+! value in the preceding bits - or we might not. For denorms, we need to
+! find out.
+! if r10:r8 is zero, we just have found out that there is an exact value.
+ .balign 4
+LOCAL(ret_denorm_inf):
+ mov DBLRH,r3
+ add r3,r3
+ div0s DBL1H,r3
+ mov #120,DBLRL
+ bt LOCAL(ret_inf_late)
+ add #64,DBL0L
+ tst DBLRL,DBL0L
+ mov #-21,DBLRL
+ bt LOCAL(find_adjust)
+ or r10,r8
+ tst r8,r8 ! check if find_adjust found an exact value.
+ shad DBLRL,r3
+ bf 0f
+ add #-16,DBL0L ! if yes, cancel adjustment
+0: mov #-8,DBLRL ! remove the three lowest (inexact) bits
+ and DBLRL,DBL0L
+ add #-2-11,r3 ! shift count for denorm generation
+ mov DBL0L,DBLRL
+ mov #28,r2
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ shll2 DBLRL
+ mov.l @r15+,r9
+ shld r2,DBL0L
+ mov.l @r15+,r8
+ mov #-31,r2
+ cmp/ge r2,r3
+ shll2 DBLRL
+ bt/s 0f
+ add DBL0L,r12 ! fraction in r12:DBLRL ; u1.63
+ negc DBLRL,DBLRL ! T := DBLRL != 0
+ add #31,r3
+ mov r12,DBLRL
+ rotcl DBLRL ! put in sticky bit
+ movt r12
+ cmp/ge r2,r3
+ bt/s LOCAL(return_0_late)
+0: div0s DBL1H,DBLRH ! calculate sign
+ mov r12,DBLRH
+ shld r3,DBLRH
+ mov DBLRL,r2
+ shld r3,DBLRL
+ add #32,r3
+ add DBLRH,DBLRH
+ mov.l LOCAL(x80000000),DBL1H
+ shld r3,r12
+ rotcr DBLRH ! combine sign with highpart
+ add #-1,r3
+ shld r3,r2
+ mov #0,r3
+ rotl r2
+ cmp/hi DBL1H,r2
+ addc r12,DBLRL
+ mov.l @r15+,r12
+ rts
+ addc r3,DBLRH
+
+LOCAL(ret_inf_late):
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ mov DBLRH,DBL0H
+ mov.l @r15+,r9
+ bra LOCAL(return_inf)
+ mov.l @r15+,r8
+
+LOCAL(return_0_late):
+ div0s DBLRH,DBL1H
+ mov.l @r15+,r12
+ mov #0,DBLRH
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #21,r9
+ xtrct r0,r8
+ add #-16,r9
+0: tst r12,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #-8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(d1): .word 1
+LOCAL(d12): .word 12
+LOCAL(d13): .word 13
+LOCAL(d120): .word 120
+
+ .balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+ .byte 120, 105, 91, 78, 66, 54, 43, 33
+ .byte 24, 15, 8, 0, -5, -12, -17, -22
+ .byte -27, -31, -34, -37, -40, -42, -44, -45
+ .byte -46, -46, -47, -46, -46, -45, -44, -42
+ .byte -41, -39, -36, -34, -31, -28, -24, -20
+ .byte -17, -12, -8, -4, 0, 5, 10, 16
+ .byte 21, 27, 33, 39, 45, 52, 58, 65
+ .byte 72, 79, 86, 93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divsf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divsf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,365 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! divsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+! long 0th..3rd significant byte
+#ifdef __LITTLE_ENDIAN__
+#define L0SB 3
+#define L1SB 2
+#define L2SB 1
+#define L3SB 0
+#else
+#define L0SB 0
+#define L1SB 1
+#define L2SB 2
+#define L3SB 3
+#endif
+
+! clobbered: r0,r1,r2,r3,r6,r7,T (and for sh.md's purposes PR)
+!
+! Note: When the divisor is larger than the divident, we have to adjust the
+! exponent down by one. We do this automatically when subtracting the entire
+! exponent/fraction bitstring as an integer, by means of the borrow from
+! bit 23 to bit 24.
+! Note: non-denormal rounding of a division result cannot cause fraction
+! overflow / exponent change. (r4 > r5 : fraction must stay in (2..1] interval;
+! r4 < r5: having an extra bit of precision available, even the smallest
+! possible difference of the result from one is rounded in all rounding modes
+! to a fraction smaller than one.)
+! sh4-200: 59 cycles
+! sh4-300: 44 cycles
+! tab indent: exponent / sign computations
+! tab+space indent: fraction computation
+FUNC(GLOBAL(divsf3))
+ .global GLOBAL(divsf3)
+ .balign 4
+GLOBAL(divsf3):
+ mov.l LOCAL(x7f800000),r3
+ mov #1,r2
+ mov r4,r6
+ shll8 r6
+ mov r5,r7
+ shll8 r7
+ rotr r2
+ tst r3,r4
+ or r2,r6
+ bt/s LOCAL(denorm_arg0)
+ or r2,r7
+ tst r3,r5
+ bt LOCAL(denorm_arg1)
+ shlr r6
+ mov.l LOCAL(x3f000000),r3 ! bias minus explict leading 1
+ div0u
+LOCAL(denorm_done):
+ div1 r7,r6
+ mov.l r8,@-r15
+ bt 0f
+ div1 r7,r6
+0: mov.l r9,@-r15
+ div1 r7,r6
+ add r4,r3
+ div1 r7,r6
+ sub r5,r3 ! result sign/exponent minus 1 if no overflow/underflow
+ div1 r7,r6
+ or r3,r2
+ div1 r7,r6
+ mov.w LOCAL(xff00),r9
+ div1 r7,r6
+ mov.l r2,@-r15 ! L0SB is 0xff iff denorm / infinity exp is computed
+ div1 r7,r6
+ mov.w LOCAL(m23),r2
+ div1 r7,r6
+ mov r4,r0
+ div1 r7,r6
+ extu.b r6,r1
+ and r9,r6
+ swap.w r1,r1 ! first 8 bits of result fraction in bit 23..16
+ div1 r7,r6
+ shld r2,r0
+ div1 r7,r6
+ mov.b r0,@(L3SB,r15) ! 0xff iff divident was infinity / nan
+ div1 r7,r6
+ mov r5,r0
+ div1 r7,r6
+ shld r2,r0
+ div1 r7,r6
+ mov.b r0,@(L2SB,r15) ! 0xff iff divisor was infinity / nan
+ div1 r7,r6
+ mov r4,r0
+ div1 r7,r6
+ mov.w LOCAL(m31),r2
+ div1 r7,r6
+ extu.b r6,r8 ! second 8 bits of result fraction in bit 7..0
+ and r9,r6
+ mov.l LOCAL(xff800000),r9
+ div1 r7,r6
+ xor r5,r0 ! msb := correct result sign
+ div1 r7,r6
+ xor r3,r0 ! xor with sign of result sign/exponent word
+ div1 r7,r6
+ shad r2,r0
+ div1 r7,r6
+ mov.b r0,@(L1SB,r15) ! 0xff iff exponent over/underflows
+ and r9,r3 ! isolate sign / exponent
+ mov.w LOCAL(xff01),r2
+ div1 r7,r6
+ swap.b r8,r0 ! second 8 bits of result fraction in bit 15..8
+ div1 r7,r6
+ or r1,r0 ! first 16 bits of result fraction in bit 23..8
+ div1 r7,r6
+ mov.w LOCAL(m1),r9
+ div1 r7,r6
+ mov.l @r15+,r8 ! load encoding of unusal exponent conditions
+ and r6,r2 ! rest | result lsb
+ mov #0,r1
+ bf 0f ! bit below lsb clear -> no rounding
+ cmp/hi r1,r2
+0: extu.b r6,r1
+ or r1,r0 ! 24 bit result fraction with explicit leading 1
+ addc r3,r0 ! add in exponent / sign
+ cmp/str r9,r8
+ ! (no stall *here* for SH4-100 / SH4-200)
+ bt/s LOCAL(inf_nan_denorm_zero)
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+/* The exponennt adjustment for denormal numbers is done by leaving an
+ adjusted value in r3; r4/r5 are not changed. */
+ .balign 4
+LOCAL(denorm_arg0):
+ mov.w LOCAL(xff00),r1
+ sub r2,r6 ! 0x800000000 : remove implict 1
+ tst r6,r6
+ sts.l pr,@-r15
+ bt LOCAL(div_zero)
+ bsr LOCAL(clz)
+ mov r6,r0
+ shld r0,r6
+ tst r3,r5
+ mov.l LOCAL(x3f800000),r3 ! bias - 1 + 1
+ mov #23,r1
+ shld r1,r0
+ bt/s LOCAL(denorm_arg1_2)
+ sub r0,r3
+ shlr r6
+ bra LOCAL(denorm_done)
+ div0u
+
+LOCAL(denorm_arg1):
+ mov.l LOCAL(x3f000000),r3 ! bias - 1
+LOCAL(denorm_arg1_2):
+ sub r2,r7 ! 0x800000000 : remove implict 1
+ mov.w LOCAL(xff00),r1
+ tst r7,r7
+ sts.l pr,@-r15
+ bt LOCAL(div_by_zero)
+ bsr LOCAL(clz)
+ mov r7,r0
+ shld r0,r7
+ add #-1,r0
+ mov #23,r1
+ shld r1,r0
+ add r0,r3
+ shlr r6
+ bra LOCAL(denorm_done)
+ div0u
+
+ .balign 4
+LOCAL(inf_nan_denorm_zero):
+! r0 has the rounded result, r6 has the non-rounded lowest bits & rest.
+! the bit just below the LSB of r6 is available as ~Q
+
+! Alternative way to get at ~Q:
+! if rounding took place, ~Q must be set.
+! if the rest appears to be zero, ~Q must be set.
+! if the rest appears to be nonzero, but rounding didn't take place,
+! ~Q must be clear; the apparent rest will then require adjusting to test if
+! the actual rest is nonzero.
+ mov r0,r2
+ not r8,r0
+ tst #0xff,r0
+ shlr8 r0
+ mov.l @r15+,r8
+ bt/s LOCAL(div_inf_or_nan)
+ tst #0xff,r0
+ mov r4,r0
+ bt LOCAL(div_by_inf_or_nan)
+ add r0,r0
+ mov r5,r1
+ add r1,r1
+ cmp/hi r1,r0
+ mov r6,r0
+ bt LOCAL(overflow)
+ sub r2,r0
+ exts.b r0,r0 ! -1 if rounding took place
+ shlr8 r6 ! isolate div1-mangled rest
+ addc r2,r0 ! generate carry if rounding took place
+ shlr8 r7
+ sub r3,r0 ! pre-rounding fraction
+ bt 0f ! going directly to denorm_sticky would cause mispredicts
+ tst r6,r6 ! rest can only be zero if lost bit was set
+0: add r7,r6 ! (T ? corrupt : reconstruct) actual rest
+ bt 0f
+ cmp/pl r6
+0: mov.w LOCAL(m24),r1
+ addc r0,r0 ! put in sticky bit
+ add #-1,r3
+ mov.l LOCAL(x40000000),r6
+ add r3,r3
+ mov r0,r2
+ shad r1,r3 ! exponent ; s32.0
+ !
+ shld r3,r0
+ add #30,r3
+ cmp/pl r3
+ shld r3,r2
+ bf LOCAL(zero_nan) ! return zero
+ rotl r2
+ cmp/hi r6,r2
+ mov #0,r7
+ addc r7,r0
+ div0s r4,r5
+ rts
+ rotcr r0
+
+! ????
+! undo normal rounding (lowest bits still in r6). then do denormal rounding.
+
+LOCAL(overflow):
+ mov.l LOCAL(xff000000),r0
+ div0s r4,r5
+ rts
+ rotcl r0
+
+LOCAL(div_inf_or_nan):
+ mov r4,r0
+ bra LOCAL(nan_if_t)
+ add r0,r0
+
+LOCAL(div_by_inf_or_nan):
+ mov.l LOCAL(xff000000),r1
+ mov #0,r0
+ mov r5,r2
+ add r2,r2
+ bra LOCAL(nan_if_t)
+ cmp/hi r1,r2
+
+
+
+! still need to check for divide by zero or divide by nan
+! r3: 0x7f800000
+ .balign 4
+LOCAL(div_zero):
+ mov r5,r1
+ add r1,r1
+ tst r1,r1 ! 0 / 0 -> nan
+ not r5,r1
+ bt LOCAL(nan)
+ add r3,r3
+ cmp/hi r3,r1 ! 0 / nan -> nan (but 0 / inf -> 0)
+LOCAL(zero_nan):
+ mov #0,r0
+LOCAL(nan_if_t):
+ bf 0f:
+LOCAL(nan):
+ mov #-1,r0
+0: div0s r4,r5 ! compute sign
+ rts
+ rotcr r0 ! insert sign
+
+LOCAL(div_by_zero):
+ mov.l LOCAL(xff000000),r0
+ mov r5,r2
+ add r2,r2
+ bra LOCAL(nan_if_t)
+ cmp/hi r0,r2
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #8-8,r9
+ xtrct r0,r8
+ add #16,r9
+0: tst r1,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(m23): .word -23
+LOCAL(m24): .word -24
+LOCAL(m31): .word -31
+LOCAL(xff01): .word 0xff01
+ .balign 4
+LOCAL(xff000000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(xff00): .word 0xff00
+LOCAL(m1): .word -1
+#else
+LOCAL(m1): .word -1
+LOCAL(xff00): .word 0xff00
+#endif
+LOCAL(x7f800000): .long 0x7f800000
+LOCAL(x3f000000): .long 0x3f000000
+LOCAL(x3f800000): .long 0x3f800000
+LOCAL(xff800000): .long 0xff800000
+LOCAL(x40000000): .long 0x40000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(divsf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixdfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixdfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixdfsi.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixdfsi.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,115 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!! fixdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixdfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get UINT_MAX, for set sign bit, you get 0.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixdfsi)
+ FUNC(GLOBAL(fixdfsi))
+ .balign 4
+GLOBAL(fixdfsi):
+ mov.w LOCAL(x413),r1
+ mov DBL0H,r0
+ shll DBL0H
+ mov.l LOCAL(mask),r3
+ mov #-21,r2
+ shld r2,DBL0H ! SH4-200 will start this insn in a new cycle
+ bt/s LOCAL(neg)
+ sub r1,DBL0H
+ cmp/pl DBL0H ! SH4-200 will start this insn in a new cycle
+ and r3,r0
+ bf/s LOCAL(ignore_low)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #10,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmax)
+ shld DBL0H,DBL0L
+ rts
+ or DBL0L,r0
+
+ .balign 8
+LOCAL(ignore_low):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ bf 0f ! SH4-200 will start this insn in a new cycle
+ mov #-31,DBL0H ! results in 0 return
+0: add #1,r0
+ rts
+ shld DBL0H,r0
+
+ .balign 4
+LOCAL(neg):
+ cmp/pl DBL0H
+ and r3,r0
+ bf/s LOCAL(ignore_low_neg)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #10,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmin)
+ shld DBL0H,DBL0L
+ or DBL0L,r0 ! SH4-200 will start this insn in a new cycle
+ rts
+ neg r0,r0
+
+ .balign 4
+LOCAL(ignore_low_neg):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ add #1,r0
+ shld DBL0H,r0
+ bf 0f
+ mov #0,r0 ! results in 0 return
+0: rts
+ neg r0,r0
+
+LOCAL(retmax):
+ mov #-1,r0
+ rts
+ shlr r0
+
+LOCAL(retmin):
+ mov #1,r0
+ rts
+ rotr r0
+
+LOCAL(x413): .word 0x413
+
+ .balign 4
+LOCAL(mask): .long 0x000fffff
+ ENDFUNC(GLOBAL(fixdfsi))
+#endif /* L_fixdfsi */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixunsdfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixunsdfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixunsdfsi.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixunsdfsi.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,82 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!! fixunsdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixunsdfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get INT_MAX, for set sign bit, you get INT_MIN.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixunsdfsi)
+ FUNC(GLOBAL(fixunsdfsi))
+ .balign 4
+GLOBAL(fixunsdfsi):
+ mov.w LOCAL(x413),r1 ! bias + 20
+ mov DBL0H,r0
+ shll DBL0H
+ mov.l LOCAL(mask),r3
+ mov #-21,r2
+ shld r2,DBL0H ! SH4-200 will start this insn in a new cycle
+ bt/s LOCAL(ret0)
+ sub r1,DBL0H
+ cmp/pl DBL0H ! SH4-200 will start this insn in a new cycle
+ and r3,r0
+ bf/s LOCAL(ignore_low)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #11,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmax)
+ shld DBL0H,DBL0L
+ rts
+ or DBL0L,r0
+
+ .balign 8
+LOCAL(ignore_low):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ add #1,r0
+ bf 0f
+LOCAL(ret0): mov #0,r0 ! results in 0 return
+0: rts
+ shld DBL0H,r0
+
+LOCAL(retmax):
+ rts
+ mov #-1,r0
+
+LOCAL(x413): .word 0x413
+
+ .balign 4
+LOCAL(mask): .long 0x000fffff
+ ENDFUNC(GLOBAL(fixunsdfsi))
+#endif /* L_fixunsdfsi */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsidf.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsidf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsidf.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsidf.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,103 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! floatsidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsidf))
+ .global GLOBAL(floatsidf)
+ .balign 4
+GLOBAL(floatsidf):
+ tst r4,r4
+ mov r4,r1
+ bt LOCAL(ret0)
+ cmp/pz r4
+ bt 0f
+ neg r4,r1
+0: mov.l LOCAL(c_clz_tab),r0
+ extu.w r1,r5
+ mov.w LOCAL(xff00),r3
+ cmp/eq r1,r5
+ mov #21,r2
+ bt 0f
+ mov r1,r5
+ shlr16 r5
+ add #-16,r2
+0: tst r3,r5 ! 0xff00
+ bt 0f
+ shlr8 r5
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r5
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r5),r5
+ cmp/pz r4
+ mov.l LOCAL(x41200000),r3 ! bias + 20 - implicit 1
+ bt 0f
+ mov.l LOCAL(xc1200000),r3 ! sign + bias + 20 - implicit 1
+0: mov r1,r0 ! DBLRL & DBLRH
+ sub r5,r2
+ mov r2,r5
+ shld r2,DBLRH
+ cmp/pz r2
+ add r3,DBLRH
+ add #32,r2
+ shld r2,DBLRL
+ bf 0f
+ mov.w LOCAL(d0),DBLRL
+0: mov #20,r2
+ shld r2,r5
+ rts
+ sub r5,DBLRH
+LOCAL(ret0):
+ mov #0,DBLRL
+ rts
+ mov #0,DBLRH
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0): .word 0
+ .word 0x4120
+#else
+ .word 0x4120
+LOCAL(d0): .word 0
+#endif
+LOCAL(xc1200000): .long 0xc1200000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsidf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsisf.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsisf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsisf.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsisf.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,106 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsisf))
+ .global GLOBAL(floatsisf)
+ .balign 4
+GLOBAL(floatsisf):
+ cmp/pz r4
+ mov r4,r5
+ bt 0f
+ neg r4,r5
+0: mov.l LOCAL(c_clz_tab),r0
+ extu.w r5,r1
+ mov.w LOCAL(xff00),r3
+ cmp/eq r5,r1
+ mov #24,r2
+ bt 0f
+ mov r5,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r1
+ cmp/pz r4
+ mov.l LOCAL(x4a800000),r3 ! bias + 23 - implicit 1
+ bt 0f
+ mov.l LOCAL(xca800000),r3 ! sign + bias + 23 - implicit 1
+0: mov r5,r0
+ sub r1,r2
+ mov.l LOCAL(x80000000),r1
+ shld r2,r0
+ cmp/pz r2
+ add r3,r0
+ bt LOCAL(noround)
+ add #31,r2
+ shld r2,r5
+ add #-31,r2
+ rotl r5
+ cmp/hi r1,r5
+ mov #0,r3
+ addc r3,r0
+ mov #23,r1
+ shld r1,r2
+ rts
+ sub r2,r0
+ .balign 8
+LOCAL(noround):
+ mov #23,r1
+ tst r4,r4
+ shld r1,r2
+ bt LOCAL(ret0)
+ rts
+ sub r2,r0
+LOCAL(ret0):
+ rts
+ mov #0,r0
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(xca800000): .long 0xca800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsisf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssidf.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssidf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssidf.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssidf.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,96 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! floatunssidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsidf))
+ .global GLOBAL(floatunsidf)
+ .balign 4
+GLOBAL(floatunsidf):
+ mov.l LOCAL(c_clz_tab),r0
+ extu.w r4,r1
+ mov.w LOCAL(0xff00),r3
+ cmp/eq r4,r1
+ mov #21,r2
+ bt 0f
+ mov r4,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r5
+ mov r4,DBLRL
+ mov.l LOCAL(x41200000),r3 ! bias + 20 - implicit 1
+ tst r4,r4
+ mov r4,DBLRH
+ bt LOCAL(ret0)
+ sub r5,r2
+ mov r2,r5
+ shld r2,DBLRH
+ cmp/pz r2
+ add r3,DBLRH
+ add #32,r2
+ shld r2,DBLRL
+ bf 0f
+ mov.w LOCAL(d0),DBLRL
+0: mov #20,r2
+ shld r2,r5
+ rts
+ sub r5,DBLRH
+LOCAL(ret0):
+ mov r4,DBLRL
+ rts
+ mov r4,DBLRH
+
+LOCAL(0xff00): .word 0xff00
+ .balign 4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0): .word 0
+ .word 0x4120
+#else
+ .word 0x4120
+LOCAL(d0): .word 0
+#endif
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsidf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssisf.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssisf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssisf.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssisf.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,94 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsisf))
+ .global GLOBAL(floatunsisf)
+ .balign 4
+GLOBAL(floatunsisf):
+ mov.l LOCAL(c_clz_tab),r0
+ extu.w r4,r1
+ mov.w LOCAL(xff00),r3
+ cmp/eq r4,r1
+ mov #24,r2
+ bt 0f
+ mov r4,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r1
+ mov r4,r0
+ mov.l LOCAL(x4a800000),r3 ! bias + 23 - implicit 1
+ tst r4,r4
+ bt LOCAL(ret0)
+ !
+ sub r1,r2
+ mov.l LOCAL(x80000000),r1
+ shld r2,r0
+ cmp/pz r2
+ add r3,r0
+ bt LOCAL(noround)
+ add #31,r2
+ shld r2,r4
+ rotl r4
+ add #-31,r2
+ cmp/hi r1,r4
+ mov #0,r3
+ addc r3,r0
+LOCAL(noround):
+ mov #23,r1
+ shld r1,r2
+ rts
+ sub r2,r0
+LOCAL(ret0):
+ rts
+ nop
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsisf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/muldf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/muldf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/muldf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/muldf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,486 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! muldf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+! Normal numbers are multiplied in 53 or 54 cycles on SH4-200.
+
+FUNC(GLOBAL(muldf3))
+ .global GLOBAL(muldf3)
+LOCAL(inf_nan_denorm_or_zero_a):
+ mov.l r8,@-r15
+ sub r3,DBL0H ! isolate high fraction
+ mov.l @(4,r15),r8 ! original DBL0H (with sign & exp)
+ sub r3,r1 ! 0x7ff00000
+ mov.l LOCAL(x60000000),r3
+ shll16 r2 ! 0xffff0000
+ ! no stall here for sh4-200
+ !
+ tst r1,r8
+ mov.l r0,@-r15
+ bf LOCAL(inf_nan_a)
+ tst r1,r0 ! test for DBL1 inf, nan or small
+ bt LOCAL(ret_inf_nan_zero)
+LOCAL(normalize_arg):
+ tst DBL0H,DBL0H
+ bf LOCAL(normalize_arg53)
+ tst DBL0L,DBL0L
+ bt LOCAL(a_zero)
+ tst r2,DBL0L
+ mov DBL0L,DBL0H
+ bt LOCAL(normalize_arg16)
+ shlr16 DBL0H
+ mov.w LOCAL(m15),r2 ! 1-16
+ bra LOCAL(normalize_arg48)
+ shll16 DBL0L
+
+LOCAL(normalize_arg53):
+ tst r2,DBL0H
+ mov #1,r2
+ bt LOCAL(normalize_arg48)
+ mov DBL0H,r1
+ shlr16 r1
+ bra LOCAL(normalize_DBL0H)
+ mov #21-16,r3
+
+LOCAL(normalize_arg16):
+ mov.w LOCAL(m31),r2 ! 1-32
+ mov #0,DBL0L
+LOCAL(normalize_arg48):
+ mov DBL0H,r1
+ mov #21,r3
+LOCAL(normalize_DBL0H):
+ extu.b r1,r8
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r8,r1
+ !
+ bt 0f
+ shlr8 r1
+0:
+#ifdef __pic__
+ add r0,r1
+
+ mova LOCAL(c__clz_tab),r0
+
+#endif /* __pic__ */
+ mov.b @(r0,r1),r8
+ mov DBL0L,r1
+ mov.l @r15+,r0
+ bt 0f
+ add #-8,r3
+0: clrt
+ sub r8,r3
+ mov.w LOCAL(d20),r8
+ shld r3,DBL0H
+ shld r3,DBL0L
+ sub r3,r2
+ add #-32,r3
+ shld r3,r1
+ mov.l LOCAL(x00100000),r3
+ or r1,DBL0H
+ shld r8,r2
+ mov.l @r15+,r8
+ add r2,DBL1H
+ mov.l LOCAL(x001fffff),r2
+ dmulu.l DBL0L,DBL1L
+ bra LOCAL(arg_denorm_done)
+ or r3,r0 ! set implicit 1 bit
+
+LOCAL(a_zero):
+ mov.l @(4,r15),r8
+ add #8,r15
+LOCAL(zero):
+ mov #0,DBLRH
+ bra LOCAL(pop_ret)
+ mov #0,DBLRL
+
+! both inf / nan -> result is nan if at least one is none, else inf.
+! BBL0 inf/nan, DBL1 zero -> result is nan
+! DBL0 inf/nan, DBL1 finite -> result is DBL0 with sign adjustemnt
+LOCAL(inf_nan_a):
+ mov r8,DBLRH
+ mov.l @(4,r15),r8
+ add #8,r15
+ tst r1,r0 ! arg1 inf/nan ?
+ mov DBL0L,DBLRL
+ bt LOCAL(both_inf_nan)
+ tst DBL1L,DBL1L
+ mov DBL1H,r1
+ bf LOCAL(pop_ret)
+ add r1,r1
+ tst r1,r1
+ !
+ bf LOCAL(pop_ret)
+LOCAL(nan):
+ mov #-1,DBLRL
+ bra LOCAL(pop_ret)
+ mov #-1,DBLRH
+
+LOCAL(both_inf_nan):
+ or DBL1L,DBLRL
+ bra LOCAL(pop_ret)
+ or DBL1H,DBLRH
+
+LOCAL(ret_inf_nan_zero):
+ tst r1,r0
+ mov.l @(4,r15),r8
+ or DBL0L,DBL0H
+ bf/s LOCAL(zero)
+ add #8,r15
+ tst DBL0H,DBL0H
+ bt LOCAL(nan)
+LOCAL(inf_nan_b):
+ mov DBL1L,DBLRL
+ mov DBL1H,DBLRH
+LOCAL(pop_ret):
+ mov.l @r15+,DBL0H
+ add DBLRH,DBLRH
+
+
+ div0s DBL0H,DBL1H
+
+ rts
+ rotcr DBLRH
+
+ .balign 4
+/* Argument a has already been tested for being zero or denorm.
+ On the other side, we have to swap a and b so that we can share the
+ normalization code.
+ a: sign/exponent : @r15 fraction: DBL0H:DBL0L
+ b: sign/exponent: DBL1H fraction: r0:DBL1L */
+LOCAL(inf_nan_denorm_or_zero_b):
+ sub r3,r1 ! 0x7ff00000
+ mov.l @r15,r2 ! get original DBL0H
+ tst r1,DBL1H
+ sub r3,r0 ! isolate high fraction
+ bf LOCAL(inf_nan_b)
+ mov.l DBL1H,@r15
+ mov r0,DBL0H
+ mov.l r8,@-r15
+ mov r2,DBL1H
+ mov.l LOCAL(0xffff0000),r2
+ mov.l r1,@-r15
+ mov DBL1L,r1
+ mov DBL0L,DBL1L
+ bra LOCAL(normalize_arg)
+ mov r1,DBL0L
+
+LOCAL(d20):
+ .word 20
+LOCAL(m15):
+ .word -15
+LOCAL(m31):
+ .word -31
+LOCAL(xff):
+ .word 0xff
+
+ .balign 4
+LOCAL(0xffff0000): .word 0xffff0000
+
+ ! calculate a (DBL0H:DBL0L) * b (DBL1H:DBL1L)
+ .balign 4
+GLOBAL(muldf3):
+ mov.l LOCAL(xfff00000),r3
+ mov DBL1H,r0
+ dmulu.l DBL0L,DBL1L
+ mov.l LOCAL(x7fe00000),r1
+ sub r3,r0
+ mov.l DBL0H,@-r15
+ sub r3,DBL0H
+ tst r1,DBL0H
+ or r3,DBL0H
+ mov.l LOCAL(x001fffff),r2
+ bt LOCAL(inf_nan_denorm_or_zero_a)
+ tst r1,r0
+ or r3,r0 ! r0:DBL1L := b fraction ; u12.52
+ bt LOCAL(inf_nan_denorm_or_zero_b) ! T clear on fall-through
+LOCAL(arg_denorm_done):
+ and r2,r0 ! r0:DBL1L := b fraction ; u12.52
+ sts macl,r3
+ sts mach,r1
+ dmulu.l DBL0L,r0
+ and r2,DBL0H ! DBL0H:DBL0L := a fraction ; u12.52
+ mov.l r8,@-r15
+ mov #0,DBL0L
+ mov.l r9,@-r15
+ sts macl,r2
+ sts mach,r8
+ dmulu.l DBL0H,DBL1L
+ addc r1,r2
+
+ addc DBL0L,r8 ! add T; clears T
+
+ sts macl,r1
+ sts mach,DBL1L
+ dmulu.l DBL0H,r0
+ addc r1,r2
+ mov.l LOCAL(x7ff00000),DBL0H
+ addc DBL1L,r8 ! clears T
+ mov.l @(8,r15),DBL1L ! a sign/exp w/fraction
+ sts macl,DBLRL
+ sts mach,DBLRH
+ and DBL0H,DBL1L ! a exponent
+ mov.w LOCAL(x200),r9
+ addc r8,DBLRL
+ mov.l LOCAL(x3ff00000),r8 ! bias
+ addc DBL0L,DBLRH ! add T
+ cmp/hi DBL0L,r3 ! 32 guard bits -> sticky: T := r3 != 0
+ movt r3
+ tst r9,DBLRH ! T := fraction < 2
+ or r3,r2 ! DBLRH:DBLRL:r2 := result fraction; u24.72
+ bt/s LOCAL(shll12)
+ sub r8,DBL1L
+ mov.l LOCAL(x002fffff),r8
+ and DBL1H,DBL0H ! b exponent
+ mov.l LOCAL(x00100000),r9
+ add DBL0H,DBL1L ! result exponent - 1
+ tst r8,r2
+ mov.w LOCAL(m20),r8
+ subc DBL0L,r9
+ addc r2,r9 ! r2 value is still needed for denormal rounding
+ mov.w LOCAL(d11),DBL0L
+ rotcr r9
+ clrt
+ shld r8,r9
+ mov.w LOCAL(m21),r8
+ mov DBLRL,r3
+ shld DBL0L,DBLRL
+ addc r9,DBLRL
+ mov.l @r15+,r9
+ shld r8,r3
+ mov.l @r15+,r8
+ shld DBL0L,DBLRH
+ mov.l @r15+,DBL0H
+ addc r3,DBLRH
+ mov.l LOCAL(x7ff00000),DBL0L
+ add DBL1L,DBLRH ! implicit 1 adjusts exponent
+ mov.l LOCAL(xffe00000),r3
+ cmp/hs DBL0L,DBLRH
+ add DBLRH,DBLRH
+ bt LOCAL(ill_exp_11)
+ tst r3,DBLRH
+ bt LOCAL(denorm_exp0_11)
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+
+LOCAL(shll12):
+ mov.l LOCAL(x0017ffff),r8
+ extu.b DBLRH,DBLRH ! remove implicit 1.
+ mov.l LOCAL(x00080000),r9
+ and DBL1H,DBL0H ! b exponent
+ add DBL0H,DBL1L ! result exponent
+ tst r8,r2 ! rounding adjust for lower guard ...
+ mov.w LOCAL(m19),r8
+ subc DBL0L,r9 ! ... bits and round to even; clear T
+ addc r2,r9 ! r2 value is still needed for denormal rounding
+ mov.w LOCAL(d12),DBL0L
+ rotcr r9
+ clrt
+ shld r8,r9
+ mov.w LOCAL(m20),r8
+ mov DBLRL,r3
+ shld DBL0L,DBLRL
+ addc r9,DBLRL
+ mov.l @r15+,r9
+ shld r8,r3
+ mov.l @r15+,r8
+ shld DBL0L,DBLRH
+ mov.l LOCAL(x7ff00000),DBL0L
+ addc r3,DBLRH
+ mov.l @r15+,DBL0H
+ add DBL1L,DBLRH
+ mov.l LOCAL(xffe00000),r3
+ cmp/hs DBL0L,DBLRH
+ add DBLRH,DBLRH
+ bt LOCAL(ill_exp_12)
+ tst r3,DBLRH
+ bt LOCAL(denorm_exp0_12)
+LOCAL(insert_sign):
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+LOCAL(overflow):
+ mov r3,DBLRH
+ mov #0,DBLRL
+ bra LOCAL(insert_sign)
+ mov.l @r15+,r8
+
+LOCAL(denorm_exp0_11):
+ mov.l r8,@-r15
+ mov #-21,r8
+ mov.l r9,@-r15
+ bra LOCAL(denorm)
+ mov #-2,DBL1L ! one for denormal, and one for sticky bit
+
+LOCAL(ill_exp_11):
+ mov DBL1H,DBL1L
+ and r3,DBL0L ! 0x7fe00000
+ add DBL1L,DBL1L
+ mov.l r8,@-r15
+ cmp/hi DBL1L,DBL0L ! check if exp a was large
+ mov #-20,DBL0L
+ bf LOCAL(overflow)
+ mov #-21,r8
+ mov DBLRH,DBL1L
+ rotcr DBL1L ! shift in negative sign
+ mov.l r9,@-r15
+ shad DBL0L,DBL1L ! exponent ; s32
+ bra LOCAL(denorm)
+ add #-2,DBL1L ! add one for denormal, and one for sticky bit
+
+LOCAL(denorm_exp0_12):
+ mov.l r8,@-r15
+ mov #-20,r8
+ mov.l r9,@-r15
+ bra LOCAL(denorm)
+ mov #-2,DBL1L ! one for denormal, and one for sticky bit
+
+ .balign 4 ! also aligns LOCAL(denorm)
+LOCAL(ill_exp_12):
+ and r3,DBL0L ! 0x7fe00000
+ mov DBL1H,DBL1L
+ add DBL1L,DBL1L
+ mov.l r8,@-r15
+ cmp/hi DBL1L,DBL0L ! check if exp a was large
+ bf LOCAL(overflow)
+ mov DBLRH,DBL1L
+ rotcr DBL1L ! shift in negative sign
+ mov #-20,r8
+ shad r8,DBL1L ! exponent ; s32
+ mov.l r9,@-r15
+ add #-2,DBL1L ! add one for denormal, and one for sticky bit
+LOCAL(denorm):
+ not r3,r9 ! 0x001fffff
+ mov.l r10,@-r15
+ mov r2,r10
+ shld r8,r10 ! 11 or 12 lower bit valid
+ and r9,DBLRH ! Mask away vestiges of exponent.
+ add #32,r8
+ sub r3,DBLRH ! Make leading 1 explicit.
+ shld r8,r2 ! r10:r2 := unrounded result lowpart
+ shlr DBLRH ! compensate for doubling at end of normal code
+ sub DBLRL,r10 ! reconstruct effect of previous rounding
+ exts.b r10,r9
+ shad r3,r10 ! sign extension
+ mov #0,r3
+ clrt
+ addc r9,DBLRL ! Undo previous rounding.
+ mov.w LOCAL(m32),r9
+ addc r10,DBLRH
+ cmp/hi r3,r2
+ rotcl DBLRL ! fit in the rest of r2 as a sticky bit.
+ mov.l @r15+,r10
+ rotcl DBLRH
+ cmp/ge r9,DBL1L
+ bt LOCAL(small_norm_shift)
+ cmp/hi r3,DBLRL
+ add #32,DBL1L
+ movt DBLRL
+ cmp/gt r9,DBL1L
+ or DBLRH,DBLRL
+ bt/s LOCAL(small_norm_shift)
+ mov r3,DBLRH
+ mov r3,DBLRL ! exponent too negative to shift - return zero
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+ .balign 4
+LOCAL(small_norm_shift):
+ mov DBLRL,r2 ! stash away guard bits
+ shld DBL1L,DBLRL
+ mov DBLRH,DBL0L
+ shld DBL1L,DBLRH
+ mov.l LOCAL(x7fffffff),r9
+ add #32,DBL1L
+ shld DBL1L,r2
+ shld DBL1L,DBL0L
+ or DBL0L,DBLRL
+ shlr DBL0L
+ addc r2,r9
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+ addc r3,DBLRL
+ addc r3,DBLRH
+ div0s DBL0H,DBL1H
+ add DBLRH,DBLRH
+ rts
+ rotcr DBLRH
+
+
+LOCAL(x200):
+ .word 0x200
+LOCAL(m19):
+ .word -19
+LOCAL(m20):
+ .word -20
+LOCAL(m21):
+ .word -21
+LOCAL(m32):
+ .word -32
+LOCAL(d11):
+ .word 11
+LOCAL(d12):
+ .word 12
+ .balign 4
+LOCAL(x60000000):
+ .long 0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(xfff00000):
+ .long 0xfff00000
+LOCAL(x7fffffff):
+ .long 0x7fffffff
+LOCAL(x00100000):
+ .long 0x00100000
+LOCAL(x7fe00000):
+ .long 0x7fe00000
+LOCAL(x001fffff):
+ .long 0x001fffff
+LOCAL(x7ff00000):
+ .long 0x7ff00000
+LOCAL(x3ff00000):
+ .long 0x3ff00000
+LOCAL(x002fffff):
+ .long 0x002fffff
+LOCAL(xffe00000):
+ .long 0xffe00000
+LOCAL(x0017ffff):
+ .long 0x0017ffff
+LOCAL(x00080000):
+ .long 0x00080000
+ENDFUNC(GLOBAL(muldf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/mulsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/mulsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/mulsf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/mulsf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,246 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! mulsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+ .balign 4
+ .global GLOBAL(mulsf3)
+ FUNC(GLOBAL(mulsf3))
+GLOBAL(mulsf3):
+ mov.l LOCAL(x7f800000),r1
+ not r4,r2
+ mov r4,r3
+ not r5,r0
+ tst r1,r2
+ or r1,r3
+ bt/s LOCAL(inf_nan_arg0)
+ tst r1,r0
+ bt LOCAL(inf_nan_arg1)
+ tst r1,r5
+ mov r1,r2
+ shll8 r3
+ or r5,r1
+ bt/s LOCAL(zero_denorm_arg1)
+ shll8 r1
+ tst r2,r4
+ bt LOCAL(zero_denorm_arg0)
+ dmulu.l r3,r1
+ mov r4,r0
+ and r2,r0
+LOCAL(arg_norm):
+ and r5,r2
+ mov.l LOCAL(x3f800000),r3
+ sts mach,r1
+ sub r3,r0
+ sts macl,r3
+ add r2,r0
+ cmp/pz r1
+ mov.w LOCAL(x100),r2
+ bf/s LOCAL(norm_frac)
+ tst r3,r3
+ shll2 r1 /* Shift one up, replace leading 1 with 0. */
+ shlr r1
+ tst r3,r3
+LOCAL(norm_frac):
+ mov.w LOCAL(mx80),r3
+ bf LOCAL(round_frac)
+ tst r2,r1
+LOCAL(round_frac):
+ mov.l LOCAL(xff000000),r2
+ subc r3,r1 /* Even overflow gives right result: exp++, frac=0. */
+ shlr8 r1
+ add r1,r0
+ shll r0
+ bt LOCAL(ill_exp)
+ tst r2,r0
+ bt LOCAL(denorm0)
+ cmp/hs r2,r0
+ bt LOCAL(inf)
+LOCAL(insert_sign):
+ div0s r4,r5
+ rts
+ rotcr r0
+LOCAL(denorm0):
+ sub r2,r0
+ bra LOCAL(insert_sign)
+ shlr r0
+LOCAL(zero_denorm_arg1):
+ mov.l LOCAL(x60000000),r2 /* Check exp0 >= -64 */
+ add r1,r1
+ tst r1,r1 /* arg1 == 0 ? */
+ mov #0,r0
+ bt LOCAL(insert_sign) /* argument 1 is zero ==> return 0 */
+ tst r4,r2
+ bt LOCAL(insert_sign) /* exp0 < -64 ==> return 0 */
+ mov.l LOCAL(c__clz_tab),r0
+ mov r3,r2
+ mov r1,r3
+ bra LOCAL(arg_normalize)
+ mov r2,r1
+LOCAL(zero_denorm_arg0):
+ mov.l LOCAL(x60000000),r2 /* Check exp1 >= -64 */
+ add r3,r3
+ tst r3,r3 /* arg0 == 0 ? */
+ mov #0,r0
+ bt LOCAL(insert_sign) /* argument 0 is zero ==> return 0 */
+ tst r5,r2
+ bt LOCAL(insert_sign) /* exp1 < -64 ==> return 0 */
+ mov.l LOCAL(c__clz_tab),r0
+LOCAL(arg_normalize):
+ mov.l r7,@-r15
+ extu.w r3,r7
+ cmp/eq r3,r7
+ mov.l LOCAL(xff000000),r7
+ mov #-8,r2
+ bt 0f
+ tst r7,r3
+ mov #-16,r2
+ bt 0f
+ mov #-24,r2
+0:
+ mov r3,r7
+ shld r2,r7
+#ifdef __pic__
+ add r0,r7
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r7),r0
+ add #32,r2
+ mov r2,r7
+ mov #23,r2
+ sub r0,r7
+ mov.l LOCAL(x7f800000),r0
+ shld r7,r3
+ shld r2,r7
+ mov r0,r2
+ and r4,r0
+ sub r7,r0
+ mov.l @r15+,r7
+ bra LOCAL(arg_norm)
+ dmulu.l r3,r1
+#if 0 /* This is slightly slower, but could be used if table lookup causes
+ cache thrashing. */
+ bt LOCAL(insert_sign) /* exp1 < -64 ==> return 0 */
+ mov.l LOCAL(xff000000),r2
+ mov r4,r0
+LOCAL(arg_normalize):
+ tst r2,r3
+ bf LOCAL(arg_bit_norm)
+LOCAL(arg_byte_loop):
+ tst r2,r3
+ add r2,r0
+ shll8 r3
+ bt LOCAL(arg_byte_loop)
+ add r4,r0
+LOCAL(arg_bit_norm):
+ mov.l LOCAL(x7f800000),r2
+ rotl r3
+LOCAL(arg_bit_loop):
+ add r2,r0
+ bf/s LOCAL(arg_bit_loop)
+ rotl r3
+ rotr r3
+ rotr r3
+ sub r2,r0
+ bra LOCAL(arg_norm)
+ dmulu.l r3,r1
+#endif /* 0 */
+LOCAL(inf):
+ bra LOCAL(insert_sign)
+ mov r2,r0
+LOCAL(inf_nan_arg0):
+ bt LOCAL(inf_nan_both)
+ add r0,r0
+ cmp/eq #-1,r0 /* arg1 zero? -> NAN */
+ bt LOCAL(insert_sign)
+ mov r4,r0
+LOCAL(inf_insert_sign):
+ bra LOCAL(insert_sign)
+ add r0,r0
+LOCAL(inf_nan_both):
+ mov r4,r0
+ bra LOCAL(inf_insert_sign)
+ or r5,r0
+LOCAL(inf_nan_arg1):
+ mov r2,r0
+ add r0,r0
+ cmp/eq #-1,r0 /* arg0 zero? */
+ bt LOCAL(insert_sign)
+ bra LOCAL(inf_insert_sign)
+ mov r5,r0
+LOCAL(ill_exp):
+ cmp/pz r0
+ mov #-24,r3
+ bt LOCAL(inf)
+ add r1,r1
+ mov r0,r2
+ sub r1,r2 ! remove fraction to get back pre-rounding exponent.
+ sts mach,r0
+ sts macl,r1
+ shad r3,r2
+ mov r0,r3
+ shld r2,r0
+ add #32,r2
+ cmp/pz r2
+ shld r2,r3
+ bf LOCAL(zero)
+ or r1,r3
+ mov #-1,r1
+ tst r3,r3
+ mov.w LOCAL(x100),r3
+ bf/s LOCAL(denorm_round_up)
+ mov #-0x80,r1
+ tst r3,r0
+LOCAL(denorm_round_up):
+ mov #-7,r3
+ subc r1,r0
+ bra LOCAL(insert_sign)
+ shld r3,r0
+LOCAL(zero):
+ bra LOCAL(insert_sign)
+ mov #0,r0
+LOCAL(x100):
+ .word 0x100
+LOCAL(mx80):
+ .word -0x80
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x3f800000):
+ .long 0x3f800000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x60000000):
+ .long 0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ ENDFUNC(GLOBAL(mulsf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/muldf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/muldf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/muldf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/muldf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,601 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!multiplication of two double precision floating point numbers
+!Author:Aanchal Khanna
+!SH1 Support / Simplifications: Joern Rennecke
+!
+!Entry:
+!r4,r5:operand 1
+!
+!r6,r7:operand 2
+!
+!Exit:
+!r0,r1:result
+!
+!Notes: argument 1 is passed in regs r4 and r5 and argument 2 is passed in regs
+!r6 and r7, result is returned in regs r0 and r1. operand 1 is referred as op1
+!and operand 2 as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+ .text
+ .align 5
+ .global GLOBAL (muldf3)
+ FUNC (GLOBAL (muldf3))
+
+GLOBAL (muldf3):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+ mov r6,r1
+ mov r7,r6
+ mov r1,r7
+#endif
+ mov.l .L_mask_sign,r0
+ mov r4,r2
+
+ and r0,r2
+ mov #0,r1
+
+ shll r4
+ and r6,r0
+
+ xor r2,r0 !r0 contains the result's sign bit
+ shlr r4
+
+ mov.l .L_inf,r2
+ shll r6
+
+ mov r4,r3
+ shlr r6
+
+.L_chk_a_inv:
+ !chk if op1 is Inf/NaN
+ and r2,r3
+ mov.l r8,@-r15
+
+ cmp/eq r3,r2
+ mov.l .L_mask_high_mant,r8
+
+ mov r2,r3
+ bf .L_chk_b_inv
+
+ mov r8,r3
+ and r4,r8
+
+ cmp/hi r1,r8
+ bt .L_return_a !op1 NaN, return op1
+
+ cmp/hi r1,r5
+ mov r2,r8
+
+ bt .L_return_a !op1 NaN, return op1
+ and r6,r8
+
+ cmp/eq r8,r2
+ and r6,r3
+
+ bt .L_b_inv
+ cmp/eq r1,r6
+
+ bf .L_return_a !op1 Inf,op2= normal no return op1
+ cmp/eq r1,r7
+
+ bf .L_return_a !op1 Inf,op2= normal no return op1
+ mov.l @r15+,r8
+
+ rts
+ mov #-1,DBLRH !op1=Inf, op2=0,return nan
+
+.L_b_inv:
+ !op2 is NaN/Inf
+ cmp/hi r1,r7
+ mov r1,r2
+
+ mov r5,r1
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/hi r2,r6
+ or r4,r0
+
+ bt .L_return_b !op2=NaN,return op2
+ mov.l @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts !op1=Inf,op2=Inf,return Inf with sign
+ nop
+
+.L_chk_b_inv:
+ !Chk if op2 is NaN/Inf
+ and r6,r2
+ cmp/eq r3,r2
+
+ bf .L_chk_a_for_zero
+ and r6,r8
+
+ cmp/hi r1,r8
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/hi r1,r7
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/eq r5,r1
+ bf .L_return_b !op1=normal number,op2=Inf,return Inf
+
+ mov r7,r1
+ cmp/eq r4,r1
+
+ bf .L_return_b /* op1=normal number, op2=Inf,return Inf */
+ mov.l @r15+,r8
+
+ rts
+ mov #-1,DBLRH !op1=0,op2=Inf,return NaN
+
+.L_return_a:
+ mov r5,r1
+ or r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_return_b:
+ mov r7,r1
+ or r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_chk_a_for_zero:
+ !Chk if op1 is zero
+ cmp/eq r1,r4
+ bf .L_chk_b_for_zero
+
+ cmp/eq r1,r5
+ bf .L_chk_b_for_zero
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_chk_b_for_zero:
+ !op1=0,chk if op2 is zero
+ cmp/eq r1,r6
+ mov r1,r3
+
+ mov.l .L_inf,r1
+ bf .L_normal_nos
+
+ cmp/eq r3,r7
+ bf .L_normal_nos
+
+ mov r3,r1
+ mov.l @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ nop
+
+.L_normal_nos:
+ !op1 and op2 are normal nos
+ mov.l r9,@-r15
+ mov r4,r3
+
+ mov #-20,r9
+ and r1,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r9,r2
+#else
+ SHLR20 (r2)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r9,r3
+#else
+ SHLR20 (r3)
+#endif
+ cmp/pl r3
+
+ bf .L_norm_a !normalize op1
+.L_chk_b:
+ cmp/pl r2
+ bf .L_norm_b !normalize op2
+
+.L_mul1:
+ add r3,r2
+ mov.l .L_1023,r1
+
+ !resultant exponent in r2
+ add r1,r2
+ mov.l .L_2047,r1
+
+ !Chk the exponent for overflow
+ cmp/ge r1,r2
+ and r8,r4
+
+ bt .L_return_inf
+ mov.l .L_imp_bit,r1
+
+ or r1,r4
+ and r8,r6
+
+ or r1,r6
+ clrt
+
+ !multiplying the mantissas
+ DMULU_SAVE
+ DMULUL (r7,r5,r1) !bits 0-31 of product
+
+ DMULUH (r3)
+
+ DMULUL (r4,r7,r8)
+
+ addc r3,r8
+
+ DMULUH (r3)
+
+ movt r9
+ clrt
+
+ DMULUL (r5,r6,r7)
+
+ addc r7,r8 !bits 63-32 of product
+
+ movt r7
+ add r7,r9
+
+ DMULUH (r7)
+
+ add r7,r3
+
+ add r9,r3
+ clrt
+
+ DMULUL (r4,r6,r7)
+
+ addc r7,r3 !bits 64-95 of product
+
+ DMULUH (r7)
+ DMULU_RESTORE
+
+ mov #0,r5
+ addc r5,r7 !bits 96-105 of product
+
+ cmp/eq r5,r1
+ mov #1,r4
+
+ bt .L_skip
+ or r4,r8
+.L_skip:
+ mov.l .L_106_bit,r4
+ mov r8,r9
+
+.L_chk_extra_msb:
+ !chk if exra MSB is generated
+ and r7,r4
+ cmp/eq r5,r4
+
+ mov #12,r4
+ SL(bf, .L_shift_rt_by_1,
+ mov #31,r5)
+
+.L_pack_mantissa:
+ !scale the mantissa t0 53 bits
+ mov #-19,r6
+ mov.l .L_mask_high_mant,r5
+
+ SHLRN (19, r6, r8)
+
+ and r3,r5
+
+ shlr r8
+ movt r1
+
+ SHLLN (12, r4, r5)
+
+ add #-1,r6
+
+ or r5,r8 !lower bits of resulting mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r3
+#else
+ SHLR20 (r3)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r4,r7
+#else
+ SHLL12 (r7)
+#endif
+ clrt
+
+ or r7,r3 !higher bits of resulting mantissa
+ mov #0,r7
+
+ !chk the exponent for underflow
+ cmp/ge r2,r7
+ bt .L_underflow
+
+ addc r1,r8 !rounding
+ mov r8,r1
+
+ addc r7,r3 !rounding
+ mov.l .L_mask_22_bit,r5
+
+ and r3,r5
+ !chk if extra msb is generated after rounding
+ cmp/eq r7,r5
+
+ mov.l .L_mask_high_mant,r8
+ bt .L_pack_result
+
+ add #1,r2
+ mov.l .L_2047,r6
+
+ cmp/ge r6,r2
+
+ bt .L_return_inf
+ shlr r3
+
+ rotcr r1
+
+.L_pack_result:
+ !pack the result, r2=exponent, r3=higher mantissa, r1=lower mantissa
+ !r0=sign bit
+ mov #20,r6
+ and r8,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ SHLL20 (r2)
+#endif
+ or r3,r0
+
+ or r2,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_norm_a:
+ !normalize op1
+ shll r5
+ mov.l .L_imp_bit,r1
+
+ rotcl r4
+ add #-1,r3
+
+ tst r1,r4
+ bt .L_norm_a
+
+ bra .L_chk_b
+ add #1,r3
+
+.L_norm_b:
+ !normalize op2
+ shll r7
+ mov.l .L_imp_bit,r1
+
+ rotcl r6
+ add #-1,r2
+
+ tst r1,r6
+ bt .L_norm_b
+
+ bra .L_mul1
+ add #1,r2
+
+.L_shift_rt_by_1:
+ !adjust the extra msb
+
+ add #1,r2 !add 1 to exponent
+ mov.l .L_2047,r6
+
+ cmp/ge r6,r2
+ mov #20,r6
+
+ bt .L_return_inf
+ shlr r7 !r7 contains bit 96-105 of product
+
+ rotcr r3 !r3 contains bit 64-95 of product
+
+ rotcr r8 !r8 contains bit 32-63 of product
+ bra .L_pack_mantissa
+
+ rotcr r1 !r1 contains bit 31-0 of product
+
+.L_return_inf:
+ !return Inf
+ mov.l .L_inf,r2
+ mov #0,r1
+
+ or r2,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_underflow:
+ !check if the result needs to be denormalized
+ mov #-53,r1
+ add #1,r2
+
+ cmp/gt r2,r1
+ mov #32,r4
+
+ add #-2,r2
+ bt .L_return_zero
+
+ add r2,r4
+ mov r7,r1
+
+ cmp/ge r7,r4
+ mov r2,r6
+
+ mov #-54,r2
+ bt .L_denorm
+
+ mov #-32,r6
+
+.L_denorm:
+ !denormalize the result
+ shlr r8
+ rotcr r1
+
+ shll r8
+ add #1,r6
+
+ shlr r3
+ rotcr r8
+
+ cmp/eq r7,r6
+ bf .L_denorm
+
+ mov r4,r6
+ cmp/eq r2,r4
+
+ bt .L_break
+ mov r7,r5
+
+ cmp/gt r6,r7
+ bf .L_break
+
+ mov r2,r4
+ mov r1,r5
+
+ mov r7,r1
+ bt .L_denorm
+
+.L_break:
+ mov #0,r2
+
+ cmp/gt r1,r2
+
+ addc r2,r8
+ mov.l .L_comp_1,r4
+
+ addc r7,r3
+ or r3,r0
+
+ cmp/eq r9,r7
+ bf .L_return
+
+ cmp/eq r7,r5
+ mov.l .L_mask_sign,r6
+
+ bf .L_return
+ cmp/eq r1,r6
+
+ bf .L_return
+ and r4,r8
+
+.L_return:
+ mov.l @r15+,r9
+ mov r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_return_zero:
+ mov.l @r15+,r9
+ mov r7,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+ .align 2
+
+.L_mask_high_mant:
+ .long 0x000fffff
+.L_inf:
+ .long 0x7ff00000
+.L_mask_sign:
+ .long 0x80000000
+.L_1023:
+ .long -1023
+.L_2047:
+ .long 2047
+.L_imp_bit:
+ .long 0x00100000
+.L_mask_22_bit:
+ .long 0x00200000
+.L_106_bit:
+ .long 0x00000200
+.L_comp_1:
+ .long 0xfffffffe
+
+ENDFUNC (GLOBAL (muldf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/mulsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/mulsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/mulsf3.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/mulsf3.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,352 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for multiplying two floating point numbers
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 and r5
+! Result: r0
+
+! The arguments are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (mulsf3)
+ FUNC (GLOBAL (mulsf3))
+
+GLOBAL (mulsf3):
+ ! Extract the sign bits
+ mov.l .L_sign,r3
+ mov r3,r0
+
+ and r4,r3 ! sign bit for op1
+ mov.l .L_sign_mask,r6
+
+ ! Mask out the sign bit from op1 and op2
+ and r5,r0 ! sign bit for op2
+ mov.l .L_inf,r2
+
+ and r6,r4
+ xor r3,r0 ! Final sign in r0
+
+ and r6,r5
+ tst r4,r4
+
+ ! Check for zero
+ mov r5,r7
+ ! Check op1 for zero
+ SL(bt, .L_op1_zero,
+ mov r4,r6)
+
+ tst r5,r5
+ bt .L_op2_zero ! op2 is zero
+
+ ! Extract the exponents
+ and r2,r6 ! Exponent of op1
+ cmp/eq r2,r6
+
+ and r2,r7
+ bt .L_inv_op1 ! op1 is NaN or Inf
+
+ mov.l .L_mant,r3
+ cmp/eq r2,r7
+
+ and r3,r4 ! Mantissa of op1
+ bt .L_ret_op2 ! op2 is Nan or Inf
+
+ and r3,r5 ! Mantissa of op2
+
+ mov #-23,r3
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+ SHLR23 (r7)
+#else
+ shld r3,r6
+ shld r3,r7
+#endif
+ ! Check for denormals
+ mov.l .L_24bit,r3
+ tst r6,r6
+
+ bt .L_norm_op1 ! op1 is denormal
+ add #-127,r6 ! Unbias op1's exp
+
+ tst r7,r7
+ bt .L_norm_op2 ! op2 is denormal
+
+ add #-127,r7 ! Unbias op2's exp
+
+.L_multiply:
+ add r6,r7 ! Final exponent in r7
+ mov.l .L_24bit,r1
+
+ ! set 24th bit of mantissas
+ mov #127,r3
+ or r1,r4
+
+ DMULU_SAVE
+
+ ! Multiply
+ or r1,r5
+ DMULUL (r4,r5,r4)
+
+ DMULUH (r5)
+
+ DMULU_RESTORE
+
+ mov.l .L_16bit,r6
+
+ ! Check for extra MSB generated
+ tst r5,r6
+
+ mov.l .L_255,r1
+ bf .L_shift_by_1 ! Adjust the extra MSB
+
+! Normalize the result with rounding
+.L_epil:
+ ! Bias the exponent
+ add #127,r7
+ cmp/ge r1,r7
+
+ ! Check exponent overflow and underflow
+ bt .L_ret_inf
+
+ cmp/pl r7
+ bf .L_denorm
+
+.L_epil_0:
+ mov #-23,r3
+ shll r5
+ mov #0,r6
+
+! Fit resultant mantissa in 24 bits
+! Apply default rounding
+.L_loop_epil_0:
+ tst r3,r3
+ bt .L_loop_epil_out
+
+ add #1,r3
+ shlr r4
+
+ bra .L_loop_epil_0
+ rotcr r6
+
+! Round mantissa
+.L_loop_epil_out:
+ shll8 r5
+ or r5,r4
+
+ mov.l .L_mant,r2
+ mov #23,r3
+
+ ! Check last bit shifted out of result
+ tst r6,r6
+ bt .L_epil_2
+
+ ! Round
+ shll r6
+ movt r5
+
+ add r5,r4
+
+ ! If this is the only ON bit shifted
+ ! Round towards LSB = 0
+ tst r6,r6
+ bf .L_epil_2
+
+ shlr r4
+ shll r4
+
+.L_epil_2:
+ ! Rounding may have produced extra MSB.
+ mov.l .L_25bit,r5
+ tst r4,r5
+
+ bt .L_epil_1
+
+ add #1,r7
+ shlr r4
+
+.L_epil_1:
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r7)
+#else
+ shld r3,r7
+#endif
+
+ and r2,r4
+
+ or r7,r4
+ rts
+ or r4,r0
+
+.L_denorm:
+ mov #0,r3
+
+.L_den_1:
+ shlr r5
+ rotcr r4
+
+ cmp/eq r3,r7
+ bt .L_epil_0
+
+ bra .L_den_1
+ add #1,r7
+
+
+! Normalize the first argument
+.L_norm_op1:
+ shll r4
+ tst r3,r4
+
+ add #-1,r6
+ bt .L_norm_op1
+
+ ! The biasing is by 126
+ add #-126,r6
+ tst r7,r7
+
+ bt .L_norm_op2
+
+ bra .L_multiply
+ add #-127,r7
+
+! Normalize the second argument
+.L_norm_op2:
+ shll r5
+ tst r3,r5
+
+ add #-1,r7
+ bt .L_norm_op2
+
+ bra .L_multiply
+ add #-126,r7
+
+! op2 is zero. Check op1 for exceptional cases
+.L_op2_zero:
+ mov.l .L_inf,r2
+ and r2,r6
+
+ ! Check if op1 is deterministic
+ cmp/eq r2,r6
+ SL(bf, .L_ret_op2,
+ mov #1,r1)
+
+ ! Return NaN
+ rts
+ mov #-1,r0
+
+! Adjust the extra MSB
+.L_shift_by_1:
+ shlr r5
+ rotcr r4
+
+ add #1,r7 ! Show the shift in exponent
+
+ cmp/gt r3,r7
+ bf .L_epil
+
+ ! The resultant exponent is invalid
+ mov.l .L_inf,r1
+ rts
+ or r1,r0
+
+.L_ret_op1:
+ rts
+ or r4,r0
+
+! op1 is zero. Check op2 for exceptional cases
+.L_op1_zero:
+ mov.l .L_inf,r2
+ and r2,r7
+
+ ! Check if op2 is deterministic
+ cmp/eq r2,r7
+ SL(bf, .L_ret_op1,
+ mov #1,r1)
+
+ ! Return NaN
+ rts
+ mov #-1,r0
+
+.L_inv_op1:
+ mov.l .L_mant,r3
+ mov r4,r6
+
+ and r3,r6
+ tst r6,r6
+
+ bf .L_ret_op1 ! op1 is Nan
+ ! op1 is not Nan. It is Inf
+
+ cmp/eq r2,r7
+ bf .L_ret_op1 ! op2 has a valid exponent
+
+! op2 has a invalid exponent. It could be Inf, -Inf, Nan.
+! It doesn't make any difference.
+.L_ret_op2:
+ rts
+ or r5,r0
+
+.L_ret_inf:
+ rts
+ or r2,r0
+
+.L_ret_zero:
+ mov #0,r2
+ rts
+ or r2,r0
+
+
+ .align 2
+.L_mant:
+ .long 0x007FFFFF
+
+.L_inf:
+ .long 0x7F800000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_25bit:
+ .long 0x01000000
+
+.L_16bit:
+ .long 0x00008000
+
+.L_sign:
+ .long 0x80000000
+
+.L_sign_mask:
+ .long 0x7FFFFFFF
+
+.L_255:
+ .long 0x000000FF
+
+ENDFUNC (GLOBAL (mulsf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/ieee-754-df.S gcc-4.5.0/gcc/config/sh/ieee-754-df.S
--- gcc-4.5.0/gcc/config/sh/ieee-754-df.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/ieee-754-df.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,789 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_DOUBLE__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Double-precision floating-point emulation.
+ We handle NANs, +-infinity, and +-zero.
+ However, we assume that for NANs, the topmost bit of the fraction is set. */
+
+#ifdef __LITTLE_ENDIAN__
+#define DBL0L r4
+#define DBL0H r5
+#define DBL1L r6
+#define DBL1H r7
+#define DBLRL r0
+#define DBLRH r1
+#else
+#define DBL0L r5
+#define DBL0H r4
+#define DBL1L r7
+#define DBL1H r6
+#define DBLRL r1
+#define DBLRH r0
+#endif
+
+#ifdef __SH_FPU_ANY__
+#define RETURN_R0_MAIN
+#define RETURN_R0 bra LOCAL(return_r0)
+#define RETURN_FR0
+LOCAL(return_r0): \
+ lds r0,fpul; \
+ rts; \
+ fsts fpul,fr0
+#define ARG_TO_R4 \
+ flds fr4,fpul; \
+ sts fpul,r4
+#else /* ! __SH_FPU_ANY__ */
+#define RETURN_R0_MAIN rts
+#define RETURN_R0 rts
+#define RETURN_FR0
+#define ARG_TO_R4
+#endif /* ! __SH_FPU_ANY__ */
+
+#ifdef L_nedf2
+/* -ffinite-math-only -mb inline version, T := r4:DF == r6:DF
+ cmp/eq r5,r7
+ mov r4,r0
+ bf 0f
+ cmp/eq r4,r6
+ bt 0f
+ or r6,r0
+ add r0,r0
+ or r5,r0
+ tst r0,r0
+ 0: */
+ .balign 4
+ .global GLOBAL(nedf2)
+ HIDDEN_FUNC(GLOBAL(nedf2))
+GLOBAL(nedf2):
+ cmp/eq DBL0L,DBL1L
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ bf LOCAL(ne)
+ cmp/eq DBL0H,DBL1H
+ not DBL0H,r0
+ bt LOCAL(check_nan)
+ mov DBL0H,r0
+ or DBL1H,r0
+ add r0,r0
+ rts
+ or DBL0L,r0
+LOCAL(check_nan):
+ tst r1,r0
+ rts
+ movt r0
+LOCAL(ne):
+ rts
+ mov #1,r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(nedf2))
+#endif /* L_nedf2 */
+
+#ifdef L_unorddf2
+ .balign 4
+ .global GLOBAL(unorddf2)
+ HIDDEN_FUNC(GLOBAL(unorddf2))
+GLOBAL(unorddf2):
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ not DBL0H,r0
+ tst r1,r0
+ not r6,r0
+ bt LOCAL(unord)
+ tst r1,r0
+LOCAL(unord):
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(unorddf2))
+#endif /* L_unorddf2 */
+
+#if defined(L_gtdf2t) || defined(L_gtdf2t_trap)
+#ifdef L_gtdf2t
+#define fun_label GLOBAL(gtdf2t)
+#else
+#define fun_label GLOBAL(gtdf2t_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater, the result true, unless
+ any of them is a nan (but infinity is fine), or both values are
+ +- zero. Otherwise, the result false. */
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ cmp/pz DBL0H
+ not DBL1H,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov DBL0H,r0
+ bt LOCAL(nan) /* return zero if DBL1 is NAN. */
+ cmp/eq DBL1H,DBL0H
+ bt LOCAL(cmp_low)
+ cmp/gt DBL1H,DBL0H
+ or DBL1H,r0
+ SLC(bf, LOCAL(check_nan),
+ cmp/gt DBL0H,r1)
+ add r0,r0
+ bf LOCAL(nan) /* return zero if DBL0 is NAN. */
+ or DBL0L,r0
+ rts
+ or DBL1L,r0 /* non-zero unless both DBL0 and DBL1 are +-zero. */
+LOCAL(cmp_low):
+ cmp/hi DBL1L,DBL0L
+ rts
+ movt r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan) /* return zero if DBL1 is NAN. */
+ cmp/eq DBL1H,DBL0H
+ SLC(bt, LOCAL(neg_cmp_low),
+ cmp/hi DBL0L,DBL1L)
+ not DBL0H,r0
+ tst r1,r0
+ bt LOCAL(nan) /* return zero if DBL0 is NAN. */
+ cmp/hi DBL0H,DBL1H
+ SLI(rts !,)
+ SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+ SLI(cmp/hi DBL0L,DBL1L)
+ rts
+ movt r0
+LOCAL(check_nan):
+#ifdef L_gtdf2t
+LOCAL(nan):
+ rts
+ mov #0,r0
+#else
+ SLI(cmp/gt DBL0H,r1)
+ bf LOCAL(nan) /* return zero if DBL0 is NAN. */
+ rts
+ mov #0,r0
+LOCAL(nan):
+ mov #0,r0
+ trapa #0
+#endif
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(fun_label)
+#endif /* defined(L_gtdf2t) || defined(L_gtdf2t_trap) */
+
+#ifdef L_gedf2f
+ .balign 4
+ .global GLOBAL(gedf2f)
+ HIDDEN_FUNC(GLOBAL(gedf2f))
+GLOBAL(gedf2f):
+ /* If the raw values compare greater or equal, the result is
+ true, unless any of them is a nan, or both are the
+ same infinity. If both are -+zero, the result is true;
+ otherwise, it is false.
+ We use 0 as true and nonzero as false for this function. */
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ cmp/pz DBL1H
+ not DBL0H,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov DBL0H,r0
+ bt LOCAL(nan)
+ cmp/eq DBL0H,DBL1H
+ bt LOCAL(cmp_low)
+ cmp/gt DBL0H,DBL1H
+ or DBL1H,r0
+ SLC(bf, LOCAL(check_nan),
+ cmp/ge r1,DBL1H)
+ add r0,r0
+ bt LOCAL(nan)
+ or DBL0L,r0
+ rts
+ or DBL1L,r0
+LOCAL(cmp_low):
+ cmp/hi DBL0L,DBL1L
+#if defined(L_gedf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+ rts
+ movt r0
+#if defined(L_gedf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+ SLI(cmp/ge r1,DBL1H)
+LOCAL(nan):
+ rts
+ movt r0
+#elif defined(L_gedf2f_trap)
+LOCAL(check_nan):
+ SLI(cmp/ge r1,DBL1H)
+ bt LOCAL(nan)
+ rts
+LOCAL(nan):
+ movt r0
+ trapa #0
+#endif /* L_gedf2f_trap */
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ cmp/eq DBL0H,DBL1H
+ not DBL1H,r0
+ SLC(bt, LOCAL(neg_cmp_low),
+ cmp/hi DBL1L,DBL0L)
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi DBL1H,DBL0H
+ SLI(rts !,)
+ SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+ SLI(cmp/hi DBL1L,DBL0L)
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(gedf2f))
+#endif /* L_gedf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_extendsfdf2
+ .balign 4
+ .global GLOBAL(extendsfdf2)
+ FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+ ARG_TO_R4
+ mov.l LOCAL(x7f800000),r3
+ mov r4,DBLRL
+ tst r3,r4
+ bt LOCAL(zero_denorm)
+ mov.l LOCAL(xe0000000),r2
+ rotr DBLRL
+ rotr DBLRL
+ rotr DBLRL
+ and r2,DBLRL
+ mov r4,DBLRH
+ not r4,r2
+ tst r3,r2
+ mov.l LOCAL(x38000000),r2
+ bf 0f
+ add r2,r2 ! infinity / NaN adjustment
+0: shll DBLRH
+ shlr2 DBLRH
+ shlr2 DBLRH
+ add DBLRH,DBLRH
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+LOCAL(zero_denorm):
+ mov.l r4,@-r15
+ add r4,r4
+ tst r4,r4
+ bt LOCAL(zero)
+ shlr8 r3 /* 0x007f8000 */
+ mov.w LOCAL(x389),r2
+LOCAL(shift_byte):
+ tst r3,r4
+ shll8 r4
+ SL(bt, LOCAL(shift_byte),
+ add #-8,r2)
+LOCAL(shift_bit):
+ shll r4
+ SL(bf, LOCAL(shift_bit),
+ add #-1,r2)
+ mov #0,DBLRL
+ mov r4,DBLRH
+ mov.l @r15+,r4
+ shlr8 DBLRH
+ shlr2 DBLRH
+ shlr DBLRH
+ rotcr DBLRL
+ cmp/gt r4,DBLRH ! get sign
+ rotcr DBLRH
+ rotcr DBLRL
+ shll16 r2
+ shll8 r2
+ rts
+ add r2,DBLRH
+LOCAL(zero):
+ mov.l @r15+,DBLRH
+ rts
+ mov #0,DBLRL
+LOCAL(x389): .word 0x389
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(xe0000000):
+ .long 0xe0000000
+ ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+ .balign 4
+ .global GLOBAL(truncdfsf2)
+ FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+ mov.l LOCAL(x38000000),r3 ! exponent adjustment DF -> SF
+ mov DBL0H,r1
+ mov.l LOCAL(x70000000),r2 ! mask for out-of-range exponent bits
+ mov DBL0H,r0
+ mov.l DBL0L,@-r15
+ sub r3,r1
+ tst r2,r1
+ shll8 r0 !
+ shll2 r0 ! Isolate highpart fraction.
+ shll2 r0 !
+ bf LOCAL(ill_exp)
+ shll2 r1
+ mov.l LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits. */
+ shll2 r1
+ mov.l LOCAL(xff000000),r3
+ shlr8 r0
+ tst r2,DBL0L /* Check if msb guard bit wants rounding up. */
+ shlr16 DBL0L
+ shlr8 DBL0L
+ shlr2 DBL0L
+ SL1(bt, LOCAL(add_frac),
+ shlr2 DBL0L)
+ add #1,DBL0L
+LOCAL(add_frac):
+ add DBL0L,r0
+ mov.l LOCAL(x01000000),r2
+ and r3,r1
+ mov.l @r15+,DBL0L
+ add r1,r0
+ tst r3,r0
+ bt LOCAL(inf_denorm0)
+ cmp/hs r3,r0
+LOCAL(denorm_noup_sh1):
+ bt LOCAL(inf)
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+RETURN_R0_MAIN
+ rotcr r0
+RETURN_FR0
+LOCAL(inf_denorm0): ! We might need to undo previous rounding.
+ mov.l LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits. */
+ tst r1,r1
+ bf LOCAL(inf)
+ add #-1,r0
+ tst r3,DBL0L /* Check if msb guard bit was rounded up. */
+ mov.l LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits. */
+ addc r2,r0
+ shlr r0
+ tst r3,DBL0L /* Check if msb guard bit wants rounding up. */
+#ifdef DELAYED_BRANCHES
+ bt/s LOCAL(denorm_noup)
+#else
+ bt LOCAL(denorm_noup_sh1)
+#endif
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ add #1,r0
+LOCAL(denorm_noup):
+ RETURN_R0
+ rotcr r0
+LOCAL(ill_exp):
+ div0s DBL0H,r1
+ mov.l LOCAL(x7ff80000),r2
+ add r1,r1
+ bf LOCAL(inf_nan)
+ mov.w LOCAL(m32),r3 /* Handle denormal or zero. */
+ shlr16 r1
+ exts.w r1,r1
+ shll2 r1
+ add r1,r1
+ shlr8 r1
+ exts.w r1,r1
+ add #-8,r1 /* Go from 9 to 1 guard bit in MSW. */
+ cmp/gt r3,r1
+ mov.l @r15+,r3 /* DBL0L */
+ bf LOCAL(zero)
+ mov.l DBL0L, @-r15
+ shll8 DBL0L
+ rotcr r0 /* Insert leading 1. */
+ shlr16 r3
+ shll2 r3
+ add r3,r3
+ shlr8 r3
+ cmp/pl DBL0L /* Check lower 23 guard bits if guard bit 23 is 0. */
+ addc r3,r0 /* Assemble fraction with compressed guard bits. */
+ mov.l @r15+,DBL0L
+ mov #0,r2
+ neg r1,r1
+LOCAL(denorm_loop):
+ shlr r0
+ rotcl r2
+ dt r1
+ bf LOCAL(denorm_loop)
+ tst #2,r0
+ rotcl r0
+ tst r2,r2
+ rotcl r0
+ xor #3,r0
+ add #3,r0 /* Even overflow gives the correct result. */
+ shlr2 r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(zero):
+ mov #0,r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(inf_nan):
+ not DBL0H,r0
+ tst r2,r0
+ mov.l @r15+,DBL0L
+ bf LOCAL(inf)
+ RETURN_R0
+ mov #-1,r0 /* NAN */
+LOCAL(inf): /* r2 must be positive here. */
+ mov.l LOCAL(xffe00000),r0
+ div0s r2,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(m32):
+ .word -32
+ .balign 4
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(x70000000):
+ .long 0x70000000
+LOCAL(x2fffffff):
+ .long 0x2fffffff
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x5fffffff):
+ .long 0x5fffffff
+LOCAL(x7ff80000):
+ .long 0x7ff80000
+LOCAL(xffe00000):
+ .long 0xffe00000
+ ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+#ifdef L_add_sub_df3
+#include "IEEE-754/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift. Supporting SH1 / SH2 here would
+ make this code too hard to maintain, so if you want to add SH1 / SH2
+ support, do it in a separate copy. */
+#ifdef DYN_SHIFT
+#ifdef L_extendsfdf2
+ .balign 4
+ .global GLOBAL(extendsfdf2)
+ FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+ ARG_TO_R4
+ mov.l LOCAL(x7f800000),r2
+ mov #29,r3
+ mov r4,DBLRL
+ not r4,DBLRH
+ tst r2,r4
+ shld r3,DBLRL
+ bt LOCAL(zero_denorm)
+ mov #-3,r3
+ tst r2,DBLRH
+ mov r4,DBLRH
+ mov.l LOCAL(x38000000),r2
+ bt/s LOCAL(inf_nan)
+ shll DBLRH
+ shld r3,DBLRH
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+ .balign 4
+LOCAL(inf_nan):
+ shld r3,DBLRH
+ add r2,r2
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+LOCAL(zero_denorm):
+ mov.l r4,@-r15
+ add r4,r4
+ tst r4,r4
+ extu.w r4,r2
+ bt LOCAL(zero)
+ cmp/eq r4,r2
+ extu.b r4,r1
+ bf/s LOCAL(three_bytes)
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r4,r1
+ mov #22,DBLRH
+ bt LOCAL(one_byte)
+ shlr8 r2
+ mov #14,DBLRH
+LOCAL(one_byte):
+#ifdef __pic__
+ add r0,r2
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r2),r2
+ mov #21,r3
+ mov.w LOCAL(x0),DBLRL
+ sub r2,DBLRH
+LOCAL(norm_shift):
+ shld DBLRH,r4
+ mov.l @r15+,r2
+ shld r3,DBLRH
+ mov.l LOCAL(xb7ffffff),r3
+ add r4,DBLRH
+ cmp/pz r2
+ mov r2,r4
+ rotcr DBLRH
+ rts
+ sub r3,DBLRH
+LOCAL(three_bytes):
+ mov r4,r2
+ shlr16 r2
+#ifdef __pic__
+ add r0,r2
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r2),r2
+ mov #21,r3
+ mov #6-32,DBLRH
+ sub r2,DBLRH
+ mov r4,DBLRL
+ shld DBLRH,DBLRL
+ bra LOCAL(norm_shift)
+ add #32,DBLRH
+LOCAL(zero):
+ rts /* DBLRL has already been zeroed above. */
+ mov.l @r15+,DBLRH
+LOCAL(x0):
+ .word 0
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(xb7ffffff):
+ /* Flip sign back, do exponent adjustment, and remove leading one. */
+ .long 0x80000000 + 0x38000000 - 1
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+ .balign 4
+ .global GLOBAL(truncdfsf2)
+ FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+ mov.l LOCAL(x38000000),r3
+ mov DBL0H,r1
+ mov.l LOCAL(x70000000),r2
+ mov DBL0H,r0
+ sub r3,r1
+ mov.l DBL0L,@-r15
+ tst r2,r1
+ mov #12,r3
+ shld r3,r0 ! Isolate highpart fraction.
+ bf LOCAL(ill_exp)
+ shll2 r1
+ mov.l LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits. */
+ shll2 r1
+ mov.l LOCAL(xff000000),r3
+ shlr8 r0
+ tst r2,DBL0L /* Check if msb guard bit wants rounding up. */
+ mov #-28,r2
+ bt/s LOCAL(add_frac)
+ shld r2,DBL0L
+ add #1,DBL0L
+LOCAL(add_frac):
+ add DBL0L,r0
+ mov.l LOCAL(x01000000),r2
+ and r3,r1
+ mov.l @r15+,DBL0L
+ add r1,r0
+ tst r3,r0
+ bt LOCAL(inf_denorm0)
+#if 0 // No point checking overflow -> infinity if we dont't raise a signal.
+ cmp/hs r3,r0
+ bt LOCAL(inf)
+#endif
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ RETURN_R0_MAIN
+ rotcr r0
+RETURN_FR0
+LOCAL(inf_denorm0): ! We might need to undo previous rounding.
+ mov.l LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits. */
+ tst r1,r1
+ bf LOCAL(inf)
+ add #-1,r0
+ tst r3,DBL0L /* Check if msb guard bit was rounded up. */
+ mov.l LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits. */
+ addc r2,r0
+ shlr r0
+ tst r3,DBL0L /* Check if msb guard bit wants rounding up. */
+ bt/s LOCAL(denorm_noup)
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ add #1,r0
+LOCAL(denorm_noup):
+ RETURN_R0
+ rotcr r0
+LOCAL(ill_exp):
+ div0s DBL0H,r1
+ mov.l LOCAL(x7ff80000),r2
+ add r1,r1
+ bf LOCAL(inf_nan)
+ mov.w LOCAL(m32),r3 /* Handle denormal or zero. */
+ mov #-21,r2
+ shad r2,r1
+ add #-8,r1 /* Go from 9 to 1 guard bit in MSW. */
+ cmp/gt r3,r1
+ mov.l @r15+,r3 /* DBL0L */
+ bf LOCAL(zero)
+ mov.l DBL0L, @-r15
+ shll8 DBL0L
+ rotcr r0 /* Insert leading 1. */
+ shld r2,r3
+ cmp/pl DBL0L /* Check lower 23 guard bits if guard bit 23 is 0. */
+ addc r3,r0 /* Assemble fraction with compressed guard bits. */
+ mov r0,r2
+ shld r1,r0
+ mov.l @r15+,DBL0L
+ add #32,r1
+ shld r1,r2
+ tst #2,r0
+ rotcl r0
+ tst r2,r2
+ rotcl r0
+ xor #3,r0
+ add #3,r0 /* Even overflow gives the correct result. */
+ shlr2 r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(zero):
+ mov #0,r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(inf_nan):
+ not DBL0H,r0
+ tst r2,r0
+ mov.l @r15+,DBL0L
+ bf LOCAL(inf)
+ RETURN_R0
+ mov #-1,r0 /* NAN */
+LOCAL(inf): /* r2 must be positive here. */
+ mov.l LOCAL(xffe00000),r0
+ div0s r2,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(m32):
+ .word -32
+ .balign 4
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(x70000000):
+ .long 0x70000000
+LOCAL(x2fffffff):
+ .long 0x2fffffff
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x5fffffff):
+ .long 0x5fffffff
+LOCAL(x7ff80000):
+ .long 0x7ff80000
+LOCAL(xffe00000):
+ .long 0xffe00000
+ ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+
+
+#ifdef L_add_sub_df3
+#include "IEEE-754/m3/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/m3/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/m3/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/m3/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/m3/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/m3/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/m3/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_DOUBLE__ */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/ieee-754-sf.S gcc-4.5.0/gcc/config/sh/ieee-754-sf.S
--- gcc-4.5.0/gcc/config/sh/ieee-754-sf.S 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/ieee-754-sf.S 2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,697 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_ANY__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Single-precision floating-point emulation.
+ We handle NANs, +-infinity, and +-zero.
+ However, we assume that for NANs, the topmost bit of the fraction is set. */
+#ifdef L_nesf2
+/* -ffinite-math-only inline version, T := r4:SF == r5:SF
+ cmp/eq r4,r5
+ mov r4,r0
+ bt 0f
+ or r5,r0
+ add r0,r0
+ tst r0,r0 ! test for +0.0 == -0.0 ; -0.0 == +0.0
+ 0: */
+ .balign 4
+ .global GLOBAL(nesf2)
+ HIDDEN_FUNC(GLOBAL(nesf2))
+GLOBAL(nesf2):
+ /* If the raw values are unequal, the result is unequal, unless
+ both values are +-zero.
+ If the raw values are equal, the result is equal, unless
+ the values are NaN. */
+ cmp/eq r4,r5
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ not r4,r0
+ bt LOCAL(check_nan)
+ mov r4,r0
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(check_nan):
+ tst r1,r0
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(nesf2))
+#endif /* L_nesf2 */
+
+#ifdef L_unordsf2
+ .balign 4
+ .global GLOBAL(unordsf2)
+ HIDDEN_FUNC(GLOBAL(unordsf2))
+GLOBAL(unordsf2):
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ not r4,r0
+ tst r1,r0
+ not r5,r0
+ bt LOCAL(unord)
+ tst r1,r0
+LOCAL(unord):
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(unordsf2))
+#endif /* L_unordsf2 */
+
+#if defined(L_gtsf2t) || defined(L_gtsf2t_trap)
+/* -ffinite-math-only inline version, T := r4:SF > r5:SF ? 0 : 1
+ cmp/pz r4
+ mov r4,r0
+ bf/s 0f
+ cmp/hs r5,r4
+ cmp/ge r4,r5
+ or r5,r0
+ bt 0f
+ add r0,r0
+ tst r0,r0
+ 0: */
+#ifdef L_gtsf2t
+#define fun_label GLOBAL(gtsf2t)
+#else
+#define fun_label GLOBAL(gtsf2t_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater, the result true, unless
+ any of them is a nan (but infinity is fine), or both values are
+ +- zero. Otherwise, the result false. */
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ cmp/pz r4
+ not r5,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov r4,r0
+ bt LOCAL(nan)
+ cmp/gt r5,r4
+ SLC(bf, LOCAL(check_nan),
+ cmp/gt r4,r1)
+ bf LOCAL(nan)
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ not r4,r0
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi r4,r5
+#if defined(L_gtsf2t) && defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+#endif /* DELAYED_BRANCHES */
+ rts
+ movt r0
+#ifdef L_gtsf2t
+LOCAL(check_nan):
+LOCAL(nan):
+ rts
+ mov #0,r0
+#else /* ! L_gtsf2t */
+LOCAL(check_nan):
+ SLI(cmp/gt r4,r1)
+ bf LOCAL(nan)
+ rts
+ movt r0
+LOCAL(nan):
+ mov #0,r0
+ trapa #0
+#endif /* ! L_gtsf2t */
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(fun_label)
+#endif /* L_gtsf2t */
+
+#if defined(L_gesf2f) || defined(L_gesf2f_trap)
+/* -ffinite-math-only inline version, T := r4:SF >= r5:SF */
+ cmp/pz r5
+ mov r4,r0
+ bf/s 0f
+ cmp/hs r4,r5
+ cmp/ge r5,r4
+ or r5,r0
+ bt 0f
+ add r0,r0
+ tst r0,r0
+ 0:
+#ifdef L_gesf2f
+#define fun_label GLOBAL(gesf2f)
+#else
+#define fun_label GLOBAL(gesf2f_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater or equal, the result is
+ true, unless any of them is a nan. If both are -+zero, the
+ result is true; otherwise, it is false.
+ We use 0 as true and nonzero as false for this function. */
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ cmp/pz r5
+ not r4,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov r4,r0
+ bt LOCAL(nan)
+ cmp/gt r4,r5
+ SLC(bf, LOCAL(check_nan),
+ cmp/ge r1,r5)
+ bt LOCAL(nan)
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ not r5,r0
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi r5,r4
+#if defined(L_gesf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+ rts
+ movt r0
+#if defined(L_gesf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+ cmp/ge r1,r5
+LOCAL(nan):
+ rts
+ movt r0
+#endif /* ! DELAYED_BRANCHES */
+#ifdef L_gesf2f_trap
+LOCAL(check_nan):
+ SLI(cmp/ge r1,r5)
+ bt LOCAL(nan)
+ rts
+LOCAL(nan):
+ movt r0
+ trapa #0
+#endif /* L_gesf2f_trap */
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(gesf2f))
+#endif /* L_gesf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_add_sub_sf3
+#include "IEEE-754/addsf3.S"
+#endif /* _add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+#include "IEEE-754/fixunssfsi.S"
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+#include "IEEE-754/fixsfsi.S"
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/divsf3.S"
+#endif /* L_divsf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift. Supporting SH1 / SH2 here would
+ make this code too hard to maintain, so if you want to add SH1 / SH2
+ support, do it in a separate copy. */
+#ifdef DYN_SHIFT
+#ifdef L_add_sub_sf3
+#include "IEEE-754/m3/addsf3.S"
+#endif /* L_add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/m3/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get UINT_MAX, for set sign bit, you get 0.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixunssfsi)
+ FUNC(GLOBAL(fixunssfsi))
+GLOBAL(fixunssfsi):
+ mov.l LOCAL(max),r2
+ mov #-23,r1
+ mov r4,r0
+ shad r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/ge r2,r0
+ or r2,r0
+ bt LOCAL(retmax)
+ cmp/pz r4
+ and r1,r0
+ bf LOCAL(ret0)
+ add #-23,r4
+ rts
+ shld r4,r0
+LOCAL(ret0):
+LOCAL(retmax):
+ rts
+ subc r0,r0
+ .balign 4
+LOCAL(mask):
+ .long 0x00ffffff
+LOCAL(max):
+ .long 0x4f800000
+ ENDFUNC(GLOBAL(fixunssfsi))
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get INT_MAX, for set sign bit, you get INT_MIN.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixsfsi)
+ FUNC(GLOBAL(fixsfsi))
+ .balign 4
+GLOBAL(fixsfsi):
+ mov r4,r0
+ shll r4
+ mov #-24,r1
+ bt LOCAL(neg)
+ mov.l LOCAL(max),r2
+ shld r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/pz r4
+ add #-23,r4
+ bf LOCAL(ret0)
+ cmp/gt r0,r2
+ bf LOCAL(retmax)
+ and r1,r0
+ addc r1,r0
+ rts
+ shld r4,r0
+
+ .balign 4
+LOCAL(neg):
+ mov.l LOCAL(min),r2
+ shld r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/pz r4
+ add #-23,r4
+ bf LOCAL(ret0)
+ cmp/gt r0,r2
+ bf LOCAL(retmin)
+ and r1,r0
+ addc r1,r0
+ shld r4,r0 ! SH4-200 will start this insn on a new cycle
+ rts
+ neg r0,r0
+
+ .balign 4
+LOCAL(ret0):
+ rts
+ mov #0,r0
+
+LOCAL(retmax):
+ mov #-1,r0
+ rts
+ shlr r0
+
+LOCAL(retmin):
+ mov #1,r0
+ rts
+ rotr r0
+
+ .balign 4
+LOCAL(mask):
+ .long 0x007fffff
+LOCAL(max):
+ .long 0x4f000000
+LOCAL(min):
+ .long 0xcf000000
+ ENDFUNC(GLOBAL(fixsfsi))
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/m3/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/m3/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/m3/divsf3.S"
+#endif /* L_divsf3 */
+
+#ifdef L_hypotf
+ .balign 4
+ .global GLOBAL(hypotf)
+ FUNC(GLOBAL(hypotf))
+GLOBAL(hypotf):
+/* This integer implementation takes 71 to 72 cycles in the main path.
+ This is a bit slower than the SH4 can do this computation using double
+ precision hardware floating point - 57 cycles, or 69 with mode switches. */
+ /* First, calculate x (r4) as the sum of the square of the fractions -
+ the exponent is calculated separately in r3.
+ Then, calculate sqrt(x) for the fraction by reciproot iteration.
+ We get an 7.5 bit inital value using linear approximation with two slopes
+ that are powers of two.
+ x (- [1. .. 2.) y0 := 1.25 - x/4 - tab(x) y (- (0.8 .. 1.0)
+ x (- [2. .. 4.) y0 := 1. - x/8 - tab(x) y (- (0.5 .. 0.8)
+ x is represented with two bits before the point,
+ y with 0 bits before the binary point.
+ Thus, to calculate y0 := 1. - x/8 - tab(x), all you have to do is to shift x
+ right by 1, negate it, and subtract tab(x). */
+
+ /* y1 := 1.5*y0 - 0.5 * (x * y0) * (y0 * y0)
+ z0 := x * y1
+ z1 := z0 + 0.5 * (y1 - (y1*y1) * z0) */
+
+ mov.l LOCAL(xff000000),r1
+ add r4,r4
+ mov r4,r0
+ add r5,r5
+ cmp/hs r5,r4
+ sub r5,r0
+ mov #-24,r2
+ bf/s LOCAL(r5_large)
+ shad r2,r0
+ mov r4,r3
+ shll8 r4
+ rotcr r4
+ tst #0xe0,r0
+ neg r0,r0
+ bt LOCAL(ret_abs_r3)
+ tst r1,r5
+ shll8 r5
+ bt/s LOCAL(denorm_r5)
+ cmp/hi r3,r1
+ dmulu.l r4,r4
+ bf LOCAL(inf_nan)
+ rotcr r5
+ shld r0,r5
+LOCAL(denorm_r5_done):
+ sts mach,r4
+ dmulu.l r5,r5
+ mov.l r6,@-r15
+ mov #20,r6
+
+ sts mach,r5
+LOCAL(add_frac):
+ mova LOCAL(tab)-32,r0
+ mov.l r7,@-r15
+ mov.w LOCAL(x1380),r7
+ and r1,r3
+ addc r5,r4
+ mov.w LOCAL(m25),r2 ! -25
+ bf LOCAL(frac_ok)
+ sub r1,r3
+ rotcr r4
+ cmp/eq r1,r3 ! did we generate infinity ?
+ bt LOCAL(inf_nan)
+ shlr r4
+ mov r4,r1
+ shld r2,r1
+ mov.b @(r0,r1),r0
+ mov r4,r1
+ shld r6,r1
+ bra LOCAL(frac_low2)
+ sub r1,r7
+
+LOCAL(frac_ok):
+ mov r4,r1
+ shld r2,r1
+ mov.b @(r0,r1),r1
+ cmp/pz r4
+ mov r4,r0
+ bt/s LOCAL(frac_low)
+ shld r6,r0
+ mov.w LOCAL(xf80),r7
+ shlr r0
+LOCAL(frac_low):
+ sub r0,r7
+LOCAL(frac_low2):
+ mov.l LOCAL(x40000080),r0 ! avoid denorm results near 1. << r3
+ sub r1,r7 ! {0.12}
+ mov.l LOCAL(xfffe0000),r5 ! avoid rounding overflow near 4. << r3
+ swap.w r7,r1 ! {0.28}
+ dmulu.l r1,r4 /* two issue cycles */
+ mulu.w r7,r7 /* two issue cycles */
+ sts mach,r2 ! {0.26}
+ mov r1,r7
+ shlr r1
+ sts macl,r6 ! {0.24}
+ cmp/hi r0,r4
+ shlr2 r2
+ bf LOCAL(near_one)
+ shlr r2 ! {0.23} systemic error of linear approximation keeps y1 < 1
+ dmulu.l r2,r6
+ cmp/hs r5,r4
+ add r7,r1 ! {1.28}
+ bt LOCAL(near_four)
+ shlr2 r1 ! {1.26}
+ sts mach,r0 ! {0.15} x*y0^3 == {0.16} 0.5*x*y0^3
+ shlr2 r1 ! {1.24}
+ shlr8 r1 ! {1.16}
+ sett ! compensate for truncation of subtrahend, keep y1 < 1
+ subc r0,r1 ! {0.16} y1; max error about 3.5 ulp
+ swap.w r1,r0
+ dmulu.l r0,r4 ! { 1.30 }
+ mulu.w r1,r1
+ sts mach,r2
+ shlr2 r0
+ sts macl,r1
+ add r2,r0
+ mov.l LOCAL(xff000000),r6
+ add r2,r0
+ dmulu.l r1,r2
+ add #127,r0
+ add r6,r3 ! precompensation for adding leading 1
+ sts mach,r1
+ shlr r3
+ mov.l @r15+,r7
+ sub r1,r0 ! {0.31} max error about 50 ulp (+127)
+ mov.l @r15+,r6
+ shlr8 r0 ! {0.23} max error about 0.7 ulp
+ rts
+ add r3,r0
+
+LOCAL(r5_large):
+ mov r5,r3
+ mov #-31,r2
+ cmp/ge r2,r0
+ shll8 r5
+ bf LOCAL(ret_abs_r3)
+ rotcr r5
+ tst r1,r4
+ shll8 r4
+ bt/s LOCAL(denorm_r4)
+ cmp/hi r3,r1
+ dmulu.l r5,r5
+ bf LOCAL(inf_nan)
+ rotcr r4
+LOCAL(denorm_r4_done):
+ shld r0,r4
+ sts mach,r5
+ dmulu.l r4,r4
+ mov.l r6,@-r15
+ mov #20,r6
+ bra LOCAL(add_frac)
+ sts mach,r4
+
+LOCAL(near_one):
+ bra LOCAL(assemble_sqrt)
+ mov #0,r0
+LOCAL(near_four):
+ ! exact round-to-nearest would add 255. We add 256 for speed & compactness.
+ mov r4,r0
+ shlr8 r0
+ add #1,r0
+ tst r0,r0
+ addc r0,r3 ! might generate infinity.
+LOCAL(assemble_sqrt):
+ mov.l @r15+,r7
+ shlr r3
+ mov.l @r15+,r6
+ rts
+ add r3,r0
+LOCAL(inf_nan):
+LOCAL(ret_abs_r3):
+ mov r3,r0
+ rts
+ shlr r0
+LOCAL(denorm_r5):
+ bf LOCAL(inf_nan)
+ tst r1,r4
+ bt LOCAL(denorm_both)
+ dmulu.l r4,r4
+ bra LOCAL(denorm_r5_done)
+ shld r0,r5
+LOCAL(denorm_r4):
+ bf LOCAL(inf_nan)
+ tst r1,r5
+ dmulu.l r5,r5
+ bf LOCAL(denorm_r4_done)
+LOCAL(denorm_both): ! normalize according to r3.
+ extu.w r3,r2
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r3,r2
+ mov #-8,r2
+ bt 0f
+ tst r1,r3
+ mov #-16,r2
+ bt 0f
+ mov #-24,r2
+0:
+ shld r2,r3
+ mov.l r7,@-r15
+#ifdef __pic__
+ add r0,r3
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r3),r0
+ add #32,r2
+ sub r0,r2
+ shld r2,r4
+ mov r2,r7
+ dmulu.l r4,r4
+ sts.l pr,@-r15
+ mov #1,r3
+ bsr LOCAL(denorm_r5_done)
+ shld r2,r5
+ mov.l LOCAL(x01000000),r1
+ neg r7,r2
+ lds.l @r15+,pr
+ tst r1,r0
+ mov.l @r15+,r7
+ bt 0f
+ add #1,r2
+ sub r1,r0
+0:
+ rts
+ shld r2,r0
+
+LOCAL(m25):
+ .word -25
+LOCAL(x1380):
+ .word 0x1380
+LOCAL(xf80):
+ .word 0xf80
+ .balign 4
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x40000080):
+ .long 0x40000080
+LOCAL(xfffe0000):
+ .long 0xfffe0000
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+
+/*
+double err(double x)
+{
+ return (x < 2. ? 1.25 - x/4. : 1. - x/8.) - 1./sqrt(x);
+}
+
+int
+main ()
+{
+ int i = 0;
+ double x, s, v;
+ double lx, hx;
+
+ s = 1./32.;
+ for (x = 1.; x < 4; x += s, i++)
+ {
+ lx = x;
+ hx = x + s - 1. / (1 << 30);
+ v = 0.5 * (err (lx) + err (hx));
+ printf ("%s% 4d%c",
+ (i & 7) == 0 ? "\t.byte\t" : "",
+ (int)(v * 4096 + 0.5) - 128,
+ (i & 7) == 7 ? '\n' : ',');
+ }
+ return 0;
+} */
+
+ .balign 4
+LOCAL(tab):
+ .byte -113, -84, -57, -33, -11, 8, 26, 41
+ .byte 55, 67, 78, 87, 94, 101, 106, 110
+ .byte 113, 115, 115, 115, 114, 112, 109, 106
+ .byte 101, 96, 91, 84, 77, 69, 61, 52
+ .byte 51, 57, 63, 68, 72, 77, 80, 84
+ .byte 87, 89, 91, 93, 95, 96, 97, 97
+ .byte 97, 97, 97, 96, 95, 94, 93, 91
+ .byte 89, 87, 84, 82, 79, 76, 72, 69
+ .byte 65, 61, 57, 53, 49, 44, 39, 34
+ .byte 29, 24, 19, 13, 8, 2, -4, -10
+ .byte -17, -23, -29, -36, -43, -50, -57, -64
+ .byte -71, -78, -85, -93,-101,-108,-116,-124
+ ENDFUNC(GLOBAL(hypotf))
+#endif /* L_hypotf */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_ANY__ */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/lib1funcs.asm gcc-4.5.0/gcc/config/sh/lib1funcs.asm
--- gcc-4.5.0/gcc/config/sh/lib1funcs.asm 2009-04-18 03:50:40.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/lib1funcs.asm 2010-07-14 11:58:33.000000000 +0530
@@ -3931,3 +3931,6 @@ GLOBAL(udiv_qrnnd_16):
ENDFUNC(GLOBAL(udiv_qrnnd_16))
#endif /* !__SHMEDIA__ */
#endif /* L_udiv_qrnnd_16 */
+
+#include "ieee-754-sf.S"
+#include "ieee-754-df.S"
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/lib1funcs.h gcc-4.5.0/gcc/config/sh/lib1funcs.h
--- gcc-4.5.0/gcc/config/sh/lib1funcs.h 2009-07-06 19:25:09.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/lib1funcs.h 2010-07-14 11:58:33.000000000 +0530
@@ -64,13 +64,151 @@ see the files COPYING3 and COPYING.RUNTI
#endif /* !__LITTLE_ENDIAN__ */
#ifdef __sh1__
+/* branch with two-argument delay slot insn */
#define SL(branch, dest, in_slot, in_slot_arg2) \
in_slot, in_slot_arg2; branch dest
+/* branch with one-argument delay slot insn */
#define SL1(branch, dest, in_slot) \
in_slot; branch dest
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+ branch dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg2) in_slot, in_slot_arg2
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+ branch .+6; bra .+6; cmp2, cmp2arg2; cmp1, cmp1arg2
+#define DMULU_SAVE \
+ mov.l r10,@-r15; \
+ mov.l r11,@-r15; \
+ mov.l r12,@-r15; \
+ mov.l r13,@-r15
+#define DMULUL(m1, m2, rl) \
+ swap.w m1,r12; \
+ mulu.w r12,m2; \
+ swap.w m2,r13; \
+ sts macl,r10; \
+ mulu.w r13,m1; \
+ clrt; \
+ sts macl,r11; \
+ mulu.w r12,r13; \
+ addc r11,r10; \
+ sts macl,r12; \
+ mulu.w m1,m2; \
+ movt r11; \
+ sts macl,rl; \
+ mov r10,r13; \
+ shll16 r13; \
+ addc r13,rl; \
+ xtrct r11,r10; \
+ addc r10,r12 \
+/* N.B. the carry is cleared here. */
+#define DMULUH(rh) mov r12,rh
+#define DMULU_RESTORE \
+ mov.l @r15+,r13; \
+ mov.l @r15+,r12; \
+ mov.l @r15+,r11; \
+ mov.l @r15+,r10
#else /* ! __sh1__ */
+/* branch with two-argument delay slot insn */
#define SL(branch, dest, in_slot, in_slot_arg2) \
- branch##.s dest; in_slot, in_slot_arg2
+ branch##/s dest; in_slot, in_slot_arg2
+/* branch with one-argument delay slot insn */
#define SL1(branch, dest, in_slot) \
branch##/s dest; in_slot
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+ branch##/s dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg)
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+ branch##/s .+6; cmp1, cmp1arg2; cmp2, cmp2arg2
+#define DMULU_SAVE
+#define DMULUL(m1, m2, rl) dmulu.l m1,m2; sts macl,rl
+#define DMULUH(rh) sts mach,rh
+#define DMULU_RESTORE
#endif /* !__sh1__ */
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+/* don't #define DYN_SHIFT */
+ #define SHLL4(REG) \
+ shll2 REG; \
+ shll2 REG
+
+ #define SHLR4(REG) \
+ shlr2 REG; \
+ shlr2 REG
+
+ #define SHLL6(REG) \
+ shll2 REG; \
+ shll2 REG; \
+ shll2 REG
+
+ #define SHLR6(REG) \
+ shlr2 REG; \
+ shlr2 REG; \
+ shlr2 REG
+
+ #define SHLL12(REG) \
+ shll8 REG; \
+ SHLL4 (REG)
+
+ #define SHLR12(REG) \
+ shlr8 REG; \
+ SHLR4 (REG)
+
+ #define SHLR19(REG) \
+ shlr16 REG; \
+ shlr2 REG; \
+ shlr REG
+
+ #define SHLL23(REG) \
+ shll16 REG; \
+ shlr REG; \
+ shll8 REG
+
+ #define SHLR24(REG) \
+ shlr16 REG; \
+ shlr8 REG
+
+ #define SHLR21(REG) \
+ shlr16 REG; \
+ shll2 REG; \
+ add REG,REG;\
+ shlr8 REG
+
+ #define SHLL21(REG) \
+ shll16 REG; \
+ SHLL4 (REG); \
+ add REG,REG
+
+ #define SHLR11(REG) \
+ shlr8 REG; \
+ shlr2 REG; \
+ shlr REG
+
+ #define SHLR22(REG) \
+ shlr16 REG; \
+ shll2 REG; \
+ shlr8 REG
+
+ #define SHLR23(REG) \
+ shlr16 REG; \
+ add REG,REG;\
+ shlr8 REG
+
+ #define SHLR20(REG) \
+ shlr16 REG; \
+ SHLR4 (REG)
+
+ #define SHLL20(REG) \
+ shll16 REG; \
+ SHLL4 (REG)
+#define SHLD_COUNT(N,COUNT)
+#define SHLRN(N,COUNT,REG) SHLR##N(REG)
+#define SHLLN(N,COUNT,REG) SHLL##N(REG)
+#else
+#define SHLD_COUNT(N,COUNT) mov #N,COUNT
+#define SHLRN(N,COUNT,REG) shld COUNT,REG
+#define SHLLN(N,COUNT,REG) shld COUNT,REG
+#define DYN_SHIFT 1
+#endif
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/predicates.md gcc-4.5.0/gcc/config/sh/predicates.md
--- gcc-4.5.0/gcc/config/sh/predicates.md 2009-06-25 11:46:11.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/predicates.md 2010-07-14 11:58:33.000000000 +0530
@@ -699,6 +699,33 @@
(define_predicate "symbol_ref_operand"
(match_code "symbol_ref"))
+(define_special_predicate "soft_fp_comparison_operand"
+ (match_code "subreg,reg")
+{
+ switch (GET_MODE (op))
+ {
+ default:
+ return 0;
+ case CC_FP_NEmode: case CC_FP_GTmode: case CC_FP_UNLTmode:
+ break;
+ }
+ return register_operand (op, mode);
+})
+
+(define_predicate "soft_fp_comparison_operator"
+ (match_code "eq, unle, ge")
+{
+ switch (GET_CODE (op))
+ {
+ default:
+ return 0;
+ case EQ: mode = CC_FP_NEmode; break;
+ case UNLE: mode = CC_FP_GTmode; break;
+ case GE: mode = CC_FP_UNLTmode; break;
+ }
+ return register_operand (XEXP (op, 0), mode);
+})
+
;; Same as target_reg_operand, except that label_refs and symbol_refs
;; are accepted before reload.
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh.c gcc-4.5.0/gcc/config/sh/sh.c
--- gcc-4.5.0/gcc/config/sh/sh.c 2010-03-01 04:53:50.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh.c 2010-07-14 11:58:33.000000000 +0530
@@ -110,6 +110,12 @@ static short cached_can_issue_more;
/* Unique number for UNSPEC_BBR pattern. */
static unsigned int unspec_bbr_uid = 1;
+/* Saved operands from the last compare to use when we generate an scc
+ or bcc insn. */
+
+rtx sh_compare_op0;
+rtx sh_compare_op1;
+
/* Provides the class number of the smallest class containing
reg number. */
@@ -279,6 +285,7 @@ static int sh_arg_partial_bytes (CUMULAT
tree, bool);
static bool sh_scalar_mode_supported_p (enum machine_mode);
static int sh_dwarf_calling_convention (const_tree);
+static void sh_expand_float_condop (rtx operands[4], rtx (*[2]) (rtx));
static void sh_encode_section_info (tree, rtx, int);
static int sh2a_function_vector_p (tree);
static void sh_trampoline_init (rtx, tree, rtx);
@@ -536,6 +543,9 @@ static const struct attribute_spec sh_at
/* Machine-specific symbol_ref flags. */
#define SYMBOL_FLAG_FUNCVEC_FUNCTION (SYMBOL_FLAG_MACH_DEP << 0)
+#undef TARGET_MATCH_ADJUST
+#define TARGET_MATCH_ADJUST sh_match_adjust
+
struct gcc_target targetm = TARGET_INITIALIZER;
\f
/* Implement TARGET_HANDLE_OPTION. */
@@ -2146,6 +2156,78 @@ sh_emit_cheap_store_flag (enum machine_m
return gen_rtx_fmt_ee (code, VOIDmode, target, const0_rtx);
}
+static rtx
+sh_soft_fp_cmp (int code, enum machine_mode op_mode, rtx op0, rtx op1)
+{
+ const char *name = NULL;
+ rtx (*fun) (rtx, rtx), addr, tmp, first, last, equiv;
+ int df = op_mode == DFmode;
+ enum machine_mode mode = CODE_FOR_nothing; /* shut up warning. */
+
+ switch (code)
+ {
+ case EQ:
+ if (!flag_finite_math_only)
+ {
+ name = df ? "__nedf2" : "__nesf2";
+ fun = df ? gen_cmpnedf_i1 : gen_cmpnesf_i1;
+ mode = CC_FP_NEmode;
+ break;
+ } /* Fall through. */
+ case UNEQ:
+ fun = gen_cmpuneq_sdf;
+ break;
+ case UNLE:
+ if (flag_finite_math_only && !df)
+ {
+ fun = gen_cmplesf_i1_finite;
+ break;
+ }
+ name = df ? "__gtdf2t" : "__gtsf2t";
+ fun = df ? gen_cmpgtdf_i1 : gen_cmpgtsf_i1;
+ mode = CC_FP_GTmode;
+ break;
+ case GE:
+ if (flag_finite_math_only && !df)
+ {
+ tmp = op0; op0 = op1; op1 = tmp;
+ fun = gen_cmplesf_i1_finite;
+ break;
+ }
+ name = df ? "__gedf2f" : "__gesf2f";
+ fun = df ? gen_cmpunltdf_i1 : gen_cmpunltsf_i1;
+ mode = CC_FP_UNLTmode;
+ break;
+ case UNORDERED:
+ fun = gen_cmpun_sdf;
+ break;
+ default: gcc_unreachable ();
+ }
+
+ if (!name)
+ return fun (force_reg (op_mode, op0), force_reg (op_mode, op1));
+
+ tmp = gen_reg_rtx (mode);
+ addr = gen_reg_rtx (Pmode);
+ function_symbol (addr, name, SFUNC_STATIC);
+ first = emit_move_insn (gen_rtx_REG (op_mode, R4_REG), op0);
+ emit_move_insn (gen_rtx_REG (op_mode, R5_REG + df), op1);
+ last = emit_insn (fun (tmp, addr));
+ equiv = gen_rtx_fmt_ee (COMPARE, mode, op0, op1);
+ REG_NOTES (last) = gen_rtx_EXPR_LIST (REG_EQUAL, equiv, REG_NOTES (last));
+ /* Wrap the sequence in REG_LIBCALL / REG_RETVAL notes so that loop
+ invariant code motion can move it. */
+/*
+ REG_NOTES (first) = gen_rtx_INSN_LIST (REG_LIBCALL, last, REG_NOTES (first));
+ REG_NOTES (last) = gen_rtx_INSN_LIST (REG_RETVAL, first, REG_NOTES (last));
+*/
+ /* Use fpcmp_i1 rather than cmpeqsi_t, so that the optimizers can grok
+ the computation. */
+ return gen_rtx_SET (VOIDmode,
+ gen_rtx_REG (SImode, T_REG),
+ gen_rtx_fmt_ee (code, SImode, tmp, CONST0_RTX (mode)));
+}
+
/* Called from the md file, set up the operands of a compare instruction. */
void
@@ -8601,6 +8683,57 @@ sh_fix_range (const char *const_str)
str = comma + 1;
}
}
+
+/* Expand an sfunc operation taking NARGS MODE arguments, using generator
+ function FUN, which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using EQUIV. */
+static void
+expand_sfunc_op (int nargs, enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, rtx equiv, rtx *operands)
+{
+ int next_reg = FIRST_PARM_REG, i;
+ rtx addr, first = NULL_RTX, last, insn;
+
+ addr = gen_reg_rtx (Pmode);
+ function_symbol (addr, name, SFUNC_FREQUENT);
+ for ( i = 1; i <= nargs; i++)
+ {
+ insn = emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
+ if (!first)
+ first = insn;
+ next_reg += GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+ }
+ last = emit_insn ((*fun) (operands[0], addr));
+ REG_NOTES (last) = gen_rtx_EXPR_LIST (REG_EQUAL, equiv, REG_NOTES (last));
+ /* Wrap the sequence in REG_LIBCALL / REG_RETVAL notes so that loop
+ invariant code motion can move it. */
+/*
+ REG_NOTES (first) = gen_rtx_INSN_LIST (REG_LIBCALL, last, REG_NOTES (first));
+ REG_NOTES (last) = gen_rtx_INSN_LIST (REG_RETVAL, first, REG_NOTES (last));
+*/
+}
+
+/* Expand an sfunc unary operation taking an MODE argument, using generator
+ function FUN, which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using CODE. */
+void
+expand_sfunc_unop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, enum rtx_code code, rtx *operands)
+{
+ rtx equiv = gen_rtx_fmt_e (code, GET_MODE (operands[0]), operands[1]);
+ expand_sfunc_op (1, mode, fun, name, equiv, operands);
+}
+
+/* Expand an sfunc binary operation in MODE, using generator function FUN,
+ which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using CODE. */
+void
+expand_sfunc_binop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, enum rtx_code code, rtx *operands)
+{
+ rtx equiv = gen_rtx_fmt_ee (code, mode, operands[1], operands[2]);
+ expand_sfunc_op (2, mode, fun, name, equiv, operands);
+}
\f
/* Insert any deferred function attributes from earlier pragmas. */
static void
@@ -11429,11 +11562,10 @@ function_symbol (rtx target, const char
{
rtx sym;
- /* If this is not an ordinary function, the name usually comes from a
- string literal or an sprintf buffer. Make sure we use the same
+ /* The name usually comes from a string literal or an sprintf buffer.
+ Make sure we use the same
string consistently, so that cse will be able to unify address loads. */
- if (kind != FUNCTION_ORDINARY)
- name = IDENTIFIER_POINTER (get_identifier (name));
+ name = IDENTIFIER_POINTER (get_identifier (name));
sym = gen_rtx_SYMBOL_REF (Pmode, name);
SYMBOL_REF_FLAGS (sym) = SYMBOL_FLAG_FUNCTION;
if (flag_pic)
@@ -11441,6 +11573,10 @@ function_symbol (rtx target, const char
{
case FUNCTION_ORDINARY:
break;
+ case SFUNC_FREQUENT:
+ if (!optimize || optimize_size)
+ break;
+ /* Fall through. */
case SFUNC_GOT:
{
rtx reg = target ? target : gen_reg_rtx (Pmode);
@@ -11551,6 +11687,141 @@ sh_expand_t_scc (rtx operands[])
return 1;
}
+void
+sh_expand_float_cbranch (rtx operands[4])
+{
+ static rtx (*branches[]) (rtx) = { gen_branch_true, gen_branch_false };
+
+ sh_expand_float_condop (operands, branches);
+}
+
+void
+sh_expand_float_scc (rtx operands[4])
+{
+ static rtx (*movts[]) (rtx) = { gen_movt, gen_movnegt };
+
+ operands[3] = NULL_RTX;
+ sh_expand_float_condop (operands, movts);
+}
+
+/* The first element of USER is for positive logic, the second one for
+ negative logic. */
+static void
+sh_expand_float_condop (rtx operands[4], rtx (*user[2]) (rtx))
+{
+ enum machine_mode mode = GET_MODE (operands[1]);
+ enum rtx_code comparison = GET_CODE (operands[0]);
+ int swap_operands = 0;
+
+ if (TARGET_SH1_SOFTFP_MODE (mode))
+ {
+ switch (comparison)
+ {
+ case NE:
+ comparison = EQ;
+ user++;
+ break;
+ case LT:
+ swap_operands = 1; /* Fall through. */
+ case GT:
+ comparison = UNLE;
+ user++;
+ break;
+ case UNGT:
+ swap_operands = 1; /* Fall through. */
+ case UNLT:
+ comparison = GE;
+ user++;
+ break;
+ case UNGE:
+ swap_operands = 1;
+ comparison = UNLE;
+ break;
+ case LE:
+ swap_operands = 1;
+ comparison = GE; /* Fall through. */
+ case EQ:
+ case UNEQ:
+ case GE:
+ case UNLE:
+ case UNORDERED:
+ break;
+ case LTGT:
+ comparison = UNEQ;
+ user++;
+ break;
+ case ORDERED:
+ comparison = UNORDERED;
+ user++;
+ break;
+
+ default: gcc_unreachable ();
+ }
+ }
+ else /* SH2E .. SH4 Hardware floating point */
+ {
+ switch (comparison)
+ {
+ case NE:
+ comparison = EQ;
+ user++;
+ break;
+ case LT:
+ swap_operands = 1; /* Fall through. */
+ comparison = GT;
+ case GT:
+ case EQ:
+ case LTGT:
+ case ORDERED:
+ break;
+ case LE:
+ if (flag_finite_math_only)
+ {
+ comparison = GT;
+ user++;
+ break;
+ }
+ swap_operands = 1;
+ comparison = GE; /* Fall through. */
+ case GE:
+ if (flag_finite_math_only)
+ {
+ swap_operands = 1;
+ comparison = GT;
+ user++;
+ break;
+ }
+ break;
+ case UNGE:
+ swap_operands = 1; /* Fall through. */
+ case UNLE:
+ comparison = GT;
+ user++;
+ break;
+ case UNGT:
+ swap_operands = 1; /* Fall through. */
+ case UNLT:
+ comparison = GE;
+ user++;
+ break;
+ case UNEQ:
+ comparison = LTGT;
+ user++;
+ break;
+ case UNORDERED:
+ comparison = ORDERED;
+ user++;
+ break;
+
+ default: gcc_unreachable ();
+ }
+ }
+ sh_compare_op0 = operands[1+swap_operands];
+ sh_compare_op1 = operands[2-swap_operands];
+ sh_emit_compare_and_branch (&operands[3], comparison);
+ emit_jump_insn ((*user) (operands[3]));
+}
+
/* INSN is an sfunc; return the rtx that describes the address used. */
static rtx
extract_sfunc_addr (rtx insn)
@@ -12100,6 +12371,19 @@ sh_secondary_reload (bool in_p, rtx x, e
return NO_REGS;
}
+int
+sh_match_adjust (rtx x, int regno)
+{
+ /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+ multiple hard register group of scalar integer registers, so that
+ for example (reg:DI 0) and (reg:SI 1) will be considered the same
+ register. */
+ if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+ && regno < FIRST_PSEUDO_REGISTER)
+ regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+ return regno;
+}
+
enum sh_divide_strategy_e sh_div_strategy = SH_DIV_STRATEGY_DEFAULT;
#include "gt-sh.h"
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh.h gcc-4.5.0/gcc/config/sh/sh.h
--- gcc-4.5.0/gcc/config/sh/sh.h 2009-12-01 03:08:46.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh.h 2010-07-14 16:13:59.000000000 +0530
@@ -175,6 +175,11 @@ do { \
#define TARGET_FPU_DOUBLE \
((target_flags & MASK_SH4) != 0 || TARGET_SH2A_DOUBLE)
+#define TARGET_SH1_SOFTFP (TARGET_SH1 && !TARGET_FPU_DOUBLE)
+
+#define TARGET_SH1_SOFTFP_MODE(MODE) \
+ (TARGET_SH1_SOFTFP && (!TARGET_SH2E || (MODE) == DFmode))
+
/* Nonzero if an FPU is available. */
#define TARGET_FPU_ANY (TARGET_SH2E || TARGET_FPU_DOUBLE)
@@ -321,6 +326,38 @@ do { \
#define SUPPORT_ANY_SH5 \
(SUPPORT_ANY_SH5_32MEDIA || SUPPORT_ANY_SH5_64MEDIA)
+/* Check if we have support for optimized software floating point using
+ dynamic shifts - then some function calls clobber fewer registers. */
+#ifdef SUPPORT_SH3
+#define SUPPORT_SH3_OSFP 1
+#else
+#define SUPPORT_SH3_OSFP 0
+#endif
+
+#ifdef SUPPORT_SH3E
+#define SUPPORT_SH3E_OSFP 1
+#else
+#define SUPPORT_SH3E_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_NOFPU) || defined(SUPPORT_SH3_OSFP)
+#define SUPPORT_SH4_NOFPU_OSFP 1
+#else
+#define SUPPORT_SH4_NOFPU_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_SINGLE_ONLY) || defined (SUPPORT_SH3E_OSFP)
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 1
+#else
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 0
+#endif
+
+#define TARGET_OSFP (0 \
+ || (TARGET_SH3 && !TARGET_SH2E && SUPPORT_SH3_OSFP) \
+ || (TARGET_SH3E && SUPPORT_SH3E_OSFP) \
+ || (TARGET_HARD_SH4 && !TARGET_SH2E && SUPPORT_SH4_NOFPU_OSFP) \
+ || (TARGET_HARD_SH4 && TARGET_SH2E && SUPPORT_SH4_SINGLE_ONLY_OSFP))
+
/* Reset all target-selection flags. */
#define MASK_ARCH (MASK_SH1 | MASK_SH2 | MASK_SH3 | MASK_SH_E | MASK_SH4 \
| MASK_HARD_SH2A | MASK_HARD_SH2A_DOUBLE | MASK_SH4A \
@@ -2120,6 +2157,12 @@ struct sh_args {
#define LIBGCC2_DOUBLE_TYPE_SIZE 64
#endif
+#if defined(__SH2E__) || defined(__SH3E__) || defined( __SH4_SINGLE_ONLY__)
+#define LIBGCC2_DOUBLE_TYPE_SIZE 32
+#else
+#define LIBGCC2_DOUBLE_TYPE_SIZE 64
+#endif
+
/* 'char' is signed by default. */
#define DEFAULT_SIGNED_CHAR 1
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh.md gcc-4.5.0/gcc/config/sh/sh.md
--- gcc-4.5.0/gcc/config/sh/sh.md 2009-11-22 04:21:07.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh.md 2010-07-14 18:47:04.000000000 +0530
@@ -106,6 +106,7 @@
(DR0_REG 64)
(DR2_REG 66)
(DR4_REG 68)
+ (FR4_REG 68)
(FR23_REG 87)
(TR0_REG 128)
@@ -173,6 +174,15 @@
(UNSPECV_WINDOW_END 10)
(UNSPECV_CONST_END 11)
(UNSPECV_EH_RETURN 12)
+ ;; NaN handling for software floating point:
+ ;; We require one bit specific for a precision to be set in all NaNs,
+ ;; so that we can test them with a not / tst sequence.
+ ;; ??? Ironically, this is the quiet bit for now, because that is the
+ ;; only bit set by __builtin_nan ("").
+ ;; ??? Should really use one bit lower and force it set by using
+ ;; a custom encoding function.
+ (SF_NAN_MASK 0x7fc00000)
+ (DF_NAN_MASK 0x7ff80000)
])
;; -------------------------------------------------------------------------
@@ -614,6 +624,14 @@
cmp/eq %1,%0"
[(set_attr "type" "mt_group")])
+(define_insn "fpcmp_i1"
+ [(set (reg:SI T_REG)
+ (match_operator:SI 1 "soft_fp_comparison_operator"
+ [(match_operand 0 "soft_fp_comparison_operand" "r") (const_int 0)]))]
+ "TARGET_SH1_SOFTFP"
+ "tst %0,%0"
+ [(set_attr "type" "mt_group")])
+
(define_insn "cmpgtsi_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:SI 0 "arith_reg_operand" "r,r")
@@ -1153,9 +1171,9 @@
(define_insn "*movsicc_t_false"
[(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
- (if_then_else (eq (reg:SI T_REG) (const_int 0))
- (match_operand:SI 1 "general_movsrc_operand" "r,I08")
- (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+ (if_then_else:SI (eq (reg:SI T_REG) (const_int 0))
+ (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+ (match_operand:SI 2 "arith_reg_operand" "0,0")))]
"TARGET_PRETEND_CMOVE
&& (arith_reg_operand (operands[1], SImode)
|| (immediate_operand (operands[1], SImode)
@@ -1166,9 +1184,9 @@
(define_insn "*movsicc_t_true"
[(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
- (if_then_else (ne (reg:SI T_REG) (const_int 0))
- (match_operand:SI 1 "general_movsrc_operand" "r,I08")
- (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+ (if_then_else:SI (ne (reg:SI T_REG) (const_int 0))
+ (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+ (match_operand:SI 2 "arith_reg_operand" "0,0")))]
"TARGET_PRETEND_CMOVE
&& (arith_reg_operand (operands[1], SImode)
|| (immediate_operand (operands[1], SImode)
@@ -6833,6 +6851,50 @@ label:
\f
;; Conditional branch insns
+(define_expand "cmpun_sdf"
+ [(unordered (match_operand 0 "" "") (match_operand 1 "" ""))]
+ ""
+ "
+{
+ HOST_WIDE_INT mask;
+ switch (GET_MODE (operands[0]))
+ {
+ case SFmode:
+ mask = SF_NAN_MASK;
+ break;
+ case DFmode:
+ mask = DF_NAN_MASK;
+ break;
+ default:
+ FAIL;
+ }
+ emit_insn (gen_cmpunsf_i1 (operands[0], operands[1],
+ force_reg (SImode, GEN_INT (mask))));
+ DONE;
+}")
+
+(define_expand "cmpuneq_sdf"
+ [(uneq (match_operand 0 "" "") (match_operand 1 "" ""))]
+ ""
+ "
+{
+ HOST_WIDE_INT mask;
+ switch (GET_MODE (operands[0]))
+ {
+ case SFmode:
+ mask = SF_NAN_MASK;
+ break;
+ case DFmode:
+ mask = DF_NAN_MASK;
+ break;
+ default:
+ FAIL;
+ }
+ emit_insn (gen_cmpuneqsf_i1 (operands[0], operands[1],
+ force_reg (SImode, GEN_INT (mask))));
+ DONE;
+}")
+
(define_expand "cbranchint4_media"
[(set (pc)
(if_then_else (match_operator 0 "shmedia_cbranch_comparison_operator"
@@ -9340,6 +9402,19 @@ mov.l\\t1f,r0\\n\\
+(define_expand "sunle"
+ [(set (match_operand:SI 0 "arith_reg_operand" "")
+ (match_dup 1))]
+ "TARGET_SH1_SOFTFP"
+ "
+{
+ if (! currently_expanding_to_rtl)
+ FAIL;
+ sh_emit_compare_and_branch (operands, UNLE);
+ emit_insn (gen_movt (operands[0]));
+ DONE;
+}")
+
;; sne moves the complement of the T reg to DEST like this:
;; cmp/eq ...
;; mov #-1,temp
@@ -9750,7 +9825,7 @@ mov.l\\t1f,r0\\n\\
[(set (match_operand:SF 0 "arith_reg_operand" "")
(plus:SF (match_operand:SF 1 "arith_reg_operand" "")
(match_operand:SF 2 "arith_reg_operand" "")))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SH2E)
@@ -9758,6 +9833,12 @@ mov.l\\t1f,r0\\n\\
expand_sf_binop (&gen_addsf3_i, operands);
DONE;
}
+ else if (TARGET_OSFP)
+ {
+ expand_sfunc_binop (SFmode, &gen_addsf3_i3, \"__addsf3\", PLUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*addsf3_media"
@@ -9856,6 +9937,22 @@ mov.l\\t1f,r0\\n\\
}"
[(set_attr "type" "fparith_media")])
+(define_insn "addsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (plus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R6_REG))
+ (clobber (reg:SI R7_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "addsf3_i"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(plus:SF (match_operand:SF 1 "fp_arith_reg_operand" "%0")
@@ -9870,7 +9967,7 @@ mov.l\\t1f,r0\\n\\
[(set (match_operand:SF 0 "fp_arith_reg_operand" "")
(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
(match_operand:SF 2 "fp_arith_reg_operand" "")))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SH2E)
@@ -9878,6 +9975,12 @@ mov.l\\t1f,r0\\n\\
expand_sf_binop (&gen_subsf3_i, operands);
DONE;
}
+ else if (TARGET_OSFP)
+ {
+ expand_sfunc_binop (SFmode, &gen_subsf3_i3, \"__subsf3\", MINUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*subsf3_media"
@@ -9888,6 +9991,23 @@ mov.l\\t1f,r0\\n\\
"fsub.s %1, %2, %0"
[(set_attr "type" "fparith_media")])
+(define_insn "subsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (minus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R5_REG))
+ (clobber (reg:SI R6_REG))
+ (clobber (reg:SI R7_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "subsf3_i"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")
@@ -9900,11 +10020,32 @@ mov.l\\t1f,r0\\n\\
(define_expand "mulsf3"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "")
- (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
- (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+ (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+ (match_operand:SF 2 "fp_arith_reg_operand" "")))]
"TARGET_SH2E || TARGET_SHMEDIA_FPU"
"")
+;(define_expand "mulsf3"
+; [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+; (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+; (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+; "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
+; "
+; {
+; if (TARGET_SH4 || TARGET_SH2A_SINGLE)
+; expand_sf_binop (&gen_mulsf3_i4, operands);
+; else if (TARGET_SH2E)
+; emit_insn (gen_mulsf3_ie (operands[0], operands[1], operands[2]));
+; else if (TARGET_OSFP)
+; {
+; expand_sfunc_binop (SFmode, &gen_mulsf3_i3, \"__mulsf3\", MULT,
+; operands);
+; DONE;
+; }
+; if (! TARGET_SHMEDIA)
+; DONE;
+; }")
+
(define_insn "*mulsf3_media"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -9944,6 +10085,22 @@ mov.l\\t1f,r0\\n\\
[(set_attr "type" "fp")
(set_attr "fp_mode" "single")])
+(define_insn "mulsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (mult:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "mac_media"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(plus:SF (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -10104,6 +10261,149 @@ mov.l\\t1f,r0\\n\\
"ftrc %1,%0"
[(set_attr "type" "fp")])
+(define_insn "cmpnesf_i1"
+ [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+ (compare:CC_FP_NE (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtsf_i1"
+ [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+ (compare:CC_FP_GT (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltsf_i1"
+ [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+ (compare:CC_FP_UNLT (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqsf_i1_finite"
+ [(set (reg:SI T_REG)
+ (eq:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+ (clobber (match_scratch:SI 2 "=0,1,?r"))]
+ "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+ "*
+{
+ if (which_alternative == 0)
+ output_asm_insn (\"cmp/eq\t%0,%1\;or\t%1,%2\;bt\t0f\", operands);
+ else if (which_alternative == 1)
+ output_asm_insn (\"cmp/eq\t%0,%1\;or\t%0,%2\;bt\t0f\", operands);
+ else
+ output_asm_insn (\"cmp/eq\t%0,%1\;mov\t%0,%2\;bt\t0f\;or\t%1,%2\",
+ operands);
+ return \"add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+ [(set_attr "length" "10,10,12")])
+
+(define_insn "cmplesf_i1_finite"
+ [(set (reg:SI T_REG)
+ (le:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+ (clobber (match_scratch:SI 2 "=0,1,r"))]
+ "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+ "*
+{
+ output_asm_insn (\"cmp/pz\t%0\", operands);
+ if (which_alternative == 2)
+ output_asm_insn (\"mov\t%0,%2\", operands);
+ if (TARGET_SH2)
+ output_asm_insn (\"bf/s\t0f\;cmp/hs\t%1,%0\;cmp/ge\t%0,%1\", operands);
+ else
+ output_asm_insn (\"bt\t1f\;bra\t0f\;cmp/hs\t%1,%0\\n1:\tcmp/ge\t%0,%1\",
+ operands);
+ if (which_alternative == 1)
+ output_asm_insn (\"or\t%0,%2\", operands);
+ else
+ output_asm_insn (\"or\t%1,%2\", operands);
+ return \"bt\t0f\;add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+ [(set_attr "length" "18,18,20")])
+
+(define_insn "cmpunsf_i1"
+ [(set (reg:SI T_REG)
+ (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+ (clobber (match_scratch:SI 3 "=0,&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+ [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqsf_i1"
+ [(set (reg:SI T_REG)
+ (uneq:SI (match_operand:SF 0 "arith_reg_operand" "r")
+ (match_operand:SF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "*
+{
+ output_asm_insn (\"not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\", operands);
+ output_asm_insn (\"bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%0,%1\", operands);
+ output_asm_insn (\"mov\t%0,%3\;bt\t0f\;or\t%1,%3\", operands);
+ return \"add\t%3,%3\;tst\t%3,%3\\n0:\";
+}"
+ [(set_attr "length" "24")])
+
+(define_insn "movcc_fp_ne"
+ [(set (match_operand:CC_FP_NE 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_NE 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_gt"
+ [(set (match_operand:CC_FP_GT 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_GT 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_unlt"
+ [(set (match_operand:CC_FP_UNLT 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_UNLT 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
(define_insn "cmpgtsf_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
@@ -10131,6 +10431,22 @@ mov.l\\t1f,r0\\n\\
"* return output_ieee_ccmpeq (insn, operands);"
[(set_attr "length" "4")])
+(define_insn "*cmpltgtsf_t"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+ "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")])
+
+(define_insn "*cmporderedsf_t"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+ "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")])
+
(define_insn "cmpgtsf_t_i4"
[(set (reg:SI T_REG)
@@ -10163,6 +10479,26 @@ mov.l\\t1f,r0\\n\\
[(set_attr "length" "4")
(set_attr "fp_mode" "single")])
+(define_insn "*cmpltgtsf_t_4"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "single")])
+
+(define_insn "*cmporderedsf_t_4"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "single")])
+
(define_insn "cmpeqsf_media"
[(set (match_operand:SI 0 "register_operand" "=r")
(eq:SI (match_operand:SF 1 "fp_arith_reg_operand" "f")
@@ -10411,11 +10747,39 @@ mov.l\\t1f,r0\\n\\
[(set_attr "type" "fmove")
(set_attr "fp_mode" "single")])
+(define_expand "abssc2"
+ [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+ (abs:SF (match_operand:SC 1 "fp_arith_reg_operand" "")))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "
+{
+ expand_sfunc_unop (SCmode, &gen_abssc2_i3, \"__hypotf\", ABS, operands);
+ DONE;
+}")
+
+(define_insn "abssc2_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (abs:SF (reg:SC R4_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI R5_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "adddf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(plus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
(match_operand:DF 2 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_FPU_DOUBLE || TARGET_SH3"
"
{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10423,6 +10787,12 @@ mov.l\\t1f,r0\\n\\
expand_df_binop (&gen_adddf3_i, operands);
DONE;
}
+ else if (TARGET_SH3)
+ {
+ expand_sfunc_binop (DFmode, &gen_adddf3_i3_wrap, \"__adddf3\", PLUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*adddf3_media"
@@ -10443,6 +10813,30 @@ mov.l\\t1f,r0\\n\\
[(set_attr "type" "dfp_arith")
(set_attr "fp_mode" "double")])
+(define_expand "adddf3_i3_wrap"
+ [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+ "TARGET_SH3"
+ "
+{
+ emit_insn (gen_adddf3_i3 (operands[1]));
+ emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+ DONE;
+}")
+
+(define_insn "adddf3_i3"
+ [(set (reg:DF R0_REG)
+ (plus:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:DI R2_REG))
+ (clobber (reg:DF R4_REG))
+ (clobber (reg:DF R6_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH3"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "subdf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(minus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10479,7 +10873,7 @@ mov.l\\t1f,r0\\n\\
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(mult:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
(match_operand:DF 2 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_FPU_DOUBLE || TARGET_SH3"
"
{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10487,6 +10881,12 @@ mov.l\\t1f,r0\\n\\
expand_df_binop (&gen_muldf3_i, operands);
DONE;
}
+ else if (TARGET_SH3)
+ {
+ expand_sfunc_binop (DFmode, &gen_muldf3_i3_wrap, \"__muldf3\", MULT,
+ operands);
+ DONE;
+ }
}")
(define_insn "*muldf3_media"
@@ -10507,6 +10907,32 @@ mov.l\\t1f,r0\\n\\
[(set_attr "type" "dfp_mul")
(set_attr "fp_mode" "double")])
+(define_expand "muldf3_i3_wrap"
+ [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+ "TARGET_SH3"
+ "
+{
+ emit_insn (gen_muldf3_i3 (operands[1]));
+ emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+ DONE;
+}")
+
+(define_insn "muldf3_i3"
+ [(set (reg:DF R0_REG)
+ (mult:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:DI R2_REG))
+ (clobber (reg:DF R4_REG))
+ (clobber (reg:DF R6_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH3"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "divdf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(div:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10636,6 +11062,73 @@ mov.l\\t1f,r0\\n\\
;; (use (match_dup 2))])
;; (set (match_dup 0) (reg:SI FPUL_REG))])
+(define_insn "cmpnedf_i1"
+ [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+ (compare:CC_FP_NE (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtdf_i1"
+ [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+ (compare:CC_FP_GT (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltdf_i1"
+ [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+ (compare:CC_FP_UNLT (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqdf_i1_finite"
+ [(set (reg:SI T_REG)
+ (eq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (clobber (match_scratch:SI 2 "=&r"))]
+ "TARGET_SH1_SOFTFP && flag_finite_math_only"
+ "cmp/eq\t%R0,%R1\;mov\t%S0,%2\;bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;or\t%S1,%2\;add\t%2,%2\;or\t%R0,%2\;tst\t%2,%2\\n0:"
+ [(set_attr "length" "18")])
+
+(define_insn "cmpundf_i1"
+ [(set (reg:SI T_REG)
+ (unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
+ (match_operand:DF 1 "arith_reg_operand" "r,r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+ (clobber (match_scratch:SI 3 "=0,&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+ [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqdf_i1"
+ [(set (reg:SI T_REG)
+ (uneq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
+ "TARGET_SH1_SOFTFP"
+ "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%R0,%R1\; bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;mov\t%S0,%3\;or\t%S1,%3\;add\t%3,%3\;or\t%R0,%3\;tst\t%3,%3\\n0:"
+ [(set_attr "length" "30")])
+
(define_insn "cmpgtdf_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:DF 0 "arith_reg_operand" "f")
@@ -10667,6 +11160,26 @@ mov.l\\t1f,r0\\n\\
[(set_attr "length" "4")
(set_attr "fp_mode" "double")])
+(define_insn "*cmpltgtdf_t"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+ (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_DOUBLE"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "double")])
+
+(define_insn "*cmpordereddf_t_4"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+ (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "double")])
+
(define_insn "cmpeqdf_media"
[(set (match_operand:SI 0 "register_operand" "=r")
(eq:SI (match_operand:DF 1 "fp_arith_reg_operand" "f")
@@ -10835,16 +11348,82 @@ mov.l\\t1f,r0\\n\\
[(set_attr "type" "fp")
(set_attr "fp_mode" "double")])
+;; ??? In order to use this efficiently, we'd have to have an extra
+;; register class for r0 and r1 - and that would cause repercussions in
+;; register allocation elsewhere. So just say we clobber r0 / r1, and
+;; that we can use an arbitrary target. */
+(define_insn_and_split "extendsfdf2_i1"
+ [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+ (float_extend:DF (reg:SF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (reg:DF R0_REG))]
+ "emit_insn (gen_extendsfdf2_i1_r0 (operands[1]));"
+ [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i1_r0"
+ [(set (reg:DF R0_REG) (float_extend:DF (reg:SF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn_and_split "extendsfdf2_i2e"
+ [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+ (float_extend:DF (reg:SF FR4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI FPUL_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (reg:DF R0_REG))]
+ "emit_insn (gen_extendsfdf2_i2e_r0 (operands[1]));"
+ [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i2e_r0"
+ [(set (reg:DF R0_REG) (float_extend:DF (reg:SF FR4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI FPUL_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "truncdfsf2"
[(set (match_operand:SF 0 "fpul_operand" "")
- (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
- "
-{
+ (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
+ "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "
+{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
{
emit_df_insn (gen_truncdfsf2_i4 (operands[0], operands[1],
- get_fpscr_rtx ()));
+ get_fpscr_rtx ()));
DONE;
}
}")
@@ -10864,6 +11443,23 @@ mov.l\\t1f,r0\\n\\
"fcnvds %1,%0"
[(set_attr "type" "fp")
(set_attr "fp_mode" "double")])
+
+(define_insn "truncdfsf2_i2e"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=w")
+ (float_truncate:SF (reg:DF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI FPUL_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
\f
;; Bit field extract patterns. These give better code for packed bitfields,
;; because they allow auto-increment addresses to be generated.
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh-modes.def gcc-4.5.0/gcc/config/sh/sh-modes.def
--- gcc-4.5.0/gcc/config/sh/sh-modes.def 2007-08-02 16:19:31.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh-modes.def 2010-07-14 11:58:33.000000000 +0530
@@ -22,6 +22,11 @@ PARTIAL_INT_MODE (SI);
/* PDI mode is used to represent a function address in a target register. */
PARTIAL_INT_MODE (DI);
+/* For software floating point comparisons. */
+CC_MODE (CC_FP_NE);
+CC_MODE (CC_FP_GT);
+CC_MODE (CC_FP_UNLT);
+
/* Vector modes. */
VECTOR_MODE (INT, QI, 2); /* V2QI */
VECTOR_MODES (INT, 4); /* V4QI V2HI */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh.opt gcc-4.5.0/gcc/config/sh/sh.opt
--- gcc-4.5.0/gcc/config/sh/sh.opt 2009-07-20 13:07:37.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh.opt 2010-07-14 11:58:33.000000000 +0530
@@ -21,7 +21,7 @@
;; Used for various architecture options.
Mask(SH_E)
-;; Set if the default precision of th FPU is single.
+;; Set if the default precision of the FPU is single.
Mask(FPU_SINGLE)
;; Set if we should generate code using type 2A insns.
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh-protos.h gcc-4.5.0/gcc/config/sh/sh-protos.h
--- gcc-4.5.0/gcc/config/sh/sh-protos.h 2009-12-01 03:08:46.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh-protos.h 2010-07-14 11:58:33.000000000 +0530
@@ -25,8 +25,13 @@ along with GCC; see the file COPYING3.
#define GCC_SH_PROTOS_H
enum sh_function_kind {
- /* A function with normal C ABI */
+ /* A function with normal C ABI, or an SH1..SH4 sfunc that may resolved via
+ a PLT. */
FUNCTION_ORDINARY,
+ /* A function that is a bit large to put it in every calling dso, but that's
+ typically used often enough so that calling via GOT makes sense for
+ speed. */
+ SFUNC_FREQUENT,
/* A special function that guarantees that some otherwise call-clobbered
registers are not clobbered. These can't go through the SH5 resolver,
because it only saves argument passing registers. */
@@ -116,6 +121,10 @@ extern void expand_sf_binop (rtx (*)(rtx
extern void expand_df_unop (rtx (*)(rtx, rtx, rtx), rtx *);
extern void expand_df_binop (rtx (*)(rtx, rtx, rtx, rtx), rtx *);
extern void expand_fp_branch (rtx (*)(void), rtx (*)(void));
+extern void expand_sfunc_unop (enum machine_mode, rtx (*) (rtx, rtx),
+ const char *, enum rtx_code code, rtx *);
+extern void expand_sfunc_binop (enum machine_mode, rtx (*) (rtx, rtx),
+ const char *, enum rtx_code code, rtx *);
extern int sh_insn_length_adjustment (rtx);
extern int sh_can_redirect_branch (rtx, rtx);
extern void sh_expand_unop_v2sf (enum rtx_code, rtx, rtx);
@@ -133,6 +142,8 @@ extern struct rtx_def *get_fpscr_rtx (vo
extern int sh_media_register_for_return (void);
extern void sh_expand_prologue (void);
extern void sh_expand_epilogue (bool);
+extern void sh_expand_float_cbranch (rtx operands[4]);
+extern void sh_expand_float_scc (rtx operands[4]);
extern int sh_need_epilogue (void);
extern void sh_set_return_address (rtx, rtx);
extern int initial_elimination_offset (int, int);
@@ -176,6 +187,7 @@ struct secondary_reload_info;
extern enum reg_class sh_secondary_reload (bool, rtx, enum reg_class,
enum machine_mode,
struct secondary_reload_info *);
+extern int sh_match_adjust (rtx, int);
extern int sh2a_get_function_vector_number (rtx);
extern int sh2a_is_function_vector_call (rtx);
extern void sh_fix_range (const char *);
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/t-sh gcc-4.5.0/gcc/config/sh/t-sh
--- gcc-4.5.0/gcc/config/sh/t-sh 2009-08-23 03:13:07.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/t-sh 2010-07-14 17:23:41.000000000 +0530
@@ -26,6 +26,10 @@ LIB1ASMSRC = sh/lib1funcs.asm
LIB1ASMFUNCS = _ashiftrt _ashiftrt_n _ashiftlt _lshiftrt _movmem \
_movmem_i4 _mulsi3 _sdivsi3 _sdivsi3_i4 _udivsi3 _udivsi3_i4 _set_fpscr \
_div_table _udiv_qrnnd_16 \
+ _nesf2 _nedf2 _gtsf2t _gtdf2t _gesf2f _gedf2f \
+ _add_sub_sf3 _mulsf3 _hypotf _muldf3 _add_sub_df3 _divdf3 \
+ _fixunssfsi _fixsfsi _fixunsdfsi _fixdfsi _floatunssisf _floatsisf \
+ _floatunssidf _floatsidf \
$(LIB1ASMFUNCS_CACHE)
LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_invalidate_array
@@ -131,17 +135,17 @@ OPT_EXTRA_PARTS= libgcc-Os-4-200.a libgc
EXTRA_MULTILIB_PARTS= $(IC_EXTRA_PARTS) $(OPT_EXTRA_PARTS)
$(T)ic_invalidate_array_4-100.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4-100.a: $(T)ic_invalidate_array_4-100.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-100.a $(T)ic_invalidate_array_4-100.o
$(T)ic_invalidate_array_4-200.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4-200.a: $(T)ic_invalidate_array_4-200.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-200.a $(T)ic_invalidate_array_4-200.o
$(T)ic_invalidate_array_4a.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4a.a: $(T)ic_invalidate_array_4a.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4a.a $(T)ic_invalidate_array_4a.o
[-- Attachment #3: sh_softfp.patch --]
[-- Type: application/octet-stream, Size: 5477 bytes --]
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config.gcc gcc-4.5.0/gcc/config.gcc
--- gcc-4.5.0/gcc/config.gcc 2010-04-07 16:04:00.000000000 +0530
+++ gcc-4.5.0/gcc/config.gcc 2010-07-14 18:43:09.000000000 +0530
@@ -2167,7 +2167,7 @@ sh-*-symbianelf* | sh[12346l]*-*-symbian
sh-*-linux* | sh[2346lbe]*-*-linux* | \
sh-*-netbsdelf* | shl*-*-netbsdelf* | sh5-*-netbsd* | sh5l*-*-netbsd* | \
sh64-*-netbsd* | sh64l*-*-netbsd*)
- tmake_file="${tmake_file} sh/t-sh sh/t-elf"
+ tmake_file="${tmake_file} sh/t-sh sh/t-elf sh/t-fprules-softfp soft-fp/t-softfp"
if test x${with_endian} = x; then
case ${target} in
sh[1234]*be-*-* | sh[1234]*eb-*-*) with_endian=big ;;
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sfp-machine.h gcc-4.5.0/gcc/config/sh/sfp-machine.h
--- gcc-4.5.0/gcc/config/sh/sfp-machine.h 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sfp-machine.h 2010-07-14 18:45:09.000000000 +0530
@@ -0,0 +1,67 @@
+#define _FP_W_TYPE_SIZE 32
+#define _FP_W_TYPE unsigned long
+#define _FP_WS_TYPE signed long
+#define _FP_I_TYPE long
+
+/* The type of the result of a floating point comparison. This must
+ match `__libgcc_cmp_return__' in GCC for the target. */
+typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
+#define CMPtype __gcc_CMPtype
+
+#define _FP_MUL_MEAT_S(R,X,Y) \
+ _FP_MUL_MEAT_1_wide(_FP_WFRACBITS_S,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_D(R,X,Y) \
+ _FP_MUL_MEAT_2_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_Q(R,X,Y) \
+ _FP_MUL_MEAT_4_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+
+#define _FP_DIV_MEAT_S(R,X,Y) _FP_DIV_MEAT_1_loop(S,R,X,Y)
+#define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_2_udiv(D,R,X,Y)
+#define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
+
+#define _FP_NANFRAC_S ((_FP_QNANBIT_S << 1) - 1)
+#define _FP_NANFRAC_D ((_FP_QNANBIT_D << 1) - 1), -1
+#define _FP_NANFRAC_Q ((_FP_QNANBIT_Q << 1) - 1), -1, -1, -1
+#define _FP_NANSIGN_S 0
+#define _FP_NANSIGN_D 0
+#define _FP_NANSIGN_Q 0
+
+#define _FP_KEEPNANFRACP 1
+
+/* Someone please check this. */
+#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP) \
+ do { \
+ if ((_FP_FRAC_HIGH_RAW_##fs(X) & _FP_QNANBIT_##fs) \
+ && !(_FP_FRAC_HIGH_RAW_##fs(Y) & _FP_QNANBIT_##fs)) \
+ { \
+ R##_s = Y##_s; \
+ _FP_FRAC_COPY_##wc(R,Y); \
+ } \
+ else \
+ { \
+ R##_s = X##_s; \
+ _FP_FRAC_COPY_##wc(R,X); \
+ } \
+ R##_c = FP_CLS_NAN; \
+ } while (0)
+
+#define __LITTLE_ENDIAN 1234
+#define __BIG_ENDIAN 4321
+
+#if defined __BIG_ENDIAN__ || defined _BIG_ENDIAN
+# if defined __LITTLE_ENDIAN__ || defined _LITTLE_ENDIAN
+# error "Both BIG_ENDIAN and LITTLE_ENDIAN defined!"
+# endif
+# define __BYTE_ORDER __BIG_ENDIAN
+#else
+# if defined __LITTLE_ENDIAN__ || defined _LITTLE_ENDIAN
+# define __BYTE_ORDER __LITTLE_ENDIAN
+# else
+# error "Cannot determine current byte order"
+# endif
+#endif
+
+/* Define ALIASNAME as a strong alias for NAME. */
+# define strong_alias(name, aliasname) _strong_alias(name, aliasname)
+# define _strong_alias(name, aliasname) \
+ extern __typeof (name) aliasname __attribute__ ((alias (#name)));
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/t-fprules-softfp gcc-4.5.0/gcc/config/sh/t-fprules-softfp
--- gcc-4.5.0/gcc/config/sh/t-fprules-softfp 1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/t-fprules-softfp 2010-07-14 18:43:09.000000000 +0530
@@ -0,0 +1,6 @@
+softfp_float_modes := sf df
+softfp_int_modes := si di
+softfp_extensions := sfdf
+softfp_truncations := dfsf
+softfp_machine_header := sh/sfp-machine.h
+softfp_exclude_libgcc2 := y
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/t-sh gcc-4.5.0/gcc/config/sh/t-sh
--- gcc-4.5.0/gcc/config/sh/t-sh 2009-08-23 03:13:07.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/t-sh 2010-07-14 18:43:09.000000000 +0530
@@ -31,24 +31,6 @@ LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_
TARGET_LIBGCC2_CFLAGS = -mieee
-# We want fine grained libraries, so use the new code to build the
-# floating point emulation libraries.
-FPBIT = fp-bit.c
-DPBIT = dp-bit.c
-
-dp-bit.c: $(srcdir)/config/fp-bit.c
- echo '#ifdef __LITTLE_ENDIAN__' > dp-bit.c
- echo '#define FLOAT_BIT_ORDER_MISMATCH' >>dp-bit.c
- echo '#endif' >> dp-bit.c
- cat $(srcdir)/config/fp-bit.c >> dp-bit.c
-
-fp-bit.c: $(srcdir)/config/fp-bit.c
- echo '#define FLOAT' > fp-bit.c
- echo '#ifdef __LITTLE_ENDIAN__' >> fp-bit.c
- echo '#define FLOAT_BIT_ORDER_MISMATCH' >>fp-bit.c
- echo '#endif' >> fp-bit.c
- cat $(srcdir)/config/fp-bit.c >> fp-bit.c
-
DEFAULT_ENDIAN = $(word 1,$(TM_ENDIAN_CONFIG))
OTHER_ENDIAN = $(word 2,$(TM_ENDIAN_CONFIG))
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: SH optimized software floating point routines
2010-07-16 10:04 ` Naveen H. S
@ 2010-07-16 10:26 ` Joern Rennecke
2010-07-22 23:10 ` Joseph S. Myers
2010-07-16 14:01 ` Kaz Kojima
2010-07-17 13:30 ` Joern Rennecke
2 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-16 10:26 UTC (permalink / raw)
To: Naveen H. S; +Cc: Kaz Kojima, gcc, Prafulla Thakare
Quoting "Naveen H. S" <Naveen.S@kpitcummins.com>:
> extendsfdf2 - gcc.c-torture/execute/conversion.c
> gcc.dg/torture/fp-int-convert-float.c, gcc.dg/pr28796-2.c
Note that some tests invoke undefined behaviour; I've also come across this
when doing optimized soft FP for ARCompact:
http://gcc.gnu.org/viewcvs/branches/arc-4_4-20090909-branch/gcc/testsuite/gcc.dg/torture/fp-int-convert.h?r1=151539&r2=151545
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-16 10:04 ` Naveen H. S
2010-07-16 10:26 ` Joern Rennecke
@ 2010-07-16 14:01 ` Kaz Kojima
2010-07-17 14:31 ` Joern Rennecke
2010-07-17 13:30 ` Joern Rennecke
2 siblings, 1 reply; 30+ messages in thread
From: Kaz Kojima @ 2010-07-16 14:01 UTC (permalink / raw)
To: Naveen.S; +Cc: gcc, Prafulla.Thakare, amylaar
"Naveen H. S" <Naveen.S@kpitcummins.com> wrote:
>>> you can free to propose a complete and regtested patch for SH
>>> assembly soft fp against trunk.
>
> Please find attached the ported soft float patch "sh_softfloat.patch".
> The original patch was posted at the following link by Joern RENNECKE.
> http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00614.html
Your patches are for 4.5.0 and sh_softfloat.patch can't be
applicable for trunk cleanly. Please provide patches against
svn trunk and the ChangeLog entries for them.
Here is an incomplete list of comments for sh_softfloat.patch:
*target*:
New macro TARGET_MATCH_ADJUST requires a doc patch.
reload.c:
config/sh/lib1funcs.asm:
config/sh/lib1funcs.h:
Copyright years of these files should be updated.
All copyright of IEEE-754/* files should be GPLv3 with Runtime
library exception instead of v2 and 2010 should be added to
their copyright years.
Please see the one used in sh/lib1funcs.asm for example.
config/sh/sh.c:
>+/* Saved operands from the last compare to use when we generate an scc
>+ or bcc insn. */
>+
>+rtx sh_compare_op0;
>+rtx sh_compare_op1;
It looks that sh_compare_op0 snd sh_compare_op1 are set but
not used.
>+ REG_NOTES (last) = gen_rtx_EXPR_LIST (REG_EQUAL, equiv, REG_NOTES (last));
Use add_reg_note here. Several other similar cases.
> Please find attached the patch ""sh_softfp.patch" which implements basic
> support of soft-fp for SH target. There were no regressions found with
> the patch. Please let us know if there should be any further improvements
> required for complete soft-fp support.
sh_softfp.patch looks basically OK to me, though I'm curious
with numbers for fp-bit.c/softfp/softfloat. Could you show us
some real speed&size numbers?
Regards,
kaz
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: SH optimized software floating point routines
2010-07-16 10:04 ` Naveen H. S
2010-07-16 10:26 ` Joern Rennecke
2010-07-16 14:01 ` Kaz Kojima
@ 2010-07-17 13:30 ` Joern Rennecke
2010-07-19 0:59 ` Joern Rennecke
2 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-17 13:30 UTC (permalink / raw)
To: Naveen H. S; +Cc: Kaz Kojima, gcc, Prafulla Thakare
[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]
Quoting "Naveen H. S" <Naveen.S@kpitcummins.com>:
> Hi,
>
>>> you can free to propose a complete and regtested patch for SH
>>> assembly soft fp against trunk.
>
> Please find attached the ported soft float patch "sh_softfloat.patch".
> The original patch was posted at the following link by Joern RENNECKE.
> http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00614.html
>
> The following modifications have been done in the patch.
>
> sched-deps.c :- Hunk was not applied due to modifications in the
> current gcc source.
>
> t-sh :- divsf3, extendsfdf2 and truncdfsf2 routines are not included
> as they resulted in some regressions.
> divsf3 - c-c++-common/torture/complex-sign-mixed-div.c
> extendsfdf2 - gcc.c-torture/execute/conversion.c
> gcc.dg/torture/fp-int-convert-float.c, gcc.dg/pr28796-2.c
> gcc.dg/torture/type-generic-1.c
I've found some bugs in the SH[12] implementation of divsf3 / extendsf2.
What test case fails for truncsf2?
> sh.md :- cbranchsf4, cbranchdf4, cstoresf4, cstoredf4 instruction
> patterns are not included as they are already present in current source.
> Modifying the routines referring patch resulted in build failure.
Without optimized comparisons, conversions and division, the performance
will be heavily compromised.
Could you please test the attached patch.
[-- Attachment #2: sh-softfp-20100717-1350 --]
[-- Type: text/plain, Size: 285198 bytes --]
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi (revision 162269)
+++ gcc/doc/tm.texi (working copy)
@@ -2753,6 +2753,10 @@ of the individual moves due to expected
forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
@end deftypefn
+@deftypefn {Target Hook} int TARGET_MATCH_ADJUST (rtx, @var{int})
+This hook is documented in @file{target.def} / @file{targhooks.c}.
+@end deftypefn
+
@defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in (revision 162269)
+++ gcc/doc/tm.texi.in (working copy)
@@ -2753,6 +2753,8 @@ of the individual moves due to expected
forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
@end deftypefn
+@hook TARGET_MATCH_ADJUST
+
@defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c (revision 162269)
+++ gcc/targhooks.c (working copy)
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.
#include "reload.h"
#include "optabs.h"
#include "recog.h"
+#include "regs.h"
bool
@@ -906,6 +907,27 @@ default_secondary_reload (bool in_p ATTR
return rclass;
}
+/* Given an rtx and its regno, return a regno value that shall be used for
+ purposes of comparison in operands_match_p.
+ Generally, we say that integer registers are subject to big-endian
+ adjustment. This default target hook should generally work if the mode
+ of a register is a sufficient indication if this adjustment is to take
+ place; this will not work when software floating point is done in integer
+ registers. */
+int
+default_match_adjust (rtx x, int regno)
+{
+ /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+ multiple hard register group of scalar integer registers, so that
+ for example (reg:DI 0) and (reg:SI 1) will be considered the same
+ register. */
+ if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+ && SCALAR_INT_MODE_P (GET_MODE (x))
+ && regno < FIRST_PSEUDO_REGISTER)
+ regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+ return regno;
+}
+
void
default_target_option_override (void)
{
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h (revision 162269)
+++ gcc/targhooks.h (working copy)
@@ -121,6 +121,7 @@ extern const reg_class_t *default_ira_co
extern reg_class_t default_secondary_reload (bool, rtx, reg_class_t,
enum machine_mode,
secondary_reload_info *);
+extern int default_match_adjust (rtx, int);
extern void default_target_option_override (void);
extern void hook_void_bitmap (bitmap);
extern bool default_handle_c_option (size_t, const char *, int);
Index: gcc/target.def
===================================================================
--- gcc/target.def (revision 162269)
+++ gcc/target.def (working copy)
@@ -1945,6 +1945,14 @@ DEFHOOK
secondary_reload_info *sri),
default_secondary_reload)
+/* Take an rtx and its regno, and return the regno for purposes of
+ checking a matching constraint. */
+DEFHOOK
+(match_adjust,
+ "This hook is documented in @file{target.def} / @file{targhooks.c}.",
+ int, (rtx, int),
+ default_match_adjust)
+
/* This target hook allows the backend to perform additional
processing while initializing for variable expansion. */
DEFHOOK
Index: gcc/reload.c
===================================================================
--- gcc/reload.c (revision 162269)
+++ gcc/reload.c (working copy)
@@ -2216,14 +2216,8 @@ operands_match_p (rtx x, rtx y)
multiple hard register group of scalar integer registers, so that
for example (reg:DI 0) and (reg:SI 1) will be considered the same
register. */
- if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
- && SCALAR_INT_MODE_P (GET_MODE (x))
- && i < FIRST_PSEUDO_REGISTER)
- i += hard_regno_nregs[i][GET_MODE (x)] - 1;
- if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (y)) > UNITS_PER_WORD
- && SCALAR_INT_MODE_P (GET_MODE (y))
- && j < FIRST_PSEUDO_REGISTER)
- j += hard_regno_nregs[j][GET_MODE (y)] - 1;
+ i = targetm.match_adjust (x, i);
+ j = targetm.match_adjust (y, j);
return i == j;
}
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in (revision 162269)
+++ gcc/Makefile.in (working copy)
@@ -2806,7 +2806,7 @@ opts-common.o : opts-common.c opts.h opt
targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \
$(EXPR_H) $(TM_H) $(RTL_H) $(TM_P_H) $(FUNCTION_H) output.h $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
$(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) gt-targhooks.h \
- $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h
+ $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h $(REGS_H)
bversion.h: s-bversion; @true
s-bversion: BASE-VER
Index: gcc/config/sh/sh-protos.h
===================================================================
--- gcc/config/sh/sh-protos.h (revision 162269)
+++ gcc/config/sh/sh-protos.h (working copy)
@@ -25,8 +25,13 @@ along with GCC; see the file COPYING3.
#define GCC_SH_PROTOS_H
enum sh_function_kind {
- /* A function with normal C ABI */
+ /* A function with normal C ABI, or an SH1..SH4 sfunc that may resolved via
+ a PLT. */
FUNCTION_ORDINARY,
+ /* A function that is a bit large to put it in every calling dso, but that's
+ typically used often enough so that calling via GOT makes sense for
+ speed. */
+ SFUNC_FREQUENT,
/* A special function that guarantees that some otherwise call-clobbered
registers are not clobbered. These can't go through the SH5 resolver,
because it only saves argument passing registers. */
@@ -115,6 +120,10 @@ extern void expand_sf_binop (rtx (*)(rtx
extern void expand_df_unop (rtx (*)(rtx, rtx, rtx), rtx *);
extern void expand_df_binop (rtx (*)(rtx, rtx, rtx, rtx), rtx *);
extern void expand_fp_branch (rtx (*)(void), rtx (*)(void));
+extern void expand_sfunc_unop (enum machine_mode, rtx (*) (rtx, rtx),
+ const char *, enum rtx_code code, rtx *);
+extern void expand_sfunc_binop (enum machine_mode, rtx (*) (rtx, rtx),
+ const char *, enum rtx_code code, rtx *);
extern int sh_insn_length_adjustment (rtx);
extern int sh_can_redirect_branch (rtx, rtx);
extern void sh_expand_unop_v2sf (enum rtx_code, rtx, rtx);
@@ -132,6 +141,8 @@ extern struct rtx_def *get_fpscr_rtx (vo
extern int sh_media_register_for_return (void);
extern void sh_expand_prologue (void);
extern void sh_expand_epilogue (bool);
+extern void sh_expand_float_cbranch (rtx operands[4]);
+extern void sh_expand_float_scc (rtx operands[4]);
extern int sh_need_epilogue (void);
extern void sh_set_return_address (rtx, rtx);
extern int initial_elimination_offset (int, int);
@@ -176,6 +187,7 @@ struct secondary_reload_info;
extern reg_class_t sh_secondary_reload (bool, rtx, reg_class_t,
enum machine_mode,
struct secondary_reload_info *);
+extern int sh_match_adjust (rtx, int);
extern int sh2a_get_function_vector_number (rtx);
extern int sh2a_is_function_vector_call (rtx);
extern void sh_fix_range (const char *);
Index: gcc/config/sh/lib1funcs.asm
===================================================================
--- gcc/config/sh/lib1funcs.asm (revision 162269)
+++ gcc/config/sh/lib1funcs.asm (working copy)
@@ -3931,3 +3931,6 @@ GLOBAL(udiv_qrnnd_16):
ENDFUNC(GLOBAL(udiv_qrnnd_16))
#endif /* !__SHMEDIA__ */
#endif /* L_udiv_qrnnd_16 */
+
+#include "ieee-754-sf.S"
+#include "ieee-754-df.S"
Index: gcc/config/sh/t-sh
===================================================================
--- gcc/config/sh/t-sh (revision 162269)
+++ gcc/config/sh/t-sh (working copy)
@@ -26,6 +26,10 @@ LIB1ASMSRC = sh/lib1funcs.asm
LIB1ASMFUNCS = _ashiftrt _ashiftrt_n _ashiftlt _lshiftrt _movmem \
_movmem_i4 _mulsi3 _sdivsi3 _sdivsi3_i4 _udivsi3 _udivsi3_i4 _set_fpscr \
_div_table _udiv_qrnnd_16 \
+ _nesf2 _nedf2 _gtsf2t _gtdf2t _gesf2f _gedf2f _extendsfdf2 _truncdfsf2 \
+ _add_sub_sf3 _mulsf3 _hypotf _muldf3 _add_sub_df3 _divsf3 _divdf3 \
+ _fixunssfsi _fixsfsi _fixunsdfsi _fixdfsi _floatunssisf _floatsisf \
+ _floatunssidf _floatsidf \
$(LIB1ASMFUNCS_CACHE)
LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_invalidate_array
@@ -120,7 +124,6 @@ $(T)crtn.o: $(srcdir)/config/sh/crtn.asm
$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)crtn.o -x assembler-with-cpp $(srcdir)/config/sh/crtn.asm
$(out_object_file): gt-sh.h
-gt-sh.h : s-gtype ; @true
# These are not suitable for COFF.
# EXTRA_MULTILIB_PARTS= crt1.o crti.o crtn.o crtbegin.o crtend.o
@@ -131,17 +134,17 @@ OPT_EXTRA_PARTS= libgcc-Os-4-200.a libgc
EXTRA_MULTILIB_PARTS= $(IC_EXTRA_PARTS) $(OPT_EXTRA_PARTS)
$(T)ic_invalidate_array_4-100.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4-100.a: $(T)ic_invalidate_array_4-100.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-100.a $(T)ic_invalidate_array_4-100.o
$(T)ic_invalidate_array_4-200.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4-200.a: $(T)ic_invalidate_array_4-200.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-200.a $(T)ic_invalidate_array_4-200.o
$(T)ic_invalidate_array_4a.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4a.a: $(T)ic_invalidate_array_4a.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4a.a $(T)ic_invalidate_array_4a.o
Index: gcc/config/sh/sh.opt
===================================================================
--- gcc/config/sh/sh.opt (revision 162269)
+++ gcc/config/sh/sh.opt (working copy)
@@ -21,7 +21,7 @@
;; Used for various architecture options.
Mask(SH_E)
-;; Set if the default precision of th FPU is single.
+;; Set if the default precision of the FPU is single.
Mask(FPU_SINGLE)
;; Set if we should generate code using type 2A insns.
Index: gcc/config/sh/ieee-754-df.S
===================================================================
--- gcc/config/sh/ieee-754-df.S (revision 0)
+++ gcc/config/sh/ieee-754-df.S (revision 0)
@@ -0,0 +1,791 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_DOUBLE__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Double-precision floating-point emulation.
+ We handle NANs, +-infinity, and +-zero.
+ However, we assume that for NANs, the topmost bit of the fraction is set. */
+
+#ifdef __LITTLE_ENDIAN__
+#define DBL0L r4
+#define DBL0H r5
+#define DBL1L r6
+#define DBL1H r7
+#define DBLRL r0
+#define DBLRH r1
+#else
+#define DBL0L r5
+#define DBL0H r4
+#define DBL1L r7
+#define DBL1H r6
+#define DBLRL r1
+#define DBLRH r0
+#endif
+
+#ifdef __SH_FPU_ANY__
+#define RETURN_R0_MAIN
+#define RETURN_R0 bra LOCAL(return_r0)
+#define RETURN_FR0
+LOCAL(return_r0): \
+ lds r0,fpul; \
+ rts; \
+ fsts fpul,fr0
+#define ARG_TO_R4 \
+ flds fr4,fpul; \
+ sts fpul,r4
+#else /* ! __SH_FPU_ANY__ */
+#define RETURN_R0_MAIN rts
+#define RETURN_R0 rts
+#define RETURN_FR0
+#define ARG_TO_R4
+#endif /* ! __SH_FPU_ANY__ */
+
+#ifdef L_nedf2
+/* -ffinite-math-only -mb inline version, T := r4:DF == r6:DF
+ cmp/eq r5,r7
+ mov r4,r0
+ bf 0f
+ cmp/eq r4,r6
+ bt 0f
+ or r6,r0
+ add r0,r0
+ or r5,r0
+ tst r0,r0
+ 0: */
+ .balign 4
+ .global GLOBAL(nedf2)
+ HIDDEN_FUNC(GLOBAL(nedf2))
+GLOBAL(nedf2):
+ cmp/eq DBL0L,DBL1L
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ bf LOCAL(ne)
+ cmp/eq DBL0H,DBL1H
+ not DBL0H,r0
+ bt LOCAL(check_nan)
+ mov DBL0H,r0
+ or DBL1H,r0
+ add r0,r0
+ rts
+ or DBL0L,r0
+LOCAL(check_nan):
+ tst r1,r0
+ rts
+ movt r0
+LOCAL(ne):
+ rts
+ mov #1,r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(nedf2))
+#endif /* L_nedf2 */
+
+#ifdef L_unorddf2
+ .balign 4
+ .global GLOBAL(unorddf2)
+ HIDDEN_FUNC(GLOBAL(unorddf2))
+GLOBAL(unorddf2):
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ not DBL0H,r0
+ tst r1,r0
+ not r6,r0
+ bt LOCAL(unord)
+ tst r1,r0
+LOCAL(unord):
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(unorddf2))
+#endif /* L_unorddf2 */
+
+#if defined(L_gtdf2t) || defined(L_gtdf2t_trap)
+#ifdef L_gtdf2t
+#define fun_label GLOBAL(gtdf2t)
+#else
+#define fun_label GLOBAL(gtdf2t_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater, the result true, unless
+ any of them is a nan (but infinity is fine), or both values are
+ +- zero. Otherwise, the result false. */
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ cmp/pz DBL0H
+ not DBL1H,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov DBL0H,r0
+ bt LOCAL(nan) /* return zero if DBL1 is NAN. */
+ cmp/eq DBL1H,DBL0H
+ bt LOCAL(cmp_low)
+ cmp/gt DBL1H,DBL0H
+ or DBL1H,r0
+ SLC(bf, LOCAL(check_nan),
+ cmp/gt DBL0H,r1)
+ add r0,r0
+ bf LOCAL(nan) /* return zero if DBL0 is NAN. */
+ or DBL0L,r0
+ rts
+ or DBL1L,r0 /* non-zero unless both DBL0 and DBL1 are +-zero. */
+LOCAL(cmp_low):
+ cmp/hi DBL1L,DBL0L
+ rts
+ movt r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan) /* return zero if DBL1 is NAN. */
+ cmp/eq DBL1H,DBL0H
+ SLC(bt, LOCAL(neg_cmp_low),
+ cmp/hi DBL0L,DBL1L)
+ not DBL0H,r0
+ tst r1,r0
+ bt LOCAL(nan) /* return zero if DBL0 is NAN. */
+ cmp/hi DBL0H,DBL1H
+ SLI(rts !,)
+ SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+ SLI(cmp/hi DBL0L,DBL1L)
+ rts
+ movt r0
+LOCAL(check_nan):
+#ifdef L_gtdf2t
+LOCAL(nan):
+ rts
+ mov #0,r0
+#else
+ SLI(cmp/gt DBL0H,r1)
+ bf LOCAL(nan) /* return zero if DBL0 is NAN. */
+ rts
+ mov #0,r0
+LOCAL(nan):
+ mov #0,r0
+ trapa #0
+#endif
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(fun_label)
+#endif /* defined(L_gtdf2t) || defined(L_gtdf2t_trap) */
+
+#ifdef L_gedf2f
+ .balign 4
+ .global GLOBAL(gedf2f)
+ HIDDEN_FUNC(GLOBAL(gedf2f))
+GLOBAL(gedf2f):
+ /* If the raw values compare greater or equal, the result is
+ true, unless any of them is a nan, or both are the
+ same infinity. If both are -+zero, the result is true;
+ otherwise, it is false.
+ We use 0 as true and nonzero as false for this function. */
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ cmp/pz DBL1H
+ not DBL0H,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov DBL0H,r0
+ bt LOCAL(nan)
+ cmp/eq DBL0H,DBL1H
+ bt LOCAL(cmp_low)
+ cmp/gt DBL0H,DBL1H
+ or DBL1H,r0
+ SLC(bf, LOCAL(check_nan),
+ cmp/ge r1,DBL1H)
+ add r0,r0
+ bt LOCAL(nan)
+ or DBL0L,r0
+ rts
+ or DBL1L,r0
+LOCAL(cmp_low):
+ cmp/hi DBL0L,DBL1L
+#if defined(L_gedf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+ rts
+ movt r0
+#if defined(L_gedf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+ SLI(cmp/ge r1,DBL1H)
+LOCAL(nan):
+ rts
+ movt r0
+#elif defined(L_gedf2f_trap)
+LOCAL(check_nan):
+ SLI(cmp/ge r1,DBL1H)
+ bt LOCAL(nan)
+ rts
+LOCAL(nan):
+ movt r0
+ trapa #0
+#endif /* L_gedf2f_trap */
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ cmp/eq DBL0H,DBL1H
+ not DBL1H,r0
+ SLC(bt, LOCAL(neg_cmp_low),
+ cmp/hi DBL1L,DBL0L)
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi DBL1H,DBL0H
+ SLI(rts !,)
+ SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+ SLI(cmp/hi DBL1L,DBL0L)
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(gedf2f))
+#endif /* L_gedf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_extendsfdf2
+ .balign 4
+ .global GLOBAL(extendsfdf2)
+ FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+ ARG_TO_R4
+ mov.l LOCAL(x7f800000),r3
+ mov r4,DBLRL
+ tst r3,r4
+ bt LOCAL(zero_denorm)
+ mov.l LOCAL(xe0000000),r2
+ rotr DBLRL
+ rotr DBLRL
+ rotr DBLRL
+ and r2,DBLRL
+ mov r4,DBLRH
+ not r4,r2
+ tst r3,r2
+ mov.l LOCAL(x38000000),r2
+ bf 0f
+ add r2,r2 ! infinity / NaN adjustment
+0: shll DBLRH
+ shlr2 DBLRH
+ shlr2 DBLRH
+ add DBLRH,DBLRH
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+LOCAL(zero_denorm):
+ mov.l r4,@-r15
+ add r4,r4
+ tst r4,r4
+ bt LOCAL(zero)
+ mov.l LOCAL(x00ff0000),r3
+ mov.w LOCAL(x389),r2
+LOCAL(shift_byte):
+ tst r3,r4
+ shll8 r4
+ SL(bt, LOCAL(shift_byte),
+ add #-8,r2)
+LOCAL(shift_bit):
+ shll r4
+ SL(bf, LOCAL(shift_bit),
+ add #-1,r2)
+ mov #0,DBLRL
+ mov r4,DBLRH
+ mov.l @r15+,r4
+ shlr8 DBLRH
+ shlr2 DBLRH
+ shlr DBLRH
+ rotcr DBLRL
+ cmp/gt r4,DBLRH ! get sign
+ rotcr DBLRH
+ rotcr DBLRL
+ shll16 r2
+ shll8 r2
+ rts
+ add r2,DBLRH
+LOCAL(zero):
+ mov.l @r15+,DBLRH
+ rts
+ mov #0,DBLRL
+LOCAL(x389): .word 0x389
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(xe0000000):
+ .long 0xe0000000
+LOCAL(x00ff0000):
+ .long 0x00ff0000
+ ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+ .balign 4
+ .global GLOBAL(truncdfsf2)
+ FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+ mov.l LOCAL(x38000000),r3 ! exponent adjustment DF -> SF
+ mov DBL0H,r1
+ mov.l LOCAL(x70000000),r2 ! mask for out-of-range exponent bits
+ mov DBL0H,r0
+ mov.l DBL0L,@-r15
+ sub r3,r1
+ tst r2,r1
+ shll8 r0 !
+ shll2 r0 ! Isolate highpart fraction.
+ shll2 r0 !
+ bf LOCAL(ill_exp)
+ shll2 r1
+ mov.l LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits. */
+ shll2 r1
+ mov.l LOCAL(xff000000),r3
+ shlr8 r0
+ tst r2,DBL0L /* Check if msb guard bit wants rounding up. */
+ shlr16 DBL0L
+ shlr8 DBL0L
+ shlr2 DBL0L
+ SL1(bt, LOCAL(add_frac),
+ shlr2 DBL0L)
+ add #1,DBL0L
+LOCAL(add_frac):
+ add DBL0L,r0
+ mov.l LOCAL(x01000000),r2
+ and r3,r1
+ mov.l @r15+,DBL0L
+ add r1,r0
+ tst r3,r0
+ bt LOCAL(inf_denorm0)
+ cmp/hs r3,r0
+LOCAL(denorm_noup_sh1):
+ bt LOCAL(inf)
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+RETURN_R0_MAIN
+ rotcr r0
+RETURN_FR0
+LOCAL(inf_denorm0): ! We might need to undo previous rounding.
+ mov.l LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits. */
+ tst r1,r1
+ bf LOCAL(inf)
+ add #-1,r0
+ tst r3,DBL0L /* Check if msb guard bit was rounded up. */
+ mov.l LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits. */
+ addc r2,r0
+ shlr r0
+ tst r3,DBL0L /* Check if msb guard bit wants rounding up. */
+#ifdef DELAYED_BRANCHES
+ bt/s LOCAL(denorm_noup)
+#else
+ bt LOCAL(denorm_noup_sh1)
+#endif
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ add #1,r0
+LOCAL(denorm_noup):
+ RETURN_R0
+ rotcr r0
+LOCAL(ill_exp):
+ div0s DBL0H,r1
+ mov.l LOCAL(x7ff80000),r2
+ add r1,r1
+ bf LOCAL(inf_nan)
+ mov.w LOCAL(m32),r3 /* Handle denormal or zero. */
+ shlr16 r1
+ exts.w r1,r1
+ shll2 r1
+ add r1,r1
+ shlr8 r1
+ exts.w r1,r1
+ add #-8,r1 /* Go from 9 to 1 guard bit in MSW. */
+ cmp/gt r3,r1
+ mov.l @r15+,r3 /* DBL0L */
+ bf LOCAL(zero)
+ mov.l DBL0L, @-r15
+ shll8 DBL0L
+ rotcr r0 /* Insert leading 1. */
+ shlr16 r3
+ shll2 r3
+ add r3,r3
+ shlr8 r3
+ cmp/pl DBL0L /* Check lower 23 guard bits if guard bit 23 is 0. */
+ addc r3,r0 /* Assemble fraction with compressed guard bits. */
+ mov.l @r15+,DBL0L
+ mov #0,r2
+ neg r1,r1
+LOCAL(denorm_loop):
+ shlr r0
+ rotcl r2
+ dt r1
+ bf LOCAL(denorm_loop)
+ tst #2,r0
+ rotcl r0
+ tst r2,r2
+ rotcl r0
+ xor #3,r0
+ add #3,r0 /* Even overflow gives the correct result. */
+ shlr2 r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(zero):
+ mov #0,r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(inf_nan):
+ not DBL0H,r0
+ tst r2,r0
+ mov.l @r15+,DBL0L
+ bf LOCAL(inf)
+ RETURN_R0
+ mov #-1,r0 /* NAN */
+LOCAL(inf): /* r2 must be positive here. */
+ mov.l LOCAL(xffe00000),r0
+ div0s r2,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(m32):
+ .word -32
+ .balign 4
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(x70000000):
+ .long 0x70000000
+LOCAL(x2fffffff):
+ .long 0x2fffffff
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x5fffffff):
+ .long 0x5fffffff
+LOCAL(x7ff80000):
+ .long 0x7ff80000
+LOCAL(xffe00000):
+ .long 0xffe00000
+ ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+#ifdef L_add_sub_df3
+#include "IEEE-754/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift. Supporting SH1 / SH2 here would
+ make this code too hard to maintain, so if you want to add SH1 / SH2
+ support, do it in a separate copy. */
+#ifdef DYN_SHIFT
+#ifdef L_extendsfdf2
+ .balign 4
+ .global GLOBAL(extendsfdf2)
+ FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+ ARG_TO_R4
+ mov.l LOCAL(x7f800000),r2
+ mov #29,r3
+ mov r4,DBLRL
+ not r4,DBLRH
+ tst r2,r4
+ shld r3,DBLRL
+ bt LOCAL(zero_denorm)
+ mov #-3,r3
+ tst r2,DBLRH
+ mov r4,DBLRH
+ mov.l LOCAL(x38000000),r2
+ bt/s LOCAL(inf_nan)
+ shll DBLRH
+ shld r3,DBLRH
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+ .balign 4
+LOCAL(inf_nan):
+ shld r3,DBLRH
+ add r2,r2
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+LOCAL(zero_denorm):
+ mov.l r4,@-r15
+ add r4,r4
+ tst r4,r4
+ extu.w r4,r2
+ bt LOCAL(zero)
+ cmp/eq r4,r2
+ extu.b r4,r1
+ bf/s LOCAL(three_bytes)
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r4,r1
+ mov #22,DBLRH
+ bt LOCAL(one_byte)
+ shlr8 r2
+ mov #14,DBLRH
+LOCAL(one_byte):
+#ifdef __pic__
+ add r0,r2
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r2),r2
+ mov #21,r3
+ mov.w LOCAL(x0),DBLRL
+ sub r2,DBLRH
+LOCAL(norm_shift):
+ shld DBLRH,r4
+ mov.l @r15+,r2
+ shld r3,DBLRH
+ mov.l LOCAL(xb7ffffff),r3
+ add r4,DBLRH
+ cmp/pz r2
+ mov r2,r4
+ rotcr DBLRH
+ rts
+ sub r3,DBLRH
+LOCAL(three_bytes):
+ mov r4,r2
+ shlr16 r2
+#ifdef __pic__
+ add r0,r2
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r2),r2
+ mov #21,r3
+ mov #6-32,DBLRH
+ sub r2,DBLRH
+ mov r4,DBLRL
+ shld DBLRH,DBLRL
+ bra LOCAL(norm_shift)
+ add #32,DBLRH
+LOCAL(zero):
+ rts /* DBLRL has already been zeroed above. */
+ mov.l @r15+,DBLRH
+LOCAL(x0):
+ .word 0
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(xb7ffffff):
+ /* Flip sign back, do exponent adjustment, and remove leading one. */
+ .long 0x80000000 + 0x38000000 - 1
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+ .balign 4
+ .global GLOBAL(truncdfsf2)
+ FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+ mov.l LOCAL(x38000000),r3
+ mov DBL0H,r1
+ mov.l LOCAL(x70000000),r2
+ mov DBL0H,r0
+ sub r3,r1
+ mov.l DBL0L,@-r15
+ tst r2,r1
+ mov #12,r3
+ shld r3,r0 ! Isolate highpart fraction.
+ bf LOCAL(ill_exp)
+ shll2 r1
+ mov.l LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits. */
+ shll2 r1
+ mov.l LOCAL(xff000000),r3
+ shlr8 r0
+ tst r2,DBL0L /* Check if msb guard bit wants rounding up. */
+ mov #-28,r2
+ bt/s LOCAL(add_frac)
+ shld r2,DBL0L
+ add #1,DBL0L
+LOCAL(add_frac):
+ add DBL0L,r0
+ mov.l LOCAL(x01000000),r2
+ and r3,r1
+ mov.l @r15+,DBL0L
+ add r1,r0
+ tst r3,r0
+ bt LOCAL(inf_denorm0)
+#if 0 // No point checking overflow -> infinity if we dont't raise a signal.
+ cmp/hs r3,r0
+ bt LOCAL(inf)
+#endif
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ RETURN_R0_MAIN
+ rotcr r0
+RETURN_FR0
+LOCAL(inf_denorm0): ! We might need to undo previous rounding.
+ mov.l LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits. */
+ tst r1,r1
+ bf LOCAL(inf)
+ add #-1,r0
+ tst r3,DBL0L /* Check if msb guard bit was rounded up. */
+ mov.l LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits. */
+ addc r2,r0
+ shlr r0
+ tst r3,DBL0L /* Check if msb guard bit wants rounding up. */
+ bt/s LOCAL(denorm_noup)
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ add #1,r0
+LOCAL(denorm_noup):
+ RETURN_R0
+ rotcr r0
+LOCAL(ill_exp):
+ div0s DBL0H,r1
+ mov.l LOCAL(x7ff80000),r2
+ add r1,r1
+ bf LOCAL(inf_nan)
+ mov.w LOCAL(m32),r3 /* Handle denormal or zero. */
+ mov #-21,r2
+ shad r2,r1
+ add #-8,r1 /* Go from 9 to 1 guard bit in MSW. */
+ cmp/gt r3,r1
+ mov.l @r15+,r3 /* DBL0L */
+ bf LOCAL(zero)
+ mov.l DBL0L, @-r15
+ shll8 DBL0L
+ rotcr r0 /* Insert leading 1. */
+ shld r2,r3
+ cmp/pl DBL0L /* Check lower 23 guard bits if guard bit 23 is 0. */
+ addc r3,r0 /* Assemble fraction with compressed guard bits. */
+ mov r0,r2
+ shld r1,r0
+ mov.l @r15+,DBL0L
+ add #32,r1
+ shld r1,r2
+ tst #2,r0
+ rotcl r0
+ tst r2,r2
+ rotcl r0
+ xor #3,r0
+ add #3,r0 /* Even overflow gives the correct result. */
+ shlr2 r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(zero):
+ mov #0,r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(inf_nan):
+ not DBL0H,r0
+ tst r2,r0
+ mov.l @r15+,DBL0L
+ bf LOCAL(inf)
+ RETURN_R0
+ mov #-1,r0 /* NAN */
+LOCAL(inf): /* r2 must be positive here. */
+ mov.l LOCAL(xffe00000),r0
+ div0s r2,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(m32):
+ .word -32
+ .balign 4
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(x70000000):
+ .long 0x70000000
+LOCAL(x2fffffff):
+ .long 0x2fffffff
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x5fffffff):
+ .long 0x5fffffff
+LOCAL(x7ff80000):
+ .long 0x7ff80000
+LOCAL(xffe00000):
+ .long 0xffe00000
+ ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+
+
+#ifdef L_add_sub_df3
+#include "IEEE-754/m3/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/m3/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/m3/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/m3/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/m3/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/m3/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/m3/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_DOUBLE__ */
Index: gcc/config/sh/predicates.md
===================================================================
--- gcc/config/sh/predicates.md (revision 162269)
+++ gcc/config/sh/predicates.md (working copy)
@@ -719,6 +719,33 @@ (define_predicate "shift_operator"
(define_predicate "symbol_ref_operand"
(match_code "symbol_ref"))
+(define_special_predicate "soft_fp_comparison_operand"
+ (match_code "subreg,reg")
+{
+ switch (GET_MODE (op))
+ {
+ default:
+ return 0;
+ case CC_FP_NEmode: case CC_FP_GTmode: case CC_FP_UNLTmode:
+ break;
+ }
+ return register_operand (op, mode);
+})
+
+(define_predicate "soft_fp_comparison_operator"
+ (match_code "eq, unle, ge")
+{
+ switch (GET_CODE (op))
+ {
+ default:
+ return 0;
+ case EQ: mode = CC_FP_NEmode; break;
+ case UNLE: mode = CC_FP_GTmode; break;
+ case GE: mode = CC_FP_UNLTmode; break;
+ }
+ return register_operand (XEXP (op, 0), mode);
+})
+
;; Same as target_reg_operand, except that label_refs and symbol_refs
;; are accepted before reload.
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c (revision 162269)
+++ gcc/config/sh/sh.c (working copy)
@@ -111,6 +111,12 @@ static short cached_can_issue_more;
/* Unique number for UNSPEC_BBR pattern. */
static unsigned int unspec_bbr_uid = 1;
+/* Saved operands from the last compare to use when we generate an scc
+ or bcc insn. */
+
+rtx sh_compare_op0;
+rtx sh_compare_op1;
+
/* Provides the class number of the smallest class containing
reg number. */
@@ -284,6 +290,7 @@ static int sh_arg_partial_bytes (CUMULAT
tree, bool);
static bool sh_scalar_mode_supported_p (enum machine_mode);
static int sh_dwarf_calling_convention (const_tree);
+static void sh_expand_float_condop (rtx *operands, rtx, rtx (*[2]) (rtx));
static void sh_encode_section_info (tree, rtx, int);
static int sh2a_function_vector_p (tree);
static void sh_trampoline_init (rtx, tree, rtx);
@@ -551,6 +558,9 @@ static const struct attribute_spec sh_at
/* Machine-specific symbol_ref flags. */
#define SYMBOL_FLAG_FUNCVEC_FUNCTION (SYMBOL_FLAG_MACH_DEP << 0)
+#undef TARGET_MATCH_ADJUST
+#define TARGET_MATCH_ADJUST sh_match_adjust
+
struct gcc_target targetm = TARGET_INITIALIZER;
\f
/* Implement TARGET_HANDLE_OPTION. */
@@ -2180,6 +2190,78 @@ sh_emit_cheap_store_flag (enum machine_m
return gen_rtx_fmt_ee (code, VOIDmode, target, const0_rtx);
}
+static rtx
+sh_soft_fp_cmp (int code, enum machine_mode op_mode, rtx op0, rtx op1)
+{
+ const char *name = NULL;
+ rtx (*fun) (rtx, rtx), addr, tmp, first, last, equiv;
+ int df = op_mode == DFmode;
+ enum machine_mode mode = CODE_FOR_nothing; /* shut up warning. */
+
+ switch (code)
+ {
+ case EQ:
+ if (!flag_finite_math_only)
+ {
+ name = df ? "__nedf2" : "__nesf2";
+ fun = df ? gen_cmpnedf_i1 : gen_cmpnesf_i1;
+ mode = CC_FP_NEmode;
+ break;
+ } /* Fall through. */
+ case UNEQ:
+ fun = gen_cmpuneq_sdf;
+ break;
+ case UNLE:
+ if (flag_finite_math_only && !df)
+ {
+ fun = gen_cmplesf_i1_finite;
+ break;
+ }
+ name = df ? "__gtdf2t" : "__gtsf2t";
+ fun = df ? gen_cmpgtdf_i1 : gen_cmpgtsf_i1;
+ mode = CC_FP_GTmode;
+ break;
+ case GE:
+ if (flag_finite_math_only && !df)
+ {
+ tmp = op0; op0 = op1; op1 = tmp;
+ fun = gen_cmplesf_i1_finite;
+ break;
+ }
+ name = df ? "__gedf2f" : "__gesf2f";
+ fun = df ? gen_cmpunltdf_i1 : gen_cmpunltsf_i1;
+ mode = CC_FP_UNLTmode;
+ break;
+ case UNORDERED:
+ fun = gen_cmpun_sdf;
+ break;
+ default: gcc_unreachable ();
+ }
+
+ if (!name)
+ return fun (force_reg (op_mode, op0), force_reg (op_mode, op1));
+
+ tmp = gen_reg_rtx (mode);
+ addr = gen_reg_rtx (Pmode);
+ function_symbol (addr, name, SFUNC_STATIC);
+ first = emit_move_insn (gen_rtx_REG (op_mode, R4_REG), op0);
+ emit_move_insn (gen_rtx_REG (op_mode, R5_REG + df), op1);
+ last = emit_insn (fun (tmp, addr));
+ equiv = gen_rtx_fmt_ee (COMPARE, mode, op0, op1);
+ REG_NOTES (last) = gen_rtx_EXPR_LIST (REG_EQUAL, equiv, REG_NOTES (last));
+ /* Wrap the sequence in REG_LIBCALL / REG_RETVAL notes so that loop
+ invariant code motion can move it. */
+/*
+ REG_NOTES (first) = gen_rtx_INSN_LIST (REG_LIBCALL, last, REG_NOTES (first));
+ REG_NOTES (last) = gen_rtx_INSN_LIST (REG_RETVAL, first, REG_NOTES (last));
+*/
+ /* Use fpcmp_i1 rather than cmpeqsi_t, so that the optimizers can grok
+ the computation. */
+ return gen_rtx_SET (VOIDmode,
+ gen_rtx_REG (SImode, T_REG),
+ gen_rtx_fmt_ee (code, SImode, tmp, CONST0_RTX (mode)));
+}
+
/* Called from the md file, set up the operands of a compare instruction. */
void
@@ -8662,6 +8744,57 @@ sh_fix_range (const char *const_str)
str = comma + 1;
}
}
+
+/* Expand an sfunc operation taking NARGS MODE arguments, using generator
+ function FUN, which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using EQUIV. */
+static void
+expand_sfunc_op (int nargs, enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, rtx equiv, rtx *operands)
+{
+ int next_reg = FIRST_PARM_REG, i;
+ rtx addr, first = NULL_RTX, last, insn;
+
+ addr = gen_reg_rtx (Pmode);
+ function_symbol (addr, name, SFUNC_FREQUENT);
+ for ( i = 1; i <= nargs; i++)
+ {
+ insn = emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
+ if (!first)
+ first = insn;
+ next_reg += GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+ }
+ last = emit_insn ((*fun) (operands[0], addr));
+ REG_NOTES (last) = gen_rtx_EXPR_LIST (REG_EQUAL, equiv, REG_NOTES (last));
+ /* Wrap the sequence in REG_LIBCALL / REG_RETVAL notes so that loop
+ invariant code motion can move it. */
+/*
+ REG_NOTES (first) = gen_rtx_INSN_LIST (REG_LIBCALL, last, REG_NOTES (first));
+ REG_NOTES (last) = gen_rtx_INSN_LIST (REG_RETVAL, first, REG_NOTES (last));
+*/
+}
+
+/* Expand an sfunc unary operation taking an MODE argument, using generator
+ function FUN, which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using CODE. */
+void
+expand_sfunc_unop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, enum rtx_code code, rtx *operands)
+{
+ rtx equiv = gen_rtx_fmt_e (code, GET_MODE (operands[0]), operands[1]);
+ expand_sfunc_op (1, mode, fun, name, equiv, operands);
+}
+
+/* Expand an sfunc binary operation in MODE, using generator function FUN,
+ which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using CODE. */
+void
+expand_sfunc_binop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, enum rtx_code code, rtx *operands)
+{
+ rtx equiv = gen_rtx_fmt_ee (code, mode, operands[1], operands[2]);
+ expand_sfunc_op (2, mode, fun, name, equiv, operands);
+}
\f
/* Insert any deferred function attributes from earlier pragmas. */
static void
@@ -11593,11 +11726,10 @@ function_symbol (rtx target, const char
{
rtx sym;
- /* If this is not an ordinary function, the name usually comes from a
- string literal or an sprintf buffer. Make sure we use the same
+ /* The name usually comes from a string literal or an sprintf buffer.
+ Make sure we use the same
string consistently, so that cse will be able to unify address loads. */
- if (kind != FUNCTION_ORDINARY)
- name = IDENTIFIER_POINTER (get_identifier (name));
+ name = IDENTIFIER_POINTER (get_identifier (name));
sym = gen_rtx_SYMBOL_REF (Pmode, name);
SYMBOL_REF_FLAGS (sym) = SYMBOL_FLAG_FUNCTION;
if (flag_pic)
@@ -11605,6 +11737,10 @@ function_symbol (rtx target, const char
{
case FUNCTION_ORDINARY:
break;
+ case SFUNC_FREQUENT:
+ if (!optimize || optimize_size)
+ break;
+ /* Fall through. */
case SFUNC_GOT:
{
rtx reg = target ? target : gen_reg_rtx (Pmode);
@@ -11715,6 +11851,168 @@ sh_expand_t_scc (rtx operands[])
return 1;
}
+void
+sh_expand_float_cbranch (rtx operands[4])
+{
+ static rtx (*branches[]) (rtx) = { gen_branch_true, gen_branch_false };
+
+ sh_expand_float_condop (operands, operands[3], branches);
+}
+
+void
+sh_expand_float_scc (rtx operands[4])
+{
+ static rtx (*movts[]) (rtx) = { gen_movt, gen_movnegt };
+
+ sh_expand_float_condop (&operands[1], operands[0], movts);
+}
+
+/* The first element of USER is for positive logic, the second one for
+ negative logic. */
+static void
+sh_expand_float_condop (rtx *operands, rtx dest, rtx (*user[2]) (rtx))
+{
+ enum machine_mode mode = GET_MODE (operands[1]);
+ enum rtx_code comparison = GET_CODE (operands[0]);
+ int swap_operands = 0;
+ rtx op0, op1;
+ rtx lab = NULL_RTX;
+
+ if (TARGET_SH1_SOFTFP_MODE (mode))
+ {
+ switch (comparison)
+ {
+ case NE:
+ comparison = EQ;
+ user++;
+ break;
+ case LT:
+ swap_operands = 1; /* Fall through. */
+ case GT:
+ comparison = UNLE;
+ user++;
+ break;
+ case UNGT:
+ swap_operands = 1; /* Fall through. */
+ case UNLT:
+ comparison = GE;
+ user++;
+ break;
+ case UNGE:
+ swap_operands = 1;
+ comparison = UNLE;
+ break;
+ case LE:
+ swap_operands = 1;
+ comparison = GE; /* Fall through. */
+ case EQ:
+ case UNEQ:
+ case GE:
+ case UNLE:
+ case UNORDERED:
+ break;
+ case LTGT:
+ comparison = UNEQ;
+ user++;
+ break;
+ case ORDERED:
+ comparison = UNORDERED;
+ user++;
+ break;
+
+ default: gcc_unreachable ();
+ }
+ }
+ else /* SH2E .. SH4 Hardware floating point */
+ {
+ switch (comparison)
+ {
+ case LTGT:
+ if (!flag_finite_math_only)
+ break;
+ /* Fall through. */
+ case NE:
+ comparison = EQ;
+ user++;
+ break;
+ case LT:
+ swap_operands = 1;
+ comparison = GT; /* Fall through. */
+ case GT:
+ case EQ:
+ case ORDERED:
+ break;
+ case LE:
+ swap_operands = 1;
+ comparison = GE; /* Fall through. */
+ case GE:
+ if (flag_finite_math_only)
+ {
+ swap_operands ^= 1;
+ comparison = GT;
+ user++;
+ break;
+ }
+ break;
+ case UNGT:
+ swap_operands = 1; /* Fall through. */
+ case UNLT:
+ if (flag_finite_math_only)
+ {
+ swap_operands ^= 1;
+ comparison = GT;
+ break;
+ }
+ comparison = GE;
+ user++;
+ break;
+ case UNGE:
+ swap_operands = 1; /* Fall through. */
+ case UNLE:
+ comparison = GT;
+ user++;
+ break;
+ case UNEQ:
+ if (flag_finite_math_only)
+ {
+ comparison = EQ;
+ break;
+ }
+ comparison = LTGT;
+ user++;
+ break;
+ case UNORDERED:
+ comparison = ORDERED;
+ user++;
+ break;
+
+ default: gcc_unreachable ();
+ }
+ operands[1] = force_reg (mode, operands[1]);
+ operands[2] = force_reg (mode, operands[2]);
+ if (comparison == GE)
+ {
+ lab = gen_label_rtx ();
+ sh_emit_scc_to_t (GT, operands[1+swap_operands],
+ operands[2-swap_operands]);
+ emit_jump_insn (gen_branch_true (lab));
+ comparison = EQ;
+ }
+ }
+ op0 = operands[1+swap_operands];
+ op1 = operands[2-swap_operands];
+ if (GET_MODE_CLASS (mode) == MODE_FLOAT && TARGET_SH1_SOFTFP_MODE (mode))
+ emit_insn (sh_soft_fp_cmp (comparison, mode, op0, op1));
+ else
+ sh_emit_set_t_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, T_REG),
+ gen_rtx_fmt_ee (comparison, SImode,
+ op0, op1)),
+ mode);
+ if (lab)
+ emit_label (lab);
+ emit ((*user) (dest));
+}
+
/* INSN is an sfunc; return the rtx that describes the address used. */
static rtx
extract_sfunc_addr (rtx insn)
@@ -12266,6 +12564,19 @@ sh_secondary_reload (bool in_p, rtx x, r
return NO_REGS;
}
+int
+sh_match_adjust (rtx x, int regno)
+{
+ /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+ multiple hard register group of scalar integer registers, so that
+ for example (reg:DI 0) and (reg:SI 1) will be considered the same
+ register. */
+ if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+ && regno < FIRST_PSEUDO_REGISTER)
+ regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+ return regno;
+}
+
enum sh_divide_strategy_e sh_div_strategy = SH_DIV_STRATEGY_DEFAULT;
#include "gt-sh.h"
Index: gcc/config/sh/sh.h
===================================================================
--- gcc/config/sh/sh.h (revision 162269)
+++ gcc/config/sh/sh.h (working copy)
@@ -183,6 +183,11 @@ do { \
#define TARGET_FPU_DOUBLE \
((target_flags & MASK_SH4) != 0 || TARGET_SH2A_DOUBLE)
+#define TARGET_SH1_SOFTFP (TARGET_SH1 && !TARGET_FPU_DOUBLE)
+
+#define TARGET_SH1_SOFTFP_MODE(MODE) \
+ (TARGET_SH1_SOFTFP && (!TARGET_SH2E || (MODE) == DFmode))
+
/* Nonzero if an FPU is available. */
#define TARGET_FPU_ANY (TARGET_SH2E || TARGET_FPU_DOUBLE)
@@ -329,6 +334,38 @@ do { \
#define SUPPORT_ANY_SH5 \
(SUPPORT_ANY_SH5_32MEDIA || SUPPORT_ANY_SH5_64MEDIA)
+/* Check if we have support for optimized software floating point using
+ dynamic shifts - then some function calls clobber fewer registers. */
+#ifdef SUPPORT_SH3
+#define SUPPORT_SH3_OSFP 1
+#else
+#define SUPPORT_SH3_OSFP 0
+#endif
+
+#ifdef SUPPORT_SH3E
+#define SUPPORT_SH3E_OSFP 1
+#else
+#define SUPPORT_SH3E_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_NOFPU) || defined(SUPPORT_SH3_OSFP)
+#define SUPPORT_SH4_NOFPU_OSFP 1
+#else
+#define SUPPORT_SH4_NOFPU_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_SINGLE_ONLY) || defined (SUPPORT_SH3E_OSFP)
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 1
+#else
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 0
+#endif
+
+#define TARGET_OSFP (0 \
+ || (TARGET_SH3 && !TARGET_SH2E && SUPPORT_SH3_OSFP) \
+ || (TARGET_SH3E && SUPPORT_SH3E_OSFP) \
+ || (TARGET_HARD_SH4 && !TARGET_SH2E && SUPPORT_SH4_NOFPU_OSFP) \
+ || (TARGET_HARD_SH4 && TARGET_SH2E && SUPPORT_SH4_SINGLE_ONLY_OSFP))
+
/* Reset all target-selection flags. */
#define MASK_ARCH (MASK_SH1 | MASK_SH2 | MASK_SH3 | MASK_SH_E | MASK_SH4 \
| MASK_HARD_SH2A | MASK_HARD_SH2A_DOUBLE | MASK_SH4A \
@@ -2047,6 +2084,12 @@ struct sh_args {
#define LIBGCC2_DOUBLE_TYPE_SIZE 64
#endif
+#if defined(__SH2E__) || defined(__SH3E__) || defined( __SH4_SINGLE_ONLY__)
+#define LIBGCC2_DOUBLE_TYPE_SIZE 32
+#else
+#define LIBGCC2_DOUBLE_TYPE_SIZE 64
+#endif
+
/* 'char' is signed by default. */
#define DEFAULT_SIGNED_CHAR 1
Index: gcc/config/sh/sh-modes.def
===================================================================
--- gcc/config/sh/sh-modes.def (revision 162269)
+++ gcc/config/sh/sh-modes.def (working copy)
@@ -22,6 +22,11 @@ PARTIAL_INT_MODE (SI);
/* PDI mode is used to represent a function address in a target register. */
PARTIAL_INT_MODE (DI);
+/* For software floating point comparisons. */
+CC_MODE (CC_FP_NE);
+CC_MODE (CC_FP_GT);
+CC_MODE (CC_FP_UNLT);
+
/* Vector modes. */
VECTOR_MODE (INT, QI, 2); /* V2QI */
VECTOR_MODES (INT, 4); /* V4QI V2HI */
Index: gcc/config/sh/lib1funcs.h
===================================================================
--- gcc/config/sh/lib1funcs.h (revision 162269)
+++ gcc/config/sh/lib1funcs.h (working copy)
@@ -64,13 +64,151 @@ see the files COPYING3 and COPYING.RUNTI
#endif /* !__LITTLE_ENDIAN__ */
#ifdef __sh1__
+/* branch with two-argument delay slot insn */
#define SL(branch, dest, in_slot, in_slot_arg2) \
in_slot, in_slot_arg2; branch dest
+/* branch with one-argument delay slot insn */
#define SL1(branch, dest, in_slot) \
in_slot; branch dest
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+ branch dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg2) in_slot, in_slot_arg2
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+ branch .+6; bra .+6; cmp2, cmp2arg2; cmp1, cmp1arg2
+#define DMULU_SAVE \
+ mov.l r10,@-r15; \
+ mov.l r11,@-r15; \
+ mov.l r12,@-r15; \
+ mov.l r13,@-r15
+#define DMULUL(m1, m2, rl) \
+ swap.w m1,r12; \
+ mulu.w r12,m2; \
+ swap.w m2,r13; \
+ sts macl,r10; \
+ mulu.w r13,m1; \
+ clrt; \
+ sts macl,r11; \
+ mulu.w r12,r13; \
+ addc r11,r10; \
+ sts macl,r12; \
+ mulu.w m1,m2; \
+ movt r11; \
+ sts macl,rl; \
+ mov r10,r13; \
+ shll16 r13; \
+ addc r13,rl; \
+ xtrct r11,r10; \
+ addc r10,r12 \
+/* N.B. the carry is cleared here. */
+#define DMULUH(rh) mov r12,rh
+#define DMULU_RESTORE \
+ mov.l @r15+,r13; \
+ mov.l @r15+,r12; \
+ mov.l @r15+,r11; \
+ mov.l @r15+,r10
#else /* ! __sh1__ */
+/* branch with two-argument delay slot insn */
#define SL(branch, dest, in_slot, in_slot_arg2) \
- branch##.s dest; in_slot, in_slot_arg2
+ branch##/s dest; in_slot, in_slot_arg2
+/* branch with one-argument delay slot insn */
#define SL1(branch, dest, in_slot) \
branch##/s dest; in_slot
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+ branch##/s dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg)
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+ branch##/s .+6; cmp1, cmp1arg2; cmp2, cmp2arg2
+#define DMULU_SAVE
+#define DMULUL(m1, m2, rl) dmulu.l m1,m2; sts macl,rl
+#define DMULUH(rh) sts mach,rh
+#define DMULU_RESTORE
#endif /* !__sh1__ */
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+/* don't #define DYN_SHIFT */
+ #define SHLL4(REG) \
+ shll2 REG; \
+ shll2 REG
+
+ #define SHLR4(REG) \
+ shlr2 REG; \
+ shlr2 REG
+
+ #define SHLL6(REG) \
+ shll2 REG; \
+ shll2 REG; \
+ shll2 REG
+
+ #define SHLR6(REG) \
+ shlr2 REG; \
+ shlr2 REG; \
+ shlr2 REG
+
+ #define SHLL12(REG) \
+ shll8 REG; \
+ SHLL4 (REG)
+
+ #define SHLR12(REG) \
+ shlr8 REG; \
+ SHLR4 (REG)
+
+ #define SHLR19(REG) \
+ shlr16 REG; \
+ shlr2 REG; \
+ shlr REG
+
+ #define SHLL23(REG) \
+ shll16 REG; \
+ shlr REG; \
+ shll8 REG
+
+ #define SHLR24(REG) \
+ shlr16 REG; \
+ shlr8 REG
+
+ #define SHLR21(REG) \
+ shlr16 REG; \
+ shll2 REG; \
+ add REG,REG;\
+ shlr8 REG
+
+ #define SHLL21(REG) \
+ shll16 REG; \
+ SHLL4 (REG); \
+ add REG,REG
+
+ #define SHLR11(REG) \
+ shlr8 REG; \
+ shlr2 REG; \
+ shlr REG
+
+ #define SHLR22(REG) \
+ shlr16 REG; \
+ shll2 REG; \
+ shlr8 REG
+
+ #define SHLR23(REG) \
+ shlr16 REG; \
+ add REG,REG;\
+ shlr8 REG
+
+ #define SHLR20(REG) \
+ shlr16 REG; \
+ SHLR4 (REG)
+
+ #define SHLL20(REG) \
+ shll16 REG; \
+ SHLL4 (REG)
+#define SHLD_COUNT(N,COUNT)
+#define SHLRN(N,COUNT,REG) SHLR##N(REG)
+#define SHLLN(N,COUNT,REG) SHLL##N(REG)
+#else
+#define SHLD_COUNT(N,COUNT) mov #N,COUNT
+#define SHLRN(N,COUNT,REG) shld COUNT,REG
+#define SHLLN(N,COUNT,REG) shld COUNT,REG
+#define DYN_SHIFT 1
+#endif
Index: gcc/config/sh/IEEE-754/m3/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/divsf3.S (revision 0)
@@ -0,0 +1,365 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! divsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+! long 0th..3rd significant byte
+#ifdef __LITTLE_ENDIAN__
+#define L0SB 3
+#define L1SB 2
+#define L2SB 1
+#define L3SB 0
+#else
+#define L0SB 0
+#define L1SB 1
+#define L2SB 2
+#define L3SB 3
+#endif
+
+! clobbered: r0,r1,r2,r3,r6,r7,T (and for sh.md's purposes PR)
+!
+! Note: When the divisor is larger than the divident, we have to adjust the
+! exponent down by one. We do this automatically when subtracting the entire
+! exponent/fraction bitstring as an integer, by means of the borrow from
+! bit 23 to bit 24.
+! Note: non-denormal rounding of a division result cannot cause fraction
+! overflow / exponent change. (r4 > r5 : fraction must stay in (2..1] interval;
+! r4 < r5: having an extra bit of precision available, even the smallest
+! possible difference of the result from one is rounded in all rounding modes
+! to a fraction smaller than one.)
+! sh4-200: 59 cycles
+! sh4-300: 44 cycles
+! tab indent: exponent / sign computations
+! tab+space indent: fraction computation
+FUNC(GLOBAL(divsf3))
+ .global GLOBAL(divsf3)
+ .balign 4
+GLOBAL(divsf3):
+ mov.l LOCAL(x7f800000),r3
+ mov #1,r2
+ mov r4,r6
+ shll8 r6
+ mov r5,r7
+ shll8 r7
+ rotr r2
+ tst r3,r4
+ or r2,r6
+ bt/s LOCAL(denorm_arg0)
+ or r2,r7
+ tst r3,r5
+ bt LOCAL(denorm_arg1)
+ shlr r6
+ mov.l LOCAL(x3f000000),r3 ! bias minus explict leading 1
+ div0u
+LOCAL(denorm_done):
+ div1 r7,r6
+ mov.l r8,@-r15
+ bt 0f
+ div1 r7,r6
+0: mov.l r9,@-r15
+ div1 r7,r6
+ add r4,r3
+ div1 r7,r6
+ sub r5,r3 ! result sign/exponent minus 1 if no overflow/underflow
+ div1 r7,r6
+ or r3,r2
+ div1 r7,r6
+ mov.w LOCAL(xff00),r9
+ div1 r7,r6
+ mov.l r2,@-r15 ! L0SB is 0xff iff denorm / infinity exp is computed
+ div1 r7,r6
+ mov.w LOCAL(m23),r2
+ div1 r7,r6
+ mov r4,r0
+ div1 r7,r6
+ extu.b r6,r1
+ and r9,r6
+ swap.w r1,r1 ! first 8 bits of result fraction in bit 23..16
+ div1 r7,r6
+ shld r2,r0
+ div1 r7,r6
+ mov.b r0,@(L3SB,r15) ! 0xff iff divident was infinity / nan
+ div1 r7,r6
+ mov r5,r0
+ div1 r7,r6
+ shld r2,r0
+ div1 r7,r6
+ mov.b r0,@(L2SB,r15) ! 0xff iff divisor was infinity / nan
+ div1 r7,r6
+ mov r4,r0
+ div1 r7,r6
+ mov.w LOCAL(m31),r2
+ div1 r7,r6
+ extu.b r6,r8 ! second 8 bits of result fraction in bit 7..0
+ and r9,r6
+ mov.l LOCAL(xff800000),r9
+ div1 r7,r6
+ xor r5,r0 ! msb := correct result sign
+ div1 r7,r6
+ xor r3,r0 ! xor with sign of result sign/exponent word
+ div1 r7,r6
+ shad r2,r0
+ div1 r7,r6
+ mov.b r0,@(L1SB,r15) ! 0xff iff exponent over/underflows
+ and r9,r3 ! isolate sign / exponent
+ mov.w LOCAL(xff01),r2
+ div1 r7,r6
+ swap.b r8,r0 ! second 8 bits of result fraction in bit 15..8
+ div1 r7,r6
+ or r1,r0 ! first 16 bits of result fraction in bit 23..8
+ div1 r7,r6
+ mov.w LOCAL(m1),r9
+ div1 r7,r6
+ mov.l @r15+,r8 ! load encoding of unusal exponent conditions
+ and r6,r2 ! rest | result lsb
+ mov #0,r1
+ bf 0f ! bit below lsb clear -> no rounding
+ cmp/hi r1,r2
+0: extu.b r6,r1
+ or r1,r0 ! 24 bit result fraction with explicit leading 1
+ addc r3,r0 ! add in exponent / sign
+ cmp/str r9,r8
+ ! (no stall *here* for SH4-100 / SH4-200)
+ bt/s LOCAL(inf_nan_denorm_zero)
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+/* The exponennt adjustment for denormal numbers is done by leaving an
+ adjusted value in r3; r4/r5 are not changed. */
+ .balign 4
+LOCAL(denorm_arg0):
+ mov.w LOCAL(xff00),r1
+ sub r2,r6 ! 0x800000000 : remove implict 1
+ tst r6,r6
+ sts.l pr,@-r15
+ bt LOCAL(div_zero)
+ bsr LOCAL(clz)
+ mov r6,r0
+ shld r0,r6
+ tst r3,r5
+ mov.l LOCAL(x3f800000),r3 ! bias - 1 + 1
+ mov #23,r1
+ shld r1,r0
+ bt/s LOCAL(denorm_arg1_2)
+ sub r0,r3
+ shlr r6
+ bra LOCAL(denorm_done)
+ div0u
+
+LOCAL(denorm_arg1):
+ mov.l LOCAL(x3f000000),r3 ! bias - 1
+LOCAL(denorm_arg1_2):
+ sub r2,r7 ! 0x800000000 : remove implict 1
+ mov.w LOCAL(xff00),r1
+ tst r7,r7
+ sts.l pr,@-r15
+ bt LOCAL(div_by_zero)
+ bsr LOCAL(clz)
+ mov r7,r0
+ shld r0,r7
+ add #-1,r0
+ mov #23,r1
+ shld r1,r0
+ add r0,r3
+ shlr r6
+ bra LOCAL(denorm_done)
+ div0u
+
+ .balign 4
+LOCAL(inf_nan_denorm_zero):
+! r0 has the rounded result, r6 has the non-rounded lowest bits & rest.
+! the bit just below the LSB of r6 is available as ~Q
+
+! Alternative way to get at ~Q:
+! if rounding took place, ~Q must be set.
+! if the rest appears to be zero, ~Q must be set.
+! if the rest appears to be nonzero, but rounding didn't take place,
+! ~Q must be clear; the apparent rest will then require adjusting to test if
+! the actual rest is nonzero.
+ mov r0,r2
+ not r8,r0
+ tst #0xff,r0
+ shlr8 r0
+ mov.l @r15+,r8
+ bt/s LOCAL(div_inf_or_nan)
+ tst #0xff,r0
+ mov r4,r0
+ bt LOCAL(div_by_inf_or_nan)
+ add r0,r0
+ mov r5,r1
+ add r1,r1
+ cmp/hi r1,r0
+ mov r6,r0
+ bt LOCAL(overflow)
+ sub r2,r0
+ exts.b r0,r0 ! -1 if rounding took place
+ shlr8 r6 ! isolate div1-mangled rest
+ addc r2,r0 ! generate carry if rounding took place
+ shlr8 r7
+ sub r3,r0 ! pre-rounding fraction
+ bt 0f ! going directly to denorm_sticky would cause mispredicts
+ tst r6,r6 ! rest can only be zero if lost bit was set
+0: add r7,r6 ! (T ? corrupt : reconstruct) actual rest
+ bt 0f
+ cmp/pl r6
+0: mov.w LOCAL(m24),r1
+ addc r0,r0 ! put in sticky bit
+ add #-1,r3
+ mov.l LOCAL(x40000000),r6
+ add r3,r3
+ mov r0,r2
+ shad r1,r3 ! exponent ; s32.0
+ !
+ shld r3,r0
+ add #30,r3
+ cmp/pl r3
+ shld r3,r2
+ bf LOCAL(zero_nan) ! return zero
+ rotl r2
+ cmp/hi r6,r2
+ mov #0,r7
+ addc r7,r0
+ div0s r4,r5
+ rts
+ rotcr r0
+
+! ????
+! undo normal rounding (lowest bits still in r6). then do denormal rounding.
+
+LOCAL(overflow):
+ mov.l LOCAL(xff000000),r0
+ div0s r4,r5
+ rts
+ rotcl r0
+
+LOCAL(div_inf_or_nan):
+ mov r4,r0
+ bra LOCAL(nan_if_t)
+ add r0,r0
+
+LOCAL(div_by_inf_or_nan):
+ mov.l LOCAL(xff000000),r1
+ mov #0,r0
+ mov r5,r2
+ add r2,r2
+ bra LOCAL(nan_if_t)
+ cmp/hi r1,r2
+
+
+
+! still need to check for divide by zero or divide by nan
+! r3: 0x7f800000
+ .balign 4
+LOCAL(div_zero):
+ mov r5,r1
+ add r1,r1
+ tst r1,r1 ! 0 / 0 -> nan
+ not r5,r1
+ bt LOCAL(nan)
+ add r3,r3
+ cmp/hi r3,r1 ! 0 / nan -> nan (but 0 / inf -> 0)
+LOCAL(zero_nan):
+ mov #0,r0
+LOCAL(nan_if_t):
+ bf 0f:
+LOCAL(nan):
+ mov #-1,r0
+0: div0s r4,r5 ! compute sign
+ rts
+ rotcr r0 ! insert sign
+
+LOCAL(div_by_zero):
+ mov.l LOCAL(xff000000),r0
+ mov r5,r2
+ add r2,r2
+ bra LOCAL(nan_if_t)
+ cmp/hi r0,r2
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #8-8,r9
+ xtrct r0,r8
+ add #16,r9
+0: tst r1,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(m23): .word -23
+LOCAL(m24): .word -24
+LOCAL(m31): .word -31
+LOCAL(xff01): .word 0xff01
+ .balign 4
+LOCAL(xff000000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(xff00): .word 0xff00
+LOCAL(m1): .word -1
+#else
+LOCAL(m1): .word -1
+LOCAL(xff00): .word 0xff00
+#endif
+LOCAL(x7f800000): .long 0x7f800000
+LOCAL(x3f000000): .long 0x3f000000
+LOCAL(x3f800000): .long 0x3f800000
+LOCAL(xff800000): .long 0xff800000
+LOCAL(x40000000): .long 0x40000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(divsf3))
Index: gcc/config/sh/IEEE-754/m3/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3.S (revision 0)
@@ -0,0 +1,608 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* y = 1/x ; x (- [1,2)
+ y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+ y1 = y0 - ((y0) * x - 1) * y0 = y-x*d^2
+ y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+ z0 = y2*a ; a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+ z1 = y2*a1 (round to nearest odd 0.5 ulp);
+ a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+ z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+ Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+ with suitable scaling and/or top truncation.
+ We use a slightly modified algorithm here that checks if the lower
+ bits in z1 are sufficient to determine the outcome of rounding - in that
+ case a2 is not computed.
+ -z1 is computed in units of 1/128 ulp, with an error in the range
+ -0x3.e/128 .. +0 ulp.
+ Thus, after adding three, the result can be safely rounded for normal
+ numbers if any of the bits 5..2 is set, or if the highest guard bit
+ (bit 6 if y <1, otherwise bit 7) is set.
+ (Because of the way truncation works, we would be fine for an open
+ error interval of (-4/128..+1/128) ulp )
+ For denormal numbers, the rounding point lies higher, but it would be
+ quite cumbersome to calculate where exactly; it is sufficient if any
+ of the bits 7..3 is set.
+ x truncated to 20 bits is sufficient to calculate y0 or even y1.
+ Table entries are adjusted by about +128 to use full signed byte range.
+ This adjustment has been perturbed slightly to allow cse with the
+ shift count constant -26.
+ The threshold point for the shift adjust before rounding is found by
+ comparing the fractions, which is exact, unlike the top bit of y2.
+ Therefore, the top bit of y2 becomes slightly random after the adjustment
+ shift, but that's OK because this can happen only at the boundaries of
+ the interval, and the biasing of the error means that it can in fact happen
+ only at the bottom end. And there, the carry propagation will make sure
+ that in the end we will have in effect an implicit 1 (or two whem rounding
+ up...) */
+/* If an exact result exists, it can have no more bits than the divident.
+ Hence, we don't need to bother with the round-to-even tie breaker
+ unless the result is denormalized. */
+/* 64 cycles through main path for sh4-300 (about 93.7% of normalized numbers),
+ 82 for the path for rounding tie-breaking for normalized numbers
+ (including one branch mispredict).
+ Some cycles might be saved by more careful register allocation. */
+
+#define x_h r12
+#define yn r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too. We still have to come back to denorm_arg1_done,
+ since we heven't done any of the work yet that we do till the denorm_arg0
+ entry point. We know that neither of the arguments is inf/nan, but
+ arg0 might be zero. Check for that first to avoid having to establish an
+ rts return address. */
+LOCAL(both_denorm):
+ mov.l r9,@-r15
+ mov DBL0H,r1
+ mov.l r0,@-r15
+ shll2 r1
+ mov.w LOCAL(both_denorm_cleanup_off),r9
+ or DBL0L,r1
+ tst r1,r1
+ mov DBL0H,r0
+ bf/s LOCAL(zero_denorm_arg0_1)
+ shll2 r0
+ mov.l @(4,r15),r9
+ add #8,r15
+ bra LOCAL(ret_inf_nan_0)
+ mov r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+ mov.l @r15+,r0
+ !
+ mov.l @r15+,r9
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ bra LOCAL(denorm_arg1_done)
+ !
+ add r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+ (implicit 1). To leave the result exponent unaltered, the other
+ argument's exponent is adjusted by the the shift count. */
+
+ .balign 4
+LOCAL(arg0_tiny):
+ bsr LOCAL(clz)
+ mov DBL0L,r0
+ shll DBL0H
+ add #1,r0
+ mov DBL0L,DBL0H
+ shld r0,DBL0H
+ rotcr DBL0H
+ tst DBL0L,DBL0L /* Check for divide of zero. */
+ add #-33,r0
+ shld r0,DBL0L
+ bf/s LOCAL(adjust_arg1_exp)
+ add #64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign. */
+ mov.l @r15+,r10
+ mov #0,DBLRH
+ mov.l @r15+,r9
+ bra LOCAL(ret_inf_nan_0)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(arg1_tiny):
+ bsr LOCAL(clz)
+ mov DBL1L,r0
+ shll DBL1H
+ add #1,r0
+ mov DBL1L,DBL1H
+ shld r0,DBL1H
+ rotcr DBL1H
+ tst DBL1L,DBL1L /* Check for divide by zero. */
+ add #-33,r0
+ shld r0,DBL1L
+ bf/s LOCAL(adjust_arg0_exp)
+ add #64,r0
+ mov DBL0H,r0
+ add r0,r0
+ tst r0,r0 ! 0 / 0 ?
+ mov #-1,DBLRH
+ bf LOCAL(return_inf)
+ !
+ bt LOCAL(ret_inf_nan_0)
+ !
+
+ .balign 4
+LOCAL(zero_denorm_arg1):
+ not DBL0H,r3
+ mov DBL1H,r0
+ tst r2,r3
+ shll2 r0
+ bt LOCAL(early_inf_nan_arg0)
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg1_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ !
+ shll DBL1H
+ mov DBL1L,r3
+ shld r0,DBL1H
+ shld r0,DBL1L
+ rotcr DBL1H
+ add #-32,r0
+ shld r0,r3
+ add #32,r0
+ or r3,DBL1H
+LOCAL(adjust_arg0_exp):
+ tst r2,DBL0H
+ mov #20,r3
+ shld r3,r0
+ bt LOCAL(both_denorm)
+ add DBL0H,r0
+ div0s r0,DBL0H ! Check for obvious overflow. */
+ not r0,r3 ! Check for more subtle overflow - lest
+ bt LOCAL(return_inf)
+ mov r0,DBL0H
+ tst r2,r3 ! we mistake it for NaN later
+ mov #12,r3
+ bf LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign. */
+ mov #20,r3
+ mov #-2,DBLRH
+ bra LOCAL(ret_inf_nan_0)
+ shad r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan nan/x -> nan */
+LOCAL(inf_nan_arg0):
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+LOCAL(early_inf_nan_arg0):
+ not DBL1H,r3
+ mov DBL0H,DBLRH
+ tst r2,r3 ! both inf/nan?
+ add DBLRH,DBLRH
+ bf LOCAL(ret_inf_nan_0)
+ mov #-1,DBLRH
+LOCAL(ret_inf_nan_0):
+ mov #0,DBLRL
+ mov.l @r15+,r12
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+/* Already handled: inf/x, nan/x . Thus: x/inf -> 0; x/nan -> nan */
+ .balign 4
+LOCAL(inf_nan_arg1):
+ mov DBL1H,r2
+ mov #12,r1
+ shld r1,r2
+ mov.l @r15+,r10
+ mov #0,DBLRL
+ mov.l @r15+,r9
+ or DBL1L,r2
+ mov.l @r15+,r8
+ cmp/hi DBLRL,r2
+ mov.l @r15+,r12
+ subc DBLRH,DBLRH
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(zero_denorm_arg0):
+ mov.w LOCAL(denorm_arg0_done_off),r9
+ not DBL1H,r1
+ mov DBL0H,r0
+ tst r2,r1
+ shll2 r0
+ bt LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg0_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ shll DBL0H
+ mov DBL0L,r12
+ shld r0,DBL0H
+ shld r0,DBL0L
+ rotcr DBL0H
+ add #-32,r0
+ shld r0,r12
+ add #32,r0
+ or r12,DBL0H
+LOCAL(adjust_arg1_exp):
+ mov #20,r12
+ shld r12,r0
+ add DBL1H,r0
+ div0s r0,DBL1H ! Check for obvious underflow. */
+ not r0,r12 ! Check for more subtle underflow - lest
+ bt LOCAL(return_0)
+ mov r0,DBL1H
+ tst r2,r12 ! we mistake it for NaN later
+ bt LOCAL(return_0)
+ !
+ braf r9
+ mov #13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00): .word 0xff00
+LOCAL(denorm_arg0_done_off):
+ .word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+ .word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign 8
+GLOBAL(divdf3):
+ mov.l LOCAL(x7ff00000),r2
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst r2,DBL1H
+ mov.l r12,@-r15
+ bt LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov DBL1H,x_h ! x_h live in r12
+ shld r3,x_h ! x - 1 ; u0.20
+ mov x_h,yn
+ mova LOCAL(ytab),r0
+ mov.l r8,@-r15
+ shld r1,yn ! x-1 ; u26.6
+ mov.b @(r0,yn),yn
+ mov #6,r0
+ mov.l r9,@-r15
+ mov x_h,r8
+ mov.l r10,@-r15
+ shlr16 x_h ! x - 1; u16.16 ! x/2 - 0.5 ; u15.17
+ add x_h,r1 ! SH4-200 single-issues this insn
+ shld r0,yn
+ sub r1,yn ! yn := y0 ; u15.17
+ mov DBL1L,r1
+ mov #-20,r10
+ mul.l yn,x_h ! r12 dead
+ swap.w yn,r9
+ shld r10,r1
+ sts macl,r0 ! y0 * (x-1) - n ; u-1.32
+ add r9,r0 ! y0 * x - 1 ; s-1.32
+ tst r2,DBL0H
+ dmuls.l r0,yn
+ mov.w LOCAL(d13),r0
+ or r1,r8 ! x - 1; u0.32
+ add yn,yn ! yn = y0 ; u14.18
+ bt LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):
+ sts mach,r1 ! d0 ; s14.18
+ sub r1,yn ! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov DBL0L,r12
+ shld r0,yn ! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld r10,r12
+ mov yn,r0
+ mov DBL0H,r8
+ add yn,yn ! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts mach,r1 ! y1 * (x-1); u1.31
+ add r0,r1 ! y1 * x ; u1.31
+ dmulu.l yn,r1
+ not DBL0H,r10
+ shld r9,r8
+ tst r2,r10
+ or r8,r12 ! a - 1; u0.32
+ bt LOCAL(inf_nan_arg0)
+ sts mach,r1 ! d1+yn; u1.31
+ sett ! adjust y2 so that it can be interpreted as s1.31
+ not DBL1H,r10
+ subc r1,yn ! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst r2,r10
+ or DBL1H,r2
+ bt LOCAL(inf_nan_arg1)
+ mov.l r11,@-r15
+ sts mach,r12 ! y2*(a-1) ; u1.31
+ add yn,r12 ! z0 ; u1.31
+ dmulu.l r12,DBL1L
+ mov.l LOCAL(x40000000),DBLRH ! bias + 1
+ and r9,r2 ! x ; u12.20
+ cmp/hi DBL0L,DBL1L
+ sts macl,r8
+ mov #-24,r11
+ sts mach,r9 ! r9:r8 := z0 * DBL1L; u-19.64
+ subc DBL1H,DBLRH
+ mul.l r12,r2 ! (r9+macl):r8 == z0*x; u-19.64
+ shll r8
+ add DBL0H,DBLRH ! result sign/exponent + 1
+ mov r8,r10
+ sts macl,DBLRL
+ add DBLRL,r9
+ rotcl r9 ! r9:r8 := z*x; u-20.63
+ shld r11,r10
+ mov.l LOCAL(x7fe00000),DBLRL
+ sub DBL0L,r9 ! r9:r8 := -a ; u-20.63
+ cmp/pz r9 ! In corner cases this shift can loose ..
+ shll8 r9 ! .. the sign, so check it first.
+ mov.l LOCAL(x00200000),r11
+ or r10,r9 ! -a1 ; s-28.32
+ mov.l LOCAL(x00100000),r10
+ dmulu.l r9,yn ! sign for r9 is in T
+ xor DBL0H,DBL1H ! calculate expected sign & bit20
+ mov.w LOCAL(d120),DBL0H ! to test bits 6..4
+ xor DBLRH,DBL1H
+ !
+ sts mach,DBL0L ! -z1 ; s-27.32
+ bt 0f
+ sub yn,DBL0L ! multiply adjust for -a1 negative; r3 dies here
+0:tst r10,DBL1H ! set T if a >= x
+ mov.l LOCAL(xfff00000),r3
+ bt 0f
+ add DBL0L,DBL0L ! z1 ; s-27.32 / s-28.32
+0:bt 0f
+ add r12,r12 ! z0 ; u1.31 / u0.31
+0:add #6-64,DBL0L
+ and r3,DBLRH ! isolate sign / exponent
+ tst DBL0H,DBL0L
+ bf/s LOCAL(exact) ! make the hot path taken for best branch prediction
+ cmp/pz DBL1H
+
+! Unless we follow the next branch, we need to test which way the rounding
+! should go.
+! For normal numbers, we know that the result is not exact, so the sign
+! of the rest will be conclusive.
+! We generate a number that looks safely rounded so that denorm handling
+! can safely test the number twice.
+! r10:r8 == 0 will indicate if the number was exact, which can happen
+! when we come here for denormals to check a number that is close or
+! equal to a result in whole ulps.
+ bf LOCAL(ret_denorm_inf) ! denorm or infinity, DBLRH has inverted sign
+ add #64,DBL0L
+LOCAL(find_adjust): tst r10,DBL1H ! set T if a >= x
+ mov #-2,r10
+ addc r10,r10
+ mov DBL0L,DBLRL ! z1 ; s-27.32 / s-28.32 ; lower 4 bits unsafe.
+ shad r10,DBLRL ! tentatively rounded z1 ; s-24.32
+ shll8 r8 ! r9:r8 := -a1 ; s-28.64
+ clrt
+ dmuls.l DBLRL,DBL1L ! DBLRL signed, DBL1L unsigned
+ mov r8,r10
+ shll16 r8 ! r8 := lowpart of -a1 ; s-44.48
+ xtrct r9,r10 ! r10 := highpart of -a1 ; s-44.48
+ !
+ sts macl,r3
+ subc r3,r8
+ sts mach,r3
+ subc r3,r10
+ cmp/pz DBL1L
+ mul.l DBLRL,r2
+ bt 0f
+ sub DBLRL,r10 ! adjust for signed/unsigned multiply
+0: mov.l LOCAL(x7fe00000),DBLRL
+ mov #-26,r2
+ sts macl,r9
+ sub r9,r10 ! r10:r8 := -a2
+ add #-64+16,DBL0L ! the denorm code negates this adj. for exact results
+ shld r2,r10 ! convert sign into adjustment in the range 32..63
+ sub r10,DBL0L
+ cmp/pz DBL1H
+
+ .balign 4
+LOCAL(exact):
+ bf LOCAL(ret_denorm_inf) ! denorm or infinity, DBLRH has inverted sign
+ tst DBLRL,DBLRH
+ bt LOCAL(ret_denorm_inf) ! denorm, DBLRH has correct sign
+ mov #-7,DBL1H
+ cmp/pz DBL0L ! T is sign extension of z1
+ not DBL0L,DBLRL
+ subc r11,DBLRH ! calculate sign / exponent minus implicit 1 minus T
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ shad DBL1H,DBLRL
+ mov.l @r15+,r9
+ mov #-11,DBL1H
+ mov r12,r8 ! z0 contributes to DBLRH and DBLRL
+ shld DBL1H,r12
+ mov #21,DBL1H
+ clrt
+ shld DBL1H,r8
+ addc r8,DBLRL
+ mov.l @r15+,r8
+ addc r12,DBLRH
+ rts
+ mov.l @r15+,r12
+
+! sign in DBLRH ^ DBL1H
+! If the last 7 bits are in the range 64..64+7, we might have an exact
+! value in the preceding bits - or we might not. For denorms, we need to
+! find out.
+! if r10:r8 is zero, we just have found out that there is an exact value.
+ .balign 4
+LOCAL(ret_denorm_inf):
+ mov DBLRH,r3
+ add r3,r3
+ div0s DBL1H,r3
+ mov #120,DBLRL
+ bt LOCAL(ret_inf_late)
+ add #64,DBL0L
+ tst DBLRL,DBL0L
+ mov #-21,DBLRL
+ bt LOCAL(find_adjust)
+ or r10,r8
+ tst r8,r8 ! check if find_adjust found an exact value.
+ shad DBLRL,r3
+ bf 0f
+ add #-16,DBL0L ! if yes, cancel adjustment
+0: mov #-8,DBLRL ! remove the three lowest (inexact) bits
+ and DBLRL,DBL0L
+ add #-2-11,r3 ! shift count for denorm generation
+ mov DBL0L,DBLRL
+ mov #28,r2
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ shll2 DBLRL
+ mov.l @r15+,r9
+ shld r2,DBL0L
+ mov.l @r15+,r8
+ mov #-31,r2
+ cmp/ge r2,r3
+ shll2 DBLRL
+ bt/s 0f
+ add DBL0L,r12 ! fraction in r12:DBLRL ; u1.63
+ negc DBLRL,DBLRL ! T := DBLRL != 0
+ add #31,r3
+ mov r12,DBLRL
+ rotcl DBLRL ! put in sticky bit
+ movt r12
+ cmp/ge r2,r3
+ bt/s LOCAL(return_0_late)
+0: div0s DBL1H,DBLRH ! calculate sign
+ mov r12,DBLRH
+ shld r3,DBLRH
+ mov DBLRL,r2
+ shld r3,DBLRL
+ add #32,r3
+ add DBLRH,DBLRH
+ mov.l LOCAL(x80000000),DBL1H
+ shld r3,r12
+ rotcr DBLRH ! combine sign with highpart
+ add #-1,r3
+ shld r3,r2
+ mov #0,r3
+ rotl r2
+ cmp/hi DBL1H,r2
+ addc r12,DBLRL
+ mov.l @r15+,r12
+ rts
+ addc r3,DBLRH
+
+LOCAL(ret_inf_late):
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ mov DBLRH,DBL0H
+ mov.l @r15+,r9
+ bra LOCAL(return_inf)
+ mov.l @r15+,r8
+
+LOCAL(return_0_late):
+ div0s DBLRH,DBL1H
+ mov.l @r15+,r12
+ mov #0,DBLRH
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #21,r9
+ xtrct r0,r8
+ add #-16,r9
+0: tst r12,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #-8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(d1): .word 1
+LOCAL(d12): .word 12
+LOCAL(d13): .word 13
+LOCAL(d120): .word 120
+
+ .balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+ .byte 120, 105, 91, 78, 66, 54, 43, 33
+ .byte 24, 15, 8, 0, -5, -12, -17, -22
+ .byte -27, -31, -34, -37, -40, -42, -44, -45
+ .byte -46, -46, -47, -46, -46, -45, -44, -42
+ .byte -41, -39, -36, -34, -31, -28, -24, -20
+ .byte -17, -12, -8, -4, 0, 5, 10, 16
+ .byte 21, 27, 33, 39, 45, 52, 58, 65
+ .byte 72, 79, 86, 93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssisf.S (revision 0)
@@ -0,0 +1,94 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsisf))
+ .global GLOBAL(floatunsisf)
+ .balign 4
+GLOBAL(floatunsisf):
+ mov.l LOCAL(c_clz_tab),r0
+ extu.w r4,r1
+ mov.w LOCAL(xff00),r3
+ cmp/eq r4,r1
+ mov #24,r2
+ bt 0f
+ mov r4,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r1
+ mov r4,r0
+ mov.l LOCAL(x4a800000),r3 ! bias + 23 - implicit 1
+ tst r4,r4
+ bt LOCAL(ret0)
+ !
+ sub r1,r2
+ mov.l LOCAL(x80000000),r1
+ shld r2,r0
+ cmp/pz r2
+ add r3,r0
+ bt LOCAL(noround)
+ add #31,r2
+ shld r2,r4
+ rotl r4
+ add #-31,r2
+ cmp/hi r1,r4
+ mov #0,r3
+ addc r3,r0
+LOCAL(noround):
+ mov #23,r1
+ shld r1,r2
+ rts
+ sub r2,r0
+LOCAL(ret0):
+ rts
+ nop
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsisf))
Index: gcc/config/sh/IEEE-754/m3/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssidf.S (revision 0)
@@ -0,0 +1,96 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! floatunssidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsidf))
+ .global GLOBAL(floatunsidf)
+ .balign 4
+GLOBAL(floatunsidf):
+ mov.l LOCAL(c_clz_tab),r0
+ extu.w r4,r1
+ mov.w LOCAL(0xff00),r3
+ cmp/eq r4,r1
+ mov #21,r2
+ bt 0f
+ mov r4,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r5
+ mov r4,DBLRL
+ mov.l LOCAL(x41200000),r3 ! bias + 20 - implicit 1
+ tst r4,r4
+ mov r4,DBLRH
+ bt LOCAL(ret0)
+ sub r5,r2
+ mov r2,r5
+ shld r2,DBLRH
+ cmp/pz r2
+ add r3,DBLRH
+ add #32,r2
+ shld r2,DBLRL
+ bf 0f
+ mov.w LOCAL(d0),DBLRL
+0: mov #20,r2
+ shld r2,r5
+ rts
+ sub r5,DBLRH
+LOCAL(ret0):
+ mov r4,DBLRL
+ rts
+ mov r4,DBLRH
+
+LOCAL(0xff00): .word 0xff00
+ .balign 4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0): .word 0
+ .word 0x4120
+#else
+ .word 0x4120
+LOCAL(d0): .word 0
+#endif
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsidf))
Index: gcc/config/sh/IEEE-754/m3/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixunsdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixunsdfsi.S (revision 0)
@@ -0,0 +1,82 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!! fixunsdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixunsdfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get INT_MAX, for set sign bit, you get INT_MIN.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixunsdfsi)
+ FUNC(GLOBAL(fixunsdfsi))
+ .balign 4
+GLOBAL(fixunsdfsi):
+ mov.w LOCAL(x413),r1 ! bias + 20
+ mov DBL0H,r0
+ shll DBL0H
+ mov.l LOCAL(mask),r3
+ mov #-21,r2
+ shld r2,DBL0H ! SH4-200 will start this insn in a new cycle
+ bt/s LOCAL(ret0)
+ sub r1,DBL0H
+ cmp/pl DBL0H ! SH4-200 will start this insn in a new cycle
+ and r3,r0
+ bf/s LOCAL(ignore_low)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #11,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmax)
+ shld DBL0H,DBL0L
+ rts
+ or DBL0L,r0
+
+ .balign 8
+LOCAL(ignore_low):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ add #1,r0
+ bf 0f
+LOCAL(ret0): mov #0,r0 ! results in 0 return
+0: rts
+ shld DBL0H,r0
+
+LOCAL(retmax):
+ rts
+ mov #-1,r0
+
+LOCAL(x413): .word 0x413
+
+ .balign 4
+LOCAL(mask): .long 0x000fffff
+ ENDFUNC(GLOBAL(fixunsdfsi))
+#endif /* L_fixunsdfsi */
Index: gcc/config/sh/IEEE-754/m3/divdf3-rt.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3-rt.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3-rt.S (revision 0)
@@ -0,0 +1,519 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* This version is not quite finshed, since I've found that I can
+ get better average performance with a slightly altered algorithm.
+ Still, if you want a version for hard real time, this version here might
+ be a good starting point, since it has effectively no conditional
+ branches in the path that deals with normal numbers
+ (branches with zero offset are effectively conditional execution),
+ and thus it has a uniform execution time in this path. */
+
+/* y = 1/x ; x (- [1,2)
+ y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+ y1 = y0 - ((y0) * x - 1) * y0 = y-x*d^2
+ y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+ z0 = y2*a ; a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+ z1 = y2*a1 (round to nearest odd 0.5 ulp);
+ a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+ z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+ Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+ with suitable scaling and/or top truncation.
+ x truncated to 20 bits is sufficient to calculate y0 or even y1.
+ Table entries are adjusted by about +128 to use full signed byte range.
+ This adjustment has been perturbed slightly to allow cse with the
+ shift count constant -26.
+ The threshold point for the shift adjust before rounding is found by
+ comparing the fractions, which is exact, unlike the top bit of y2.
+ Therefore, the top bit of y2 becomes slightly random after the adjustment
+ shift, but that's OK because this can happen only at the boundaries of
+ the interval, and the baising of the error means that it can in fact happen
+ only at the bottom end. And there, the carry propagation will make sure
+ that in the end we will have in effect an implicit 1 (or two whem rounding
+ up...) */
+/* If an exact result exists, it can have no more bits than the divident.
+ Hence, we don't need to bother with the round-to-even tie breaker
+ unless the result is denormalized. */
+/* 70 cycles through main path for sh4-300 . Some cycles might be
+ saved by more careful register allocation.
+ 122 cycles for sh4-200. If execution time for sh4-200 is of concern,
+ a specially scheduled version makes sense. */
+
+#define x_h r12
+#define yn r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too. We still have to come back to denorm_arg1_done,
+ since we heven't done any of the work yet that we do till the denorm_arg0
+ entry point. We know that neither of the arguments is inf/nan, but
+ arg0 might be zero. Check for that first to avoid having to establish an
+ rts return address. */
+LOCAL(both_denorm):
+ mov.l r9,@-r15
+ mov DBL0H,r1
+ mov.l r0,@-r15
+ shll2 r1
+ mov.w LOCAL(both_denorm_cleanup_off),r9
+ or DBL0L,r1
+ tst r1,r1
+ mov DBL0H,r0
+ bf/s LOCAL(zero_denorm_arg0_1)
+ shll2 r0
+ mov.l @(4,r15),r9
+ add #8,r15
+ bra LOCAL(ret_inf_nan_0)
+ mov r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+ mov.l @r15+,r0
+ !
+ mov.l @r15+,r9
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ bra LOCAL(denorm_arg1_done)
+ !
+ add r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+ (implicit 1). To leave the result exponent unaltered, the other
+ argument's exponent is adjusted by the the shift count. */
+
+ .balign 4
+LOCAL(arg0_tiny):
+ bsr LOCAL(clz)
+ mov DBL0L,r0
+ shll DBL0H
+ add #1,r0
+ mov DBL0L,DBL0H
+ shld r0,DBL0H
+ rotcr DBL0H
+ tst DBL0L,DBL0L /* Check for divide of zero. */
+ add #-33,r0
+ shld r0,DBL0L
+ bf/s LOCAL(adjust_arg1_exp)
+ add #64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign. */
+ mov.l @r15+,r10
+ mov #0,DBLRH
+ mov.l @r15+,r9
+ bra LOCAL(ret_inf_nan_0)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(arg1_tiny):
+ bsr LOCAL(clz)
+ mov DBL1L,r0
+ shll DBL1H
+ add #1,r0
+ mov DBL1L,DBL1H
+ shld r0,DBL1H
+ rotcr DBL1H
+ tst DBL1L,DBL1L /* Check for divide by zero. */
+ add #-33,r0
+ shld r0,DBL1L
+ bf/s LOCAL(adjust_arg0_exp)
+ add #64,r0
+ mov DBL0H,r0
+ add r0,r0
+ tst r0,r0 ! 0 / 0 ?
+ mov #-1,DBLRH
+ bf LOCAL(return_inf)
+ !
+ bt LOCAL(ret_inf_nan_0)
+ !
+
+ .balign 4
+LOCAL(zero_denorm_arg1):
+ not DBL0H,r3
+ mov DBL1H,r0
+ tst r2,r3
+ shll2 r0
+ bt LOCAL(early_inf_nan_arg0)
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg1_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ !
+ shll DBL1H
+ mov DBL1L,r3
+ shld r0,DBL1H
+ shld r0,DBL1L
+ rotcr DBL1H
+ add #-32,r0
+ shld r0,r3
+ add #32,r0
+ or r3,DBL1H
+LOCAL(adjust_arg0_exp):
+ tst r2,DBL0H
+ mov #20,r3
+ shld r3,r0
+ bt LOCAL(both_denorm)
+ add DBL0H,r0
+ div0s r0,DBL0H ! Check for obvious overflow. */
+ not r0,r3 ! Check for more subtle overflow - lest
+ bt LOCAL(return_inf)
+ mov r0,DBL0H
+ tst r2,r3 ! we mistake it for NaN later
+ mov #12,r3
+ bf LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign. */
+ mov #20,r3
+ mov #-2,DBLRH
+ bra LOCAL(ret_inf_nan_0)
+ shad r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan nan/x -> nan */
+LOCAL(inf_nan_arg0):
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+LOCAL(early_inf_nan_arg0):
+ not DBL1H,r3
+ mov DBL0H,DBLRH
+ tst r2,r3 ! both inf/nan?
+ add DBLRH,DBLRH
+ bf LOCAL(ret_inf_nan_0)
+ mov #-1,DBLRH
+LOCAL(ret_inf_nan_0):
+ mov #0,DBLRL
+ mov.l @r15+,r12
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+/* Already handled: inf/x, nan/x . Thus: x/inf -> 0; x/nan -> nan */
+ .balign 4
+LOCAL(inf_nan_arg1):
+ mov DBL1H,r2
+ mov #12,r1
+ shld r1,r2
+ mov.l @r15+,r10
+ mov #0,DBLRL
+ mov.l @r15+,r9
+ or DBL1L,r2
+ mov.l @r15+,r8
+ cmp/hi DBLRL,r2
+ mov.l @r15+,r12
+ subc DBLRH,DBLRH
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(zero_denorm_arg0):
+ mov.w LOCAL(denorm_arg0_done_off),r9
+ not DBL1H,r1
+ mov DBL0H,r0
+ tst r2,r1
+ shll2 r0
+ bt LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg0_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ shll DBL0H
+ mov DBL0L,r12
+ shld r0,DBL0H
+ shld r0,DBL0L
+ rotcr DBL0H
+ add #-32,r0
+ shld r0,r12
+ add #32,r0
+ or r12,DBL0H
+LOCAL(adjust_arg1_exp):
+ mov #20,r12
+ shld r12,r0
+ add DBL1H,r0
+ div0s r0,DBL1H ! Check for obvious underflow. */
+ not r0,r12 ! Check for more subtle underflow - lest
+ bt LOCAL(return_0)
+ mov r0,DBL1H
+ tst r2,r12 ! we mistake it for NaN later
+ bt LOCAL(return_0)
+ !
+ braf r9
+ mov #13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00): .word 0xff00
+LOCAL(denorm_arg0_done_off):
+ .word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+ .word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign 8
+GLOBAL(divdf3):
+ mov.l LOCAL(x7ff00000),r2
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst r2,DBL1H
+ mov.l r12,@-r15
+ bt LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov DBL1H,x_h ! x_h live in r12
+ shld r3,x_h ! x - 1 ; u0.20
+ mov x_h,yn
+ mova LOCAL(ytab),r0
+ mov.l r8,@-r15
+ shld r1,yn ! x-1 ; u26.6
+ mov.b @(r0,yn),yn
+ mov #6,r0
+ mov.l r9,@-r15
+ mov x_h,r8
+ mov.l r10,@-r15
+ shlr16 x_h ! x - 1; u16.16 ! x/2 - 0.5 ; u15.17
+ add x_h,r1 ! SH4-200 single-issues this insn
+ shld r0,yn
+ sub r1,yn ! yn := y0 ; u15.17
+ mov DBL1L,r1
+ mov #-20,r10
+ mul.l yn,x_h ! r12 dead
+ swap.w yn,r9
+ shld r10,r1
+ sts macl,r0 ! y0 * (x-1) - n ; u-1.32
+ add r9,r0 ! y0 * x - 1 ; s-1.32
+ tst r2,DBL0H
+ dmuls.l r0,yn
+ mov.w LOCAL(d13),r0
+ or r1,r8 ! x - 1; u0.32
+ add yn,yn ! yn = y0 ; u14.18
+ bt LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done): ! This label must stay aligned.
+ sts mach,r1 ! d0 ; s14.18
+ sub r1,yn ! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov DBL0L,r12
+ shld r0,yn ! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld r10,r12
+ mov yn,r0
+ mov DBL0H,r8
+ add yn,yn ! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts mach,r1 ! y1 * (x-1); u1.31
+ add r0,r1 ! y1 * x ; u1.31
+ dmulu.l yn,r1
+ not DBL0H,r10
+ shld r9,r8
+ tst r2,r10
+ or r8,r12 ! a - 1; u0.32
+ bt LOCAL(inf_nan_arg0)
+ sts mach,r1 ! d1+yn; u1.31
+ sett ! adjust y2 so that it can be interpreted as s1.31
+ not DBL1H,r10
+ subc r1,yn ! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst r2,r10
+ or DBL1H,r2
+ bt LOCAL(inf_nan_arg1)
+ mov.l r11,@-r15
+ sts mach,r11 ! y2*(a-1) ; u1.31
+ add yn,r11 ! z0 ; u1.31
+ dmulu.l r11,DBL1L
+ mov.l LOCAL(x40000000),DBLRH ! bias + 1
+ and r9,r2 ! x ; u12.20
+ cmp/hi DBL0L,DBL1L
+ sts macl,r8
+ mov #-24,r12
+ sts mach,r9 ! r9:r8 := z0 * DBL1L; u-19.64
+ subc DBL1H,DBLRH
+ mul.l r11,r2 ! (r9+macl):r8 == z0*x; u-19.64
+ shll r8
+ add DBL0H,DBLRH ! result sign/exponent + 1
+ mov r8,r10
+ sts macl,DBLRL
+ add DBLRL,r9
+ rotcl r9 ! r9:r8 := z*x; u-20.63
+ shld r12,r10
+ mov.l LOCAL(x7fe00000),DBLRL
+ sub DBL0L,r9 ! r9:r8 := -a ; u-20.63
+ mov.l LOCAL(x00200000),r12
+FIXME: the following shift might loose the sign.
+ shll8 r9
+ or r10,r9 ! -a1 ; s-28.32
+ mov.l LOCAL(x00100000),r10
+ dmuls.l r9,yn ! r3 dead
+ mov DBL1H,r3
+ mov.l LOCAL(xfff00000),DBL0L
+ xor DBL0H,r3 ! calculate expected sign & bit20
+ div0s r3,DBLRH
+ xor DBLRH,r3
+ bt LOCAL(ret_denorm_inf)
+ tst DBLRL,DBLRH
+ bt LOCAL(ret_denorm)
+ sub r12,DBLRH ! calculate sign / exponent minus implicit 1
+ tst r10,r3 ! set T if a >= x
+ sts mach,r12! -z1 ; s-27.32
+ bt 0f
+ add r11,r11 ! z0 ; u1.31 / u0.31
+0: mov #6,r3
+ negc r3,r10 ! shift count := a >= x ? -7 : -6; T := 1
+ shll8 r8 ! r9:r8 := -a1 ; s-28.64
+ shad r10,r12 ! -z1 ; truncate to s-20.32 / s-21.32
+ rotcl r12 ! -z1 ; s-21.32 / s-22.32 / round to odd 0.5 ulp ; T := sign
+ add #20,r10
+ dmulu.l r12,DBL1L ! r12 signed, DBL1L unsigned
+ and DBL0L,DBLRH ! isolate sign / exponent
+ shld r10,r9
+ mov r8,r3
+ shld r10,r8
+ sts macl,DBL0L
+ sts mach,DBLRL
+ add #-32,r10
+ shld r10,r3
+ mul.l r12,r2
+ bf 0f ! adjustment for signed/unsigned multiply
+ sub DBL1L,DBLRL ! DBL1L dead
+0: shar r12 ! -z1 ; truncate to s-20.32 / s-21.32
+ sts macl,DBL1L
+ or r3,r9 ! r9:r8 := -a1 ; s-41.64/s-42.64
+ !
+ cmp/hi r8,DBL0L
+ add DBLRL,DBL1L ! DBL1L:DBL0L := -z1*x ; s-41.64/s-42.64
+ subc DBL1L,r9
+ not r12,DBLRL ! z1, truncated to s-20.32 / s-21.32
+ shll r9 ! T := a2 > 0
+ mov r11,r2
+ mov #21,r7
+ shld r7,r11
+ addc r11,DBLRL
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ mov #-11,r7
+ mov.l @r15+,r9
+ shld r7,r2
+ mov.l @r15+,r8
+ addc r2,DBLRH
+ rts
+ mov.l @r15+,r12
+
+LOCAL(ret_denorm):
+ tst r10,DBLRH
+ bra LOCAL(denorm_have_count)
+ movt DBLRH ! calculate shift count (off by 2)
+
+LOCAL(ret_denorm_inf):
+ mov DBLRH,r12
+ add r12,r12
+ cmp/pz r12
+ mov #-21,DBLRL
+ bt LOCAL(ret_inf_late)
+ shld DBLRL,DBLRH
+LOCAL(denorm_have_count):
+ add #-2,DBLRH
+/* FIXME */
+ bra LOCAL(return_0)
+ mov.l @r15+,r11
+
+LOCAL(ret_inf_late):
+ mov.l @r15+,r11
+ !
+ mov.l @r15+,r10
+ !
+ mov.l @r15+,r9
+ bra LOCAL(return_inf)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #8-11,r9
+ xtrct r0,r8
+ add #16,r9
+0: tst r12,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(d1): .word 1
+LOCAL(d12): .word 12
+LOCAL(d13): .word 13
+
+ .balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+ .byte 120, 105, 91, 78, 66, 54, 43, 33
+ .byte 24, 15, 8, 0, -5, -12, -17, -22
+ .byte -27, -31, -34, -37, -40, -42, -44, -45
+ .byte -46, -46, -47, -46, -46, -45, -44, -42
+ .byte -41, -39, -36, -34, -31, -28, -24, -20
+ .byte -17, -12, -8, -4, 0, 5, 10, 16
+ .byte 21, 27, 33, 39, 45, 52, 58, 65
+ .byte 72, 79, 86, 93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/addsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/addsf3.S (revision 0)
@@ -0,0 +1,290 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! addsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+#ifdef L_add_sub_sf3
+ .balign 4
+ .global GLOBAL(subsf3)
+ FUNC(GLOBAL(subsf3))
+ .global GLOBAL(addsf3)
+ FUNC(GLOBAL(addsf3))
+GLOBAL(subsf3):
+ cmp/pz r5
+ add r5,r5
+ rotcr r5
+ .balign 4
+GLOBAL(addsf3):
+ mov.l LOCAL(x7f800000),r3
+ mov r4,r6
+ add r6,r6
+ mov r5,r7
+ add r7,r7
+ mov r4,r0
+ or r3,r0
+ cmp/hi r6,r7
+ mov r5,r1
+ bf/s LOCAL(r4_hs)
+ or r3,r1
+ cmp/eq r5,r1
+ bt LOCAL(ret_r5) /* sole Inf or NaN, return unchanged. */
+ shll8 r0 ! r4 fraction
+ shll8 r1 ! r5 fraction
+ mov r6,r3
+ mov #-24,r2
+ mov r7,r6
+ shld r2,r6 ! r5 exp
+ mov r0,r7
+ shld r2,r3 ! r4 exp
+ tst r6,r6
+ sub r6,r3 ! exp difference (negative or 0)
+ bt LOCAL(denorm_r4)
+LOCAL(denorm_r4_done): ! r1: u1.31
+ shld r3,r0 ! Get 31 upper bits, including 8 guard bits
+ mov.l LOCAL(xff000000),r2
+ add #31,r3
+ mov.l r5,@-r15 ! push result sign.
+ cmp/pl r3 ! r0 has no more than one bit set -> return arg 1
+ shld r3,r7 ! copy of lowest guard bit in r0 and lower guard bits
+ bf LOCAL(ret_stack)
+ div0s r4,r5
+ bf/s LOCAL(add)
+ cmp/pl r7 /* Is LSB in r0 clear, but any lower guards bit set? */
+ subc r0,r1
+ mov.l LOCAL(c__clz_tab),r7
+ tst r2,r1
+ mov #-24,r3
+ bf/s LOCAL(norm_r0)
+ mov r1,r0
+ extu.w r1,r1
+ bra LOCAL(norm_check2)
+ cmp/eq r0,r1
+LOCAL(ret_r5):
+ rts
+ mov r5,r0
+LOCAL(ret_stack):
+ rts
+ mov.l @r15+,r0
+
+/* We leave the numbers denormalized, but we change the bit position to be
+ consistent with normalized numbers. This also removes the spurious
+ leading one that was inserted before. */
+LOCAL(denorm_r4):
+ tst r3,r3
+ bf/s LOCAL(denorm_r4_done)
+ add r0,r0
+ bra LOCAL(denorm_r4_done)
+ add r1,r1
+LOCAL(denorm_r5):
+ tst r6,r6
+ add r1,r1
+ bf LOCAL(denorm_r5_done)
+ clrt
+ bra LOCAL(denorm_r5_done)
+ add r0,r0
+
+/* If the exponent differs by two or more, normalization is minimal, and
+ few guard bits are needed for an exact final result, so sticky guard
+ bit compresion before subtraction (or add) works fine.
+ If the exponent differs by one, only one extra guard bit is generated,
+ and effectively no guard bit compression takes place. */
+
+ .balign 4
+LOCAL(r4_hs):
+ cmp/eq r4,r0
+ mov #-24,r3
+ bt LOCAL(inf_nan_arg0)
+ shld r3,r7
+ shll8 r0
+ tst r7,r7
+ shll8 r1
+ mov.l LOCAL(xff000000),r2
+ bt/s LOCAL(denorm_r5)
+ shld r3,r6
+LOCAL(denorm_r5_done):
+ mov r1,r3
+ subc r6,r7
+ bf LOCAL(same_exp)
+ shld r7,r1 /* Get 31 upper bits. */
+ add #31,r7
+ mov.l r4,@-r15 ! push result sign.
+ cmp/pl r7
+ shld r7,r3
+ bf LOCAL(ret_stack)
+ div0s r4,r5
+ bf/s LOCAL(add)
+ cmp/pl r3 /* Is LSB in r1 clear, but any lower guard bit set? */
+ subc r1,r0
+ mov.l LOCAL(c__clz_tab),r7
+LOCAL(norm_check):
+ tst r2,r0
+ mov #-24,r3
+ bf LOCAL(norm_r0)
+ extu.w r0,r1
+ cmp/eq r0,r1
+LOCAL(norm_check2):
+ mov #-8,r3
+ bt LOCAL(norm_r0)
+ mov #-16,r3
+LOCAL(norm_r0):
+ mov r0,r1
+ shld r3,r0
+#ifdef __pic__
+ add r0,r7
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r7),r7
+ add #25,r3
+ add #-9+1,r6
+ mov r1,r0
+ sub r7,r3
+ mov.l LOCAL(xbfffffff),r7
+ sub r3,r6 /* generate exp-1 */
+ mov.w LOCAL(d24),r2
+ cmp/pz r6 /* check exp > 0 */
+ shld r3,r0 /* Leading 1 becomes +1 exp adjustment. */
+ bf LOCAL(zero_denorm)
+LOCAL(denorm_done):
+ add #30,r3
+ shld r3,r1
+ mov.w LOCAL(m1),r3
+ tst r7,r1 ! clear T if rounding up
+ shld r2,r6
+ subc r3,r0 ! round - overflow will boost exp adjustment to 2.
+ mov.l @r15+,r2
+ add r6,r0 ! overflow will generate inf
+ cmp/ge r2,r3 ! get sign into T
+ rts
+ rotcr r0
+LOCAL(ret_r4):
+ rts
+ mov r4,r0
+
+/* At worst, we are shifting the number back in place where an incoming
+ denormal was. Thus, the shifts won't get out of range. They still
+ might generate a zero fraction, but that's OK, that makes it 0. */
+LOCAL(zero_denorm):
+ add r6,r3
+ mov r1,r0
+ mov #0,r6 /* leading one will become free (except for rounding) */
+ bra LOCAL(denorm_done)
+ shld r3,r0
+
+/* Handle abs(r4) >= abs(r5), same exponents specially so we don't need
+ check for a zero fraction in the main path. */
+LOCAL(same_exp):
+ div0s r4,r5
+ mov.l r4,@-r15
+ bf LOCAL(add)
+ cmp/eq r1,r0
+ mov.l LOCAL(c__clz_tab),r7
+ bf/s LOCAL(norm_check)
+ sub r1,r0
+ rts ! zero difference -> return +zero
+ mov.l @r15+,r1
+
+/* r2: 0xff000000 */
+LOCAL(add):
+ addc r1,r0
+ mov.w LOCAL(x2ff),r7
+ shll8 r6
+ bf/s LOCAL(no_carry)
+ shll16 r6
+ tst r7,r0
+ shlr8 r0
+ mov.l @r15+,r3 ! discard saved sign
+ subc r2,r0
+ sett
+ addc r6,r0
+ cmp/hs r2,r0
+ bt/s LOCAL(inf)
+ div0s r7,r4 /* Copy sign. */
+ rts
+ rotcr r0
+LOCAL(inf):
+ mov r6,r0
+ rts
+ rotcr r0
+LOCAL(no_carry):
+ mov.w LOCAL(m1),r3
+ tst r6,r6
+ bt LOCAL(denorm_add)
+ add r0,r0
+ tst r7,r0 ! check if lower guard bit set or round to even
+ shlr8 r0
+ mov.l @r15+,r1 ! discard saved sign
+ subc r3,r0 ! round ; overflow -> exp++
+ cmp/ge r4,r3 /* Copy sign. */
+ add r6,r0 ! overflow -> inf
+ rts
+ rotcr r0
+
+LOCAL(denorm_add):
+ cmp/ge r4,r3 /* Copy sign. */
+ shlr8 r0
+ mov.l @r15+,r1 ! discard saved sign
+ rts
+ rotcr r0
+
+LOCAL(inf_nan_arg0):
+ cmp/eq r5,r1
+ bf LOCAL(ret_r4)
+ div0s r4,r5 /* Both are inf or NaN, check signs. */
+ bt LOCAL(ret_nan) /* inf - inf, or NaN. */
+ mov r4,r0 ! same sign; return NaN if either is NaN.
+ rts
+ or r5,r0
+LOCAL(ret_nan):
+ rts
+ mov #-1,r0
+
+LOCAL(d24):
+ .word 24
+LOCAL(x2ff):
+ .word 0x2ff
+LOCAL(m1):
+ .word -1
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(xbfffffff):
+ .long 0xbfffffff
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(xfe000000):
+ .long 0xfe000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+
+ ENDFUNC(GLOBAL(addsf3))
+ ENDFUNC(GLOBAL(subsf3))
+#endif /* L_add_sub_sf3 */
Index: gcc/config/sh/IEEE-754/m3/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/adddf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/adddf3.S (revision 0)
@@ -0,0 +1,587 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! adddf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4-200 without FPU, but can also be used for SH3.
+! Numbers with same sign are added in typically 37 cycles, worst case is
+! 43 cycles, unless there is an overflow, in which case the addition can
+! take up to takes 47 cycles.
+! Normal numbers with different sign are added in 56 (57 for PIC) cycles
+! or less on SH4.
+! If one of the inputs is a denormal, the worst case is 59 (60 for PIC)
+! cycles. (Two denormal inputs are faster than normal inputs, and
+! denormal outputs don't slow down computation).
+! Subtraction takes two cycles to negate the second input and then drops
+! through to addition.
+
+/* If the input exponents of a difference of two normalized numbers
+ differ by more than one, the output does not need to be adjusted
+ by more than one bit position. Hence, it makes sense to ensure that
+ the shifts by 0 & 1 are handled quickly to reduce average and worst
+ case times. */
+FUNC(GLOBAL(adddf3))
+FUNC(GLOBAL(subdf3))
+ .global GLOBAL(adddf3)
+ .global GLOBAL(subdf3)
+LOCAL(denorm_arg1):
+ bt LOCAL(inf_nan_arg0)
+ tst r0,r2
+ bt/s LOCAL(denorm_both)
+ shlr r1
+ mov.l LOCAL(x00100000),r3
+ bra LOCAL(denorm_arg1_done)
+ sub r2,r3
+
+! Handle denorm addition here because otherwise the ordinary addition would
+! have to check for denormal results.
+! Denormal subtraction could also be done faster, but the denorm subtraction
+! path here is still one cycles faster than the one for normalized input
+! numbers, and 16 instructions shorter than the fastest version.
+! Here we also generate +0.0 + +0.0 -> +0.0 ; -0.0 + -0.0 -> -0.0
+LOCAL(denorm_both):
+ div0s DBL0H,DBL1H
+ mov.l LOCAL(x800fffff),r9
+ bt/s LOCAL(denorm_sub)
+ and r1,DBL1H
+ and r9,DBL0H
+ mov.l @r15+,r9
+ mov DBL0L,DBLRL
+ mov DBL0H,DBLRH
+ addc DBL1L,DBLRL
+ mov.l @r15+,r8
+ rts
+ addc DBL1H,DBLRH
+
+! N.B., since subtraction also generates +0.0 for subtraction of numbers
+! with identical fractions, this also covers the +0.0 + -0.0 -> +0.0 /
+! -0.0 + +0.0 -> +0.0 cases.
+LOCAL(denorm_sub):
+ mov DBL0H,r8 ! tentative result sign
+ and r1,DBL0H
+ bra LOCAL(sub_same_exp)
+ addc r1,r2 ! exponent++, clear T
+
+LOCAL(inf_nan_arg0):
+ mov DBL0L,DBLRL
+ bra LOCAL(pop_r8_r9)
+ mov DBL0H,DBLRH
+
+LOCAL(ret_arg0):
+ mov.l LOCAL(x800fffff),DBLRH
+ mov DBL0L,DBLRL
+ mov r2,r3
+LOCAL(ret_arg):
+ mov.l @r15+,r9
+ and r8,DBLRH
+ mov.l @r15+,r8
+ rts
+ or r3,DBLRH
+
+ .balign 4
+GLOBAL(subdf3):
+ cmp/pz DBL1H
+ add DBL1H,DBL1H
+ rotcr DBL1H
+ nop
+
+GLOBAL(adddf3):
+ mov.l LOCAL(x7ff00000),r0
+ mov DBL0H,r2
+ mov.l LOCAL(x001fffff),r1
+ mov DBL1H,r3
+ mov.l r8,@-r15
+ and r0,r2
+ mov.l r9,@-r15
+ and r0,r3
+ cmp/hi r2,r3
+ or r0,DBL0H
+ or r0,DBL1H
+ bt LOCAL(arg1_gt)
+ tst r0,r3
+ mov #-20,r9
+ bt/s LOCAL(denorm_arg1)
+ cmp/hs r0,r2
+ bt LOCAL(inf_nan_arg0)
+ sub r2,r3
+LOCAL(denorm_arg1_done): ! r2 is tentative result exponent
+ shad r9,r3
+ mov.w LOCAL(m32),r9
+ mov DBL0H,r8 ! tentative result sign
+ and r1,DBL0H ! arg0 fraction
+ mov DBL1H,r0 ! the 'other' sign
+ and r1,DBL1H ! arg1 fraction
+ cmp/ge r9,r3
+ mov DBL1H,r1
+ bf/s LOCAL(large_shift_arg1)
+ shld r3,DBL1H
+LOCAL(small_shift_arg1):
+ mov DBL1L,r9
+ shld r3,DBL1L
+ tst r3,r3
+ add #32,r3
+ bt/s LOCAL(same_exp)
+ div0s r8,r0 ! compare signs
+ shld r3,r1
+
+ or r1,DBL1L
+ bf/s LOCAL(add)
+ shld r3,r9
+ clrt
+ negc r9,r9
+ mov.l LOCAL(x001f0000),r3
+LOCAL(sub_high):
+ mov DBL0L,DBLRL
+ subc DBL1L,DBLRL
+ mov DBL0H,DBLRH
+ bra LOCAL(subtract_done)
+ subc DBL1H,DBLRH
+
+LOCAL(large_shift_arg1):
+ mov.w LOCAL(d0),r9
+ add #64,r3
+ cmp/pl r3
+ shld r3,r1
+ bf LOCAL(ret_arg0)
+ cmp/hi r9,DBL1L
+ mov DBL1H,DBL1L
+ mov r9,DBL1H
+ addc r1,r9
+
+ div0s r8,r0 ! compare signs
+
+ bf LOCAL(add)
+ clrt
+ mov.l LOCAL(x001f0000),r3
+ bra LOCAL(sub_high)
+ negc r9,r9
+
+LOCAL(add_clr_r9):
+ mov #0,r9
+LOCAL(add):
+ mov.l LOCAL(x00200000),r3
+ addc DBL1L,DBL0L
+ addc DBL1H,DBL0H
+ mov.l LOCAL(x80000000),r1
+ tst r3,DBL0H
+ mov.l LOCAL(x7fffffff),r3
+ mov DBL0L,r0
+ bt/s LOCAL(no_carry)
+ and r1,r8
+ tst r9,r9
+ bf LOCAL(add_one)
+ tst #2,r0
+LOCAL(add_one):
+ subc r9,r9
+ sett
+ mov r0,DBLRL
+ addc r9,DBLRL
+ mov DBL0H,DBLRH
+ addc r9,DBLRH
+ shlr DBLRH
+ mov.l LOCAL(x7ff00000),r3
+ add r2,DBLRH
+ mov.l @r15+,r9
+ rotcr DBLRL
+ cmp/hi r3,DBLRH
+LOCAL(add_done):
+ bt LOCAL(inf)
+LOCAL(or_sign):
+ or r8,DBLRH
+ rts
+ mov.l @r15+,r8
+
+LOCAL(inf):
+ bra LOCAL(or_sign)
+ mov r3,DBLRH
+
+LOCAL(pos_difference_0):
+ tst r3,DBL0H
+ mov DBL0L,DBLRL
+ mov.l LOCAL(x80000000),DBL0L
+ mov DBL0H,DBLRH
+ mov.l LOCAL(x00100000),DBL0H
+ bt/s LOCAL(long_norm)
+ and DBL0L,r8
+ bra LOCAL(norm_loop)
+ not DBL0L,r3
+
+LOCAL(same_exp):
+ bf LOCAL(add_clr_r9)
+ clrt
+LOCAL(sub_same_exp):
+ subc DBL1L,DBL0L
+ mov.l LOCAL(x001f0000),r3
+ subc DBL1H,DBL0H
+ mov.w LOCAL(d0),r9
+ bf LOCAL(pos_difference_0)
+ clrt
+ negc DBL0L,DBLRL
+ mov.l LOCAL(x80000000),DBL0L
+ negc DBL0H,DBLRH
+ mov.l LOCAL(x00100000),DBL0H
+ tst r3,DBLRH
+ not r8,r8
+ bt/s LOCAL(long_norm)
+ and DBL0L,r8
+ bra LOCAL(norm_loop)
+ not DBL0L,r3
+
+LOCAL(large_shift_arg0):
+ add #64,r2
+
+ mov #0,r9
+ cmp/pl r2
+ shld r2,r1
+ bf LOCAL(ret_arg1_exp_r3)
+ cmp/hi r9,DBL0L
+ mov DBL0H,DBL0L
+ mov r9,DBL0H
+ addc r1,r9
+ div0s r8,r0 ! compare signs
+ mov r3,r2 ! tentative result exponent
+ bf LOCAL(add)
+ clrt
+ negc r9,r9
+ bra LOCAL(subtract_arg0_arg1_done)
+ mov DBL1L,DBLRL
+
+LOCAL(arg1_gt):
+ tst r0,r2
+ mov #-20,r9
+ bt/s LOCAL(denorm_arg0)
+ cmp/hs r0,r3
+ bt LOCAL(inf_nan_arg1)
+ sub r3,r2
+LOCAL(denorm_arg0_done):
+ shad r9,r2
+ mov.w LOCAL(m32),r9
+ mov DBL1H,r8 ! tentative result sign
+ and r1,DBL1H
+ mov DBL0H,r0 ! the 'other' sign
+ and r1,DBL0H
+ cmp/ge r9,r2
+ mov DBL0H,r1
+ shld r2,DBL0H
+ bf LOCAL(large_shift_arg0)
+ mov DBL0L,r9
+ shld r2,DBL0L
+ add #32,r2
+ mov.l r3,@-r15
+ shld r2,r1
+ mov r2,r3
+ div0s r8,r0 ! compare signs
+ mov.l @r15+,r2 ! tentative result exponent
+ shld r3,r9
+ bf/s LOCAL(add)
+ or r1,DBL0L
+ clrt
+ negc r9,r9
+ mov DBL1L,DBLRL
+LOCAL(subtract_arg0_arg1_done):
+ subc DBL0L,DBLRL
+ mov DBL1H,DBLRH
+ mov.l LOCAL(x001f0000),r3
+ subc DBL0H,DBLRH
+/* Since the exponents were different, the difference is positive. */
+/* Fall through */
+LOCAL(subtract_done):
+/* First check if a shift by a few bits is sufficient. This not only
+ speeds up this case, but also alleviates the need for considering
+ lower bits from r9 or rounding in the other code.
+ Moreover, by handling the upper 1+4 bits of the fraction here, long_norm
+ can assume that DBLRH fits into 20 (20 < 16) bit. */
+ tst r3,DBLRH
+ mov.l LOCAL(x80000000),r3
+ mov.l LOCAL(x00100000),DBL0H
+ bt/s LOCAL(long_norm)
+ and r3,r8
+ mov.l LOCAL(x7fffffff),r3
+LOCAL(norm_loop): ! Well, this used to be a loop...
+ tst DBL0H,DBLRH
+ sub DBL0H,r2
+ bf LOCAL(norm_round)
+ shll r9
+ rotcl DBLRL
+
+ rotcl DBLRH
+
+ tst DBL0H,DBLRH
+ sub DBL0H,r2
+ bf LOCAL(norm_round)
+ shll DBLRL
+ rotcl DBLRH
+ mov.l @r15+,r9
+ cmp/gt r2,DBL0H
+ sub DBL0H,r2
+LOCAL(norm_loop_1):
+ bt LOCAL(denorm0_n)
+ tst DBL0H,DBLRH
+ bf LOCAL(norm_pack)
+ shll DBLRL
+ rotcl DBLRH ! clears T
+ bra LOCAL(norm_loop_1)
+ subc DBL0H,r2
+
+LOCAL(no_carry):
+ shlr r0
+ mov.l LOCAL(x000fffff),DBLRH
+ addc r3,r9
+ mov.w LOCAL(d0),DBL1H
+ mov DBL0L,DBLRL
+ and DBL0H,DBLRH ! mask out implicit 1
+ mov.l LOCAL(x7ff00000),r3
+ addc DBL1H,DBLRL
+ addc r2,DBLRH
+ mov.l @r15+,r9
+ add DBL1H,DBLRH ! fraction overflow -> exp increase
+ bra LOCAL(add_done)
+ cmp/hi r3,DBLRH
+
+LOCAL(denorm_arg0):
+ bt LOCAL(inf_nan_arg1)
+ mov.l LOCAL(x00100000),r2
+ shlr r1
+ bra LOCAL(denorm_arg0_done)
+ sub r3,r2
+
+LOCAL(inf_nan_arg1):
+ mov DBL1L,DBLRL
+ bra LOCAL(pop_r8_r9)
+ mov DBL1H,DBLRH
+
+LOCAL(ret_arg1_exp_r3):
+ mov.l LOCAL(x800fffff),DBLRH
+ bra LOCAL(ret_arg)
+ mov DBL1L,DBLRL
+
+#ifdef __pic__
+ .balign 8
+#endif
+LOCAL(m32):
+ .word -32
+LOCAL(d0):
+ .word 0
+#ifndef __pic__
+ .balign 8
+#endif
+! Because we had several bits of cancellations, we know that r9 contains
+! only one bit.
+! We'll normalize by shifting words so that DBLRH:DBLRL contains
+! the fraction with 0 < DBLRH <= 0x1fffff, then we shift DBLRH:DBLRL
+! up by 21 minus the number of non-zero bits in DBLRH.
+LOCAL(long_norm):
+ tst DBLRH,DBLRH
+ mov.w LOCAL(xff),DBL0L
+ mov #21,r3
+ bf LOCAL(long_norm_highset)
+ mov.l LOCAL(x02100000),DBL1L ! shift 32, implicit 1
+ tst DBLRL,DBLRL
+ extu.w DBLRL,DBL0H
+ bt LOCAL(zero_or_ulp)
+ mov DBLRL,DBLRH
+ cmp/hi DBL0H,DBLRL
+ bf 0f
+ mov.l LOCAL(x01100000),DBL1L ! shift 16, implicit 1
+ clrt
+ shlr16 DBLRH
+ xtrct DBLRL,r9
+ mov DBLRH,DBL0H
+LOCAL(long_norm_ulp_done):
+0: mov r9,DBLRL ! DBLRH:DBLRL == fraction; DBL0H == DBLRH
+ subc DBL1L,r2
+ bt LOCAL(denorm1_b)
+#ifdef __pic__
+ mov.l LOCAL(c__clz_tab),DBL1H
+LOCAL(long_norm_lookup):
+ mov r0,r9
+ mova LOCAL(c__clz_tab),r0
+ add DBL1H,r0
+#else
+ mov r0,r9
+LOCAL(long_norm_lookup):
+ mov.l LOCAL(c__clz_tab),r0
+#endif /* __pic__ */
+ cmp/hi DBL0L,DBL0H
+ bf 0f
+ shlr8 DBL0H
+0: mov.b @(r0,DBL0H),r0
+ bf 0f
+ add #-8,r3
+0: mov.w LOCAL(d20),DBL0L
+ mov #-20,DBL0H
+ clrt
+ sub r0,r3
+ mov r9,r0
+ mov r3,DBL1H
+ shld DBL0L,DBL1H
+ subc DBL1H,r2
+ !
+ bf LOCAL(no_denorm)
+ shad DBL0H,r2
+ bra LOCAL(denorm1_done)
+ add r2,r3
+
+LOCAL(norm_round):
+ cmp/pz r2
+ mov #0,DBL1H
+ bf LOCAL(denorm0_1)
+ or r8,r2
+ mov DBLRL,DBL1L
+ shlr DBL1L
+ addc r3,r9
+ mov.l @r15+,r9
+ addc DBL1H,DBLRL ! round to even
+ mov.l @r15+,r8
+ rts
+ addc r2,DBLRH
+
+LOCAL(norm_pack):
+ add r8,DBLRH
+ mov.l @r15+,r8
+ rts
+ add r2,DBLRH
+
+LOCAL(denorm0_1):
+ mov.l @r15+,r9
+ mov r8,DBL0L
+ mov.l @r15+,r8
+LOCAL(denorm0_shift):
+ shlr DBLRH
+ rotcr DBLRL
+
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(denorm0_n):
+ mov r8,DBL0L
+ addc DBL0H,r2
+ mov.l @r15+,r8
+ bf LOCAL(denorm0_shift)
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(no_denorm):
+ add r2,r8 ! add (exponent - 1) to sign
+
+LOCAL(denorm1_done):
+ shld r3,DBLRH
+ mov DBLRL,DBL0L
+ shld r3,DBLRL
+
+ add r8,DBLRH ! add in sign and (exponent - 1)
+ mov.l @r15+,r9
+ add #-32,r3
+ mov.l @r15+,r8
+ shld r3,DBL0L
+
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(long_norm_highset):
+ mov.l LOCAL(x00200000),DBL1L ! shift 1, implicit 1
+ shll r9
+ rotcl DBLRL
+ mov DBLRH,DBL0H
+ rotcl DBLRH ! clears T
+#ifdef __pic__
+ mov.l LOCAL(c__clz_tab),DBL1H
+#else
+ mov r0,r9
+#endif /* __pic__ */
+ subc DBL1L,r2
+ add #-1,r3
+ bf LOCAL(long_norm_lookup)
+LOCAL(denorm1_a):
+ shlr DBLRH
+ rotcr DBLRL
+ mov.l @r15+,r9
+ or r8,DBLRH
+
+ rts
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(denorm1_b):
+ mov #-20,DBL0L
+ shad DBL0L,r2
+ mov DBLRH,DBL0L
+ shld r2,DBLRH
+ shld r2,DBLRL
+ or r8,DBLRH
+ mov.l @r15+,r9
+ add #32,r2
+ mov.l @r15+,r8
+ shld r2,DBL0L
+ rts
+ or DBL0L,DBLRL
+
+LOCAL(zero_or_ulp):
+ tst r9,r9
+ bf LOCAL(long_norm_ulp_done)
+ ! return +0.0
+LOCAL(pop_r8_r9):
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+LOCAL(d20):
+ .word 20
+LOCAL(xff):
+ .word 0xff
+ .balign 4
+LOCAL(x7ff00000):
+ .long 0x7ff00000
+LOCAL(x001fffff):
+ .long 0x001fffff
+LOCAL(x80000000):
+ .long 0x80000000
+LOCAL(x000fffff):
+ .long 0x000fffff
+LOCAL(x800fffff):
+ .long 0x800fffff
+LOCAL(x001f0000):
+ .long 0x001f0000
+LOCAL(x00200000):
+ .long 0x00200000
+LOCAL(x7fffffff):
+ .long 0x7fffffff
+LOCAL(x00100000):
+ .long 0x00100000
+LOCAL(x02100000):
+ .long 0x02100000
+LOCAL(x01100000):
+ .long 0x01100000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(adddf3))
+ENDFUNC(GLOBAL(subdf3))
Index: gcc/config/sh/IEEE-754/m3/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/mulsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/mulsf3.S (revision 0)
@@ -0,0 +1,246 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! mulsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+ .balign 4
+ .global GLOBAL(mulsf3)
+ FUNC(GLOBAL(mulsf3))
+GLOBAL(mulsf3):
+ mov.l LOCAL(x7f800000),r1
+ not r4,r2
+ mov r4,r3
+ not r5,r0
+ tst r1,r2
+ or r1,r3
+ bt/s LOCAL(inf_nan_arg0)
+ tst r1,r0
+ bt LOCAL(inf_nan_arg1)
+ tst r1,r5
+ mov r1,r2
+ shll8 r3
+ or r5,r1
+ bt/s LOCAL(zero_denorm_arg1)
+ shll8 r1
+ tst r2,r4
+ bt LOCAL(zero_denorm_arg0)
+ dmulu.l r3,r1
+ mov r4,r0
+ and r2,r0
+LOCAL(arg_norm):
+ and r5,r2
+ mov.l LOCAL(x3f800000),r3
+ sts mach,r1
+ sub r3,r0
+ sts macl,r3
+ add r2,r0
+ cmp/pz r1
+ mov.w LOCAL(x100),r2
+ bf/s LOCAL(norm_frac)
+ tst r3,r3
+ shll2 r1 /* Shift one up, replace leading 1 with 0. */
+ shlr r1
+ tst r3,r3
+LOCAL(norm_frac):
+ mov.w LOCAL(mx80),r3
+ bf LOCAL(round_frac)
+ tst r2,r1
+LOCAL(round_frac):
+ mov.l LOCAL(xff000000),r2
+ subc r3,r1 /* Even overflow gives right result: exp++, frac=0. */
+ shlr8 r1
+ add r1,r0
+ shll r0
+ bt LOCAL(ill_exp)
+ tst r2,r0
+ bt LOCAL(denorm0)
+ cmp/hs r2,r0
+ bt LOCAL(inf)
+LOCAL(insert_sign):
+ div0s r4,r5
+ rts
+ rotcr r0
+LOCAL(denorm0):
+ sub r2,r0
+ bra LOCAL(insert_sign)
+ shlr r0
+LOCAL(zero_denorm_arg1):
+ mov.l LOCAL(x60000000),r2 /* Check exp0 >= -64 */
+ add r1,r1
+ tst r1,r1 /* arg1 == 0 ? */
+ mov #0,r0
+ bt LOCAL(insert_sign) /* argument 1 is zero ==> return 0 */
+ tst r4,r2
+ bt LOCAL(insert_sign) /* exp0 < -64 ==> return 0 */
+ mov.l LOCAL(c__clz_tab),r0
+ mov r3,r2
+ mov r1,r3
+ bra LOCAL(arg_normalize)
+ mov r2,r1
+LOCAL(zero_denorm_arg0):
+ mov.l LOCAL(x60000000),r2 /* Check exp1 >= -64 */
+ add r3,r3
+ tst r3,r3 /* arg0 == 0 ? */
+ mov #0,r0
+ bt LOCAL(insert_sign) /* argument 0 is zero ==> return 0 */
+ tst r5,r2
+ bt LOCAL(insert_sign) /* exp1 < -64 ==> return 0 */
+ mov.l LOCAL(c__clz_tab),r0
+LOCAL(arg_normalize):
+ mov.l r7,@-r15
+ extu.w r3,r7
+ cmp/eq r3,r7
+ mov.l LOCAL(xff000000),r7
+ mov #-8,r2
+ bt 0f
+ tst r7,r3
+ mov #-16,r2
+ bt 0f
+ mov #-24,r2
+0:
+ mov r3,r7
+ shld r2,r7
+#ifdef __pic__
+ add r0,r7
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r7),r0
+ add #32,r2
+ mov r2,r7
+ mov #23,r2
+ sub r0,r7
+ mov.l LOCAL(x7f800000),r0
+ shld r7,r3
+ shld r2,r7
+ mov r0,r2
+ and r4,r0
+ sub r7,r0
+ mov.l @r15+,r7
+ bra LOCAL(arg_norm)
+ dmulu.l r3,r1
+#if 0 /* This is slightly slower, but could be used if table lookup causes
+ cache thrashing. */
+ bt LOCAL(insert_sign) /* exp1 < -64 ==> return 0 */
+ mov.l LOCAL(xff000000),r2
+ mov r4,r0
+LOCAL(arg_normalize):
+ tst r2,r3
+ bf LOCAL(arg_bit_norm)
+LOCAL(arg_byte_loop):
+ tst r2,r3
+ add r2,r0
+ shll8 r3
+ bt LOCAL(arg_byte_loop)
+ add r4,r0
+LOCAL(arg_bit_norm):
+ mov.l LOCAL(x7f800000),r2
+ rotl r3
+LOCAL(arg_bit_loop):
+ add r2,r0
+ bf/s LOCAL(arg_bit_loop)
+ rotl r3
+ rotr r3
+ rotr r3
+ sub r2,r0
+ bra LOCAL(arg_norm)
+ dmulu.l r3,r1
+#endif /* 0 */
+LOCAL(inf):
+ bra LOCAL(insert_sign)
+ mov r2,r0
+LOCAL(inf_nan_arg0):
+ bt LOCAL(inf_nan_both)
+ add r0,r0
+ cmp/eq #-1,r0 /* arg1 zero? -> NAN */
+ bt LOCAL(insert_sign)
+ mov r4,r0
+LOCAL(inf_insert_sign):
+ bra LOCAL(insert_sign)
+ add r0,r0
+LOCAL(inf_nan_both):
+ mov r4,r0
+ bra LOCAL(inf_insert_sign)
+ or r5,r0
+LOCAL(inf_nan_arg1):
+ mov r2,r0
+ add r0,r0
+ cmp/eq #-1,r0 /* arg0 zero? */
+ bt LOCAL(insert_sign)
+ bra LOCAL(inf_insert_sign)
+ mov r5,r0
+LOCAL(ill_exp):
+ cmp/pz r0
+ mov #-24,r3
+ bt LOCAL(inf)
+ add r1,r1
+ mov r0,r2
+ sub r1,r2 ! remove fraction to get back pre-rounding exponent.
+ sts mach,r0
+ sts macl,r1
+ shad r3,r2
+ mov r0,r3
+ shld r2,r0
+ add #32,r2
+ cmp/pz r2
+ shld r2,r3
+ bf LOCAL(zero)
+ or r1,r3
+ mov #-1,r1
+ tst r3,r3
+ mov.w LOCAL(x100),r3
+ bf/s LOCAL(denorm_round_up)
+ mov #-0x80,r1
+ tst r3,r0
+LOCAL(denorm_round_up):
+ mov #-7,r3
+ subc r1,r0
+ bra LOCAL(insert_sign)
+ shld r3,r0
+LOCAL(zero):
+ bra LOCAL(insert_sign)
+ mov #0,r0
+LOCAL(x100):
+ .word 0x100
+LOCAL(mx80):
+ .word -0x80
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x3f800000):
+ .long 0x3f800000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x60000000):
+ .long 0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ ENDFUNC(GLOBAL(mulsf3))
Index: gcc/config/sh/IEEE-754/m3/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsisf.S (revision 0)
@@ -0,0 +1,106 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsisf))
+ .global GLOBAL(floatsisf)
+ .balign 4
+GLOBAL(floatsisf):
+ cmp/pz r4
+ mov r4,r5
+ bt 0f
+ neg r4,r5
+0: mov.l LOCAL(c_clz_tab),r0
+ extu.w r5,r1
+ mov.w LOCAL(xff00),r3
+ cmp/eq r5,r1
+ mov #24,r2
+ bt 0f
+ mov r5,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r1
+ cmp/pz r4
+ mov.l LOCAL(x4a800000),r3 ! bias + 23 - implicit 1
+ bt 0f
+ mov.l LOCAL(xca800000),r3 ! sign + bias + 23 - implicit 1
+0: mov r5,r0
+ sub r1,r2
+ mov.l LOCAL(x80000000),r1
+ shld r2,r0
+ cmp/pz r2
+ add r3,r0
+ bt LOCAL(noround)
+ add #31,r2
+ shld r2,r5
+ add #-31,r2
+ rotl r5
+ cmp/hi r1,r5
+ mov #0,r3
+ addc r3,r0
+ mov #23,r1
+ shld r1,r2
+ rts
+ sub r2,r0
+ .balign 8
+LOCAL(noround):
+ mov #23,r1
+ tst r4,r4
+ shld r1,r2
+ bt LOCAL(ret0)
+ rts
+ sub r2,r0
+LOCAL(ret0):
+ rts
+ mov #0,r0
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(xca800000): .long 0xca800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsisf))
Index: gcc/config/sh/IEEE-754/m3/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/muldf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/muldf3.S (revision 0)
@@ -0,0 +1,486 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! muldf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+! Normal numbers are multiplied in 53 or 54 cycles on SH4-200.
+
+FUNC(GLOBAL(muldf3))
+ .global GLOBAL(muldf3)
+LOCAL(inf_nan_denorm_or_zero_a):
+ mov.l r8,@-r15
+ sub r3,DBL0H ! isolate high fraction
+ mov.l @(4,r15),r8 ! original DBL0H (with sign & exp)
+ sub r3,r1 ! 0x7ff00000
+ mov.l LOCAL(x60000000),r3
+ shll16 r2 ! 0xffff0000
+ ! no stall here for sh4-200
+ !
+ tst r1,r8
+ mov.l r0,@-r15
+ bf LOCAL(inf_nan_a)
+ tst r1,r0 ! test for DBL1 inf, nan or small
+ bt LOCAL(ret_inf_nan_zero)
+LOCAL(normalize_arg):
+ tst DBL0H,DBL0H
+ bf LOCAL(normalize_arg53)
+ tst DBL0L,DBL0L
+ bt LOCAL(a_zero)
+ tst r2,DBL0L
+ mov DBL0L,DBL0H
+ bt LOCAL(normalize_arg16)
+ shlr16 DBL0H
+ mov.w LOCAL(m15),r2 ! 1-16
+ bra LOCAL(normalize_arg48)
+ shll16 DBL0L
+
+LOCAL(normalize_arg53):
+ tst r2,DBL0H
+ mov #1,r2
+ bt LOCAL(normalize_arg48)
+ mov DBL0H,r1
+ shlr16 r1
+ bra LOCAL(normalize_DBL0H)
+ mov #21-16,r3
+
+LOCAL(normalize_arg16):
+ mov.w LOCAL(m31),r2 ! 1-32
+ mov #0,DBL0L
+LOCAL(normalize_arg48):
+ mov DBL0H,r1
+ mov #21,r3
+LOCAL(normalize_DBL0H):
+ extu.b r1,r8
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r8,r1
+ !
+ bt 0f
+ shlr8 r1
+0:
+#ifdef __pic__
+ add r0,r1
+
+ mova LOCAL(c__clz_tab),r0
+
+#endif /* __pic__ */
+ mov.b @(r0,r1),r8
+ mov DBL0L,r1
+ mov.l @r15+,r0
+ bt 0f
+ add #-8,r3
+0: clrt
+ sub r8,r3
+ mov.w LOCAL(d20),r8
+ shld r3,DBL0H
+ shld r3,DBL0L
+ sub r3,r2
+ add #-32,r3
+ shld r3,r1
+ mov.l LOCAL(x00100000),r3
+ or r1,DBL0H
+ shld r8,r2
+ mov.l @r15+,r8
+ add r2,DBL1H
+ mov.l LOCAL(x001fffff),r2
+ dmulu.l DBL0L,DBL1L
+ bra LOCAL(arg_denorm_done)
+ or r3,r0 ! set implicit 1 bit
+
+LOCAL(a_zero):
+ mov.l @(4,r15),r8
+ add #8,r15
+LOCAL(zero):
+ mov #0,DBLRH
+ bra LOCAL(pop_ret)
+ mov #0,DBLRL
+
+! both inf / nan -> result is nan if at least one is none, else inf.
+! BBL0 inf/nan, DBL1 zero -> result is nan
+! DBL0 inf/nan, DBL1 finite -> result is DBL0 with sign adjustemnt
+LOCAL(inf_nan_a):
+ mov r8,DBLRH
+ mov.l @(4,r15),r8
+ add #8,r15
+ tst r1,r0 ! arg1 inf/nan ?
+ mov DBL0L,DBLRL
+ bt LOCAL(both_inf_nan)
+ tst DBL1L,DBL1L
+ mov DBL1H,r1
+ bf LOCAL(pop_ret)
+ add r1,r1
+ tst r1,r1
+ !
+ bf LOCAL(pop_ret)
+LOCAL(nan):
+ mov #-1,DBLRL
+ bra LOCAL(pop_ret)
+ mov #-1,DBLRH
+
+LOCAL(both_inf_nan):
+ or DBL1L,DBLRL
+ bra LOCAL(pop_ret)
+ or DBL1H,DBLRH
+
+LOCAL(ret_inf_nan_zero):
+ tst r1,r0
+ mov.l @(4,r15),r8
+ or DBL0L,DBL0H
+ bf/s LOCAL(zero)
+ add #8,r15
+ tst DBL0H,DBL0H
+ bt LOCAL(nan)
+LOCAL(inf_nan_b):
+ mov DBL1L,DBLRL
+ mov DBL1H,DBLRH
+LOCAL(pop_ret):
+ mov.l @r15+,DBL0H
+ add DBLRH,DBLRH
+
+
+ div0s DBL0H,DBL1H
+
+ rts
+ rotcr DBLRH
+
+ .balign 4
+/* Argument a has already been tested for being zero or denorm.
+ On the other side, we have to swap a and b so that we can share the
+ normalization code.
+ a: sign/exponent : @r15 fraction: DBL0H:DBL0L
+ b: sign/exponent: DBL1H fraction: r0:DBL1L */
+LOCAL(inf_nan_denorm_or_zero_b):
+ sub r3,r1 ! 0x7ff00000
+ mov.l @r15,r2 ! get original DBL0H
+ tst r1,DBL1H
+ sub r3,r0 ! isolate high fraction
+ bf LOCAL(inf_nan_b)
+ mov.l DBL1H,@r15
+ mov r0,DBL0H
+ mov.l r8,@-r15
+ mov r2,DBL1H
+ mov.l LOCAL(0xffff0000),r2
+ mov.l r1,@-r15
+ mov DBL1L,r1
+ mov DBL0L,DBL1L
+ bra LOCAL(normalize_arg)
+ mov r1,DBL0L
+
+LOCAL(d20):
+ .word 20
+LOCAL(m15):
+ .word -15
+LOCAL(m31):
+ .word -31
+LOCAL(xff):
+ .word 0xff
+
+ .balign 4
+LOCAL(0xffff0000): .word 0xffff0000
+
+ ! calculate a (DBL0H:DBL0L) * b (DBL1H:DBL1L)
+ .balign 4
+GLOBAL(muldf3):
+ mov.l LOCAL(xfff00000),r3
+ mov DBL1H,r0
+ dmulu.l DBL0L,DBL1L
+ mov.l LOCAL(x7fe00000),r1
+ sub r3,r0
+ mov.l DBL0H,@-r15
+ sub r3,DBL0H
+ tst r1,DBL0H
+ or r3,DBL0H
+ mov.l LOCAL(x001fffff),r2
+ bt LOCAL(inf_nan_denorm_or_zero_a)
+ tst r1,r0
+ or r3,r0 ! r0:DBL1L := b fraction ; u12.52
+ bt LOCAL(inf_nan_denorm_or_zero_b) ! T clear on fall-through
+LOCAL(arg_denorm_done):
+ and r2,r0 ! r0:DBL1L := b fraction ; u12.52
+ sts macl,r3
+ sts mach,r1
+ dmulu.l DBL0L,r0
+ and r2,DBL0H ! DBL0H:DBL0L := a fraction ; u12.52
+ mov.l r8,@-r15
+ mov #0,DBL0L
+ mov.l r9,@-r15
+ sts macl,r2
+ sts mach,r8
+ dmulu.l DBL0H,DBL1L
+ addc r1,r2
+
+ addc DBL0L,r8 ! add T; clears T
+
+ sts macl,r1
+ sts mach,DBL1L
+ dmulu.l DBL0H,r0
+ addc r1,r2
+ mov.l LOCAL(x7ff00000),DBL0H
+ addc DBL1L,r8 ! clears T
+ mov.l @(8,r15),DBL1L ! a sign/exp w/fraction
+ sts macl,DBLRL
+ sts mach,DBLRH
+ and DBL0H,DBL1L ! a exponent
+ mov.w LOCAL(x200),r9
+ addc r8,DBLRL
+ mov.l LOCAL(x3ff00000),r8 ! bias
+ addc DBL0L,DBLRH ! add T
+ cmp/hi DBL0L,r3 ! 32 guard bits -> sticky: T := r3 != 0
+ movt r3
+ tst r9,DBLRH ! T := fraction < 2
+ or r3,r2 ! DBLRH:DBLRL:r2 := result fraction; u24.72
+ bt/s LOCAL(shll12)
+ sub r8,DBL1L
+ mov.l LOCAL(x002fffff),r8
+ and DBL1H,DBL0H ! b exponent
+ mov.l LOCAL(x00100000),r9
+ add DBL0H,DBL1L ! result exponent - 1
+ tst r8,r2
+ mov.w LOCAL(m20),r8
+ subc DBL0L,r9
+ addc r2,r9 ! r2 value is still needed for denormal rounding
+ mov.w LOCAL(d11),DBL0L
+ rotcr r9
+ clrt
+ shld r8,r9
+ mov.w LOCAL(m21),r8
+ mov DBLRL,r3
+ shld DBL0L,DBLRL
+ addc r9,DBLRL
+ mov.l @r15+,r9
+ shld r8,r3
+ mov.l @r15+,r8
+ shld DBL0L,DBLRH
+ mov.l @r15+,DBL0H
+ addc r3,DBLRH
+ mov.l LOCAL(x7ff00000),DBL0L
+ add DBL1L,DBLRH ! implicit 1 adjusts exponent
+ mov.l LOCAL(xffe00000),r3
+ cmp/hs DBL0L,DBLRH
+ add DBLRH,DBLRH
+ bt LOCAL(ill_exp_11)
+ tst r3,DBLRH
+ bt LOCAL(denorm_exp0_11)
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+
+LOCAL(shll12):
+ mov.l LOCAL(x0017ffff),r8
+ extu.b DBLRH,DBLRH ! remove implicit 1.
+ mov.l LOCAL(x00080000),r9
+ and DBL1H,DBL0H ! b exponent
+ add DBL0H,DBL1L ! result exponent
+ tst r8,r2 ! rounding adjust for lower guard ...
+ mov.w LOCAL(m19),r8
+ subc DBL0L,r9 ! ... bits and round to even; clear T
+ addc r2,r9 ! r2 value is still needed for denormal rounding
+ mov.w LOCAL(d12),DBL0L
+ rotcr r9
+ clrt
+ shld r8,r9
+ mov.w LOCAL(m20),r8
+ mov DBLRL,r3
+ shld DBL0L,DBLRL
+ addc r9,DBLRL
+ mov.l @r15+,r9
+ shld r8,r3
+ mov.l @r15+,r8
+ shld DBL0L,DBLRH
+ mov.l LOCAL(x7ff00000),DBL0L
+ addc r3,DBLRH
+ mov.l @r15+,DBL0H
+ add DBL1L,DBLRH
+ mov.l LOCAL(xffe00000),r3
+ cmp/hs DBL0L,DBLRH
+ add DBLRH,DBLRH
+ bt LOCAL(ill_exp_12)
+ tst r3,DBLRH
+ bt LOCAL(denorm_exp0_12)
+LOCAL(insert_sign):
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+LOCAL(overflow):
+ mov r3,DBLRH
+ mov #0,DBLRL
+ bra LOCAL(insert_sign)
+ mov.l @r15+,r8
+
+LOCAL(denorm_exp0_11):
+ mov.l r8,@-r15
+ mov #-21,r8
+ mov.l r9,@-r15
+ bra LOCAL(denorm)
+ mov #-2,DBL1L ! one for denormal, and one for sticky bit
+
+LOCAL(ill_exp_11):
+ mov DBL1H,DBL1L
+ and r3,DBL0L ! 0x7fe00000
+ add DBL1L,DBL1L
+ mov.l r8,@-r15
+ cmp/hi DBL1L,DBL0L ! check if exp a was large
+ mov #-20,DBL0L
+ bf LOCAL(overflow)
+ mov #-21,r8
+ mov DBLRH,DBL1L
+ rotcr DBL1L ! shift in negative sign
+ mov.l r9,@-r15
+ shad DBL0L,DBL1L ! exponent ; s32
+ bra LOCAL(denorm)
+ add #-2,DBL1L ! add one for denormal, and one for sticky bit
+
+LOCAL(denorm_exp0_12):
+ mov.l r8,@-r15
+ mov #-20,r8
+ mov.l r9,@-r15
+ bra LOCAL(denorm)
+ mov #-2,DBL1L ! one for denormal, and one for sticky bit
+
+ .balign 4 ! also aligns LOCAL(denorm)
+LOCAL(ill_exp_12):
+ and r3,DBL0L ! 0x7fe00000
+ mov DBL1H,DBL1L
+ add DBL1L,DBL1L
+ mov.l r8,@-r15
+ cmp/hi DBL1L,DBL0L ! check if exp a was large
+ bf LOCAL(overflow)
+ mov DBLRH,DBL1L
+ rotcr DBL1L ! shift in negative sign
+ mov #-20,r8
+ shad r8,DBL1L ! exponent ; s32
+ mov.l r9,@-r15
+ add #-2,DBL1L ! add one for denormal, and one for sticky bit
+LOCAL(denorm):
+ not r3,r9 ! 0x001fffff
+ mov.l r10,@-r15
+ mov r2,r10
+ shld r8,r10 ! 11 or 12 lower bit valid
+ and r9,DBLRH ! Mask away vestiges of exponent.
+ add #32,r8
+ sub r3,DBLRH ! Make leading 1 explicit.
+ shld r8,r2 ! r10:r2 := unrounded result lowpart
+ shlr DBLRH ! compensate for doubling at end of normal code
+ sub DBLRL,r10 ! reconstruct effect of previous rounding
+ exts.b r10,r9
+ shad r3,r10 ! sign extension
+ mov #0,r3
+ clrt
+ addc r9,DBLRL ! Undo previous rounding.
+ mov.w LOCAL(m32),r9
+ addc r10,DBLRH
+ cmp/hi r3,r2
+ rotcl DBLRL ! fit in the rest of r2 as a sticky bit.
+ mov.l @r15+,r10
+ rotcl DBLRH
+ cmp/ge r9,DBL1L
+ bt LOCAL(small_norm_shift)
+ cmp/hi r3,DBLRL
+ add #32,DBL1L
+ movt DBLRL
+ cmp/gt r9,DBL1L
+ or DBLRH,DBLRL
+ bt/s LOCAL(small_norm_shift)
+ mov r3,DBLRH
+ mov r3,DBLRL ! exponent too negative to shift - return zero
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+ .balign 4
+LOCAL(small_norm_shift):
+ mov DBLRL,r2 ! stash away guard bits
+ shld DBL1L,DBLRL
+ mov DBLRH,DBL0L
+ shld DBL1L,DBLRH
+ mov.l LOCAL(x7fffffff),r9
+ add #32,DBL1L
+ shld DBL1L,r2
+ shld DBL1L,DBL0L
+ or DBL0L,DBLRL
+ shlr DBL0L
+ addc r2,r9
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+ addc r3,DBLRL
+ addc r3,DBLRH
+ div0s DBL0H,DBL1H
+ add DBLRH,DBLRH
+ rts
+ rotcr DBLRH
+
+
+LOCAL(x200):
+ .word 0x200
+LOCAL(m19):
+ .word -19
+LOCAL(m20):
+ .word -20
+LOCAL(m21):
+ .word -21
+LOCAL(m32):
+ .word -32
+LOCAL(d11):
+ .word 11
+LOCAL(d12):
+ .word 12
+ .balign 4
+LOCAL(x60000000):
+ .long 0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(xfff00000):
+ .long 0xfff00000
+LOCAL(x7fffffff):
+ .long 0x7fffffff
+LOCAL(x00100000):
+ .long 0x00100000
+LOCAL(x7fe00000):
+ .long 0x7fe00000
+LOCAL(x001fffff):
+ .long 0x001fffff
+LOCAL(x7ff00000):
+ .long 0x7ff00000
+LOCAL(x3ff00000):
+ .long 0x3ff00000
+LOCAL(x002fffff):
+ .long 0x002fffff
+LOCAL(xffe00000):
+ .long 0xffe00000
+LOCAL(x0017ffff):
+ .long 0x0017ffff
+LOCAL(x00080000):
+ .long 0x00080000
+ENDFUNC(GLOBAL(muldf3))
Index: gcc/config/sh/IEEE-754/m3/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsidf.S (revision 0)
@@ -0,0 +1,103 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+! floatsidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsidf))
+ .global GLOBAL(floatsidf)
+ .balign 4
+GLOBAL(floatsidf):
+ tst r4,r4
+ mov r4,r1
+ bt LOCAL(ret0)
+ cmp/pz r4
+ bt 0f
+ neg r4,r1
+0: mov.l LOCAL(c_clz_tab),r0
+ extu.w r1,r5
+ mov.w LOCAL(xff00),r3
+ cmp/eq r1,r5
+ mov #21,r2
+ bt 0f
+ mov r1,r5
+ shlr16 r5
+ add #-16,r2
+0: tst r3,r5 ! 0xff00
+ bt 0f
+ shlr8 r5
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r5
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r5),r5
+ cmp/pz r4
+ mov.l LOCAL(x41200000),r3 ! bias + 20 - implicit 1
+ bt 0f
+ mov.l LOCAL(xc1200000),r3 ! sign + bias + 20 - implicit 1
+0: mov r1,r0 ! DBLRL & DBLRH
+ sub r5,r2
+ mov r2,r5
+ shld r2,DBLRH
+ cmp/pz r2
+ add r3,DBLRH
+ add #32,r2
+ shld r2,DBLRL
+ bf 0f
+ mov.w LOCAL(d0),DBLRL
+0: mov #20,r2
+ shld r2,r5
+ rts
+ sub r5,DBLRH
+LOCAL(ret0):
+ mov #0,DBLRL
+ rts
+ mov #0,DBLRH
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0): .word 0
+ .word 0x4120
+#else
+ .word 0x4120
+LOCAL(d0): .word 0
+#endif
+LOCAL(xc1200000): .long 0xc1200000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsidf))
Index: gcc/config/sh/IEEE-754/m3/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixdfsi.S (revision 0)
@@ -0,0 +1,115 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!! fixdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixdfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get UINT_MAX, for set sign bit, you get 0.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixdfsi)
+ FUNC(GLOBAL(fixdfsi))
+ .balign 4
+GLOBAL(fixdfsi):
+ mov.w LOCAL(x413),r1
+ mov DBL0H,r0
+ shll DBL0H
+ mov.l LOCAL(mask),r3
+ mov #-21,r2
+ shld r2,DBL0H ! SH4-200 will start this insn in a new cycle
+ bt/s LOCAL(neg)
+ sub r1,DBL0H
+ cmp/pl DBL0H ! SH4-200 will start this insn in a new cycle
+ and r3,r0
+ bf/s LOCAL(ignore_low)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #10,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmax)
+ shld DBL0H,DBL0L
+ rts
+ or DBL0L,r0
+
+ .balign 8
+LOCAL(ignore_low):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ bf 0f ! SH4-200 will start this insn in a new cycle
+ mov #-31,DBL0H ! results in 0 return
+0: add #1,r0
+ rts
+ shld DBL0H,r0
+
+ .balign 4
+LOCAL(neg):
+ cmp/pl DBL0H
+ and r3,r0
+ bf/s LOCAL(ignore_low_neg)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #10,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmin)
+ shld DBL0H,DBL0L
+ or DBL0L,r0 ! SH4-200 will start this insn in a new cycle
+ rts
+ neg r0,r0
+
+ .balign 4
+LOCAL(ignore_low_neg):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ add #1,r0
+ shld DBL0H,r0
+ bf 0f
+ mov #0,r0 ! results in 0 return
+0: rts
+ neg r0,r0
+
+LOCAL(retmax):
+ mov #-1,r0
+ rts
+ shlr r0
+
+LOCAL(retmin):
+ mov #1,r0
+ rts
+ rotr r0
+
+LOCAL(x413): .word 0x413
+
+ .balign 4
+LOCAL(mask): .long 0x000fffff
+ ENDFUNC(GLOBAL(fixdfsi))
+#endif /* L_fixdfsi */
Index: gcc/config/sh/IEEE-754/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divdf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/divdf3.S (revision 0)
@@ -0,0 +1,598 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!division of two double precision floating point numbers
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:dividend
+!
+!r6,r7:divisor
+!
+!Exit:
+!r0,r1:quotient
+
+!Notes: dividend is passed in regs r4 and r5 and divisor is passed in regs
+!r6 and r7, quotient is returned in regs r0 and r1. dividend is referred as op1
+!and divisor as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (divdf3)
+ FUNC (GLOBAL (divdf3))
+
+GLOBAL (divdf3):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+ mov r6,r1
+ mov r7,r6
+ mov r1,r7
+#endif
+ mov r4,r2
+ mov.l .L_inf,r1
+
+ and r1,r2
+ mov.l r8,@-r15
+
+ cmp/eq r1,r2
+ mov r6,r8
+
+ bt .L_a_inv
+ and r1,r8
+
+ cmp/eq r1,r8
+ mov.l .L_high_mant,r3
+
+ bf .L_chk_zero
+ and r6,r3
+
+ mov.l .L_mask_sign,r8
+ cmp/pl r7
+
+ mov r8,r0
+ bt .L_ret_b !op2=NaN,return op2
+
+ and r4,r8
+ cmp/pl r3
+
+ and r6,r0
+ bt .L_ret_b !op2=NaN,return op2
+
+ xor r8,r0 !op1=normal no,op2=Inf, return Zero
+ mov #0,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_ret_b:
+ mov r7,r1
+ mov r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_a_inv:
+ !chk if op1 is Inf or NaN
+ mov.l .L_high_mant,r2
+ cmp/pl r5
+
+ and r4,r2
+ bt .L_ret_a
+
+ and r1,r8 !r1 contains infinity
+ cmp/pl r2
+
+ bt .L_ret_a
+ cmp/eq r1,r8
+
+ mov r1,DBLRH
+ add DBLRH,DBLRH
+ bf 0f
+ mov #-1,DBLRH ! Inf/Inf, return NaN.
+0: div0s r4,r6
+ mov.l @r15+,r8
+ rts
+ rotcr DBLRH
+
+.L_ret_a:
+ !return op1
+ mov r5,r1
+ mov r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_chk_zero:
+ !chk if op1=0
+ mov.l .L_mask_sign,r0
+ mov r4,r3
+
+ and r0,r3
+ shll r4
+
+ and r6,r0
+ shlr r4
+
+ xor r3,r0
+ shll r6
+
+ shlr r6
+ tst r4,r4
+
+
+ bf .L_op1_not_zero
+ tst r5,r5
+
+ bf .L_op1_not_zero
+ tst r7,r7
+
+ mov.l @r15+,r8
+ bf .L_ret_zero
+
+ tst r6,r6
+ bf .L_ret_zero
+
+ rts
+ mov #-1,DBLRH !op1=op2=0, return NaN
+
+.L_ret_zero:
+ !return zero
+ mov r0,r1
+ rts
+#ifdef __LITTLE__ENDIAN
+ mov #0,r0
+#else
+ mov #0,r1 !op1=0,op2=normal no,return zero
+#endif
+
+.L_norm_b:
+ !normalize op2
+ shll r7
+ mov.l .L_imp_bit,r3
+
+ rotcl r6
+ tst r3,r6
+
+ add #-1,r8
+ bt .L_norm_b
+
+ bra .L_divide
+ add #1,r8
+
+.L_op1_not_zero:
+ !op1!=0, chk if op2=0
+ tst r7,r7
+ mov r1,r3
+
+ mov #0,r1
+ bf .L_normal_nos
+
+ tst r6,r6
+ bf .L_normal_nos
+
+ mov.l @r15+,r8
+ or r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ nop
+
+.L_normal_nos:
+ !op1 and op2 are normal nos
+ tst r2,r2
+ mov #-20,r1
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2
+#else
+ SHLR20 (r2)
+#endif
+ bt .L_norm_a !normalize dividend
+
+.L_chk_b:
+ mov.l r9,@-r15
+ tst r8,r8
+
+ mov.l .L_high_mant,r9
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r8
+#else
+ SHLR20 (r8)
+#endif
+ ! T set -> normalize divisor
+ SL(bt, .L_norm_b,
+ and r9,r4)
+
+.L_divide:
+ mov.l .L_2047,r1
+ sub r8,r2
+
+ mov.l .L_1023,r8
+ and r9,r6
+
+ !resultant exponent
+ add r8,r2
+ !chk the exponent for overflow
+ cmp/ge r1,r2
+
+ mov.l .L_imp_bit,r1
+ bt .L_overflow
+
+ mov #0,r8
+ or r1,r4
+
+ or r1,r6
+ mov #-24,r3
+
+ !chk if the divisor is 1(mantissa only)
+ cmp/eq r8,r7
+ bf .L_div2
+
+ cmp/eq r6,r1
+ bt .L_den_one
+
+.L_div2:
+ !divide the mantissas
+ shll8 r4
+ mov r5,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r9
+#else
+ SHLR24 (r9)
+#endif
+ shll8 r6
+
+ or r9,r4
+ shll8 r5
+
+ mov r7,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r9
+#else
+ SHLR24 (r9)
+#endif
+ mov r8,r3
+ shll8 r7
+
+ or r9,r6
+ cmp/gt r4,r6
+
+ mov r3,r9
+ bt .L_shift
+
+ cmp/eq r4,r6
+ bf .L_loop
+
+ cmp/gt r5,r7
+ bf .L_loop
+
+.L_shift:
+ add #-1,r2
+ shll r5
+ rotcl r4
+
+.L_loop:
+ !actual division loop
+ cmp/gt r6,r4
+ bt .L_subtract
+
+ cmp/eq r6,r4
+ bf .L_skip
+
+ cmp/ge r7,r5
+ bf .L_skip
+
+.L_subtract:
+ clrt
+ subc r7,r5
+
+ or r1,r8
+ subc r6,r4
+
+.L_skip:
+ shlr r1
+ shll r5
+
+ rotcl r4
+ cmp/eq r1,r3
+
+ bf .L_loop
+ mov.l .L_imp_bit,r1
+
+ !chk if the divison was for the higher word of the quotient
+ tst r1,r9
+ bf .L_chk_exp
+
+ mov r8,r9
+ mov.l .L_mask_sign,r1
+
+ !divide for the lower word of the quotient
+ bra .L_loop
+ mov r3,r8
+
+.L_chk_exp:
+ !chk if the result needs to be denormalized
+ cmp/gt r2,r3
+ bf .L_round
+ mov #-53,r7
+
+.L_underflow:
+ !denormalize the result
+ add #1,r2
+ cmp/gt r2,r7
+
+ or r4,r5 !remainder
+ add #-2,r2
+
+ mov #32,r4
+ bt .L_return_zero
+
+ add r2,r4
+ cmp/ge r3,r4
+
+ mov r2,r7
+ mov r3,r1
+
+ mov #-54,r2
+ bt .L_denorm
+ mov #-32,r7
+
+.L_denorm:
+ shlr r8
+ rotcr r1
+
+ shll r8
+ add #1,r7
+
+ shlr r9
+ rotcr r8
+
+ cmp/eq r3,r7
+ bf .L_denorm
+
+ mov r4,r7
+ cmp/eq r2,r4
+
+ bt .L_break
+ mov r3,r6
+
+ cmp/gt r7,r3
+ bf .L_break
+
+ mov r2,r4
+ mov r1,r6
+
+ mov r3,r1
+ bt .L_denorm
+
+.L_break:
+ mov #0,r2
+
+ cmp/gt r1,r2
+
+ addc r2,r8
+ mov.l .L_comp_1,r4
+
+ addc r3,r9
+ or r9,r0
+
+ cmp/eq r5,r3
+ bf .L_return
+
+ cmp/eq r3,r6
+ mov.l .L_mask_sign,r7
+
+ bf .L_return
+ cmp/eq r7,r1
+
+ bf .L_return
+ and r4,r8
+
+.L_return:
+ mov.l @r15+,r9
+ mov r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_norm_a:
+ !normalize op1
+ shll r5
+ mov.l .L_imp_bit,r3
+
+ rotcl r4
+ tst r3,r4
+
+ add #-1,r2
+ bt .L_norm_a
+
+ bra .L_chk_b
+ add #1,r2
+
+.L_overflow:
+ !overflow, return inf
+ mov.l .L_inf,r2
+#ifdef __LITTLE_ENDIAN__
+ or r2,r1
+ mov #0,r0
+#else
+ or r2,r0
+ mov #0,r1
+#endif
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+.L_den_one:
+ !denominator=1, result=numerator
+ mov r4,r9
+ mov #-53,r7
+
+ cmp/ge r2,r8
+ mov r8,r4
+
+ mov r5,r8
+ mov r4,r3
+
+ !chk the exponent for underflow
+ SL(bt, .L_underflow,
+ mov r4,r5)
+
+ mov.l .L_high_mant,r7
+ bra .L_pack
+ mov #20,r6
+
+.L_return_zero:
+ !return zero
+ mov r3,r1
+ mov.l @r15+,r9
+
+ rts
+ mov.l @r15+,r8
+
+.L_round:
+ !apply rounding
+ cmp/eq r4,r6
+ bt .L_lower
+
+ clrt
+ subc r6,r4
+
+ bra .L_rounding
+ mov r4,r6
+
+.L_lower:
+ clrt
+ subc r7,r5
+ mov r5,r6
+
+.L_rounding:
+ !apply rounding
+ mov.l .L_invert,r1
+ mov r3,r4
+
+ movt r3
+ clrt
+
+ not r3,r3
+ and r1,r3
+
+ addc r3,r8
+ mov.l .L_high_mant,r7
+
+ addc r4,r9
+ cmp/eq r4,r6
+
+ mov.l .L_comp_1,r3
+ SL (bf, .L_pack,
+ mov #20,r6)
+ and r3,r8
+
+.L_pack:
+ !pack the result, r2=exponent,r0=sign,r8=lower mantissa, r9=higher mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ SHLL20 (r2)
+#endif
+ and r7,r9
+
+ or r2,r0
+ mov r8,r1
+
+ or r9,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+ .align 2
+
+.L_mask_sign:
+ .long 0x80000000
+.L_high_mant:
+ .long 0x000fffff
+.L_inf:
+ .long 0x7ff00000
+.L_1023:
+ .long 1023
+.L_2047:
+ .long 2047
+.L_imp_bit:
+ .long 0x00100000
+.L_comp_1:
+ .long 0xfffffffe
+.L_invert:
+ .long 0x00000001
+
+ENDFUNC (GLOBAL (divdf3))
Index: gcc/config/sh/IEEE-754/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatunssisf.S (revision 0)
@@ -0,0 +1,137 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of unsigned integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatunsisf)
+ FUNC (GLOBAL (floatunsisf))
+
+GLOBAL (floatunsisf):
+ tst r4,r4
+ mov #23,r6
+
+ mov.l .L_set_24_bits,r7
+ SL(bt, .L_return,
+ not r7,r3)
+
+ ! Decide the direction for shifting
+ mov.l .L_set_24_bit,r5
+ cmp/hi r7,r4
+
+ not r5,r2
+ SL(bt, .L_shift_right,
+ mov #0,r7)
+
+ tst r5,r4
+
+ mov #0,r0
+ bf .L_pack_sf
+
+! Shift the bits to the left. Adjust the exponent
+.L_shift_left:
+ shll r4
+ tst r5,r4
+
+ add #-1,r6
+ bt .L_shift_left
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa
+.L_pack_sf:
+ mov #23,r3
+ add #127,r6
+
+ ! Align the exponent
+ and r2,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+
+ or r6,r0
+ rts
+ or r4,r0
+
+! Shift right the number with rounding
+.L_shift_right:
+ shlr r4
+ rotcr r7
+
+ tst r4,r3
+ add #1,r6
+
+ bf .L_shift_right
+
+ tst r7,r7
+ bt .L_sh_rt_1
+
+ shll r7
+ movt r1
+
+ add r1,r4
+
+ tst r7,r7
+ bf .L_sh_rt_1
+
+ ! Halfway between two numbers.
+ ! Round towards LSB = 0
+ shlr r4
+ shll r4
+
+.L_sh_rt_1:
+ mov r4,r0
+
+ ! Rounding may have misplaced MSB. Adjust.
+ and r3,r0
+ cmp/eq #0,r0
+
+ bf .L_shift_right
+ bt .L_pack_sf
+
+.L_return:
+ rts
+ mov r4,r0
+
+ .align 2
+.L_set_24_bit:
+ .long 0x00800000
+
+.L_set_24_bits:
+ .long 0x00FFFFFF
+
+ENDFUNC (GLOBAL (floatunsisf))
Index: gcc/config/sh/IEEE-754/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunsdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixunsdfsi.S (revision 0)
@@ -0,0 +1,181 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to unsigned integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixunsdfsi)
+ FUNC (GLOBAL (fixunsdfsi))
+
+GLOBAL (fixunsdfsi):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+#endif
+ mov.l .L_p_inf,r2
+ mov #-20,r1
+
+ mov r2,r7
+ mov.l .L_1023,r3
+
+ and r4,r2
+ shll r4
+
+ movt r6 ! r6 contains the sign bit
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2 ! r2 contains the exponent
+#else
+ SHLR20 (r2)
+#endif
+ shlr r4
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r7
+#else
+ SHLR20 (r7)
+#endif
+ tst r6,r6
+ SL(bf, .L_epil,
+ mov #0,r0)
+
+ cmp/hi r2,r3 ! if exp < 1023,return 0
+ mov.l .L_high_mant,r1
+
+ SL(bt, .L_epil,
+ and r4,r1) ! r1 contains high mantissa
+
+ cmp/eq r2,r7 ! chk if exp is invalid
+ mov.l .L_1054,r7
+
+ bt .L_inv_exp
+ mov #11,r0
+
+ cmp/hi r7,r2 ! If exp > 1054,return maxint
+ sub r2,r7 !r7 contains the number of shifts
+
+ mov.l .L_21bit,r2
+ bt .L_ret_max
+
+ or r2,r1
+ mov r7,r3
+
+ shll8 r1
+ neg r7,r7
+
+ shll2 r1
+
+ shll r1
+ cmp/hi r3,r0
+
+ SL(bt, .L_lower_mant,
+ mov #21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_sh_loop:
+ tst r7,r7
+ bt .L_break
+ add #1,r7
+ bra .L_sh_loop
+ shlr r1
+
+.L_break:
+#endif
+ rts
+ mov r1,r0
+
+.L_lower_mant:
+ neg r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r0,r5
+#else
+ SHLR21 (r5)
+#endif
+ or r5,r1 !pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_loop:
+ tst r7,r7
+ bt .L_break1
+ add #1,r7
+ bra .L_loop
+ shlr r1
+
+.L_break1:
+#endif
+ mov r1,r0
+.L_epil:
+ rts
+ nop
+
+.L_inv_exp:
+ cmp/hi r0,r5
+ bt .L_epil
+
+ cmp/hi r0,r1 !compare high mantissa,r1
+ bt .L_epil
+
+.L_ret_max:
+ mov.l .L_maxint,r0
+
+ rts
+ nop
+
+ .align 2
+
+.L_maxint:
+ .long 0xffffffff
+.L_p_inf:
+ .long 0x7ff00000
+.L_high_mant:
+ .long 0x000fffff
+.L_1023:
+ .long 0x000003ff
+.L_1054:
+ .long 1054
+.L_21bit:
+ .long 0x00100000
+
+ENDFUNC (GLOBAL (fixunsdfsi))
Index: gcc/config/sh/IEEE-754/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/adddf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/adddf3.S (revision 0)
@@ -0,0 +1,799 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for adding two double numbers
+
+! Author: Rakesh Kumar
+! SH1 Support by Joern Rennecke
+! Sticky Bit handling : Joern Rennecke
+
+! Arguments: r4-r5, r6-r7
+! Result: r0-r1
+
+! The value in r4-r5 is referred to as op1
+! and that in r6-r7 is referred to as op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (subdf3)
+ FUNC (GLOBAL (subdf3))
+ .global GLOBAL (adddf3)
+ FUNC (GLOBAL (adddf3))
+
+GLOBAL (subdf3):
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r6,r2
+
+ mov r5,r4
+ mov r7,r6
+
+ mov r1,r5
+ mov r2,r7
+#endif
+ mov.l .L_sign,r2
+ bra .L_adddf3_1
+ xor r2,r6
+
+GLOBAL (adddf3):
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r6,r2
+
+ mov r5,r4
+ mov r7,r6
+
+ mov r1,r5
+ mov r2,r7
+#endif
+
+.L_adddf3_1:
+ mov.l r8,@-r15
+ mov r4,r1
+
+ mov.l .L_inf,r2
+ mov r6,r3
+
+ mov.l r9,@-r15
+ and r2,r1 !Exponent of op1 in r1
+
+ mov.l r10,@-r15
+ and r2,r3 !Exponent of op2 in r3
+
+ ! Check for Nan or Infinity
+ mov.l .L_sign,r9
+ cmp/eq r2,r1
+
+ mov r9,r10
+ bt .L_thread_inv_exp_op1
+
+ mov r9,r0
+ cmp/eq r2,r3
+! op1 has a valid exponent. We need not check it again.
+! Return op2 straight away.
+ and r4,r9 !r9 has sign bit for op1
+ bt .L_ret_op2
+
+ ! Check for -ve zero
+ cmp/eq r4,r0
+ and r6,r10 !r10 has sign bit for op2
+
+ bt .L_op1_nzero
+
+ cmp/eq r6,r0
+ bt .L_op2_nzero
+
+! Check for zero
+.L_non_zero:
+ tst r4,r4
+ bt .L_op1_zero
+
+ ! op1 is not zero, check op2 for zero
+ tst r6,r6
+ bt .L_op2_zero
+
+! r1 and r3 has masked out exponents, r9 and r10 has signs
+.L_add:
+ mov.l .L_high_mant,r8
+ mov #-20,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r1 ! r1 now has exponent for op1 in its lower bits
+#else
+ SHLR20 (r1)
+#endif
+ and r8,r6 ! Higher bits of mantissa of op2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3 ! r3 has exponent for op2 in its lower bits
+#else
+ SHLR20 (r3)
+#endif
+ and r8,r4 ! Higher bits of mantissa of op1
+
+ mov.l .L_21bit,r8
+
+ tst r1,r1
+ bt .L_norm_op1
+
+ ! Set the 21st bit.
+ or r8,r4
+ tst r3,r3
+
+ bt .L_norm_op2
+ or r8,r6
+
+! Check for negative mantissas. Make them positive by negation
+! r9 and r10 have signs of op1 and op2 respectively
+.L_neg_mant:
+ tst r9,r9
+ bf .L_neg_op1
+
+ tst r10,r10
+ bf .L_neg_op2
+
+.L_add_1:
+ cmp/ge r1,r3
+
+ mov r1,r0
+ bt .L_op2_exp_greater
+
+ sub r3,r0
+ ! If exponent difference is greater than 54, the resultant exponent
+ ! won't be changed. Return op1 straight away.
+ mov #54,r2
+ cmp/gt r2,r0
+
+ bt .L_pack_op1
+
+ mov r1,r3
+ clrt
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+
+ ! Shift left the first operand and apply rest of shifts to second operand.
+ mov #0,r2
+ shll r5
+
+ rotcl r4
+
+ add #-1,r3
+ dt r0
+
+ bt .L_add_mant
+ dt r0
+
+ bt LOCAL(got_guard)
+ dt r0
+
+ bt LOCAL(got_sticky)
+
+! Shift the mantissa part of op2 so that both exponents are equal
+.L_shfrac_op2:
+ shar r6
+ or r7,r2 ! sticky bit
+
+ rotcr r7
+ dt r0
+
+ bf .L_shfrac_op2
+
+ shlr r2
+
+ subc r2,r2 ! spread sticky bit across r2
+LOCAL(got_sticky):
+ shar r6
+
+ rotcr r7
+
+ rotcr r2
+LOCAL(got_guard):
+ shar r6
+
+ rotcr r7
+
+ rotcr r2
+
+
+! Add the psotive mantissas and check for overflow by checking the
+! MSB of the resultant. In case of overflow, negate the result.
+.L_add_mant:
+ clrt
+ addc r7,r5
+
+ mov #0,r10 ! Assume resultant to be positive
+ addc r6,r4
+
+ cmp/pz r4
+
+ bt .L_mant_ptv
+ negc r2,r2
+
+ negc r5,r5
+
+ mov.l .L_sign,r10 ! The assumption was wrong, result is negative
+ negc r4,r4
+
+! 23rd bit in the high part of mantissa could be set.
+! In this case, right shift the mantissa.
+.L_mant_ptv:
+ mov.l .L_23bit,r0
+
+ tst r4,r0
+ bt .L_mant_ptv_0
+
+ shlr r4
+ rotcr r5
+
+ add #1,r3
+ bra .L_mant_ptv_1
+ rotcr r2
+
+.L_mant_ptv_0:
+ mov.l .L_22bit,r0
+ tst r4,r0
+
+ bt .L_norm_mant
+
+.L_mant_ptv_1:
+ ! 22 bit of resultant mantissa is set. Shift right the mantissa
+ ! and add 1 to exponent
+ add #1,r3
+ shlr r4
+ rotcr r5
+ ! The mantissa is already normalized. We don't need to
+ ! spend any effort. Branch to epilogue.
+ bra .L_epil
+ rotcr r2
+
+! Normalize operands
+.L_norm_op1:
+ shll r5
+
+ rotcl r4
+ add #-1,r1
+
+ tst r4,r8
+ bt .L_norm_op1
+
+ tst r3,r3
+ SL(bf, .L_neg_mant,
+ add #1,r1)
+
+.L_norm_op2:
+ shll r7
+
+ rotcl r6
+ add #-1,r3
+
+ tst r6,r8
+ bt .L_norm_op2
+
+ bra .L_neg_mant
+ add #1,r3
+
+! Negate the mantissa of op1
+.L_neg_op1:
+ clrt
+ negc r5,r5
+
+ negc r4,r4
+ tst r10,r10
+
+ bt .L_add_1
+
+! Negate the mantissa of op2
+.L_neg_op2:
+ clrt
+ negc r7,r7
+
+ bra .L_add_1
+ negc r6,r6
+
+! Thread the jump to .L_inv_exp_op1
+.L_thread_inv_exp_op1:
+ bra .L_inv_exp_op1
+ nop
+
+.L_ret_op2:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r6,r1
+#else
+ mov r6,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r7,r0
+#else
+ mov r7,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_op1_nzero:
+ tst r5,r5
+ bt .L_ret_op2
+
+ ! op1 is not zero. Check op2 for negative zero
+ cmp/eq r6,r0
+ bf .L_non_zero ! both op1 and op2 are not -0
+
+.L_op2_nzero:
+ tst r7,r7
+ bf .L_non_zero
+
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0 ! op2 is -0, return op1
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! High bit of op1 is known to be zero.
+! Check low bit. r2 contains 0x00000000
+.L_op1_zero:
+ tst r5,r5
+ bt .L_ret_op2
+
+ ! op1 is not zero. Check high bit of op2
+ tst r6,r6
+ bf .L_add ! both op1 and op2 are not zero
+
+! op1 is not zero. High bit of op2 is known to be zero.
+! Check low bit of op2. r2 contains 0x00000000
+.L_op2_zero:
+ tst r7,r7
+ bf .L_add
+
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0 ! op2 is zero, return op1
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! exp (op1) is smaller or equal to exp (op2)
+! The logic of same operations is present in .L_add. Kindly refer it for
+! comments
+.L_op2_exp_greater:
+ mov r3,r0
+ sub r1,r0
+
+ mov #54,r2
+ cmp/gt r2,r0
+
+ bt .L_pack_op2
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+
+ mov #0,r2
+ shll r7
+ rotcl r6
+ add #-1,r0
+ add #-1,r3
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+.L_shfrac_op1:
+ add #-1,r0
+ shar r4
+
+ rotcr r5
+ rotcr r2
+
+ cmp/eq #0,r0
+ bf .L_shfrac_op1
+
+ bra .L_add_mant
+ nop
+
+! Return the value in op1
+.L_ret_op1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! r1 has exp, r9 has sign, r4 and r5 mantissa
+.L_pack_op1:
+ mov.l .L_high_mant,r7
+ mov r4,r0
+
+ tst r9,r9
+ bt .L_pack_op1_1
+
+ clrt
+ negc r5,r5
+ negc r0,r0
+
+.L_pack_op1_1:
+ and r7,r0
+ mov r1,r3
+
+ mov #20,r2
+ mov r5,r1
+
+ mov.l @r15+,r10
+ or r9,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+ mov.l @r15+,r9
+
+ or r3,r0
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+!r2 has exp, r10 has sign, r6 and r7 mantissa
+.L_pack_op2:
+ mov.l .L_high_mant,r9
+ mov r6,r0
+
+ tst r10,r10
+ bt .L_pack_op2_1
+
+ clrt
+ negc r7,r7
+ negc r0,r0
+
+.L_pack_op2_1:
+ and r9,r0
+ mov r7,r1
+
+ mov #20,r2
+ or r10,r0
+
+ mov.l @r15+,r10
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+
+ mov.l @r15+,r9
+
+ or r3,r0
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+! Normalize the mantissa by setting its 21 bit in high part
+.L_norm_mant:
+ mov.l .L_21bit,r0
+
+ tst r4,r0
+ bf .L_epil
+
+ tst r4,r4
+ bf .L_shift_till_1
+
+ tst r5,r5
+ bf .L_shift_till_1
+
+ ! Mantissa is zero, return 0
+ mov.l @r15+,r10
+ mov #0,r0
+
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+
+ rts
+ mov #0,r1
+
+! A loop for making the 21st bit 1 in high part of resultant mantissa
+! It is already ensured that 1 bit is present in the mantissa
+.L_shift_till_1:
+ clrt
+ shll r5
+
+ rotcl r4
+ add #-1,r3
+
+ tst r4,r0
+ bt .L_shift_till_1
+
+! Return the result. Mantissa is in r4-r5. Exponent is in r3
+! Sign bit in r10
+.L_epil:
+ cmp/pl r3
+
+ bf .L_denorm
+ mov.l LOCAL(x7fffffff),r0
+
+ mov r5,r1
+ shlr r1
+
+ mov #0,r1
+ addc r0,r2
+
+! Check extra MSB here
+ mov.l .L_22bit,r9
+ addc r1,r5 ! round to even
+
+ addc r1,r4
+ tst r9,r4
+
+ bf .L_epil_1
+
+.L_epil_0:
+ mov.l .L_21bit,r1
+
+ not r1,r1
+ and r1,r4
+
+ mov r4,r0
+ or r10,r0
+
+ mov.l @r15+,r10
+ mov #20,r2
+
+ mov.l @r15+,r9
+ mov r5,r1
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+ or r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_epil_1:
+ shlr r4
+ add #1,r3
+ bra .L_epil_0
+ rotcr r5
+
+.L_denorm:
+ add #-1,r3
+.L_denorm_1:
+ tst r3,r3
+ bt .L_denorm_2
+
+ shlr r4
+ rotcr r5
+
+ movt r1
+ bra .L_denorm_1
+ add #1,r3
+
+.L_denorm_2:
+ clrt
+ mov #0,r2
+ addc r1,r5
+
+ addc r2,r4
+ mov r4,r0
+
+ or r10,r0
+ mov.l @r15+,r10
+
+ mov r5,r1
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+! op1 is known to be positive infinity, and op2 is Inf. The sign
+! of op2 is not known. Return the appropriate value
+.L_op1_pinf_op2_inf:
+ mov.l .L_sign,r0
+ tst r6,r0
+
+ bt .L_ret_op2_1
+
+ ! op2 is negative infinity. Inf - Inf is being performed
+ mov.l .L_inf,r0
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r1
+#endif
+ mov.l @r15+,r8
+
+ rts
+#ifdef __LITTLE_ENDIAN__
+ mov #1,r0
+#else
+ mov #1,r1 ! Any value here will return Nan
+#endif
+
+.L_ret_op1_1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_ret_op2_1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r6,r1
+#else
+ mov r6,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r7,r0
+#else
+ mov r7,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! op1 is negative infinity. Check op2 for infinity or Nan
+.L_op1_ninf:
+ cmp/eq r2,r3
+ bf .L_ret_op1_1 ! op2 is neither Nan nor Inf
+
+ mov.l @r15+,r9
+ div0s r4,r6 ! different signs -> NaN
+ mov r4,DBLRH
+ or r6,DBLRH
+ mov.l @r15+,r8
+ SL(bf, 0f,
+ mov r5,DBLRL)
+ mov #-1,DBLRH ! return NaN.
+0: rts
+ or r7,DBLRL
+
+!r1 contains exponent for op1, r3 contains exponent for op2
+!r2 has .L_inf (+ve Inf)
+!op1 has invalid exponent. Either it contains Nan or Inf
+.L_inv_exp_op1:
+ ! Check if a is Nan
+ cmp/pl r5
+ bt .L_ret_op1_1
+
+ mov.l .L_high_mant,r0
+ and r4,r0
+
+ cmp/pl r0
+ bt .L_ret_op1_1
+
+ ! op1 is not Nan. It is infinity. Check the sign of it.
+ ! If op2 is Nan, return op2
+ cmp/pz r4
+
+ bf .L_op1_ninf
+
+ ! op2 is +ve infinity here
+ cmp/eq r2,r3
+ bf .L_ret_op1_1 ! op2 is neither Nan nor Inf
+
+ ! r2 is free now
+ mov.l .L_high_mant,r0
+ tst r6,r0 ! op2 also has invalid exponent
+
+ bf .L_ret_op2_1 ! op2 is Infinity, and op1 is +Infinity
+
+ tst r7,r7
+ bt .L_op1_pinf_op2_inf ! op2 is Infinity, and op1 is +Infinity
+ !op2 is not infinity, It is Nan
+ bf .L_ret_op2_1
+
+ .align 2
+.L_high_mant:
+ .long 0x000FFFFF
+
+.L_21bits:
+ .long 0x001FFFFF
+
+.L_22bit:
+ .long 0x00200000
+
+.L_23bit:
+ .long 0x00400000
+
+.L_21bit:
+ .long 0x00100000
+
+.L_sign:
+ .long 0x80000000
+
+.L_inf:
+ .long 0x7ff00000
+
+LOCAL(x7fffffff): .long 0x7fffffff
+
+ENDFUNC (GLOBAL (subdf3))
+ENDFUNC (GLOBAL (adddf3))
Index: gcc/config/sh/IEEE-754/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatsisf.S (revision 0)
@@ -0,0 +1,200 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatsisf)
+ FUNC (GLOBAL (floatsisf))
+
+GLOBAL (floatsisf):
+ mov.l .L_sign,r2
+ mov #23,r6
+
+ ! Check for zero
+ tst r4,r4
+ mov.l .L_24_bits,r7
+
+ ! Extract sign
+ and r4,r2
+ bt .L_ret
+
+ ! Negative ???
+ mov.l .L_imp_bit,r5
+ cmp/pl r4
+
+ not r7,r3
+ bf .L_neg
+
+ ! Decide the direction for shifting
+ cmp/gt r7,r4
+ mov r4,r0
+
+ and r5,r0
+ bt .L_shr_0
+
+ ! Number may already be in normalized form
+ cmp/eq #0,r0
+ bf .L_pack
+
+! Shift the bits to the left. Adjust the exponent
+.L_shl:
+ shll r4
+ mov r4,r0
+
+ and r5,r0
+ cmp/eq #0,r0
+
+ SL(bt, .L_shl,
+ add #-1,r6)
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa, r2 has sign
+.L_pack:
+ mov #23,r3
+ not r5,r5
+
+ mov r2,r0
+ add #127,r6
+
+ and r5,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+
+ or r6,r0
+ rts
+ or r4,r0
+
+! Negate the number
+.L_neg:
+ ! Take care for -2147483648.
+ mov r4,r0
+ shll r0
+
+ cmp/eq #0,r0
+ SL(bt, .L_ret_min,
+ neg r4,r4)
+
+ cmp/gt r7,r4
+ bt .L_shr_0
+
+ mov r4,r0
+ and r5,r0
+
+ cmp/eq #0,r0
+ bf .L_pack
+ bt .L_shl
+
+.L_shr_0:
+ mov #0,r1
+
+! Shift right the number with rounding
+.L_shr:
+ shlr r4
+ movt r7
+
+ tst r7,r7
+
+ ! Count number of ON bits shifted
+ bt .L_shr_1
+ add #1,r1
+
+.L_shr_1:
+ mov r4,r0
+ add #1,r6
+
+ and r3,r0
+ cmp/eq #0,r0
+
+ ! Add MSB of shifted bits
+ bf .L_shr
+ add r7,r4
+
+ tst r7,r7
+ bt .L_pack
+
+.L_pack1:
+ mov #1,r0
+ cmp/eq r1,r0
+
+ bt .L_rnd
+ mov r4,r0
+
+ ! Rounding may have misplaced MSB. Adjust.
+ and r3,r0
+ cmp/eq #0,r0
+
+ bf .L_shr
+ bt .L_pack
+
+! If only MSB of shifted bits is ON, we are halfway
+! between two numbers. Round towards even LSB of
+! resultant mantissa.
+.L_rnd:
+ shlr r4
+ bra .L_pack
+ shll r4
+
+.L_ret:
+ rts
+ mov r4,r0
+
+! Return value for -2147483648
+.L_ret_min:
+ mov.l .L_min_val,r0
+ rts
+ nop
+
+ .align 2
+.L_sign:
+ .long 0x80000000
+
+.L_imp_bit:
+ .long 0x00800000
+
+.L_24_bits:
+ .long 0x00FFFFFF
+
+.L_nsign:
+ .long 0x7FFFFFFF
+
+.L_min_val:
+ .long 0xCF000000
+
+ENDFUNC (GLOBAL (floatsisf))
Index: gcc/config/sh/IEEE-754/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/muldf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/muldf3.S (revision 0)
@@ -0,0 +1,601 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!multiplication of two double precision floating point numbers
+!Author:Aanchal Khanna
+!SH1 Support / Simplifications: Joern Rennecke
+!
+!Entry:
+!r4,r5:operand 1
+!
+!r6,r7:operand 2
+!
+!Exit:
+!r0,r1:result
+!
+!Notes: argument 1 is passed in regs r4 and r5 and argument 2 is passed in regs
+!r6 and r7, result is returned in regs r0 and r1. operand 1 is referred as op1
+!and operand 2 as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+ .text
+ .align 5
+ .global GLOBAL (muldf3)
+ FUNC (GLOBAL (muldf3))
+
+GLOBAL (muldf3):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+ mov r6,r1
+ mov r7,r6
+ mov r1,r7
+#endif
+ mov.l .L_mask_sign,r0
+ mov r4,r2
+
+ and r0,r2
+ mov #0,r1
+
+ shll r4
+ and r6,r0
+
+ xor r2,r0 !r0 contains the result's sign bit
+ shlr r4
+
+ mov.l .L_inf,r2
+ shll r6
+
+ mov r4,r3
+ shlr r6
+
+.L_chk_a_inv:
+ !chk if op1 is Inf/NaN
+ and r2,r3
+ mov.l r8,@-r15
+
+ cmp/eq r3,r2
+ mov.l .L_mask_high_mant,r8
+
+ mov r2,r3
+ bf .L_chk_b_inv
+
+ mov r8,r3
+ and r4,r8
+
+ cmp/hi r1,r8
+ bt .L_return_a !op1 NaN, return op1
+
+ cmp/hi r1,r5
+ mov r2,r8
+
+ bt .L_return_a !op1 NaN, return op1
+ and r6,r8
+
+ cmp/eq r8,r2
+ and r6,r3
+
+ bt .L_b_inv
+ cmp/eq r1,r6
+
+ bf .L_return_a !op1 Inf,op2= normal no return op1
+ cmp/eq r1,r7
+
+ bf .L_return_a !op1 Inf,op2= normal no return op1
+ mov.l @r15+,r8
+
+ rts
+ mov #-1,DBLRH !op1=Inf, op2=0,return nan
+
+.L_b_inv:
+ !op2 is NaN/Inf
+ cmp/hi r1,r7
+ mov r1,r2
+
+ mov r5,r1
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/hi r2,r6
+ or r4,r0
+
+ bt .L_return_b !op2=NaN,return op2
+ mov.l @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts !op1=Inf,op2=Inf,return Inf with sign
+ nop
+
+.L_chk_b_inv:
+ !Chk if op2 is NaN/Inf
+ and r6,r2
+ cmp/eq r3,r2
+
+ bf .L_chk_a_for_zero
+ and r6,r8
+
+ cmp/hi r1,r8
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/hi r1,r7
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/eq r5,r1
+ bf .L_return_b !op1=normal number,op2=Inf,return Inf
+
+ mov r7,r1
+ cmp/eq r4,r1
+
+ bf .L_return_b /* op1=normal number, op2=Inf,return Inf */
+ mov.l @r15+,r8
+
+ rts
+ mov #-1,DBLRH !op1=0,op2=Inf,return NaN
+
+.L_return_a:
+ mov r5,r1
+ or r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_return_b:
+ mov r7,r1
+ or r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_chk_a_for_zero:
+ !Chk if op1 is zero
+ cmp/eq r1,r4
+ bf .L_chk_b_for_zero
+
+ cmp/eq r1,r5
+ bf .L_chk_b_for_zero
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_chk_b_for_zero:
+ !op1=0,chk if op2 is zero
+ cmp/eq r1,r6
+ mov r1,r3
+
+ mov.l .L_inf,r1
+ bf .L_normal_nos
+
+ cmp/eq r3,r7
+ bf .L_normal_nos
+
+ mov r3,r1
+ mov.l @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ nop
+
+.L_normal_nos:
+ !op1 and op2 are normal nos
+ mov.l r9,@-r15
+ mov r4,r3
+
+ mov #-20,r9
+ and r1,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r9,r2
+#else
+ SHLR20 (r2)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r9,r3
+#else
+ SHLR20 (r3)
+#endif
+ cmp/pl r3
+
+ bf .L_norm_a !normalize op1
+.L_chk_b:
+ cmp/pl r2
+ bf .L_norm_b !normalize op2
+
+.L_mul1:
+ add r3,r2
+ mov.l .L_1023,r1
+
+ !resultant exponent in r2
+ add r1,r2
+ mov.l .L_2047,r1
+
+ !Chk the exponent for overflow
+ cmp/ge r1,r2
+ and r8,r4
+
+ bt .L_return_inf
+ mov.l .L_imp_bit,r1
+
+ or r1,r4
+ and r8,r6
+
+ or r1,r6
+ clrt
+
+ !multiplying the mantissas
+ DMULU_SAVE
+ DMULUL (r7,r5,r1) !bits 0-31 of product
+
+ DMULUH (r3)
+
+ DMULUL (r4,r7,r8)
+
+ addc r3,r8
+
+ DMULUH (r3)
+
+ movt r9
+ clrt
+
+ DMULUL (r5,r6,r7)
+
+ addc r7,r8 !bits 63-32 of product
+
+ movt r7
+ add r7,r9
+
+ DMULUH (r7)
+
+ add r7,r3
+
+ add r9,r3
+ clrt
+
+ DMULUL (r4,r6,r7)
+
+ addc r7,r3 !bits 64-95 of product
+
+ DMULUH (r7)
+ DMULU_RESTORE
+
+ mov #0,r5
+ addc r5,r7 !bits 96-105 of product
+
+ cmp/eq r5,r1
+ mov #1,r4
+
+ bt .L_skip
+ or r4,r8
+.L_skip:
+ mov.l .L_106_bit,r4
+ mov r8,r9
+
+.L_chk_extra_msb:
+ !chk if exra MSB is generated
+ and r7,r4
+ cmp/eq r5,r4
+
+ mov #12,r4
+ SL(bf, .L_shift_rt_by_1,
+ mov #31,r5)
+
+.L_pack_mantissa:
+ !scale the mantissa t0 53 bits
+ mov #-19,r6
+ mov.l .L_mask_high_mant,r5
+
+ SHLRN (19, r6, r8)
+
+ and r3,r5
+
+ shlr r8
+ movt r1
+
+ SHLLN (12, r4, r5)
+
+ add #-1,r6
+
+ or r5,r8 !lower bits of resulting mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r3
+#else
+ SHLR20 (r3)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r4,r7
+#else
+ SHLL12 (r7)
+#endif
+ clrt
+
+ or r7,r3 !higher bits of resulting mantissa
+ mov #0,r7
+
+ !chk the exponent for underflow
+ cmp/ge r2,r7
+ bt .L_underflow
+
+ addc r1,r8 !rounding
+ mov r8,r1
+
+ addc r7,r3 !rounding
+ mov.l .L_mask_22_bit,r5
+
+ and r3,r5
+ !chk if extra msb is generated after rounding
+ cmp/eq r7,r5
+
+ mov.l .L_mask_high_mant,r8
+ bt .L_pack_result
+
+ add #1,r2
+ mov.l .L_2047,r6
+
+ cmp/ge r6,r2
+
+ bt .L_return_inf
+ shlr r3
+
+ rotcr r1
+
+.L_pack_result:
+ !pack the result, r2=exponent, r3=higher mantissa, r1=lower mantissa
+ !r0=sign bit
+ mov #20,r6
+ and r8,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ SHLL20 (r2)
+#endif
+ or r3,r0
+
+ or r2,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_norm_a:
+ !normalize op1
+ shll r5
+ mov.l .L_imp_bit,r1
+
+ rotcl r4
+ add #-1,r3
+
+ tst r1,r4
+ bt .L_norm_a
+
+ bra .L_chk_b
+ add #1,r3
+
+.L_norm_b:
+ !normalize op2
+ shll r7
+ mov.l .L_imp_bit,r1
+
+ rotcl r6
+ add #-1,r2
+
+ tst r1,r6
+ bt .L_norm_b
+
+ bra .L_mul1
+ add #1,r2
+
+.L_shift_rt_by_1:
+ !adjust the extra msb
+
+ add #1,r2 !add 1 to exponent
+ mov.l .L_2047,r6
+
+ cmp/ge r6,r2
+ mov #20,r6
+
+ bt .L_return_inf
+ shlr r7 !r7 contains bit 96-105 of product
+
+ rotcr r3 !r3 contains bit 64-95 of product
+
+ rotcr r8 !r8 contains bit 32-63 of product
+ bra .L_pack_mantissa
+
+ rotcr r1 !r1 contains bit 31-0 of product
+
+.L_return_inf:
+ !return Inf
+ mov.l .L_inf,r2
+ mov #0,r1
+
+ or r2,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_underflow:
+ !check if the result needs to be denormalized
+ mov #-53,r1
+ add #1,r2
+
+ cmp/gt r2,r1
+ mov #32,r4
+
+ add #-2,r2
+ bt .L_return_zero
+
+ add r2,r4
+ mov r7,r1
+
+ cmp/ge r7,r4
+ mov r2,r6
+
+ mov #-54,r2
+ bt .L_denorm
+
+ mov #-32,r6
+
+.L_denorm:
+ !denormalize the result
+ shlr r8
+ rotcr r1
+
+ shll r8
+ add #1,r6
+
+ shlr r3
+ rotcr r8
+
+ cmp/eq r7,r6
+ bf .L_denorm
+
+ mov r4,r6
+ cmp/eq r2,r4
+
+ bt .L_break
+ mov r7,r5
+
+ cmp/gt r6,r7
+ bf .L_break
+
+ mov r2,r4
+ mov r1,r5
+
+ mov r7,r1
+ bt .L_denorm
+
+.L_break:
+ mov #0,r2
+
+ cmp/gt r1,r2
+
+ addc r2,r8
+ mov.l .L_comp_1,r4
+
+ addc r7,r3
+ or r3,r0
+
+ cmp/eq r9,r7
+ bf .L_return
+
+ cmp/eq r7,r5
+ mov.l .L_mask_sign,r6
+
+ bf .L_return
+ cmp/eq r1,r6
+
+ bf .L_return
+ and r4,r8
+
+.L_return:
+ mov.l @r15+,r9
+ mov r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_return_zero:
+ mov.l @r15+,r9
+ mov r7,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+ .align 2
+
+.L_mask_high_mant:
+ .long 0x000fffff
+.L_inf:
+ .long 0x7ff00000
+.L_mask_sign:
+ .long 0x80000000
+.L_1023:
+ .long -1023
+.L_2047:
+ .long 2047
+.L_imp_bit:
+ .long 0x00100000
+.L_mask_22_bit:
+ .long 0x00200000
+.L_106_bit:
+ .long 0x00000200
+.L_comp_1:
+ .long 0xfffffffe
+
+ENDFUNC (GLOBAL (muldf3))
Index: gcc/config/sh/IEEE-754/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixdfsi.S (revision 0)
@@ -0,0 +1,200 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to signed integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixdfsi)
+ FUNC (GLOBAL (fixdfsi))
+
+GLOBAL (fixdfsi):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+#endif
+ mov.l .L_p_inf,r2
+ mov #-20,r1
+
+ mov r2,r7
+ mov.l .L_1023,r3
+
+ and r4,r2
+ shll r4
+
+ movt r6 ! r6 contains the sign bit
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2 ! r2 contains the exponent
+#else
+ SHLR20 (r2)
+#endif
+ shlr r4
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r7
+#else
+ SHLR20 (r7)
+#endif
+ cmp/hi r2,r3 ! if exp < 1023,return 0
+ mov.l .L_mask_high_mant,r1
+
+ SL(bt, .L_epil,
+ mov #0,r0)
+ and r4,r1 ! r1 contains high mantissa
+
+ cmp/eq r2,r7 ! chk if exp is invalid
+ mov.l .L_1053,r7
+
+ bt .L_inv_exp
+ mov #11,r0
+
+ cmp/hi r7,r2 ! If exp > 1053,return maxint
+ sub r2,r7
+
+ mov.l .L_21bit,r2
+ SL(bt, .L_ret_max,
+ add #1,r7) ! r7 contains the number of shifts
+
+ or r2,r1
+ mov r7,r3
+ shll8 r1
+
+ neg r7,r7
+ shll2 r1
+
+ shll r1
+ cmp/hi r3,r0
+
+ !chk if the result can be made only from higher mantissa
+ SL(bt, .L_lower_mantissa,
+ mov #21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_loop:
+ tst r7,r7
+ bt .L_break1
+ add #1,r7
+ bra .L_loop
+ shlr r1
+
+.L_break1:
+#endif
+ tst r6,r6
+ SL(bt, .L_epil,
+ mov r1,r0)
+
+ rts
+ neg r0,r0
+
+.L_lower_mantissa:
+ !result is made from lower mantissa also
+ neg r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r0,r5
+#else
+ SHLR21 (r5)
+#endif
+
+ or r5,r1 !pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_sh_loop:
+ tst r7,r7
+ bt .L_break
+ add #1,r7
+ bra .L_sh_loop
+ shlr r1
+
+.L_break:
+#endif
+ mov r1,r0
+ bra .L_chk_sign
+ nop
+
+.L_epil:
+ rts
+ nop
+
+.L_inv_exp:
+ cmp/hi r0,r5
+ bt .L_epil
+
+ cmp/hi r0,r1 !compare high mantissa,r1
+ bt .L_epil
+
+.L_ret_max:
+ mov.l .L_maxint,r0
+ tst r6,r6
+ bt .L_epil
+
+ rts
+ add #1,r0
+
+.L_chk_sign:
+ tst r6,r6 !sign bit is set, number is -ve
+ bt .L_epil
+
+ rts
+ neg r0,r0
+
+ .align 2
+
+.L_maxint:
+ .long 0x7fffffff
+.L_p_inf:
+ .long 0x7ff00000
+.L_mask_high_mant:
+ .long 0x000fffff
+.L_1023:
+ .long 0x000003ff
+.L_1053:
+ .long 1053
+.L_21bit:
+ .long 0x00100000
+
+ENDFUNC (GLOBAL (fixdfsi))
Index: gcc/config/sh/IEEE-754/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/divsf3.S (revision 0)
@@ -0,0 +1,398 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!divides two single precision floating point
+
+! Author: Aanchal Khanna
+
+! Arguments: Dividend is in r4, divisor in r5
+! Result: r0
+
+! r4 and r5 are referred as op1 and op2 resp.
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (divsf3)
+ FUNC (GLOBAL (divsf3))
+
+GLOBAL (divsf3):
+ mov.l .L_mask_sign,r1
+ mov r4,r3
+
+ xor r5,r3
+ shll r4
+
+ shlr r4
+ mov.l .L_inf,r2
+
+ and r3,r1 !r1=resultant sign
+ mov r4,r6
+
+ shll r5
+ mov #0,r0
+
+ shlr r5
+ and r2,r6
+
+ cmp/eq r2,r6
+ mov r5,r7
+
+ and r2,r7
+ bt .L_op1_inv
+
+ cmp/eq r2,r7
+ mov #-23,r3
+
+ bt .L_op2_inv
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+ SHLR23 (r7)
+#else
+ shld r3,r6
+ shld r3,r7
+#endif
+
+ cmp/eq r0,r4
+
+ bt .L_op1_zero !dividend=0
+ cmp/eq r0,r6
+
+ mov.l .L_imp_bit,r3
+ bt .L_norm_op1 !normalize dividend
+.L_chk_op2:
+ cmp/eq r0,r5
+ bt .L_op2_zero !divisor=0
+
+ cmp/eq r0,r7
+ bt .L_norm_op2 !normalize divisor
+
+.L_div1:
+ sub r7,r6
+ add #127,r6 !r6=resultant exponent
+
+ mov r3,r7
+ mov.l .L_mask_mant,r3
+
+ and r3,r4
+ !chk exponent for overflow
+ mov.l .L_255,r2
+
+ and r3,r5
+ or r7,r4
+
+ cmp/ge r2,r6
+ or r7,r5
+
+ bt .L_return_inf
+ mov r0,r2
+
+ cmp/eq r4,r5
+ bf .L_den_one
+
+ cmp/ge r6,r0
+ !numerator=denominator, quotient=1, remainder=0
+ mov r7,r2
+
+ mov r0,r4
+ !chk exponent for underflow
+ bt .L_underflow
+ bra .L_pack
+ nop
+
+.L_den_one:
+ !denominator=1, result=numerator
+
+ cmp/eq r7,r5
+ bf .L_divide
+
+ !chk exponent for underflow
+ cmp/ge r6,r0
+ mov r4,r2
+
+ SL(bt, .L_underflow,
+ mov r0,r4)
+ bra .L_pack
+ nop
+
+.L_divide:
+ !dividing the mantissas r4<-dividend, r5<-divisor
+
+ cmp/hi r4,r5
+ bf .L_loop
+
+ shll r4 ! if mantissa(op1)< mantissa(op2)
+ add #-1,r6 ! shift left the numerator and decrease the exponent.
+
+.L_loop:
+ !division loop
+
+ cmp/ge r5,r4
+ bf .L_skip
+
+ or r7,r2
+ sub r5,r4
+
+.L_skip:
+ shlr r7
+ shll r4
+
+ cmp/eq r0,r7
+ bf .L_loop
+
+ !chk the exponent for underflow
+ cmp/ge r6,r0
+ bt .L_underflow
+
+ !apply rounding
+ cmp/gt r5,r4
+ bt .L_round1
+
+ cmp/eq r4,r5
+ bt .L_round2
+
+.L_pack:
+ !pack the result, r1=sign, r2=quotient, r6=exponent
+
+ mov #23,r4
+ and r3,r2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r4,r6
+#endif
+ or r2,r1
+
+ or r6,r1
+ mov r1,r0
+
+ rts
+ nop
+
+.L_round1:
+ !Apply proper rounding
+
+ bra .L_pack
+ add #1,r2
+
+.L_round2:
+ !Apply proper rounding
+
+ mov.l .L_comp_1,r5
+ bra .L_pack
+ and r5,r2
+
+.L_op1_inv:
+ !chk if op1 is Inf or NaN
+
+ mov.l .L_mask_mant,r3
+ mov r4,r6
+
+ and r3,r6
+ cmp/hi r0,r6
+
+ bt .L_ret_NaN ! op1 is NaN, return NaN.
+ cmp/eq r2,r7
+
+ SL(bf, .L_return,
+ mov r4,r0) ! Inf/finite, return Inf
+
+ ! Inf/Inf or Inf/NaN, return NaN
+.L_ret_NaN:
+ rts
+ mov #-1,r0
+
+.L_op2_inv:
+ !chk if op2 is Inf or NaN
+
+ mov.l .L_mask_mant,r3
+ mov r5,r7
+
+ and r3,r7
+ cmp/hi r0,r7
+
+ bt .L_ret_op2
+ mov r1,r0
+
+ rts
+ nop
+
+.L_op1_zero:
+ !op1 is zero. If op2 is zero, return NaN, else return zero
+
+ cmp/eq r0,r5
+
+ bf .L_return
+
+ rts
+ mov #-1,r0
+
+.L_op2_zero:
+ !B is zero,return Inf
+
+ rts
+ or r2,r0
+
+.L_return_inf:
+ mov.l .L_inf,r0
+
+ rts
+ or r1,r0
+
+.L_norm_op1:
+ !normalize dividend
+
+ shll r4
+ tst r2,r4
+
+ add #-1,r6
+ bt .L_norm_op1
+
+ bra .L_chk_op2
+ add #1,r6
+
+.L_norm_op2:
+ !normalize divisor
+
+ shll r5
+ tst r2,r5
+
+ add #-1,r7
+ bt .L_norm_op2
+
+ bra .L_div1
+ add #1,r7
+
+.L_underflow:
+ !denormalize the result
+
+ add #1,r6
+ mov #-24,r7
+
+ cmp/gt r6,r7
+ mov r2,r5
+
+ bt .L_return_zero
+ add #-1,r6
+
+ mov #32,r3
+ neg r6,r7
+
+ add #1,r7
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ cmp/ge r0,r6
+ bf .L_mov_right
+
+.L_mov_left:
+ cmp/eq r0,r6
+ bt .L_out
+
+ shll r2
+ bra .L_mov_left
+ add #-1,r6
+
+.L_mov_right:
+ cmp/eq r0,r6
+ bt .L_out
+
+ add #1,r6
+ bra .L_mov_right
+ shlr r2
+
+.L_out:
+#endif
+ sub r7,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r5
+#else
+ cmp/ge r0,r3
+ bf .L_mov_right_1
+
+.L_mov_left_1:
+ shll r5
+ add #-1,r3
+
+ cmp/eq r0,r3
+ bf .L_mov_left_1
+
+ bt .L_out_1
+
+.L_mov_right_1:
+ cmp/eq r0,r3
+ bt .L_out_1
+
+ add #1,r3
+ bra .L_mov_right_1
+ shlr r5
+
+.L_out_1:
+#endif
+ shlr r2
+ addc r0,r2
+
+ cmp/eq r4,r0 !r4 contains the remainder
+ mov r2,r0
+
+ mov.l .L_mask_sign,r7
+ bf .L_return
+
+ mov.l .L_comp_1,r2
+ cmp/eq r7,r5
+
+ bf .L_return
+ and r2,r0
+
+.L_return:
+.L_return_zero:
+ rts
+ or r1,r0
+
+.L_ret_op2:
+ rts
+ or r5,r0
+
+
+ .align 2
+.L_inf:
+ .long 0x7f800000
+.L_mask_sign:
+ .long 0x80000000
+.L_mask_mant:
+ .long 0x007fffff
+.L_imp_bit:
+ .long 0x00800000
+.L_comp_1:
+ .long 0xfffffffe
+.L_255:
+ .long 255
+
+ENDFUNC (GLOBAL (divsf3))
Index: gcc/config/sh/IEEE-754/fixunssfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunssfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixunssfsi.S (revision 0)
@@ -0,0 +1,155 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion from floating point to unsigned integer
+
+! Author: Rakesh Kumar
+
+! Argument: r4 (in floating point format)
+! Result: r0
+
+! For negative floating point numbers, it returns zero
+
+! The argument is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixunssfsi)
+ FUNC (GLOBAL (fixunssfsi))
+
+GLOBAL (fixunssfsi):
+ mov.l .L_sign,r0
+ mov r4,r2
+
+ ! Check for NaN
+ mov.l .L_inf,r1
+ and r4,r0
+
+ mov.l .L_mask_sign,r7
+ mov #127,r5
+
+ ! Remove sign bit
+ cmp/eq #0,r0
+ and r7,r2
+
+ ! If number is negative, return 0
+ ! LIBGCC deviates from standard in this regard.
+ mov r4,r3
+ SL(bf, .L_epil,
+ mov #0,r0)
+
+ mov.l .L_frac,r6
+ cmp/gt r1,r2
+
+ shll r2
+ SL1(bt, .L_epil,
+ shlr16 r2)
+
+ shlr8 r2 ! r2 has exponent
+ mov.l .L_24bit,r1
+
+ and r6,r3 ! r3 has fraction
+ cmp/gt r2,r5
+
+ ! If exponent is less than 127, return 0
+ or r1,r3
+ bt .L_epil
+
+ ! Process only if exponent is less than 158
+ mov.l .L_158,r1
+ shll8 r3
+
+ cmp/gt r1,r2
+ sub r2,r1
+
+ neg r1,r1
+ bt .L_ret_max
+
+! Shift the mantissa with exponent difference from 158
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r3
+#else
+ cmp/gt r0,r1
+ bt .L_mov_left
+
+.L_mov_right:
+ cmp/eq r1,r0
+ bt .L_ret
+
+ add #1,r1
+ bra .L_mov_right
+ shlr r3
+
+.L_mov_left:
+ add #-1,r1
+
+ shll r3
+ cmp/eq r1,r0
+
+ bf .L_mov_left
+
+.L_ret:
+#endif
+ rts
+ mov r3,r0
+
+! r0 already has appropriate value
+.L_epil:
+ rts
+ nop
+
+! Return the maximum unsigned integer value
+.L_ret_max:
+ mov.l .L_max,r3
+
+ rts
+ mov r3,r0
+
+ .align 2
+.L_inf:
+ .long 0x7F800000
+
+.L_158:
+ .long 158
+
+.L_max:
+ .long 0xFFFFFFFF
+
+.L_frac:
+ .long 0x007FFFFF
+
+.L_sign:
+ .long 0x80000000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_mask_sign:
+ .long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixunssfsi))
Index: gcc/config/sh/IEEE-754/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatunssidf.S (revision 0)
@@ -0,0 +1,76 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of unsigned integer to double precision floating point number
+!Author:Rakesh Kumar
+!Rewritten for SH1 support: Joern Rennecke
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatunsidf)
+ FUNC (GLOBAL (floatunsidf))
+
+GLOBAL (floatunsidf):
+ mov.w LOCAL(x41f0),DBLRH ! bias + 32
+ tst r4,r4 ! check for zero
+ bt .L_ret_zero
+.L_loop:
+ shll r4
+ SL(bf, .L_loop,
+ add #-16,DBLRH)
+
+ mov r4,DBLRL
+
+ SHLL20 (DBLRL)
+
+ shll16 DBLRH ! put exponent in proper place
+
+ SHLR12 (r4)
+
+ rts
+ or r4,DBLRH
+
+.L_ret_zero:
+ mov #0,r1
+ rts
+ mov #0,r0
+
+LOCAL(x41f0): .word 0x41f0
+ .align 2
+
+ENDFUNC (GLOBAL (floatunsidf))
Index: gcc/config/sh/IEEE-754/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/addsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/addsf3.S (revision 0)
@@ -0,0 +1,535 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Add floating point numbers in r4, r5.
+
+! Author: Rakesh Kumar
+
+! Arguments are in r4, r5 and result in r0
+
+! Entry points: ___subsf3, ___addsf3
+
+! r4 and r5 are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (subsf3)
+ .global GLOBAL (addsf3)
+ FUNC (GLOBAL (subsf3))
+ FUNC (GLOBAL (addsf3))
+
+GLOBAL (subsf3):
+ mov.l .L_sign_bit,r1
+ xor r1,r5
+
+GLOBAL (addsf3):
+ mov.l r8,@-r15
+ mov r4,r3
+
+ mov.l .L_pinf,r2
+ mov #0,r8
+
+ and r2,r3 ! op1's exponent.
+ mov r5,r6
+
+ ! Check NaN or Infinity
+ and r2,r6 ! op2's exponent.
+ cmp/eq r2,r3
+
+ ! go if op1 is NaN or INF.
+ mov.l .L_sign_bit,r0
+ SL(bt, .L_inv_op1,
+ mov #-23,r1)
+
+ ! Go if op2 is NaN/INF.
+ cmp/eq r2,r6
+ mov r0,r7
+ bt .L_ret_op2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r3)
+#else
+ shld r1,r3
+#endif
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+#else
+ shld r1,r6
+#endif
+
+ ! Check for negative zero
+ cmp/eq r0,r5
+
+ mov r5,r1
+ SL(bt, .L_ret_op1,
+ and r7,r1)
+
+ cmp/eq r0,r4
+ bt .L_ret_op2
+
+ ! if op1 is zero return op2
+ tst r4,r4
+ bt .L_ret_op2
+
+ ! Equal numbers with opposite sign
+ mov r4,r2
+ xor r5,r2
+
+ cmp/eq r0,r2
+ bt .L_ret_zero
+
+ ! if op2 is zero return op1
+ mov.l .L_mask_fra,r2
+ tst r5,r5
+
+ ! Extract the mantissa
+ mov r4,r0
+ SL(bt, .L_ret_op1,
+ and r2,r5)
+
+ and r2,r4
+
+ mov.l .L_imp_bit,r2
+ and r7,r0 ! sign bit of op1
+
+ ! Check for denormals
+ tst r3,r3
+ bt .L_norm_op1
+
+ ! Attach the implicit bit
+ or r2,r4
+ tst r6,r6
+
+ bt .L_norm_op2
+
+ or r2,r5
+ tst r0,r0
+
+ ! operands are +ve or -ve??
+ bt .L_ptv_op1
+
+ neg r4,r4
+
+.L_ptv_op1:
+ tst r1,r1
+ bt .L_ptv_op2
+
+ neg r5,r5
+
+! Test exponents for equality
+.L_ptv_op2:
+ cmp/eq r3,r6
+ bt .L_exp_eq
+
+! Make exponents of two arguments equal
+.L_exp_ne:
+ ! r0, r1 contain sign bits.
+ ! r4, r5 contain mantissas.
+ ! r3, r6 contain exponents.
+ ! r2, r7 scratch.
+
+ ! Calculate result exponent.
+ mov r6,r2
+ sub r3,r2 ! e2 - e1
+
+ cmp/pl r2
+ mov #23,r7
+
+ ! e2 - e1 is -ve
+ bf .L_exp_ne_1
+
+ mov r6,r3 ! Result exp.
+ cmp/gt r7,r2 ! e2-e1 > 23
+
+ mov #1,r7
+ bt .L_pack_op2_0
+
+ ! Align the mantissa
+.L_loop_ne:
+ shar r4
+
+ rotcr r8
+ cmp/eq r7,r2
+
+ add #-1,r2
+ bf .L_loop_ne
+
+ bt .L_exp_eq
+
+! Exponent difference is too high.
+! Return op2 after placing pieces in proper place
+.L_pack_op2_0:
+ ! If op1 is -ve
+ tst r1,r1
+ bt .L_pack_op2
+
+ neg r5,r5
+
+! r6 has exponent
+! r5 has mantissa, r1 has sign
+.L_pack_op2:
+ mov.l .L_nimp_bit,r2
+ mov #23,r3
+
+ mov r1,r0
+
+ and r2,r5
+ mov.l @r15+,r8
+
+ or r5,r0
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+ rts
+ or r6,r0
+
+! return op1. It is NAN or INF or op2 is zero.
+.L_ret_op1:
+ mov r4,r0
+
+ rts
+ mov.l @r15+,r8
+
+! return zero
+.L_ret_zero:
+ mov #0,r0
+
+ rts
+ mov.l @r15+,r8
+
+! return op2. It is NaN or INF or op1 is zero.
+.L_ret_op2:
+ mov r5,r0
+
+ rts
+ mov.l @r15+,r8
+
+! op2 is denormal. Normalize it.
+.L_norm_op2:
+ shll r5
+ add #-1,r6
+
+ tst r2,r5
+ bt .L_norm_op2
+
+ ! Check sign
+ tst r1,r1
+ bt .L_norm_op2_2
+
+ neg r5,r5
+
+.L_norm_op2_2:
+ add #1,r6
+ cmp/eq r3,r6
+
+ bf .L_exp_ne
+ bt .L_exp_eq
+
+! Normalize op1
+.L_norm_op1:
+ shll r4
+ add #-1,r3
+
+ tst r2,r4
+ bt .L_norm_op1
+
+ ! Check sign
+ tst r0,r0
+ bt .L_norm_op1_1
+
+ neg r4,r4
+
+.L_norm_op1_1:
+ ! Adjust biasing
+ add #1,r3
+
+ ! Check op2 for denormalized value
+ tst r6,r6
+ bt .L_norm_op2
+
+ mov.l .L_imp_bit,r2
+
+ tst r1,r1 ! Check sign
+ or r2,r5 ! Attach 24th bit
+
+ bt .L_norm_op1_2
+
+ neg r5,r5
+
+.L_norm_op1_2:
+ cmp/eq r3,r6
+
+ bt .L_exp_eq
+ bf .L_exp_ne
+
+! op1 is NaN or Inf
+.L_inv_op1:
+ ! Return op1 if it is NAN.
+ ! r2 is infinity
+ cmp/gt r2,r4
+ bt .L_ret_op1
+
+ ! op1 is +/- INF
+ ! If op2 is same return now.
+ cmp/eq r4,r5
+ bt .L_ret_op1
+
+ ! return op2 if it is NAN
+ cmp/gt r2,r5
+ bt .L_ret_op2
+
+ ! Check if op2 is inf
+ cmp/eq r2,r6
+ bf .L_ret_op1
+
+ ! Both op1 and op2 are infinities
+ !of opp signs, or there is -NAN. Return a NAN.
+ mov.l @r15+,r8
+ rts
+ mov #-1,r0
+
+! Make unequal exponents equal.
+.L_exp_ne_1:
+ mov #-25,r7
+ cmp/gt r2,r7 ! -23 > e2 - e1
+
+ add #1,r2
+ bf .L_exp_ne_2
+
+ tst r0,r0
+ bt .L_pack_op1
+
+.L_pack_op1_0:
+ bra .L_pack_op1
+ neg r4,r4
+
+! Accumulate the shifted bits in r8
+.L_exp_ne_2:
+ ! Shift with rounding
+ shar r5
+ rotcr r8
+
+ tst r2,r2
+
+ add #1,r2
+ bf .L_exp_ne_2
+
+! Exponents of op1 and op2 are equal (or made so)
+! The mantissas are in r4-r5 and remaining bits in r8
+.L_exp_eq:
+ add r5,r4 ! Add fractions.
+ mov.l .L_sign_bit,r2
+
+ ! Check for negative result
+ mov #0,r0
+ tst r2,r4
+
+ mov.l .L_255,r5
+ bt .L_post_add
+
+ negc r8,r8
+ negc r4,r4
+ or r2,r0
+
+.L_post_add:
+ ! Check for extra MSB
+ mov.l .L_chk_25,r2
+
+ tst r2,r4
+ bt .L_imp_check
+
+ shar r4
+ rotcr r8
+
+ add #1,r3
+ cmp/ge r5,r3
+
+ ! Return Inf if exp > 254
+ bt .L_ret_inf
+
+! Check for implicit (24th) bit in result
+.L_imp_check:
+ mov.l .L_imp_bit,r2
+ tst r2,r4
+
+ bf .L_pack_op1
+
+! Result needs left shift
+.L_lft_shft:
+ shll r8
+ rotcl r4
+
+ add #-1,r3
+ tst r2,r4
+
+ bt .L_lft_shft
+
+! Pack the result after rounding
+.L_pack_op1:
+ ! See if denormalized result is possible
+ mov.l .L_chk_25,r5
+ cmp/pl r3
+
+ bf .L_denorm_res
+
+ ! Are there any bits shifted previously?
+ tst r8,r8
+ bt .L_pack_1
+
+ ! Round
+ shll r8
+ movt r6
+
+ add r6,r4
+
+ ! If we are halfway between two numbers,
+ ! round towards LSB = 0
+ tst r8,r8
+
+ bf .L_pack_1
+
+ shlr r4
+ shll r4
+
+.L_pack_1:
+ ! Adjust extra MSB generated after rounding
+ tst r4,r5
+ mov.l .L_255,r2
+
+ bt .L_pack_2
+ shar r4
+
+ add #1,r3
+ cmp/ge r2,r3 ! Check for exp overflow
+
+ bt .L_ret_inf
+
+! Pack it finally
+.L_pack_2:
+ ! Do not store implicit bit
+ mov.l .L_nimp_bit,r2
+ mov #23,r1
+
+ and r2,r4
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r3)
+#else
+ shld r1,r3
+#endif
+ mov.l @r15+,r8
+
+ or r4,r0
+ rts
+ or r3,r0
+
+! Return infinity
+.L_ret_inf:
+ mov.l .L_pinf,r2
+
+ mov.l @r15+,r8
+ rts
+ or r2,r0
+
+! Result must be denormalized
+.L_denorm_res:
+ mov #0,r2
+
+! Denormalizing loop with rounding
+.L_den_1:
+ shar r4
+ movt r6
+
+ tst r3,r3
+ bt .L_den_2
+
+ ! Increment the exponent
+ add #1,r3
+
+ tst r6,r6
+ bt .L_den_0
+
+ ! Count number of ON bits shifted
+ add #1,r2
+
+.L_den_0:
+ bra .L_den_1
+ nop
+
+! Apply rounding
+.L_den_2:
+ cmp/eq r6,r1
+ bf .L_den_3
+
+ add r6,r4
+ mov #1,r1
+
+ ! If halfway between two numbers,
+ ! round towards LSB = 0
+ cmp/eq r2,r1
+ bf .L_den_3
+
+ shar r4
+ shll r4
+
+.L_den_3:
+
+ mov.l @r15+,r8
+ rts
+ or r4,r0
+
+ .align 2
+.L_imp_bit:
+ .long 0x00800000
+
+.L_nimp_bit:
+ .long 0xFF7FFFFF
+
+.L_mask_fra:
+ .long 0x007FFFFF
+
+.L_pinf:
+ .long 0x7F800000
+
+.L_sign_bit:
+ .long 0x80000000
+
+.L_bit_25:
+ .long 0x01000000
+
+.L_chk_25:
+ .long 0x7F000000
+
+.L_255:
+ .long 0x000000FF
+
+ENDFUNC (GLOBAL (addsf3))
+ENDFUNC (GLOBAL (subsf3))
Index: gcc/config/sh/IEEE-754/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/mulsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/mulsf3.S (revision 0)
@@ -0,0 +1,352 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for multiplying two floating point numbers
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 and r5
+! Result: r0
+
+! The arguments are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (mulsf3)
+ FUNC (GLOBAL (mulsf3))
+
+GLOBAL (mulsf3):
+ ! Extract the sign bits
+ mov.l .L_sign,r3
+ mov r3,r0
+
+ and r4,r3 ! sign bit for op1
+ mov.l .L_sign_mask,r6
+
+ ! Mask out the sign bit from op1 and op2
+ and r5,r0 ! sign bit for op2
+ mov.l .L_inf,r2
+
+ and r6,r4
+ xor r3,r0 ! Final sign in r0
+
+ and r6,r5
+ tst r4,r4
+
+ ! Check for zero
+ mov r5,r7
+ ! Check op1 for zero
+ SL(bt, .L_op1_zero,
+ mov r4,r6)
+
+ tst r5,r5
+ bt .L_op2_zero ! op2 is zero
+
+ ! Extract the exponents
+ and r2,r6 ! Exponent of op1
+ cmp/eq r2,r6
+
+ and r2,r7
+ bt .L_inv_op1 ! op1 is NaN or Inf
+
+ mov.l .L_mant,r3
+ cmp/eq r2,r7
+
+ and r3,r4 ! Mantissa of op1
+ bt .L_ret_op2 ! op2 is Nan or Inf
+
+ and r3,r5 ! Mantissa of op2
+
+ mov #-23,r3
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+ SHLR23 (r7)
+#else
+ shld r3,r6
+ shld r3,r7
+#endif
+ ! Check for denormals
+ mov.l .L_24bit,r3
+ tst r6,r6
+
+ bt .L_norm_op1 ! op1 is denormal
+ add #-127,r6 ! Unbias op1's exp
+
+ tst r7,r7
+ bt .L_norm_op2 ! op2 is denormal
+
+ add #-127,r7 ! Unbias op2's exp
+
+.L_multiply:
+ add r6,r7 ! Final exponent in r7
+ mov.l .L_24bit,r1
+
+ ! set 24th bit of mantissas
+ mov #127,r3
+ or r1,r4
+
+ DMULU_SAVE
+
+ ! Multiply
+ or r1,r5
+ DMULUL (r4,r5,r4)
+
+ DMULUH (r5)
+
+ DMULU_RESTORE
+
+ mov.l .L_16bit,r6
+
+ ! Check for extra MSB generated
+ tst r5,r6
+
+ mov.l .L_255,r1
+ bf .L_shift_by_1 ! Adjust the extra MSB
+
+! Normalize the result with rounding
+.L_epil:
+ ! Bias the exponent
+ add #127,r7
+ cmp/ge r1,r7
+
+ ! Check exponent overflow and underflow
+ bt .L_ret_inf
+
+ cmp/pl r7
+ bf .L_denorm
+
+.L_epil_0:
+ mov #-23,r3
+ shll r5
+ mov #0,r6
+
+! Fit resultant mantissa in 24 bits
+! Apply default rounding
+.L_loop_epil_0:
+ tst r3,r3
+ bt .L_loop_epil_out
+
+ add #1,r3
+ shlr r4
+
+ bra .L_loop_epil_0
+ rotcr r6
+
+! Round mantissa
+.L_loop_epil_out:
+ shll8 r5
+ or r5,r4
+
+ mov.l .L_mant,r2
+ mov #23,r3
+
+ ! Check last bit shifted out of result
+ tst r6,r6
+ bt .L_epil_2
+
+ ! Round
+ shll r6
+ movt r5
+
+ add r5,r4
+
+ ! If this is the only ON bit shifted
+ ! Round towards LSB = 0
+ tst r6,r6
+ bf .L_epil_2
+
+ shlr r4
+ shll r4
+
+.L_epil_2:
+ ! Rounding may have produced extra MSB.
+ mov.l .L_25bit,r5
+ tst r4,r5
+
+ bt .L_epil_1
+
+ add #1,r7
+ shlr r4
+
+.L_epil_1:
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r7)
+#else
+ shld r3,r7
+#endif
+
+ and r2,r4
+
+ or r7,r4
+ rts
+ or r4,r0
+
+.L_denorm:
+ mov #0,r3
+
+.L_den_1:
+ shlr r5
+ rotcr r4
+
+ cmp/eq r3,r7
+ bt .L_epil_0
+
+ bra .L_den_1
+ add #1,r7
+
+
+! Normalize the first argument
+.L_norm_op1:
+ shll r4
+ tst r3,r4
+
+ add #-1,r6
+ bt .L_norm_op1
+
+ ! The biasing is by 126
+ add #-126,r6
+ tst r7,r7
+
+ bt .L_norm_op2
+
+ bra .L_multiply
+ add #-127,r7
+
+! Normalize the second argument
+.L_norm_op2:
+ shll r5
+ tst r3,r5
+
+ add #-1,r7
+ bt .L_norm_op2
+
+ bra .L_multiply
+ add #-126,r7
+
+! op2 is zero. Check op1 for exceptional cases
+.L_op2_zero:
+ mov.l .L_inf,r2
+ and r2,r6
+
+ ! Check if op1 is deterministic
+ cmp/eq r2,r6
+ SL(bf, .L_ret_op2,
+ mov #1,r1)
+
+ ! Return NaN
+ rts
+ mov #-1,r0
+
+! Adjust the extra MSB
+.L_shift_by_1:
+ shlr r5
+ rotcr r4
+
+ add #1,r7 ! Show the shift in exponent
+
+ cmp/gt r3,r7
+ bf .L_epil
+
+ ! The resultant exponent is invalid
+ mov.l .L_inf,r1
+ rts
+ or r1,r0
+
+.L_ret_op1:
+ rts
+ or r4,r0
+
+! op1 is zero. Check op2 for exceptional cases
+.L_op1_zero:
+ mov.l .L_inf,r2
+ and r2,r7
+
+ ! Check if op2 is deterministic
+ cmp/eq r2,r7
+ SL(bf, .L_ret_op1,
+ mov #1,r1)
+
+ ! Return NaN
+ rts
+ mov #-1,r0
+
+.L_inv_op1:
+ mov.l .L_mant,r3
+ mov r4,r6
+
+ and r3,r6
+ tst r6,r6
+
+ bf .L_ret_op1 ! op1 is Nan
+ ! op1 is not Nan. It is Inf
+
+ cmp/eq r2,r7
+ bf .L_ret_op1 ! op2 has a valid exponent
+
+! op2 has a invalid exponent. It could be Inf, -Inf, Nan.
+! It doesn't make any difference.
+.L_ret_op2:
+ rts
+ or r5,r0
+
+.L_ret_inf:
+ rts
+ or r2,r0
+
+.L_ret_zero:
+ mov #0,r2
+ rts
+ or r2,r0
+
+
+ .align 2
+.L_mant:
+ .long 0x007FFFFF
+
+.L_inf:
+ .long 0x7F800000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_25bit:
+ .long 0x01000000
+
+.L_16bit:
+ .long 0x00008000
+
+.L_sign:
+ .long 0x80000000
+
+.L_sign_mask:
+ .long 0x7FFFFFFF
+
+.L_255:
+ .long 0x000000FF
+
+ENDFUNC (GLOBAL (mulsf3))
Index: gcc/config/sh/IEEE-754/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatsidf.S (revision 0)
@@ -0,0 +1,151 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of signed integer to double precision floating point number
+!Author:Rakesh Kumar
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatsidf)
+ FUNC (GLOBAL (floatsidf))
+
+GLOBAL (floatsidf):
+ mov.l .L_sign,r0
+ mov #0,r1
+
+ mov r0,r2
+ tst r4,r4 ! check r4 for zero
+
+ ! Extract the sign
+ mov r2,r3
+ SL(bt, .L_ret_zero,
+ and r4,r0)
+
+ cmp/eq r1,r0
+ not r3,r3
+
+ mov r1,r7
+ SL(bt, .L_loop,
+ and r4,r3)
+
+ ! Treat -2147483648 as special case
+ cmp/eq r1,r3
+ neg r4,r4
+
+ bt .L_ret_min
+
+.L_loop:
+ shll r4
+ mov r4,r5
+
+ and r2,r5
+ cmp/eq r1,r5
+
+ add #1,r7
+ bt .L_loop
+
+ mov.l .L_initial_exp,r6
+ not r2,r2
+
+ and r2,r4
+ mov #21,r3
+
+ sub r7,r6
+ mov r4,r1
+
+ mov #20,r7
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r1
+#else
+ SHLL21 (r1)
+#endif
+ mov #-11,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r6 ! Exponent in proper place
+#else
+ SHLL20 (r6)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r4
+#else
+ SHLR11 (r4)
+#endif
+ or r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+#ifdef __LITTLE_ENDIAN__
+ or r4,r1
+#else
+ or r4,r0
+#endif
+
+.L_ret_zero:
+ rts
+ mov #0,r0
+
+.L_ret_min:
+ mov.l .L_min,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ nop
+
+ .align 2
+
+.L_initial_exp:
+ .long 0x0000041E
+
+.L_sign:
+ .long 0x80000000
+
+.L_min:
+ .long 0xC1E00000
+
+ENDFUNC (GLOBAL (floatsidf))
Index: gcc/config/sh/IEEE-754/fixsfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixsfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixsfsi.S (revision 0)
@@ -0,0 +1,165 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion routine for float to integer
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 (in floating point format)
+! Return: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixsfsi)
+ FUNC (GLOBAL (fixsfsi))
+
+GLOBAL (fixsfsi):
+ mov.l .L_mask_sign,r7
+ mov r4,r2
+
+ ! Check for NaN
+ mov.l .L_inf,r1
+ and r7,r2
+
+ cmp/gt r1,r2
+ mov #127,r5
+
+ mov r4,r3
+ SL(bt, .L_epil,
+ mov #0,r0)
+
+ shll r2
+ mov.l .L_frac,r6
+
+ shlr16 r2
+ and r6,r3 ! r3 has fraction
+
+ shlr8 r2 ! r2 has exponent
+ mov.l .L_24bit,r1
+
+ ! If exponent is less than 127, return 0
+ cmp/gt r2,r5
+ or r1,r3 ! Set the implicit bit
+
+ mov.l .L_157,r1
+ SL1(bt, .L_epil,
+ shll8 r3)
+
+ ! If exponent is greater than 157,
+ ! return the maximum/minumum integer
+ ! value deducing from sign
+ cmp/gt r1,r2
+ sub r2,r1
+
+ mov.l .L_sign,r2
+ SL(bt, .L_ret_max,
+ add #1,r1)
+
+ and r4,r2 ! Sign in r2
+ neg r1,r1
+
+ ! Shift mantissa by exponent difference from 157
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r3
+#else
+ cmp/gt r0,r1
+ bt .L_mov_left
+
+.L_mov_right:
+ cmp/eq r1,r0
+ bt .L_ret
+
+ add #1,r1
+ bra .L_mov_right
+
+ shlr r3
+
+.L_mov_left:
+ add #-1,r1
+
+ shll r3
+ cmp/eq r1,r0
+
+ bf .L_mov_left
+.L_ret:
+#endif
+ ! If op1 is negative, negate the result
+ cmp/eq r0,r2
+ SL(bf, .L_negate,
+ mov r3,r0)
+
+! r0 has the appropriate value
+.L_epil:
+ rts
+ nop
+
+! Return the max/min integer value
+.L_ret_max:
+ and r4,r2 ! Sign in r2
+ mov.l .L_max,r3
+
+ mov.l .L_sign,r1
+ cmp/eq r0,r2
+
+ mov r3,r0
+ bt .L_epil
+
+ ! Negative number, return min int
+ rts
+ mov r1,r0
+
+! Negate the result
+.L_negate:
+ rts
+ neg r0,r0
+
+ .align 2
+.L_inf:
+ .long 0x7F800000
+
+.L_157:
+ .long 157
+
+.L_max:
+ .long 0x7FFFFFFF
+
+.L_frac:
+ .long 0x007FFFFF
+
+.L_sign:
+ .long 0x80000000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_mask_sign:
+ .long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixsfsi))
Index: gcc/config/sh/ieee-754-sf.S
===================================================================
--- gcc/config/sh/ieee-754-sf.S (revision 0)
+++ gcc/config/sh/ieee-754-sf.S (revision 0)
@@ -0,0 +1,697 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file. (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING. If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA. */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_ANY__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Single-precision floating-point emulation.
+ We handle NANs, +-infinity, and +-zero.
+ However, we assume that for NANs, the topmost bit of the fraction is set. */
+#ifdef L_nesf2
+/* -ffinite-math-only inline version, T := r4:SF == r5:SF
+ cmp/eq r4,r5
+ mov r4,r0
+ bt 0f
+ or r5,r0
+ add r0,r0
+ tst r0,r0 ! test for +0.0 == -0.0 ; -0.0 == +0.0
+ 0: */
+ .balign 4
+ .global GLOBAL(nesf2)
+ HIDDEN_FUNC(GLOBAL(nesf2))
+GLOBAL(nesf2):
+ /* If the raw values are unequal, the result is unequal, unless
+ both values are +-zero.
+ If the raw values are equal, the result is equal, unless
+ the values are NaN. */
+ cmp/eq r4,r5
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ not r4,r0
+ bt LOCAL(check_nan)
+ mov r4,r0
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(check_nan):
+ tst r1,r0
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(nesf2))
+#endif /* L_nesf2 */
+
+#ifdef L_unordsf2
+ .balign 4
+ .global GLOBAL(unordsf2)
+ HIDDEN_FUNC(GLOBAL(unordsf2))
+GLOBAL(unordsf2):
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ not r4,r0
+ tst r1,r0
+ not r5,r0
+ bt LOCAL(unord)
+ tst r1,r0
+LOCAL(unord):
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(unordsf2))
+#endif /* L_unordsf2 */
+
+#if defined(L_gtsf2t) || defined(L_gtsf2t_trap)
+/* -ffinite-math-only inline version, T := r4:SF > r5:SF ? 0 : 1
+ cmp/pz r4
+ mov r4,r0
+ bf/s 0f
+ cmp/hs r5,r4
+ cmp/ge r4,r5
+ or r5,r0
+ bt 0f
+ add r0,r0
+ tst r0,r0
+ 0: */
+#ifdef L_gtsf2t
+#define fun_label GLOBAL(gtsf2t)
+#else
+#define fun_label GLOBAL(gtsf2t_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater, the result true, unless
+ any of them is a nan (but infinity is fine), or both values are
+ +- zero. Otherwise, the result false. */
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ cmp/pz r4
+ not r5,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov r4,r0
+ bt LOCAL(nan)
+ cmp/gt r5,r4
+ SLC(bf, LOCAL(check_nan),
+ cmp/gt r4,r1)
+ bf LOCAL(nan)
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ not r4,r0
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi r4,r5
+#if defined(L_gtsf2t) && defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+#endif /* DELAYED_BRANCHES */
+ rts
+ movt r0
+#ifdef L_gtsf2t
+LOCAL(check_nan):
+LOCAL(nan):
+ rts
+ mov #0,r0
+#else /* ! L_gtsf2t */
+LOCAL(check_nan):
+ SLI(cmp/gt r4,r1)
+ bf LOCAL(nan)
+ rts
+ movt r0
+LOCAL(nan):
+ mov #0,r0
+ trapa #0
+#endif /* ! L_gtsf2t */
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(fun_label)
+#endif /* L_gtsf2t */
+
+#if defined(L_gesf2f) || defined(L_gesf2f_trap)
+/* -ffinite-math-only inline version, T := r4:SF >= r5:SF */
+ cmp/pz r5
+ mov r4,r0
+ bf/s 0f
+ cmp/hs r4,r5
+ cmp/ge r5,r4
+ or r5,r0
+ bt 0f
+ add r0,r0
+ tst r0,r0
+ 0:
+#ifdef L_gesf2f
+#define fun_label GLOBAL(gesf2f)
+#else
+#define fun_label GLOBAL(gesf2f_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater or equal, the result is
+ true, unless any of them is a nan. If both are -+zero, the
+ result is true; otherwise, it is false.
+ We use 0 as true and nonzero as false for this function. */
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ cmp/pz r5
+ not r4,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov r4,r0
+ bt LOCAL(nan)
+ cmp/gt r4,r5
+ SLC(bf, LOCAL(check_nan),
+ cmp/ge r1,r5)
+ bt LOCAL(nan)
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ not r5,r0
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi r5,r4
+#if defined(L_gesf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+ rts
+ movt r0
+#if defined(L_gesf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+ cmp/ge r1,r5
+LOCAL(nan):
+ rts
+ movt r0
+#endif /* ! DELAYED_BRANCHES */
+#ifdef L_gesf2f_trap
+LOCAL(check_nan):
+ SLI(cmp/ge r1,r5)
+ bt LOCAL(nan)
+ rts
+LOCAL(nan):
+ movt r0
+ trapa #0
+#endif /* L_gesf2f_trap */
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(gesf2f))
+#endif /* L_gesf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_add_sub_sf3
+#include "IEEE-754/addsf3.S"
+#endif /* _add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+#include "IEEE-754/fixunssfsi.S"
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+#include "IEEE-754/fixsfsi.S"
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/divsf3.S"
+#endif /* L_divsf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift. Supporting SH1 / SH2 here would
+ make this code too hard to maintain, so if you want to add SH1 / SH2
+ support, do it in a separate copy. */
+#ifdef DYN_SHIFT
+#ifdef L_add_sub_sf3
+#include "IEEE-754/m3/addsf3.S"
+#endif /* L_add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/m3/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get UINT_MAX, for set sign bit, you get 0.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixunssfsi)
+ FUNC(GLOBAL(fixunssfsi))
+GLOBAL(fixunssfsi):
+ mov.l LOCAL(max),r2
+ mov #-23,r1
+ mov r4,r0
+ shad r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/ge r2,r0
+ or r2,r0
+ bt LOCAL(retmax)
+ cmp/pz r4
+ and r1,r0
+ bf LOCAL(ret0)
+ add #-23,r4
+ rts
+ shld r4,r0
+LOCAL(ret0):
+LOCAL(retmax):
+ rts
+ subc r0,r0
+ .balign 4
+LOCAL(mask):
+ .long 0x00ffffff
+LOCAL(max):
+ .long 0x4f800000
+ ENDFUNC(GLOBAL(fixunssfsi))
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get INT_MAX, for set sign bit, you get INT_MIN.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixsfsi)
+ FUNC(GLOBAL(fixsfsi))
+ .balign 4
+GLOBAL(fixsfsi):
+ mov r4,r0
+ shll r4
+ mov #-24,r1
+ bt LOCAL(neg)
+ mov.l LOCAL(max),r2
+ shld r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/pz r4
+ add #-23,r4
+ bf LOCAL(ret0)
+ cmp/gt r0,r2
+ bf LOCAL(retmax)
+ and r1,r0
+ addc r1,r0
+ rts
+ shld r4,r0
+
+ .balign 4
+LOCAL(neg):
+ mov.l LOCAL(min),r2
+ shld r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/pz r4
+ add #-23,r4
+ bf LOCAL(ret0)
+ cmp/gt r0,r2
+ bf LOCAL(retmin)
+ and r1,r0
+ addc r1,r0
+ shld r4,r0 ! SH4-200 will start this insn on a new cycle
+ rts
+ neg r0,r0
+
+ .balign 4
+LOCAL(ret0):
+ rts
+ mov #0,r0
+
+LOCAL(retmax):
+ mov #-1,r0
+ rts
+ shlr r0
+
+LOCAL(retmin):
+ mov #1,r0
+ rts
+ rotr r0
+
+ .balign 4
+LOCAL(mask):
+ .long 0x007fffff
+LOCAL(max):
+ .long 0x4f000000
+LOCAL(min):
+ .long 0xcf000000
+ ENDFUNC(GLOBAL(fixsfsi))
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/m3/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/m3/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/m3/divsf3.S"
+#endif /* L_divsf3 */
+
+#ifdef L_hypotf
+ .balign 4
+ .global GLOBAL(hypotf)
+ FUNC(GLOBAL(hypotf))
+GLOBAL(hypotf):
+/* This integer implementation takes 71 to 72 cycles in the main path.
+ This is a bit slower than the SH4 can do this computation using double
+ precision hardware floating point - 57 cycles, or 69 with mode switches. */
+ /* First, calculate x (r4) as the sum of the square of the fractions -
+ the exponent is calculated separately in r3.
+ Then, calculate sqrt(x) for the fraction by reciproot iteration.
+ We get an 7.5 bit inital value using linear approximation with two slopes
+ that are powers of two.
+ x (- [1. .. 2.) y0 := 1.25 - x/4 - tab(x) y (- (0.8 .. 1.0)
+ x (- [2. .. 4.) y0 := 1. - x/8 - tab(x) y (- (0.5 .. 0.8)
+ x is represented with two bits before the point,
+ y with 0 bits before the binary point.
+ Thus, to calculate y0 := 1. - x/8 - tab(x), all you have to do is to shift x
+ right by 1, negate it, and subtract tab(x). */
+
+ /* y1 := 1.5*y0 - 0.5 * (x * y0) * (y0 * y0)
+ z0 := x * y1
+ z1 := z0 + 0.5 * (y1 - (y1*y1) * z0) */
+
+ mov.l LOCAL(xff000000),r1
+ add r4,r4
+ mov r4,r0
+ add r5,r5
+ cmp/hs r5,r4
+ sub r5,r0
+ mov #-24,r2
+ bf/s LOCAL(r5_large)
+ shad r2,r0
+ mov r4,r3
+ shll8 r4
+ rotcr r4
+ tst #0xe0,r0
+ neg r0,r0
+ bt LOCAL(ret_abs_r3)
+ tst r1,r5
+ shll8 r5
+ bt/s LOCAL(denorm_r5)
+ cmp/hi r3,r1
+ dmulu.l r4,r4
+ bf LOCAL(inf_nan)
+ rotcr r5
+ shld r0,r5
+LOCAL(denorm_r5_done):
+ sts mach,r4
+ dmulu.l r5,r5
+ mov.l r6,@-r15
+ mov #20,r6
+
+ sts mach,r5
+LOCAL(add_frac):
+ mova LOCAL(tab)-32,r0
+ mov.l r7,@-r15
+ mov.w LOCAL(x1380),r7
+ and r1,r3
+ addc r5,r4
+ mov.w LOCAL(m25),r2 ! -25
+ bf LOCAL(frac_ok)
+ sub r1,r3
+ rotcr r4
+ cmp/eq r1,r3 ! did we generate infinity ?
+ bt LOCAL(inf_nan)
+ shlr r4
+ mov r4,r1
+ shld r2,r1
+ mov.b @(r0,r1),r0
+ mov r4,r1
+ shld r6,r1
+ bra LOCAL(frac_low2)
+ sub r1,r7
+
+LOCAL(frac_ok):
+ mov r4,r1
+ shld r2,r1
+ mov.b @(r0,r1),r1
+ cmp/pz r4
+ mov r4,r0
+ bt/s LOCAL(frac_low)
+ shld r6,r0
+ mov.w LOCAL(xf80),r7
+ shlr r0
+LOCAL(frac_low):
+ sub r0,r7
+LOCAL(frac_low2):
+ mov.l LOCAL(x40000080),r0 ! avoid denorm results near 1. << r3
+ sub r1,r7 ! {0.12}
+ mov.l LOCAL(xfffe0000),r5 ! avoid rounding overflow near 4. << r3
+ swap.w r7,r1 ! {0.28}
+ dmulu.l r1,r4 /* two issue cycles */
+ mulu.w r7,r7 /* two issue cycles */
+ sts mach,r2 ! {0.26}
+ mov r1,r7
+ shlr r1
+ sts macl,r6 ! {0.24}
+ cmp/hi r0,r4
+ shlr2 r2
+ bf LOCAL(near_one)
+ shlr r2 ! {0.23} systemic error of linear approximation keeps y1 < 1
+ dmulu.l r2,r6
+ cmp/hs r5,r4
+ add r7,r1 ! {1.28}
+ bt LOCAL(near_four)
+ shlr2 r1 ! {1.26}
+ sts mach,r0 ! {0.15} x*y0^3 == {0.16} 0.5*x*y0^3
+ shlr2 r1 ! {1.24}
+ shlr8 r1 ! {1.16}
+ sett ! compensate for truncation of subtrahend, keep y1 < 1
+ subc r0,r1 ! {0.16} y1; max error about 3.5 ulp
+ swap.w r1,r0
+ dmulu.l r0,r4 ! { 1.30 }
+ mulu.w r1,r1
+ sts mach,r2
+ shlr2 r0
+ sts macl,r1
+ add r2,r0
+ mov.l LOCAL(xff000000),r6
+ add r2,r0
+ dmulu.l r1,r2
+ add #127,r0
+ add r6,r3 ! precompensation for adding leading 1
+ sts mach,r1
+ shlr r3
+ mov.l @r15+,r7
+ sub r1,r0 ! {0.31} max error about 50 ulp (+127)
+ mov.l @r15+,r6
+ shlr8 r0 ! {0.23} max error about 0.7 ulp
+ rts
+ add r3,r0
+
+LOCAL(r5_large):
+ mov r5,r3
+ mov #-31,r2
+ cmp/ge r2,r0
+ shll8 r5
+ bf LOCAL(ret_abs_r3)
+ rotcr r5
+ tst r1,r4
+ shll8 r4
+ bt/s LOCAL(denorm_r4)
+ cmp/hi r3,r1
+ dmulu.l r5,r5
+ bf LOCAL(inf_nan)
+ rotcr r4
+LOCAL(denorm_r4_done):
+ shld r0,r4
+ sts mach,r5
+ dmulu.l r4,r4
+ mov.l r6,@-r15
+ mov #20,r6
+ bra LOCAL(add_frac)
+ sts mach,r4
+
+LOCAL(near_one):
+ bra LOCAL(assemble_sqrt)
+ mov #0,r0
+LOCAL(near_four):
+ ! exact round-to-nearest would add 255. We add 256 for speed & compactness.
+ mov r4,r0
+ shlr8 r0
+ add #1,r0
+ tst r0,r0
+ addc r0,r3 ! might generate infinity.
+LOCAL(assemble_sqrt):
+ mov.l @r15+,r7
+ shlr r3
+ mov.l @r15+,r6
+ rts
+ add r3,r0
+LOCAL(inf_nan):
+LOCAL(ret_abs_r3):
+ mov r3,r0
+ rts
+ shlr r0
+LOCAL(denorm_r5):
+ bf LOCAL(inf_nan)
+ tst r1,r4
+ bt LOCAL(denorm_both)
+ dmulu.l r4,r4
+ bra LOCAL(denorm_r5_done)
+ shld r0,r5
+LOCAL(denorm_r4):
+ bf LOCAL(inf_nan)
+ tst r1,r5
+ dmulu.l r5,r5
+ bf LOCAL(denorm_r4_done)
+LOCAL(denorm_both): ! normalize according to r3.
+ extu.w r3,r2
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r3,r2
+ mov #-8,r2
+ bt 0f
+ tst r1,r3
+ mov #-16,r2
+ bt 0f
+ mov #-24,r2
+0:
+ shld r2,r3
+ mov.l r7,@-r15
+#ifdef __pic__
+ add r0,r3
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r3),r0
+ add #32,r2
+ sub r0,r2
+ shld r2,r4
+ mov r2,r7
+ dmulu.l r4,r4
+ sts.l pr,@-r15
+ mov #1,r3
+ bsr LOCAL(denorm_r5_done)
+ shld r2,r5
+ mov.l LOCAL(x01000000),r1
+ neg r7,r2
+ lds.l @r15+,pr
+ tst r1,r0
+ mov.l @r15+,r7
+ bt 0f
+ add #1,r2
+ sub r1,r0
+0:
+ rts
+ shld r2,r0
+
+LOCAL(m25):
+ .word -25
+LOCAL(x1380):
+ .word 0x1380
+LOCAL(xf80):
+ .word 0xf80
+ .balign 4
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x40000080):
+ .long 0x40000080
+LOCAL(xfffe0000):
+ .long 0xfffe0000
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+
+/*
+double err(double x)
+{
+ return (x < 2. ? 1.25 - x/4. : 1. - x/8.) - 1./sqrt(x);
+}
+
+int
+main ()
+{
+ int i = 0;
+ double x, s, v;
+ double lx, hx;
+
+ s = 1./32.;
+ for (x = 1.; x < 4; x += s, i++)
+ {
+ lx = x;
+ hx = x + s - 1. / (1 << 30);
+ v = 0.5 * (err (lx) + err (hx));
+ printf ("%s% 4d%c",
+ (i & 7) == 0 ? "\t.byte\t" : "",
+ (int)(v * 4096 + 0.5) - 128,
+ (i & 7) == 7 ? '\n' : ',');
+ }
+ return 0;
+} */
+
+ .balign 4
+LOCAL(tab):
+ .byte -113, -84, -57, -33, -11, 8, 26, 41
+ .byte 55, 67, 78, 87, 94, 101, 106, 110
+ .byte 113, 115, 115, 115, 114, 112, 109, 106
+ .byte 101, 96, 91, 84, 77, 69, 61, 52
+ .byte 51, 57, 63, 68, 72, 77, 80, 84
+ .byte 87, 89, 91, 93, 95, 96, 97, 97
+ .byte 97, 97, 97, 96, 95, 94, 93, 91
+ .byte 89, 87, 84, 82, 79, 76, 72, 69
+ .byte 65, 61, 57, 53, 49, 44, 39, 34
+ .byte 29, 24, 19, 13, 8, 2, -4, -10
+ .byte -17, -23, -29, -36, -43, -50, -57, -64
+ .byte -71, -78, -85, -93,-101,-108,-116,-124
+ ENDFUNC(GLOBAL(hypotf))
+#endif /* L_hypotf */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_ANY__ */
Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md (revision 162269)
+++ gcc/config/sh/sh.md (working copy)
@@ -107,6 +107,7 @@ (define_constants [
(DR0_REG 64)
(DR2_REG 66)
(DR4_REG 68)
+ (FR4_REG 68)
(FR23_REG 87)
(TR0_REG 128)
@@ -174,6 +175,15 @@ (define_constants [
(UNSPECV_WINDOW_END 10)
(UNSPECV_CONST_END 11)
(UNSPECV_EH_RETURN 12)
+ ;; NaN handling for software floating point:
+ ;; We require one bit specific for a precision to be set in all NaNs,
+ ;; so that we can test them with a not / tst sequence.
+ ;; ??? Ironically, this is the quiet bit for now, because that is the
+ ;; only bit set by __builtin_nan ("").
+ ;; ??? Should really use one bit lower and force it set by using
+ ;; a custom encoding function.
+ (SF_NAN_MASK 0x7fc00000)
+ (DF_NAN_MASK 0x7ff80000)
])
;; -------------------------------------------------------------------------
@@ -615,6 +625,14 @@ (define_insn "cmpeqsi_t"
cmp/eq %1,%0"
[(set_attr "type" "mt_group")])
+(define_insn "fpcmp_i1"
+ [(set (reg:SI T_REG)
+ (match_operator:SI 1 "soft_fp_comparison_operator"
+ [(match_operand 0 "soft_fp_comparison_operand" "r") (const_int 0)]))]
+ "TARGET_SH1_SOFTFP"
+ "tst %0,%0"
+ [(set_attr "type" "mt_group")])
+
(define_insn "cmpgtsi_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:SI 0 "arith_reg_operand" "r,r")
@@ -1154,9 +1172,9 @@ (define_insn_and_split "*movsicc_umin"
(define_insn "*movsicc_t_false"
[(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
- (if_then_else (eq (reg:SI T_REG) (const_int 0))
- (match_operand:SI 1 "general_movsrc_operand" "r,I08")
- (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+ (if_then_else:SI (eq (reg:SI T_REG) (const_int 0))
+ (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+ (match_operand:SI 2 "arith_reg_operand" "0,0")))]
"TARGET_PRETEND_CMOVE
&& (arith_reg_operand (operands[1], SImode)
|| (immediate_operand (operands[1], SImode)
@@ -1167,9 +1185,9 @@ (define_insn "*movsicc_t_false"
(define_insn "*movsicc_t_true"
[(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
- (if_then_else (ne (reg:SI T_REG) (const_int 0))
- (match_operand:SI 1 "general_movsrc_operand" "r,I08")
- (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+ (if_then_else:SI (ne (reg:SI T_REG) (const_int 0))
+ (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+ (match_operand:SI 2 "arith_reg_operand" "0,0")))]
"TARGET_PRETEND_CMOVE
&& (arith_reg_operand (operands[1], SImode)
|| (immediate_operand (operands[1], SImode)
@@ -6849,6 +6867,50 @@ (define_insn "stuff_delay_slot"
\f
;; Conditional branch insns
+(define_expand "cmpun_sdf"
+ [(unordered (match_operand 0 "" "") (match_operand 1 "" ""))]
+ ""
+ "
+{
+ HOST_WIDE_INT mask;
+ switch (GET_MODE (operands[0]))
+ {
+ case SFmode:
+ mask = SF_NAN_MASK;
+ break;
+ case DFmode:
+ mask = DF_NAN_MASK;
+ break;
+ default:
+ FAIL;
+ }
+ emit_insn (gen_cmpunsf_i1 (operands[0], operands[1],
+ force_reg (SImode, GEN_INT (mask))));
+ DONE;
+}")
+
+(define_expand "cmpuneq_sdf"
+ [(uneq (match_operand 0 "" "") (match_operand 1 "" ""))]
+ ""
+ "
+{
+ HOST_WIDE_INT mask;
+ switch (GET_MODE (operands[0]))
+ {
+ case SFmode:
+ mask = SF_NAN_MASK;
+ break;
+ case DFmode:
+ mask = DF_NAN_MASK;
+ break;
+ default:
+ FAIL;
+ }
+ emit_insn (gen_cmpuneqsf_i1 (operands[0], operands[1],
+ force_reg (SImode, GEN_INT (mask))));
+ DONE;
+}")
+
(define_expand "cbranchint4_media"
[(set (pc)
(if_then_else (match_operator 0 "shmedia_cbranch_comparison_operator"
@@ -9355,6 +9417,19 @@ (define_expand "cstoredi4"
+(define_expand "sunle"
+ [(set (match_operand:SI 0 "arith_reg_operand" "")
+ (match_dup 1))]
+ "TARGET_SH1_SOFTFP"
+ "
+{
+ if (! currently_expanding_to_rtl)
+ FAIL;
+ sh_emit_compare_and_branch (operands, UNLE);
+ emit_insn (gen_movt (operands[0]));
+ DONE;
+}")
+
;; sne moves the complement of the T reg to DEST like this:
;; cmp/eq ...
;; mov #-1,temp
@@ -9394,11 +9469,15 @@ (define_split
(define_expand "cstoresf4"
[(set (match_operand:SI 0 "register_operand" "=r")
(match_operator:SI 1 "sh_float_comparison_operator"
- [(match_operand:SF 2 "arith_operand" "")
- (match_operand:SF 3 "arith_operand" "")]))]
+ [(match_operand:SF 2 "nonmemory_operand" "")
+ (match_operand:SF 3 "nonmemory_operand" "")]))]
"TARGET_SH2E || TARGET_SHMEDIA_FPU"
"if (TARGET_SHMEDIA)
{
+ if (!arith_operand (operands[2], DFmode))
+ operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+ if (!arith_operand (operands[3], DFmode))
+ operands[3] = copy_to_mode_reg (DFmode, operands[3]);
emit_insn (gen_cstore4_media (operands[0], operands[1],
operands[2], operands[3]));
DONE;
@@ -9407,18 +9486,22 @@ (define_expand "cstoresf4"
if (! currently_expanding_to_rtl)
FAIL;
- sh_emit_compare_and_set (operands, SFmode);
+ sh_expand_float_scc (operands);
DONE;
")
(define_expand "cstoredf4"
[(set (match_operand:SI 0 "register_operand" "=r")
(match_operator:SI 1 "sh_float_comparison_operator"
- [(match_operand:DF 2 "arith_operand" "")
- (match_operand:DF 3 "arith_operand" "")]))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ [(match_operand:DF 2 "nonmemory_operand" "")
+ (match_operand:DF 3 "nonmemory_operand" "")]))]
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"if (TARGET_SHMEDIA)
{
+ if (!arith_operand (operands[2], DFmode))
+ operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+ if (!arith_operand (operands[3], DFmode))
+ operands[3] = copy_to_mode_reg (DFmode, operands[3]);
emit_insn (gen_cstore4_media (operands[0], operands[1],
operands[2], operands[3]));
DONE;
@@ -9427,7 +9510,7 @@ (define_expand "cstoredf4"
if (! currently_expanding_to_rtl)
FAIL;
- sh_emit_compare_and_set (operands, DFmode);
+ sh_expand_float_scc (operands);
DONE;
")
@@ -9765,7 +9848,7 @@ (define_expand "addsf3"
[(set (match_operand:SF 0 "arith_reg_operand" "")
(plus:SF (match_operand:SF 1 "arith_reg_operand" "")
(match_operand:SF 2 "arith_reg_operand" "")))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SH2E)
@@ -9773,6 +9856,12 @@ (define_expand "addsf3"
expand_sf_binop (&gen_addsf3_i, operands);
DONE;
}
+ else if (TARGET_OSFP)
+ {
+ expand_sfunc_binop (SFmode, &gen_addsf3_i3, \"__addsf3\", PLUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*addsf3_media"
@@ -9871,6 +9960,22 @@ (define_insn_and_split "binary_sf_op1"
}"
[(set_attr "type" "fparith_media")])
+(define_insn "addsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (plus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R6_REG))
+ (clobber (reg:SI R7_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "addsf3_i"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(plus:SF (match_operand:SF 1 "fp_arith_reg_operand" "%0")
@@ -9885,7 +9990,7 @@ (define_expand "subsf3"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "")
(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
(match_operand:SF 2 "fp_arith_reg_operand" "")))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SH2E)
@@ -9893,6 +9998,12 @@ (define_expand "subsf3"
expand_sf_binop (&gen_subsf3_i, operands);
DONE;
}
+ else if (TARGET_OSFP)
+ {
+ expand_sfunc_binop (SFmode, &gen_subsf3_i3, \"__subsf3\", MINUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*subsf3_media"
@@ -9903,6 +10014,23 @@ (define_insn "*subsf3_media"
"fsub.s %1, %2, %0"
[(set_attr "type" "fparith_media")])
+(define_insn "subsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (minus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R5_REG))
+ (clobber (reg:SI R6_REG))
+ (clobber (reg:SI R7_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "subsf3_i"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")
@@ -9915,11 +10043,32 @@ (define_insn "subsf3_i"
(define_expand "mulsf3"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "")
- (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
- (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+ (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+ (match_operand:SF 2 "fp_arith_reg_operand" "")))]
"TARGET_SH2E || TARGET_SHMEDIA_FPU"
"")
+;(define_expand "mulsf3"
+; [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+; (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+; (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+; "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
+; "
+; {
+; if (TARGET_SH4 || TARGET_SH2A_SINGLE)
+; expand_sf_binop (&gen_mulsf3_i4, operands);
+; else if (TARGET_SH2E)
+; emit_insn (gen_mulsf3_ie (operands[0], operands[1], operands[2]));
+; else if (TARGET_OSFP)
+; {
+; expand_sfunc_binop (SFmode, &gen_mulsf3_i3, \"__mulsf3\", MULT,
+; operands);
+; DONE;
+; }
+; if (! TARGET_SHMEDIA)
+; DONE;
+; }")
+
(define_insn "*mulsf3_media"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -9959,6 +10108,22 @@ (define_insn "mulsf3_i4"
[(set_attr "type" "fp")
(set_attr "fp_mode" "single")])
+(define_insn "mulsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (mult:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "mac_media"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(plus:SF (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -10119,6 +10284,149 @@ (define_insn "*fixsfsi"
"ftrc %1,%0"
[(set_attr "type" "fp")])
+(define_insn "cmpnesf_i1"
+ [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+ (compare:CC_FP_NE (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtsf_i1"
+ [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+ (compare:CC_FP_GT (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltsf_i1"
+ [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+ (compare:CC_FP_UNLT (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqsf_i1_finite"
+ [(set (reg:SI T_REG)
+ (eq:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+ (clobber (match_scratch:SI 2 "=0,1,?r"))]
+ "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+ "*
+{
+ if (which_alternative == 0)
+ output_asm_insn (\"cmp/eq\t%0,%1\;or\t%1,%2\;bt\t0f\", operands);
+ else if (which_alternative == 1)
+ output_asm_insn (\"cmp/eq\t%0,%1\;or\t%0,%2\;bt\t0f\", operands);
+ else
+ output_asm_insn (\"cmp/eq\t%0,%1\;mov\t%0,%2\;bt\t0f\;or\t%1,%2\",
+ operands);
+ return \"add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+ [(set_attr "length" "10,10,12")])
+
+(define_insn "cmplesf_i1_finite"
+ [(set (reg:SI T_REG)
+ (le:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+ (clobber (match_scratch:SI 2 "=0,1,r"))]
+ "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+ "*
+{
+ output_asm_insn (\"cmp/pz\t%0\", operands);
+ if (which_alternative == 2)
+ output_asm_insn (\"mov\t%0,%2\", operands);
+ if (TARGET_SH2)
+ output_asm_insn (\"bf/s\t0f\;cmp/hs\t%1,%0\;cmp/ge\t%0,%1\", operands);
+ else
+ output_asm_insn (\"bt\t1f\;bra\t0f\;cmp/hs\t%1,%0\\n1:\tcmp/ge\t%0,%1\",
+ operands);
+ if (which_alternative == 1)
+ output_asm_insn (\"or\t%0,%2\", operands);
+ else
+ output_asm_insn (\"or\t%1,%2\", operands);
+ return \"bt\t0f\;add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+ [(set_attr "length" "18,18,20")])
+
+(define_insn "cmpunsf_i1"
+ [(set (reg:SI T_REG)
+ (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+ (clobber (match_scratch:SI 3 "=0,&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+ [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqsf_i1"
+ [(set (reg:SI T_REG)
+ (uneq:SI (match_operand:SF 0 "arith_reg_operand" "r")
+ (match_operand:SF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "*
+{
+ output_asm_insn (\"not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\", operands);
+ output_asm_insn (\"bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%0,%1\", operands);
+ output_asm_insn (\"mov\t%0,%3\;bt\t0f\;or\t%1,%3\", operands);
+ return \"add\t%3,%3\;tst\t%3,%3\\n0:\";
+}"
+ [(set_attr "length" "24")])
+
+(define_insn "movcc_fp_ne"
+ [(set (match_operand:CC_FP_NE 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_NE 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_gt"
+ [(set (match_operand:CC_FP_GT 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_GT 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_unlt"
+ [(set (match_operand:CC_FP_UNLT 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_UNLT 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
(define_insn "cmpgtsf_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
@@ -10146,6 +10454,22 @@ (define_insn "ieee_ccmpeqsf_t"
"* return output_ieee_ccmpeq (insn, operands);"
[(set_attr "length" "4")])
+(define_insn "*cmpltgtsf_t"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+ "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")])
+
+(define_insn "*cmporderedsf_t"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+ "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")])
+
(define_insn "cmpgtsf_t_i4"
[(set (reg:SI T_REG)
@@ -10178,6 +10502,26 @@ (define_insn "*ieee_ccmpeqsf_t_4"
[(set_attr "length" "4")
(set_attr "fp_mode" "single")])
+(define_insn "*cmpltgtsf_t_4"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "single")])
+
+(define_insn "*cmporderedsf_t_4"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "single")])
+
(define_insn "cmpeqsf_media"
[(set (match_operand:SI 0 "register_operand" "=r")
(eq:SI (match_operand:SF 1 "fp_arith_reg_operand" "f")
@@ -10213,18 +10557,24 @@ (define_insn "cmpunsf_media"
(define_expand "cbranchsf4"
[(set (pc)
(if_then_else (match_operator 0 "sh_float_comparison_operator"
- [(match_operand:SF 1 "arith_operand" "")
- (match_operand:SF 2 "arith_operand" "")])
+ [(match_operand:SF 1 "nonmemory_operand" "")
+ (match_operand:SF 2 "nonmemory_operand" "")])
(match_operand 3 "" "")
(pc)))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SHMEDIA)
- emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
- operands[3]));
+ {
+ if (!arith_operand (operands[1], SFmode))
+ operands[1] = copy_to_mode_reg (SFmode, operands[1]);
+ if (!arith_operand (operands[2], SFmode))
+ operands[2] = copy_to_mode_reg (SFmode, operands[2]);
+ emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+ operands[2], operands[3]));
+ }
else
- sh_emit_compare_and_branch (operands, SFmode);
+ sh_expand_float_cbranch (operands);
DONE;
}")
@@ -10426,11 +10776,39 @@ (define_insn "abssf2_i"
[(set_attr "type" "fmove")
(set_attr "fp_mode" "single")])
+(define_expand "abssc2"
+ [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+ (abs:SF (match_operand:SC 1 "fp_arith_reg_operand" "")))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "
+{
+ expand_sfunc_unop (SCmode, &gen_abssc2_i3, \"__hypotf\", ABS, operands);
+ DONE;
+}")
+
+(define_insn "abssc2_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (abs:SF (reg:SC R4_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI R5_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "adddf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(plus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
(match_operand:DF 2 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_FPU_DOUBLE || TARGET_SH3"
"
{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10438,6 +10816,12 @@ (define_expand "adddf3"
expand_df_binop (&gen_adddf3_i, operands);
DONE;
}
+ else if (TARGET_SH3)
+ {
+ expand_sfunc_binop (DFmode, &gen_adddf3_i3_wrap, \"__adddf3\", PLUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*adddf3_media"
@@ -10458,6 +10842,30 @@ (define_insn "adddf3_i"
[(set_attr "type" "dfp_arith")
(set_attr "fp_mode" "double")])
+(define_expand "adddf3_i3_wrap"
+ [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+ "TARGET_SH3"
+ "
+{
+ emit_insn (gen_adddf3_i3 (operands[1]));
+ emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+ DONE;
+}")
+
+(define_insn "adddf3_i3"
+ [(set (reg:DF R0_REG)
+ (plus:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:DI R2_REG))
+ (clobber (reg:DF R4_REG))
+ (clobber (reg:DF R6_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH3"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "subdf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(minus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10494,7 +10902,7 @@ (define_expand "muldf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(mult:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
(match_operand:DF 2 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_FPU_DOUBLE || TARGET_SH3"
"
{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10502,6 +10910,12 @@ (define_expand "muldf3"
expand_df_binop (&gen_muldf3_i, operands);
DONE;
}
+ else if (TARGET_SH3)
+ {
+ expand_sfunc_binop (DFmode, &gen_muldf3_i3_wrap, \"__muldf3\", MULT,
+ operands);
+ DONE;
+ }
}")
(define_insn "*muldf3_media"
@@ -10522,6 +10936,32 @@ (define_insn "muldf3_i"
[(set_attr "type" "dfp_mul")
(set_attr "fp_mode" "double")])
+(define_expand "muldf3_i3_wrap"
+ [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+ "TARGET_SH3"
+ "
+{
+ emit_insn (gen_muldf3_i3 (operands[1]));
+ emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+ DONE;
+}")
+
+(define_insn "muldf3_i3"
+ [(set (reg:DF R0_REG)
+ (mult:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:DI R2_REG))
+ (clobber (reg:DF R4_REG))
+ (clobber (reg:DF R6_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH3"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "divdf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(div:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10651,6 +11091,73 @@ (define_insn "fix_truncdfsi2_i"
;; (use (match_dup 2))])
;; (set (match_dup 0) (reg:SI FPUL_REG))])
+(define_insn "cmpnedf_i1"
+ [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+ (compare:CC_FP_NE (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtdf_i1"
+ [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+ (compare:CC_FP_GT (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltdf_i1"
+ [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+ (compare:CC_FP_UNLT (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqdf_i1_finite"
+ [(set (reg:SI T_REG)
+ (eq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (clobber (match_scratch:SI 2 "=&r"))]
+ "TARGET_SH1_SOFTFP && flag_finite_math_only"
+ "cmp/eq\t%R0,%R1\;mov\t%S0,%2\;bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;or\t%S1,%2\;add\t%2,%2\;or\t%R0,%2\;tst\t%2,%2\\n0:"
+ [(set_attr "length" "18")])
+
+(define_insn "cmpundf_i1"
+ [(set (reg:SI T_REG)
+ (unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
+ (match_operand:DF 1 "arith_reg_operand" "r,r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+ (clobber (match_scratch:SI 3 "=0,&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+ [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqdf_i1"
+ [(set (reg:SI T_REG)
+ (uneq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
+ "TARGET_SH1_SOFTFP"
+ "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%R0,%R1\; bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;mov\t%S0,%3\;or\t%S1,%3\;add\t%3,%3\;or\t%R0,%3\;tst\t%3,%3\\n0:"
+ [(set_attr "length" "30")])
+
(define_insn "cmpgtdf_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:DF 0 "arith_reg_operand" "f")
@@ -10682,6 +11189,26 @@ (define_insn "*ieee_ccmpeqdf_t"
[(set_attr "length" "4")
(set_attr "fp_mode" "double")])
+(define_insn "*cmpltgtdf_t"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+ (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_DOUBLE"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "double")])
+
+(define_insn "*cmpordereddf_t_4"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+ (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "double")])
+
(define_insn "cmpeqdf_media"
[(set (match_operand:SI 0 "register_operand" "=r")
(eq:SI (match_operand:DF 1 "fp_arith_reg_operand" "f")
@@ -10717,18 +11244,24 @@ (define_insn "cmpundf_media"
(define_expand "cbranchdf4"
[(set (pc)
(if_then_else (match_operator 0 "sh_float_comparison_operator"
- [(match_operand:DF 1 "arith_operand" "")
- (match_operand:DF 2 "arith_operand" "")])
+ [(match_operand:DF 1 "nonmemory_operand" "")
+ (match_operand:DF 2 "nonmemory_operand" "")])
(match_operand 3 "" "")
(pc)))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SHMEDIA)
- emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
- operands[3]));
+ {
+ if (!arith_operand (operands[1], DFmode))
+ operands[1] = copy_to_mode_reg (DFmode, operands[1]);
+ if (!arith_operand (operands[2], DFmode))
+ operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+ emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+ operands[2], operands[3]));
+ }
else
- sh_emit_compare_and_branch (operands, DFmode);
+ sh_expand_float_cbranch (operands);
DONE;
}")
@@ -10850,16 +11383,82 @@ (define_insn "extendsfdf2_i4"
[(set_attr "type" "fp")
(set_attr "fp_mode" "double")])
+;; ??? In order to use this efficiently, we'd have to have an extra
+;; register class for r0 and r1 - and that would cause repercussions in
+;; register allocation elsewhere. So just say we clobber r0 / r1, and
+;; that we can use an arbitrary target. */
+(define_insn_and_split "extendsfdf2_i1"
+ [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+ (float_extend:DF (reg:SF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (reg:DF R0_REG))]
+ "emit_insn (gen_extendsfdf2_i1_r0 (operands[1]));"
+ [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i1_r0"
+ [(set (reg:DF R0_REG) (float_extend:DF (reg:SF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn_and_split "extendsfdf2_i2e"
+ [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+ (float_extend:DF (reg:SF FR4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI FPUL_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (reg:DF R0_REG))]
+ "emit_insn (gen_extendsfdf2_i2e_r0 (operands[1]));"
+ [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i2e_r0"
+ [(set (reg:DF R0_REG) (float_extend:DF (reg:SF FR4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI FPUL_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "truncdfsf2"
[(set (match_operand:SF 0 "fpul_operand" "")
- (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
- "
-{
+ (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
+ "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "
+{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
{
emit_df_insn (gen_truncdfsf2_i4 (operands[0], operands[1],
- get_fpscr_rtx ()));
+ get_fpscr_rtx ()));
DONE;
}
}")
@@ -10879,6 +11478,23 @@ (define_insn "truncdfsf2_i4"
"fcnvds %1,%0"
[(set_attr "type" "fp")
(set_attr "fp_mode" "double")])
+
+(define_insn "truncdfsf2_i2e"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=w")
+ (float_truncate:SF (reg:DF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI FPUL_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
\f
;; Bit field extract patterns. These give better code for packed bitfields,
;; because they allow auto-increment addresses to be generated.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-16 14:01 ` Kaz Kojima
@ 2010-07-17 14:31 ` Joern Rennecke
2010-07-17 21:23 ` Kaz Kojima
0 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-17 14:31 UTC (permalink / raw)
To: Kaz Kojima; +Cc: Naveen.S, gcc, Prafulla.Thakare
Quoting Kaz Kojima <kkojima@rr.iij4u.or.jp>:
> sh_softfp.patch looks basically OK to me, though I'm curious
> with numbers for fp-bit.c/softfp/softfloat. Could you show us
> some real speed&size numbers?
I don't have any sh[1234] hardware to EEMBC tests on, but the runtime
difference of 'make check' on i686-pc-linux-gnu X sh-elf using a core2 quad
for fp-bit vs. softfloat (w/ compare/conversion/divsf) is two hours 4 minutes
versus 38 minutes.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-17 14:31 ` Joern Rennecke
@ 2010-07-17 21:23 ` Kaz Kojima
2010-07-19 9:23 ` Naveen H. S
0 siblings, 1 reply; 30+ messages in thread
From: Kaz Kojima @ 2010-07-17 21:23 UTC (permalink / raw)
To: amylaar; +Cc: Naveen.S, gcc, Prafulla.Thakare
Joern Rennecke <amylaar@spamcop.net> wrote:
> I don't have any sh[1234] hardware to EEMBC tests on, but the runtime
> difference of 'make check' on i686-pc-linux-gnu X sh-elf using a core2 quad
> for fp-bit vs. softfloat (w/ compare/conversion/divsf) is two hours 4 minutes
> versus 38 minutes.
Very impressive. Thanks!
Reducing 124 minutes to 38 minutes is too attractive.
Regards,
kaz
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: SH optimized software floating point routines
2010-07-17 13:30 ` Joern Rennecke
@ 2010-07-19 0:59 ` Joern Rennecke
2010-07-20 13:35 ` Kaz Kojima
0 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-19 0:59 UTC (permalink / raw)
To: Naveen H. S, Kaz Kojima, gcc; +Cc: Prafulla Thakare
[-- Attachment #1: Type: text/plain, Size: 286 bytes --]
I've found two bugs in truncdfsf2;
I've also added back a number of hunks that Naveen had dropped.
Note that most of the patch has been prepared in 2006, so that is the
proper most recent copyright date for those files that haven't been touched
save for swapping the Copyright notice.
[-- Attachment #2: sh-softfp-20100718-2131 --]
[-- Type: text/plain, Size: 284140 bytes --]
TODO:
- Test & submit companion patches separately.
- Test.
2010-07-18 Joern Rennecke <joern.rennecke@embecosm.com>
* config/sh/IEEE-754/divsf3.S (divsf3):
Fix sign for zero r4 input.
Fix comments for NaN return.
Remove redundant some code.
* config/sh/ieee-754-df.S: Add comments on
RETURN_R0_MAIN / RETURN_R0 / RETURN_FR0.
(RETURN_FR0): Add missing backslash.
[!DYN_SHIFT] (extendsfdf2) <zero_denorm>: Fix mask used in
shift_byte loop.
[!DYN_SHIFT] (extendsfdf2) <x00ff0000>: New constant.
[!DYN_SHIFT] (truncdfsf2) <inf>: Fix returned value.
[DYN_SHIFT] (truncdfsf2) <inf>: Likewise.
[!DYN_SHIFT] (truncdfsf2) <xffe00000>: Remove now unused constant.
[DYN_SHIFT] (truncdfsf2) <xffe00000>: Likewise.
* config/sh/sh.c (sh_expand_float_condop): Changed parameters to
allow separate passing of comparison operands and destination.
Changed callers.
Replace use of from_compare.
Use emit instead of emit_jump_insn.
(sh_soft_fp_cmp): Remove REG_LIBCALL / REG_RETCAL code.
Use set_unique_reg_note.
(expand_sfunc_op): Likewise.
* config/sh/sh.md (cstoresf4): Add support for software floating point.
(cstoredf4, cbranchsf4, cbranchdf4): Likewise.
(cmpnedf_i1): Fix predicate.
(truncdfsf2): Add TARGET_SH2E case.
(mulsf3): Fix condition for emitting mulsf3_i3.
* config/sh/IEEE-754/adddf3.S: Adjust NaN value returned for
+inf + -inf to agree with DF_NAN_MASK mask.
* config/sh/t-sh (gt-sh.h): Remove redundant rule.
(LIB1ASMFUNCS): Add _unordsf2 and _unorddf2.
2006-09-02 J"orn Rennecke <joern.rennecke@st.com>
* targhooks.c (regs.h): #include.
(default_match_adjust): New function.
* targhooks.h (default_match_adjust): Declare.
* reload.c (operands_match_p): Use targetm.match_adjust.
* target.h (struct gcc_target): Add member match_adjust.
* target-def.h (TARGET_MATCH_ADJUST): New macro.
* Makefile.in (targhooks.o): Depend on $(REGS_H).
* config/sh/sh-protos.h (sh_match_adjust): Declare.
* config/sh/sh.c (TARGET_MATCH_ADJUST): Define as sh_match_adjust.
(sh_match_adjust): New function.
2006-09-15 J"orn Rennecke <joern.rennecke@st.com>
* sched-deps.c (sched_analyze_2): When a likely spilled register
is used, put in into a scheduling group with the insn that
sets it and with all the insns in-between.
2006-09-02 J"orn Rennecke <joern.rennecke@st.com>
config/sh/t-sh: ($(T)ic_invalidate_array_4-100.o): Add -I. .
($(T)ic_invalidate_array_4-200.o): Likewise.
($(T)ic_invalidate_array_4a.o): Likewise.
2006-09-02 J"orn Rennecke <joern.rennecke@st.com>
* config/sh/sh.h (LIBGCC2_DOUBLE_TYPE_SIZE): Define
2006-09-02 J"orn Rennecke <joern.rennecke@st.com>
* sh.md (*movsicc_t_false, *movsicc_t_true): Add mode.
2006-09-02 J"orn Rennecke <joern.rennecke@st.com>
Aanchal Khanna <aanchalk@noida.hcltech.com>
Rakesh Kumar <rakesh.kumar@noida.hcltech.com>
* config/sh/sh-protos.h (sh_function_kind): New enumerator
SFUNC_FREQUENT.
(expand_sfunc_unop, expand_sfunc_binop): Declare.
(sh_expand_float_cbranch): Likewise.
* config/sh/lib1funcs.asm (ieee-754-sf.S, ieee-754-df.S): #include.
* config/sh/t-sh (LIB1ASMFUNCS): Add nesf2, _nedf2, _gtsf2t, _gtdf2t,
_gesf2f, _gedf2f, _extendsfdf2, , _truncdfsf2, _add_sub_sf3, _mulsf3,
_hypotf, _muldf3, _add_sub_df3, _divsf3, _divdf3, _fixunssfsi,
_fixsfsi, _fixunsdfsi, _fixdfsi, _floatunssisf, _floatsisf,
_floatunssidf and _floatsidf.
(FPBIT, DPBIT, dp-bit.c, fp-bit.c): Removed.
* config/sh/ieee-754-df.S, config/sh/ieee-754-sf.S: New files.
* config/sh/predicates.md (soft_fp_comparison_operand): New predicate.
(soft_fp_comparison_operator): Likewise.
* config/sh/sh.c (sh_soft_fp_cmp, expand_sfunc_op): New functions.
(expand_sfunc_unop, expand_sfunc_binop): Likewise.
(sh_expand_float_cbranch): Likewise.
(sh_expand_float_condop, sh_expand_float_scc): Likewise.
(from_compare): Add support for software floating point.
(function_symbol): Always look up name. Add SFUNC_FREQUENT case.
* config/sh/sh.h (TARGET_SH1_SOFTFP): New macro.
(TARGET_SH1_SOFTFP_MODE): Likewise.
* config/sh/sh-modes.def (CC_FP_NE, CC_FP_GT, CC_FP_UNLT): New modes.
* config/sh/lib1funcs.h (SLC, SLI, SLCMP, DMULU_SAVE): New macros.
(DMULUL, DMULUH, DMULU_RESTORE, SHLL4, SHLR4, SHLL6, SHLR6): Likewise.
(SHLL12, SHLR12, SHLR19, SHLL23, SHLR24, SHLR21, SHLL21): Likewise.
(SHLR11, SHLR22, SHLR23, SHLR20, SHLL20, SHLD_COUNT, SHLRN): Likewise.
(SHLLN, DYN_SHIFT): Likewise.
(SUPPORT_SH3_OSFP, SUPPORT_SH3E_OSFP): Likewise.
(SUPPORT_SH4_NOFPU_OSFP, SUPPORT_SH4_SINGLE_ONLY_OSFP): Likewise.
(TARGET_OSFP): Likewise.
* config/sh/IEEE-754/m3/divsf3.S: New file.
* config/sh/IEEE-754/m3/divdf3.S: Likewise.
* config/sh/IEEE-754/m3/floatunssisf.S: Likewise.
* config/sh/IEEE-754/m3/floatunssidf.S: Likewise.
* config/sh/IEEE-754/m3/fixunsdfsi.S: Likewise.
* config/sh/IEEE-754/m3/divdf3-rt.S: Likewise.
* config/sh/IEEE-754/m3/addsf3.S: Likewise.
* config/sh/IEEE-754/m3/adddf3.S: Likewise.
* config/sh/IEEE-754/m3/mulsf3.S: Likewise.
* config/sh/IEEE-754/m3/muldf3.S: Likewise.
* config/sh/IEEE-754/m3/floatsisf.S: Likewise.
* config/sh/IEEE-754/m3/floatsidf.S: Likewise.
* config/sh/IEEE-754/m3/fixdfsi.S: Likewise.
* config/sh/IEEE-754/divdf3.S: Likewise.
* config/sh/IEEE-754/floatunssisf.S: Likewise.
* config/sh/IEEE-754/fixunsdfsi.S: Likewise.
* config/sh/IEEE-754/adddf3.S: Likewise.
* config/sh/IEEE-754/floatsisf.S: Likewise.
* config/sh/IEEE-754/muldf3.S: Likewise.
* config/sh/IEEE-754/fixdfsi.S: Likewise.
* config/sh/IEEE-754/divsf3.S: Likewise.
* config/sh/IEEE-754/fixunssfsi.S: Likewise.
* config/sh/IEEE-754/floatunssidf.S: Likewise.
* config/sh/IEEE-754/addsf3.S: Likewise.
* config/sh/IEEE-754/mulsf3.S: Likewise.
* config/sh/IEEE-754/floatsidf.S: Likewise.
* config/sh/IEEE-754/fixsfsi.S: Likewise.
* config/sh/sh.md (SF_NAN_MASK, DF_NAN_MASK, FR4_REG): New constants.
(fpcmp_i1, addsf3_i3, subsf3_i3): New patterns.
(mulsf3_i3, cmpnesf_i1, cmpgtsf_i1, cmpunltsf_i1): Likewise.
(cmpeqsf_i1_finite, cmplesf_i1_finite, cmpunsf_i1): Likewise.
(cmpuneqsf_i1, movcc_fp_ne, movcc_fp_gtmovcc_fp_unlt): Likewise.
(cmpltgtsf_t, cmporderedsf_t, cmpltgtsf_t_4): Likewise.
(cmporderedsf_t_4, abssc2, adddf3_i3_wrap, adddf3_i3): Likewise.
(muldf3_i3_wrap, muldf3_i3, cmpnedf_i1, cmpgtdf_i1): Likewise.
(cmpunltdf_i1, cmpeqdf_i1_finite, cmpundf_i1, cmpuneqdf_i1): Likewise.
(cmpltgtdf_t, cmpordereddf_t_4, extendsfdf2_i1): Likewise.
(extendsfdf2_i2e, extendsfdf2_i2e_r0, truncdfsf2_i2e): Likewise.
(extendsfdf2_i1_r0, truncdfsf2_i1): Likewise.
(cmpun_sdf, cmpuneq_sdf): Likewise.
(addsf3, subsf3, mulsf3): Add support for software floating point.
(adddf3, subdf3, muldf3, extendsfdf2, truncdfsf2): Likewise.
(cmpsf, cmpdf): Don't enable for TARGET_SH2E.
(movnegt): Match only one operand. Changed user.
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi (revision 162269)
+++ gcc/doc/tm.texi (working copy)
@@ -2753,6 +2753,10 @@ of the individual moves due to expected
forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
@end deftypefn
+@deftypefn {Target Hook} int TARGET_MATCH_ADJUST (rtx, @var{int})
+This hook is documented in @file{target.def} / @file{targhooks.c}.
+@end deftypefn
+
@defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in (revision 162269)
+++ gcc/doc/tm.texi.in (working copy)
@@ -2753,6 +2753,8 @@ of the individual moves due to expected
forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
@end deftypefn
+@hook TARGET_MATCH_ADJUST
+
@defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
@defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c (revision 162269)
+++ gcc/targhooks.c (working copy)
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.
#include "reload.h"
#include "optabs.h"
#include "recog.h"
+#include "regs.h"
bool
@@ -906,6 +907,27 @@ default_secondary_reload (bool in_p ATTR
return rclass;
}
+/* Given an rtx and its regno, return a regno value that shall be used for
+ purposes of comparison in operands_match_p.
+ Generally, we say that integer registers are subject to big-endian
+ adjustment. This default target hook should generally work if the mode
+ of a register is a sufficient indication if this adjustment is to take
+ place; this will not work when software floating point is done in integer
+ registers. */
+int
+default_match_adjust (rtx x, int regno)
+{
+ /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+ multiple hard register group of scalar integer registers, so that
+ for example (reg:DI 0) and (reg:SI 1) will be considered the same
+ register. */
+ if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+ && SCALAR_INT_MODE_P (GET_MODE (x))
+ && regno < FIRST_PSEUDO_REGISTER)
+ regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+ return regno;
+}
+
void
default_target_option_override (void)
{
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h (revision 162269)
+++ gcc/targhooks.h (working copy)
@@ -121,6 +121,7 @@ extern const reg_class_t *default_ira_co
extern reg_class_t default_secondary_reload (bool, rtx, reg_class_t,
enum machine_mode,
secondary_reload_info *);
+extern int default_match_adjust (rtx, int);
extern void default_target_option_override (void);
extern void hook_void_bitmap (bitmap);
extern bool default_handle_c_option (size_t, const char *, int);
Index: gcc/target.def
===================================================================
--- gcc/target.def (revision 162269)
+++ gcc/target.def (working copy)
@@ -1945,6 +1945,14 @@ DEFHOOK
secondary_reload_info *sri),
default_secondary_reload)
+/* Take an rtx and its regno, and return the regno for purposes of
+ checking a matching constraint. */
+DEFHOOK
+(match_adjust,
+ "This hook is documented in @file{target.def} / @file{targhooks.c}.",
+ int, (rtx, int),
+ default_match_adjust)
+
/* This target hook allows the backend to perform additional
processing while initializing for variable expansion. */
DEFHOOK
Index: gcc/reload.c
===================================================================
--- gcc/reload.c (revision 162269)
+++ gcc/reload.c (working copy)
@@ -2216,14 +2216,8 @@ operands_match_p (rtx x, rtx y)
multiple hard register group of scalar integer registers, so that
for example (reg:DI 0) and (reg:SI 1) will be considered the same
register. */
- if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
- && SCALAR_INT_MODE_P (GET_MODE (x))
- && i < FIRST_PSEUDO_REGISTER)
- i += hard_regno_nregs[i][GET_MODE (x)] - 1;
- if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (y)) > UNITS_PER_WORD
- && SCALAR_INT_MODE_P (GET_MODE (y))
- && j < FIRST_PSEUDO_REGISTER)
- j += hard_regno_nregs[j][GET_MODE (y)] - 1;
+ i = targetm.match_adjust (x, i);
+ j = targetm.match_adjust (y, j);
return i == j;
}
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in (revision 162269)
+++ gcc/Makefile.in (working copy)
@@ -2806,7 +2806,7 @@ opts-common.o : opts-common.c opts.h opt
targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \
$(EXPR_H) $(TM_H) $(RTL_H) $(TM_P_H) $(FUNCTION_H) output.h $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
$(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) gt-targhooks.h \
- $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h
+ $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h $(REGS_H)
bversion.h: s-bversion; @true
s-bversion: BASE-VER
Index: gcc/config/sh/sh-protos.h
===================================================================
--- gcc/config/sh/sh-protos.h (revision 162269)
+++ gcc/config/sh/sh-protos.h (working copy)
@@ -25,8 +25,13 @@ along with GCC; see the file COPYING3.
#define GCC_SH_PROTOS_H
enum sh_function_kind {
- /* A function with normal C ABI */
+ /* A function with normal C ABI, or an SH1..SH4 sfunc that may resolved via
+ a PLT. */
FUNCTION_ORDINARY,
+ /* A function that is a bit large to put it in every calling dso, but that's
+ typically used often enough so that calling via GOT makes sense for
+ speed. */
+ SFUNC_FREQUENT,
/* A special function that guarantees that some otherwise call-clobbered
registers are not clobbered. These can't go through the SH5 resolver,
because it only saves argument passing registers. */
@@ -115,6 +120,10 @@ extern void expand_sf_binop (rtx (*)(rtx
extern void expand_df_unop (rtx (*)(rtx, rtx, rtx), rtx *);
extern void expand_df_binop (rtx (*)(rtx, rtx, rtx, rtx), rtx *);
extern void expand_fp_branch (rtx (*)(void), rtx (*)(void));
+extern void expand_sfunc_unop (enum machine_mode, rtx (*) (rtx, rtx),
+ const char *, enum rtx_code code, rtx *);
+extern void expand_sfunc_binop (enum machine_mode, rtx (*) (rtx, rtx),
+ const char *, enum rtx_code code, rtx *);
extern int sh_insn_length_adjustment (rtx);
extern int sh_can_redirect_branch (rtx, rtx);
extern void sh_expand_unop_v2sf (enum rtx_code, rtx, rtx);
@@ -132,6 +141,8 @@ extern struct rtx_def *get_fpscr_rtx (vo
extern int sh_media_register_for_return (void);
extern void sh_expand_prologue (void);
extern void sh_expand_epilogue (bool);
+extern void sh_expand_float_cbranch (rtx operands[4]);
+extern void sh_expand_float_scc (rtx operands[4]);
extern int sh_need_epilogue (void);
extern void sh_set_return_address (rtx, rtx);
extern int initial_elimination_offset (int, int);
@@ -176,6 +187,7 @@ struct secondary_reload_info;
extern reg_class_t sh_secondary_reload (bool, rtx, reg_class_t,
enum machine_mode,
struct secondary_reload_info *);
+extern int sh_match_adjust (rtx, int);
extern int sh2a_get_function_vector_number (rtx);
extern int sh2a_is_function_vector_call (rtx);
extern void sh_fix_range (const char *);
Index: gcc/config/sh/lib1funcs.asm
===================================================================
--- gcc/config/sh/lib1funcs.asm (revision 162269)
+++ gcc/config/sh/lib1funcs.asm (working copy)
@@ -3931,3 +3931,6 @@ GLOBAL(udiv_qrnnd_16):
ENDFUNC(GLOBAL(udiv_qrnnd_16))
#endif /* !__SHMEDIA__ */
#endif /* L_udiv_qrnnd_16 */
+
+#include "ieee-754-sf.S"
+#include "ieee-754-df.S"
Index: gcc/config/sh/t-sh
===================================================================
--- gcc/config/sh/t-sh (revision 162269)
+++ gcc/config/sh/t-sh (working copy)
@@ -25,30 +25,16 @@ sh-c.o: $(srcdir)/config/sh/sh-c.c \
LIB1ASMSRC = sh/lib1funcs.asm
LIB1ASMFUNCS = _ashiftrt _ashiftrt_n _ashiftlt _lshiftrt _movmem \
_movmem_i4 _mulsi3 _sdivsi3 _sdivsi3_i4 _udivsi3 _udivsi3_i4 _set_fpscr \
- _div_table _udiv_qrnnd_16 \
+ _div_table _udiv_qrnnd_16 _unordsf2 _unorddf2 \
+ _nesf2 _nedf2 _gtsf2t _gtdf2t _gesf2f _gedf2f _extendsfdf2 _truncdfsf2 \
+ _add_sub_sf3 _mulsf3 _hypotf _muldf3 _add_sub_df3 _divsf3 _divdf3 \
+ _fixunssfsi _fixsfsi _fixunsdfsi _fixdfsi _floatunssisf _floatsisf \
+ _floatunssidf _floatsidf \
$(LIB1ASMFUNCS_CACHE)
LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_invalidate_array
TARGET_LIBGCC2_CFLAGS = -mieee
-# We want fine grained libraries, so use the new code to build the
-# floating point emulation libraries.
-FPBIT = fp-bit.c
-DPBIT = dp-bit.c
-
-dp-bit.c: $(srcdir)/config/fp-bit.c
- echo '#ifdef __LITTLE_ENDIAN__' > dp-bit.c
- echo '#define FLOAT_BIT_ORDER_MISMATCH' >>dp-bit.c
- echo '#endif' >> dp-bit.c
- cat $(srcdir)/config/fp-bit.c >> dp-bit.c
-
-fp-bit.c: $(srcdir)/config/fp-bit.c
- echo '#define FLOAT' > fp-bit.c
- echo '#ifdef __LITTLE_ENDIAN__' >> fp-bit.c
- echo '#define FLOAT_BIT_ORDER_MISMATCH' >>fp-bit.c
- echo '#endif' >> fp-bit.c
- cat $(srcdir)/config/fp-bit.c >> fp-bit.c
-
DEFAULT_ENDIAN = $(word 1,$(TM_ENDIAN_CONFIG))
OTHER_ENDIAN = $(word 2,$(TM_ENDIAN_CONFIG))
@@ -120,7 +106,6 @@ $(T)crtn.o: $(srcdir)/config/sh/crtn.asm
$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)crtn.o -x assembler-with-cpp $(srcdir)/config/sh/crtn.asm
$(out_object_file): gt-sh.h
-gt-sh.h : s-gtype ; @true
# These are not suitable for COFF.
# EXTRA_MULTILIB_PARTS= crt1.o crti.o crtn.o crtbegin.o crtend.o
@@ -131,17 +116,17 @@ OPT_EXTRA_PARTS= libgcc-Os-4-200.a libgc
EXTRA_MULTILIB_PARTS= $(IC_EXTRA_PARTS) $(OPT_EXTRA_PARTS)
$(T)ic_invalidate_array_4-100.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4-100.a: $(T)ic_invalidate_array_4-100.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-100.a $(T)ic_invalidate_array_4-100.o
$(T)ic_invalidate_array_4-200.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4-200.a: $(T)ic_invalidate_array_4-200.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-200.a $(T)ic_invalidate_array_4-200.o
$(T)ic_invalidate_array_4a.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
- $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+ $(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
$(T)libic_invalidate_array_4a.a: $(T)ic_invalidate_array_4a.o $(GCC_PASSES)
$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4a.a $(T)ic_invalidate_array_4a.o
Index: gcc/config/sh/sh.opt
===================================================================
--- gcc/config/sh/sh.opt (revision 162269)
+++ gcc/config/sh/sh.opt (working copy)
@@ -21,7 +21,7 @@
;; Used for various architecture options.
Mask(SH_E)
-;; Set if the default precision of th FPU is single.
+;; Set if the default precision of the FPU is single.
Mask(FPU_SINGLE)
;; Set if we should generate code using type 2A insns.
Index: gcc/config/sh/ieee-754-df.S
===================================================================
--- gcc/config/sh/ieee-754-df.S (revision 0)
+++ gcc/config/sh/ieee-754-df.S (revision 0)
@@ -0,0 +1,791 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_DOUBLE__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Double-precision floating-point emulation.
+ We handle NANs, +-infinity, and +-zero.
+ However, we assume that for NANs, the topmost bit of the fraction is set. */
+
+#ifdef __LITTLE_ENDIAN__
+#define DBL0L r4
+#define DBL0H r5
+#define DBL1L r6
+#define DBL1H r7
+#define DBLRL r0
+#define DBLRH r1
+#else
+#define DBL0L r5
+#define DBL0H r4
+#define DBL1L r7
+#define DBL1H r6
+#define DBLRL r1
+#define DBLRH r0
+#endif
+
+/* The SH[123] ABI returns floats in r0, -m4-single returns it in fr0.
+ To abstract from this, a function that returns the single-precision
+ float value in r0 should use as in-line epilogue:
+ RETURN_R0_MAIN
+ <delay-slot insn>
+ RETURN_FR0
+ and may branch to that epilogue with:
+ RETURN_R0
+ <delay-slot insn> */
+#ifdef __SH_FPU_ANY__
+#define RETURN_R0_MAIN
+#define RETURN_R0 bra LOCAL(return_r0)
+#define RETURN_FR0 \
+LOCAL(return_r0): \
+ lds r0,fpul; \
+ rts; \
+ fsts fpul,fr0
+#define ARG_TO_R4 \
+ flds fr4,fpul; \
+ sts fpul,r4
+#else /* ! __SH_FPU_ANY__ */
+#define RETURN_R0_MAIN rts
+#define RETURN_R0 rts
+#define RETURN_FR0
+#define ARG_TO_R4
+#endif /* ! __SH_FPU_ANY__ */
+
+#ifdef L_nedf2
+/* -ffinite-math-only -mb inline version, T := r4:DF == r6:DF
+ cmp/eq r5,r7
+ mov r4,r0
+ bf 0f
+ cmp/eq r4,r6
+ bt 0f
+ or r6,r0
+ add r0,r0
+ or r5,r0
+ tst r0,r0
+ 0: */
+ .balign 4
+ .global GLOBAL(nedf2)
+ HIDDEN_FUNC(GLOBAL(nedf2))
+GLOBAL(nedf2):
+ cmp/eq DBL0L,DBL1L
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ bf LOCAL(ne)
+ cmp/eq DBL0H,DBL1H
+ not DBL0H,r0
+ bt LOCAL(check_nan)
+ mov DBL0H,r0
+ or DBL1H,r0
+ add r0,r0
+ rts
+ or DBL0L,r0
+LOCAL(check_nan):
+ tst r1,r0
+ rts
+ movt r0
+LOCAL(ne):
+ rts
+ mov #1,r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(nedf2))
+#endif /* L_nedf2 */
+
+#ifdef L_unorddf2
+ .balign 4
+ .global GLOBAL(unorddf2)
+ HIDDEN_FUNC(GLOBAL(unorddf2))
+GLOBAL(unorddf2):
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ not DBL0H,r0
+ tst r1,r0
+ not r6,r0
+ bt LOCAL(unord)
+ tst r1,r0
+LOCAL(unord):
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(unorddf2))
+#endif /* L_unorddf2 */
+
+#if defined(L_gtdf2t) || defined(L_gtdf2t_trap)
+#ifdef L_gtdf2t
+#define fun_label GLOBAL(gtdf2t)
+#else
+#define fun_label GLOBAL(gtdf2t_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater, the result true, unless
+ any of them is a nan (but infinity is fine), or both values are
+ +- zero. Otherwise, the result false. */
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ cmp/pz DBL0H
+ not DBL1H,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov DBL0H,r0
+ bt LOCAL(nan) /* return zero if DBL1 is NAN. */
+ cmp/eq DBL1H,DBL0H
+ bt LOCAL(cmp_low)
+ cmp/gt DBL1H,DBL0H
+ or DBL1H,r0
+ SLC(bf, LOCAL(check_nan),
+ cmp/gt DBL0H,r1)
+ add r0,r0
+ bf LOCAL(nan) /* return zero if DBL0 is NAN. */
+ or DBL0L,r0
+ rts
+ or DBL1L,r0 /* non-zero unless both DBL0 and DBL1 are +-zero. */
+LOCAL(cmp_low):
+ cmp/hi DBL1L,DBL0L
+ rts
+ movt r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan) /* return zero if DBL1 is NAN. */
+ cmp/eq DBL1H,DBL0H
+ SLC(bt, LOCAL(neg_cmp_low),
+ cmp/hi DBL0L,DBL1L)
+ not DBL0H,r0
+ tst r1,r0
+ bt LOCAL(nan) /* return zero if DBL0 is NAN. */
+ cmp/hi DBL0H,DBL1H
+ SLI(rts !,)
+ SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+ SLI(cmp/hi DBL0L,DBL1L)
+ rts
+ movt r0
+LOCAL(check_nan):
+#ifdef L_gtdf2t
+LOCAL(nan):
+ rts
+ mov #0,r0
+#else
+ SLI(cmp/gt DBL0H,r1)
+ bf LOCAL(nan) /* return zero if DBL0 is NAN. */
+ rts
+ mov #0,r0
+LOCAL(nan):
+ mov #0,r0
+ trapa #0
+#endif
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(fun_label)
+#endif /* defined(L_gtdf2t) || defined(L_gtdf2t_trap) */
+
+#ifdef L_gedf2f
+ .balign 4
+ .global GLOBAL(gedf2f)
+ HIDDEN_FUNC(GLOBAL(gedf2f))
+GLOBAL(gedf2f):
+ /* If the raw values compare greater or equal, the result is
+ true, unless any of them is a nan, or both are the
+ same infinity. If both are -+zero, the result is true;
+ otherwise, it is false.
+ We use 0 as true and nonzero as false for this function. */
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ cmp/pz DBL1H
+ not DBL0H,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov DBL0H,r0
+ bt LOCAL(nan)
+ cmp/eq DBL0H,DBL1H
+ bt LOCAL(cmp_low)
+ cmp/gt DBL0H,DBL1H
+ or DBL1H,r0
+ SLC(bf, LOCAL(check_nan),
+ cmp/ge r1,DBL1H)
+ add r0,r0
+ bt LOCAL(nan)
+ or DBL0L,r0
+ rts
+ or DBL1L,r0
+LOCAL(cmp_low):
+ cmp/hi DBL0L,DBL1L
+#if defined(L_gedf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+ rts
+ movt r0
+#if defined(L_gedf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+ SLI(cmp/ge r1,DBL1H)
+LOCAL(nan):
+ rts
+ movt r0
+#elif defined(L_gedf2f_trap)
+LOCAL(check_nan):
+ SLI(cmp/ge r1,DBL1H)
+ bt LOCAL(nan)
+ rts
+LOCAL(nan):
+ movt r0
+ trapa #0
+#endif /* L_gedf2f_trap */
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ cmp/eq DBL0H,DBL1H
+ not DBL1H,r0
+ SLC(bt, LOCAL(neg_cmp_low),
+ cmp/hi DBL1L,DBL0L)
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi DBL1H,DBL0H
+ SLI(rts !,)
+ SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+ SLI(cmp/hi DBL1L,DBL0L)
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_DF_NAN_MASK):
+ .long DF_NAN_MASK
+ ENDFUNC(GLOBAL(gedf2f))
+#endif /* L_gedf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_extendsfdf2
+ .balign 4
+ .global GLOBAL(extendsfdf2)
+ FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+ ARG_TO_R4
+ mov.l LOCAL(x7f800000),r3
+ mov r4,DBLRL
+ tst r3,r4
+ bt LOCAL(zero_denorm)
+ mov.l LOCAL(xe0000000),r2
+ rotr DBLRL
+ rotr DBLRL
+ rotr DBLRL
+ and r2,DBLRL
+ mov r4,DBLRH
+ not r4,r2
+ tst r3,r2
+ mov.l LOCAL(x38000000),r2
+ bf 0f
+ add r2,r2 ! infinity / NaN adjustment
+0: shll DBLRH
+ shlr2 DBLRH
+ shlr2 DBLRH
+ add DBLRH,DBLRH
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+LOCAL(zero_denorm):
+ mov.l r4,@-r15
+ add r4,r4
+ tst r4,r4
+ bt LOCAL(zero)
+ mov.l LOCAL(x00ff0000),r3
+ mov.w LOCAL(x389),r2
+LOCAL(shift_byte):
+ tst r3,r4
+ shll8 r4
+ SL(bt, LOCAL(shift_byte),
+ add #-8,r2)
+LOCAL(shift_bit):
+ shll r4
+ SL(bf, LOCAL(shift_bit),
+ add #-1,r2)
+ mov #0,DBLRL
+ mov r4,DBLRH
+ mov.l @r15+,r4
+ shlr8 DBLRH
+ shlr2 DBLRH
+ shlr DBLRH
+ rotcr DBLRL
+ cmp/gt r4,DBLRH ! get sign
+ rotcr DBLRH
+ rotcr DBLRL
+ shll16 r2
+ shll8 r2
+ rts
+ add r2,DBLRH
+LOCAL(zero):
+ mov.l @r15+,DBLRH
+ rts
+ mov #0,DBLRL
+LOCAL(x389): .word 0x389
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(xe0000000):
+ .long 0xe0000000
+LOCAL(x00ff0000):
+ .long 0x00ff0000
+ ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+ .balign 4
+ .global GLOBAL(truncdfsf2)
+ FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+ mov.l LOCAL(x38000000),r3 ! exponent adjustment DF -> SF
+ mov DBL0H,r1
+ mov.l LOCAL(x70000000),r2 ! mask for out-of-range exponent bits
+ mov DBL0H,r0
+ mov.l DBL0L,@-r15
+ sub r3,r1
+ tst r2,r1
+ shll8 r0 !
+ shll2 r0 ! Isolate highpart fraction.
+ shll2 r0 !
+ bf LOCAL(ill_exp)
+ shll2 r1
+ mov.l LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits. */
+ shll2 r1
+ mov.l LOCAL(xff000000),r3
+ shlr8 r0
+ tst r2,DBL0L /* Check if msb guard bit wants rounding up. */
+ shlr16 DBL0L
+ shlr8 DBL0L
+ shlr2 DBL0L
+ SL1(bt, LOCAL(add_frac),
+ shlr2 DBL0L)
+ add #1,DBL0L
+LOCAL(add_frac):
+ add DBL0L,r0
+ mov.l LOCAL(x01000000),r2
+ and r3,r1
+ mov.l @r15+,DBL0L
+ add r1,r0
+ tst r3,r0
+ bt LOCAL(inf_denorm0)
+ cmp/hs r3,r0
+LOCAL(denorm_noup_sh1):
+ bt LOCAL(inf)
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ RETURN_R0_MAIN
+ rotcr r0
+RETURN_FR0
+LOCAL(inf_denorm0): ! We might need to undo previous rounding.
+ mov.l LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits. */
+ tst r1,r1
+ bf LOCAL(inf)
+ add #-1,r0
+ tst r3,DBL0L /* Check if msb guard bit was rounded up. */
+ mov.l LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits. */
+ addc r2,r0
+ shlr r0
+ tst r3,DBL0L /* Check if msb guard bit wants rounding up. */
+#ifdef DELAYED_BRANCHES
+ bt/s LOCAL(denorm_noup)
+#else
+ bt LOCAL(denorm_noup_sh1)
+#endif
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ add #1,r0
+LOCAL(denorm_noup):
+ RETURN_R0
+ rotcr r0
+LOCAL(ill_exp):
+ div0s DBL0H,r1
+ mov.l LOCAL(x7ff80000),r2
+ add r1,r1
+ bf LOCAL(inf_nan)
+ mov.w LOCAL(m32),r3 /* Handle denormal or zero. */
+ shlr16 r1
+ exts.w r1,r1
+ shll2 r1
+ add r1,r1
+ shlr8 r1
+ exts.w r1,r1
+ add #-8,r1 /* Go from 9 to 1 guard bit in MSW. */
+ cmp/gt r3,r1
+ mov.l @r15+,r3 /* DBL0L */
+ bf LOCAL(zero)
+ mov.l DBL0L, @-r15
+ shll8 DBL0L
+ rotcr r0 /* Insert leading 1. */
+ shlr16 r3
+ shll2 r3
+ add r3,r3
+ shlr8 r3
+ cmp/pl DBL0L /* Check lower 23 guard bits if guard bit 23 is 0. */
+ addc r3,r0 /* Assemble fraction with compressed guard bits. */
+ mov.l @r15+,DBL0L
+ mov #0,r2
+ neg r1,r1
+LOCAL(denorm_loop):
+ shlr r0
+ rotcl r2
+ dt r1
+ bf LOCAL(denorm_loop)
+ tst #2,r0
+ rotcl r0
+ tst r2,r2
+ rotcl r0
+ xor #3,r0
+ add #3,r0 /* Even overflow gives the correct result. */
+ shlr2 r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(zero):
+ mov #0,r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(inf_nan):
+ not DBL0H,r0
+ tst r2,r0
+ mov.l @r15+,DBL0L
+ bf LOCAL(inf)
+ RETURN_R0
+ mov #-1,r0 /* NAN */
+LOCAL(inf): /* r2 must be positive here. */
+ mov.l LOCAL(xff000000),r0
+ div0s r2,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(m32):
+ .word -32
+ .balign 4
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(x70000000):
+ .long 0x70000000
+LOCAL(x2fffffff):
+ .long 0x2fffffff
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x5fffffff):
+ .long 0x5fffffff
+LOCAL(x7ff80000):
+ .long 0x7ff80000
+ ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+#ifdef L_add_sub_df3
+#include "IEEE-754/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift. Supporting SH1 / SH2 here would
+ make this code too hard to maintain, so if you want to add SH1 / SH2
+ support, do it in a separate copy. */
+#ifdef DYN_SHIFT
+#ifdef L_extendsfdf2
+ .balign 4
+ .global GLOBAL(extendsfdf2)
+ FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+ ARG_TO_R4
+ mov.l LOCAL(x7f800000),r2
+ mov #29,r3
+ mov r4,DBLRL
+ not r4,DBLRH
+ tst r2,r4
+ shld r3,DBLRL
+ bt LOCAL(zero_denorm)
+ mov #-3,r3
+ tst r2,DBLRH
+ mov r4,DBLRH
+ mov.l LOCAL(x38000000),r2
+ bt/s LOCAL(inf_nan)
+ shll DBLRH
+ shld r3,DBLRH
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+ .balign 4
+LOCAL(inf_nan):
+ shld r3,DBLRH
+ add r2,r2
+ rotcr DBLRH
+ rts
+ add r2,DBLRH
+LOCAL(zero_denorm):
+ mov.l r4,@-r15
+ add r4,r4
+ tst r4,r4
+ extu.w r4,r2
+ bt LOCAL(zero)
+ cmp/eq r4,r2
+ extu.b r4,r1
+ bf/s LOCAL(three_bytes)
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r4,r1
+ mov #22,DBLRH
+ bt LOCAL(one_byte)
+ shlr8 r2
+ mov #14,DBLRH
+LOCAL(one_byte):
+#ifdef __pic__
+ add r0,r2
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r2),r2
+ mov #21,r3
+ mov.w LOCAL(x0),DBLRL
+ sub r2,DBLRH
+LOCAL(norm_shift):
+ shld DBLRH,r4
+ mov.l @r15+,r2
+ shld r3,DBLRH
+ mov.l LOCAL(xb7ffffff),r3
+ add r4,DBLRH
+ cmp/pz r2
+ mov r2,r4
+ rotcr DBLRH
+ rts
+ sub r3,DBLRH
+LOCAL(three_bytes):
+ mov r4,r2
+ shlr16 r2
+#ifdef __pic__
+ add r0,r2
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r2),r2
+ mov #21,r3
+ mov #6-32,DBLRH
+ sub r2,DBLRH
+ mov r4,DBLRL
+ shld DBLRH,DBLRL
+ bra LOCAL(norm_shift)
+ add #32,DBLRH
+LOCAL(zero):
+ rts /* DBLRL has already been zeroed above. */
+ mov.l @r15+,DBLRH
+LOCAL(x0):
+ .word 0
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(xb7ffffff):
+ /* Flip sign back, do exponent adjustment, and remove leading one. */
+ .long 0x80000000 + 0x38000000 - 1
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+ .balign 4
+ .global GLOBAL(truncdfsf2)
+ FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+ mov.l LOCAL(x38000000),r3
+ mov DBL0H,r1
+ mov.l LOCAL(x70000000),r2
+ mov DBL0H,r0
+ sub r3,r1
+ mov.l DBL0L,@-r15
+ tst r2,r1
+ mov #12,r3
+ shld r3,r0 ! Isolate highpart fraction.
+ bf LOCAL(ill_exp)
+ shll2 r1
+ mov.l LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits. */
+ shll2 r1
+ mov.l LOCAL(xff000000),r3
+ shlr8 r0
+ tst r2,DBL0L /* Check if msb guard bit wants rounding up. */
+ mov #-28,r2
+ bt/s LOCAL(add_frac)
+ shld r2,DBL0L
+ add #1,DBL0L
+LOCAL(add_frac):
+ add DBL0L,r0
+ mov.l LOCAL(x01000000),r2
+ and r3,r1
+ mov.l @r15+,DBL0L
+ add r1,r0
+ tst r3,r0
+ bt LOCAL(inf_denorm0)
+#if 0 // No point checking overflow -> infinity if we dont't raise a signal.
+ cmp/hs r3,r0
+ bt LOCAL(inf)
+#endif
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ RETURN_R0_MAIN
+ rotcr r0
+RETURN_FR0
+LOCAL(inf_denorm0): ! We might need to undo previous rounding.
+ mov.l LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits. */
+ tst r1,r1
+ bf LOCAL(inf)
+ add #-1,r0
+ tst r3,DBL0L /* Check if msb guard bit was rounded up. */
+ mov.l LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits. */
+ addc r2,r0
+ shlr r0
+ tst r3,DBL0L /* Check if msb guard bit wants rounding up. */
+ bt/s LOCAL(denorm_noup)
+ div0s DBL0H,r2 /* copy orig. sign into T. */
+ add #1,r0
+LOCAL(denorm_noup):
+ RETURN_R0
+ rotcr r0
+LOCAL(ill_exp):
+ div0s DBL0H,r1
+ mov.l LOCAL(x7ff80000),r2
+ add r1,r1
+ bf LOCAL(inf_nan)
+ mov.w LOCAL(m32),r3 /* Handle denormal or zero. */
+ mov #-21,r2
+ shad r2,r1
+ add #-8,r1 /* Go from 9 to 1 guard bit in MSW. */
+ cmp/gt r3,r1
+ mov.l @r15+,r3 /* DBL0L */
+ bf LOCAL(zero)
+ mov.l DBL0L, @-r15
+ shll8 DBL0L
+ rotcr r0 /* Insert leading 1. */
+ shld r2,r3
+ cmp/pl DBL0L /* Check lower 23 guard bits if guard bit 23 is 0. */
+ addc r3,r0 /* Assemble fraction with compressed guard bits. */
+ mov r0,r2
+ shld r1,r0
+ mov.l @r15+,DBL0L
+ add #32,r1
+ shld r1,r2
+ tst #2,r0
+ rotcl r0
+ tst r2,r2
+ rotcl r0
+ xor #3,r0
+ add #3,r0 /* Even overflow gives the correct result. */
+ shlr2 r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(zero):
+ mov #0,r0
+ div0s r0,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(inf_nan):
+ not DBL0H,r0
+ tst r2,r0
+ mov.l @r15+,DBL0L
+ bf LOCAL(inf)
+ RETURN_R0
+ mov #-1,r0 /* NAN */
+LOCAL(inf): /* r2 must be positive here. */
+ mov.l LOCAL(xff000000),r0
+ div0s r2,DBL0H
+ RETURN_R0
+ rotcr r0
+LOCAL(m32):
+ .word -32
+ .balign 4
+LOCAL(x38000000):
+ .long 0x38000000
+LOCAL(x70000000):
+ .long 0x70000000
+LOCAL(x2fffffff):
+ .long 0x2fffffff
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x5fffffff):
+ .long 0x5fffffff
+LOCAL(x7ff80000):
+ .long 0x7ff80000
+ ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+
+
+#ifdef L_add_sub_df3
+#include "IEEE-754/m3/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/m3/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/m3/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/m3/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/m3/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/m3/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/m3/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_DOUBLE__ */
Index: gcc/config/sh/predicates.md
===================================================================
--- gcc/config/sh/predicates.md (revision 162269)
+++ gcc/config/sh/predicates.md (working copy)
@@ -719,6 +719,33 @@ (define_predicate "shift_operator"
(define_predicate "symbol_ref_operand"
(match_code "symbol_ref"))
+(define_special_predicate "soft_fp_comparison_operand"
+ (match_code "subreg,reg")
+{
+ switch (GET_MODE (op))
+ {
+ default:
+ return 0;
+ case CC_FP_NEmode: case CC_FP_GTmode: case CC_FP_UNLTmode:
+ break;
+ }
+ return register_operand (op, mode);
+})
+
+(define_predicate "soft_fp_comparison_operator"
+ (match_code "eq, unle, ge")
+{
+ switch (GET_CODE (op))
+ {
+ default:
+ return 0;
+ case EQ: mode = CC_FP_NEmode; break;
+ case UNLE: mode = CC_FP_GTmode; break;
+ case GE: mode = CC_FP_UNLTmode; break;
+ }
+ return register_operand (XEXP (op, 0), mode);
+})
+
;; Same as target_reg_operand, except that label_refs and symbol_refs
;; are accepted before reload.
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c (revision 162269)
+++ gcc/config/sh/sh.c (working copy)
@@ -284,6 +284,7 @@ static int sh_arg_partial_bytes (CUMULAT
tree, bool);
static bool sh_scalar_mode_supported_p (enum machine_mode);
static int sh_dwarf_calling_convention (const_tree);
+static void sh_expand_float_condop (rtx *operands, rtx, rtx (*[2]) (rtx));
static void sh_encode_section_info (tree, rtx, int);
static int sh2a_function_vector_p (tree);
static void sh_trampoline_init (rtx, tree, rtx);
@@ -551,6 +552,9 @@ static const struct attribute_spec sh_at
/* Machine-specific symbol_ref flags. */
#define SYMBOL_FLAG_FUNCVEC_FUNCTION (SYMBOL_FLAG_MACH_DEP << 0)
+#undef TARGET_MATCH_ADJUST
+#define TARGET_MATCH_ADJUST sh_match_adjust
+
struct gcc_target targetm = TARGET_INITIALIZER;
\f
/* Implement TARGET_HANDLE_OPTION. */
@@ -2180,6 +2184,72 @@ sh_emit_cheap_store_flag (enum machine_m
return gen_rtx_fmt_ee (code, VOIDmode, target, const0_rtx);
}
+static rtx
+sh_soft_fp_cmp (int code, enum machine_mode op_mode, rtx op0, rtx op1)
+{
+ const char *name = NULL;
+ rtx (*fun) (rtx, rtx), addr, tmp, last, equiv;
+ int df = op_mode == DFmode;
+ enum machine_mode mode = CODE_FOR_nothing; /* shut up warning. */
+
+ switch (code)
+ {
+ case EQ:
+ if (!flag_finite_math_only)
+ {
+ name = df ? "__nedf2" : "__nesf2";
+ fun = df ? gen_cmpnedf_i1 : gen_cmpnesf_i1;
+ mode = CC_FP_NEmode;
+ break;
+ } /* Fall through. */
+ case UNEQ:
+ fun = gen_cmpuneq_sdf;
+ break;
+ case UNLE:
+ if (flag_finite_math_only && !df)
+ {
+ fun = gen_cmplesf_i1_finite;
+ break;
+ }
+ name = df ? "__gtdf2t" : "__gtsf2t";
+ fun = df ? gen_cmpgtdf_i1 : gen_cmpgtsf_i1;
+ mode = CC_FP_GTmode;
+ break;
+ case GE:
+ if (flag_finite_math_only && !df)
+ {
+ tmp = op0; op0 = op1; op1 = tmp;
+ fun = gen_cmplesf_i1_finite;
+ break;
+ }
+ name = df ? "__gedf2f" : "__gesf2f";
+ fun = df ? gen_cmpunltdf_i1 : gen_cmpunltsf_i1;
+ mode = CC_FP_UNLTmode;
+ break;
+ case UNORDERED:
+ fun = gen_cmpun_sdf;
+ break;
+ default: gcc_unreachable ();
+ }
+
+ if (!name)
+ return fun (force_reg (op_mode, op0), force_reg (op_mode, op1));
+
+ tmp = gen_reg_rtx (mode);
+ addr = gen_reg_rtx (Pmode);
+ function_symbol (addr, name, SFUNC_STATIC);
+ emit_move_insn (gen_rtx_REG (op_mode, R4_REG), op0);
+ emit_move_insn (gen_rtx_REG (op_mode, R5_REG + df), op1);
+ last = emit_insn (fun (tmp, addr));
+ equiv = gen_rtx_fmt_ee (COMPARE, mode, op0, op1);
+ set_unique_reg_note (last, REG_EQUAL, equiv);
+ /* Use fpcmp_i1 rather than cmpeqsi_t, so that the optimizers can grok
+ the computation. */
+ return gen_rtx_SET (VOIDmode,
+ gen_rtx_REG (SImode, T_REG),
+ gen_rtx_fmt_ee (code, SImode, tmp, CONST0_RTX (mode)));
+}
+
/* Called from the md file, set up the operands of a compare instruction. */
void
@@ -8662,6 +8732,49 @@ sh_fix_range (const char *const_str)
str = comma + 1;
}
}
+
+/* Expand an sfunc operation taking NARGS MODE arguments, using generator
+ function FUN, which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using EQUIV. */
+static void
+expand_sfunc_op (int nargs, enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, rtx equiv, rtx *operands)
+{
+ int next_reg = FIRST_PARM_REG, i;
+ rtx addr, last, insn;
+
+ addr = gen_reg_rtx (Pmode);
+ function_symbol (addr, name, SFUNC_FREQUENT);
+ for ( i = 1; i <= nargs; i++)
+ {
+ insn = emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
+ next_reg += GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+ }
+ last = emit_insn ((*fun) (operands[0], addr));
+ set_unique_reg_note (last, REG_EQUAL, equiv);
+}
+
+/* Expand an sfunc unary operation taking an MODE argument, using generator
+ function FUN, which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using CODE. */
+void
+expand_sfunc_unop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, enum rtx_code code, rtx *operands)
+{
+ rtx equiv = gen_rtx_fmt_e (code, GET_MODE (operands[0]), operands[1]);
+ expand_sfunc_op (1, mode, fun, name, equiv, operands);
+}
+
+/* Expand an sfunc binary operation in MODE, using generator function FUN,
+ which needs symbol NAME loaded int a register first.
+ Add a REG_EQUAL note using CODE. */
+void
+expand_sfunc_binop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+ const char *name, enum rtx_code code, rtx *operands)
+{
+ rtx equiv = gen_rtx_fmt_ee (code, mode, operands[1], operands[2]);
+ expand_sfunc_op (2, mode, fun, name, equiv, operands);
+}
\f
/* Insert any deferred function attributes from earlier pragmas. */
static void
@@ -11593,11 +11706,10 @@ function_symbol (rtx target, const char
{
rtx sym;
- /* If this is not an ordinary function, the name usually comes from a
- string literal or an sprintf buffer. Make sure we use the same
+ /* The name usually comes from a string literal or an sprintf buffer.
+ Make sure we use the same
string consistently, so that cse will be able to unify address loads. */
- if (kind != FUNCTION_ORDINARY)
- name = IDENTIFIER_POINTER (get_identifier (name));
+ name = IDENTIFIER_POINTER (get_identifier (name));
sym = gen_rtx_SYMBOL_REF (Pmode, name);
SYMBOL_REF_FLAGS (sym) = SYMBOL_FLAG_FUNCTION;
if (flag_pic)
@@ -11605,6 +11717,10 @@ function_symbol (rtx target, const char
{
case FUNCTION_ORDINARY:
break;
+ case SFUNC_FREQUENT:
+ if (!optimize || optimize_size)
+ break;
+ /* Fall through. */
case SFUNC_GOT:
{
rtx reg = target ? target : gen_reg_rtx (Pmode);
@@ -11715,6 +11831,168 @@ sh_expand_t_scc (rtx operands[])
return 1;
}
+void
+sh_expand_float_cbranch (rtx operands[4])
+{
+ static rtx (*branches[]) (rtx) = { gen_branch_true, gen_branch_false };
+
+ sh_expand_float_condop (operands, operands[3], branches);
+}
+
+void
+sh_expand_float_scc (rtx operands[4])
+{
+ static rtx (*movts[]) (rtx) = { gen_movt, gen_movnegt };
+
+ sh_expand_float_condop (&operands[1], operands[0], movts);
+}
+
+/* The first element of USER is for positive logic, the second one for
+ negative logic. */
+static void
+sh_expand_float_condop (rtx *operands, rtx dest, rtx (*user[2]) (rtx))
+{
+ enum machine_mode mode = GET_MODE (operands[1]);
+ enum rtx_code comparison = GET_CODE (operands[0]);
+ int swap_operands = 0;
+ rtx op0, op1;
+ rtx lab = NULL_RTX;
+
+ if (TARGET_SH1_SOFTFP_MODE (mode))
+ {
+ switch (comparison)
+ {
+ case NE:
+ comparison = EQ;
+ user++;
+ break;
+ case LT:
+ swap_operands = 1; /* Fall through. */
+ case GT:
+ comparison = UNLE;
+ user++;
+ break;
+ case UNGT:
+ swap_operands = 1; /* Fall through. */
+ case UNLT:
+ comparison = GE;
+ user++;
+ break;
+ case UNGE:
+ swap_operands = 1;
+ comparison = UNLE;
+ break;
+ case LE:
+ swap_operands = 1;
+ comparison = GE; /* Fall through. */
+ case EQ:
+ case UNEQ:
+ case GE:
+ case UNLE:
+ case UNORDERED:
+ break;
+ case LTGT:
+ comparison = UNEQ;
+ user++;
+ break;
+ case ORDERED:
+ comparison = UNORDERED;
+ user++;
+ break;
+
+ default: gcc_unreachable ();
+ }
+ }
+ else /* SH2E .. SH4 Hardware floating point */
+ {
+ switch (comparison)
+ {
+ case LTGT:
+ if (!flag_finite_math_only)
+ break;
+ /* Fall through. */
+ case NE:
+ comparison = EQ;
+ user++;
+ break;
+ case LT:
+ swap_operands = 1;
+ comparison = GT; /* Fall through. */
+ case GT:
+ case EQ:
+ case ORDERED:
+ break;
+ case LE:
+ swap_operands = 1;
+ comparison = GE; /* Fall through. */
+ case GE:
+ if (flag_finite_math_only)
+ {
+ swap_operands ^= 1;
+ comparison = GT;
+ user++;
+ break;
+ }
+ break;
+ case UNGT:
+ swap_operands = 1; /* Fall through. */
+ case UNLT:
+ if (flag_finite_math_only)
+ {
+ swap_operands ^= 1;
+ comparison = GT;
+ break;
+ }
+ comparison = GE;
+ user++;
+ break;
+ case UNGE:
+ swap_operands = 1; /* Fall through. */
+ case UNLE:
+ comparison = GT;
+ user++;
+ break;
+ case UNEQ:
+ if (flag_finite_math_only)
+ {
+ comparison = EQ;
+ break;
+ }
+ comparison = LTGT;
+ user++;
+ break;
+ case UNORDERED:
+ comparison = ORDERED;
+ user++;
+ break;
+
+ default: gcc_unreachable ();
+ }
+ operands[1] = force_reg (mode, operands[1]);
+ operands[2] = force_reg (mode, operands[2]);
+ if (comparison == GE)
+ {
+ lab = gen_label_rtx ();
+ sh_emit_scc_to_t (GT, operands[1+swap_operands],
+ operands[2-swap_operands]);
+ emit_jump_insn (gen_branch_true (lab));
+ comparison = EQ;
+ }
+ }
+ op0 = operands[1+swap_operands];
+ op1 = operands[2-swap_operands];
+ if (GET_MODE_CLASS (mode) == MODE_FLOAT && TARGET_SH1_SOFTFP_MODE (mode))
+ emit_insn (sh_soft_fp_cmp (comparison, mode, op0, op1));
+ else
+ sh_emit_set_t_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, T_REG),
+ gen_rtx_fmt_ee (comparison, SImode,
+ op0, op1)),
+ mode);
+ if (lab)
+ emit_label (lab);
+ emit ((*user) (dest));
+}
+
/* INSN is an sfunc; return the rtx that describes the address used. */
static rtx
extract_sfunc_addr (rtx insn)
@@ -12266,6 +12544,19 @@ sh_secondary_reload (bool in_p, rtx x, r
return NO_REGS;
}
+int
+sh_match_adjust (rtx x, int regno)
+{
+ /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+ multiple hard register group of scalar integer registers, so that
+ for example (reg:DI 0) and (reg:SI 1) will be considered the same
+ register. */
+ if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+ && regno < FIRST_PSEUDO_REGISTER)
+ regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+ return regno;
+}
+
enum sh_divide_strategy_e sh_div_strategy = SH_DIV_STRATEGY_DEFAULT;
#include "gt-sh.h"
Index: gcc/config/sh/sh.h
===================================================================
--- gcc/config/sh/sh.h (revision 162269)
+++ gcc/config/sh/sh.h (working copy)
@@ -183,6 +183,11 @@ do { \
#define TARGET_FPU_DOUBLE \
((target_flags & MASK_SH4) != 0 || TARGET_SH2A_DOUBLE)
+#define TARGET_SH1_SOFTFP (TARGET_SH1 && !TARGET_FPU_DOUBLE)
+
+#define TARGET_SH1_SOFTFP_MODE(MODE) \
+ (TARGET_SH1_SOFTFP && (!TARGET_SH2E || (MODE) == DFmode))
+
/* Nonzero if an FPU is available. */
#define TARGET_FPU_ANY (TARGET_SH2E || TARGET_FPU_DOUBLE)
@@ -329,6 +334,38 @@ do { \
#define SUPPORT_ANY_SH5 \
(SUPPORT_ANY_SH5_32MEDIA || SUPPORT_ANY_SH5_64MEDIA)
+/* Check if we have support for optimized software floating point using
+ dynamic shifts - then some function calls clobber fewer registers. */
+#ifdef SUPPORT_SH3
+#define SUPPORT_SH3_OSFP 1
+#else
+#define SUPPORT_SH3_OSFP 0
+#endif
+
+#ifdef SUPPORT_SH3E
+#define SUPPORT_SH3E_OSFP 1
+#else
+#define SUPPORT_SH3E_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_NOFPU) || defined(SUPPORT_SH3_OSFP)
+#define SUPPORT_SH4_NOFPU_OSFP 1
+#else
+#define SUPPORT_SH4_NOFPU_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_SINGLE_ONLY) || defined (SUPPORT_SH3E_OSFP)
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 1
+#else
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 0
+#endif
+
+#define TARGET_OSFP (0 \
+ || (TARGET_SH3 && !TARGET_SH2E && SUPPORT_SH3_OSFP) \
+ || (TARGET_SH3E && SUPPORT_SH3E_OSFP) \
+ || (TARGET_HARD_SH4 && !TARGET_SH2E && SUPPORT_SH4_NOFPU_OSFP) \
+ || (TARGET_HARD_SH4 && TARGET_SH2E && SUPPORT_SH4_SINGLE_ONLY_OSFP))
+
/* Reset all target-selection flags. */
#define MASK_ARCH (MASK_SH1 | MASK_SH2 | MASK_SH3 | MASK_SH_E | MASK_SH4 \
| MASK_HARD_SH2A | MASK_HARD_SH2A_DOUBLE | MASK_SH4A \
@@ -2047,6 +2084,12 @@ struct sh_args {
#define LIBGCC2_DOUBLE_TYPE_SIZE 64
#endif
+#if defined(__SH2E__) || defined(__SH3E__) || defined( __SH4_SINGLE_ONLY__)
+#define LIBGCC2_DOUBLE_TYPE_SIZE 32
+#else
+#define LIBGCC2_DOUBLE_TYPE_SIZE 64
+#endif
+
/* 'char' is signed by default. */
#define DEFAULT_SIGNED_CHAR 1
Index: gcc/config/sh/sh-modes.def
===================================================================
--- gcc/config/sh/sh-modes.def (revision 162269)
+++ gcc/config/sh/sh-modes.def (working copy)
@@ -22,6 +22,11 @@ PARTIAL_INT_MODE (SI);
/* PDI mode is used to represent a function address in a target register. */
PARTIAL_INT_MODE (DI);
+/* For software floating point comparisons. */
+CC_MODE (CC_FP_NE);
+CC_MODE (CC_FP_GT);
+CC_MODE (CC_FP_UNLT);
+
/* Vector modes. */
VECTOR_MODE (INT, QI, 2); /* V2QI */
VECTOR_MODES (INT, 4); /* V4QI V2HI */
Index: gcc/config/sh/lib1funcs.h
===================================================================
--- gcc/config/sh/lib1funcs.h (revision 162269)
+++ gcc/config/sh/lib1funcs.h (working copy)
@@ -64,13 +64,151 @@ see the files COPYING3 and COPYING.RUNTI
#endif /* !__LITTLE_ENDIAN__ */
#ifdef __sh1__
+/* branch with two-argument delay slot insn */
#define SL(branch, dest, in_slot, in_slot_arg2) \
in_slot, in_slot_arg2; branch dest
+/* branch with one-argument delay slot insn */
#define SL1(branch, dest, in_slot) \
in_slot; branch dest
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+ branch dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg2) in_slot, in_slot_arg2
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+ branch .+6; bra .+6; cmp2, cmp2arg2; cmp1, cmp1arg2
+#define DMULU_SAVE \
+ mov.l r10,@-r15; \
+ mov.l r11,@-r15; \
+ mov.l r12,@-r15; \
+ mov.l r13,@-r15
+#define DMULUL(m1, m2, rl) \
+ swap.w m1,r12; \
+ mulu.w r12,m2; \
+ swap.w m2,r13; \
+ sts macl,r10; \
+ mulu.w r13,m1; \
+ clrt; \
+ sts macl,r11; \
+ mulu.w r12,r13; \
+ addc r11,r10; \
+ sts macl,r12; \
+ mulu.w m1,m2; \
+ movt r11; \
+ sts macl,rl; \
+ mov r10,r13; \
+ shll16 r13; \
+ addc r13,rl; \
+ xtrct r11,r10; \
+ addc r10,r12 \
+/* N.B. the carry is cleared here. */
+#define DMULUH(rh) mov r12,rh
+#define DMULU_RESTORE \
+ mov.l @r15+,r13; \
+ mov.l @r15+,r12; \
+ mov.l @r15+,r11; \
+ mov.l @r15+,r10
#else /* ! __sh1__ */
+/* branch with two-argument delay slot insn */
#define SL(branch, dest, in_slot, in_slot_arg2) \
- branch##.s dest; in_slot, in_slot_arg2
+ branch##/s dest; in_slot, in_slot_arg2
+/* branch with one-argument delay slot insn */
#define SL1(branch, dest, in_slot) \
branch##/s dest; in_slot
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+ branch##/s dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg)
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+ branch##/s .+6; cmp1, cmp1arg2; cmp2, cmp2arg2
+#define DMULU_SAVE
+#define DMULUL(m1, m2, rl) dmulu.l m1,m2; sts macl,rl
+#define DMULUH(rh) sts mach,rh
+#define DMULU_RESTORE
#endif /* !__sh1__ */
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+/* don't #define DYN_SHIFT */
+ #define SHLL4(REG) \
+ shll2 REG; \
+ shll2 REG
+
+ #define SHLR4(REG) \
+ shlr2 REG; \
+ shlr2 REG
+
+ #define SHLL6(REG) \
+ shll2 REG; \
+ shll2 REG; \
+ shll2 REG
+
+ #define SHLR6(REG) \
+ shlr2 REG; \
+ shlr2 REG; \
+ shlr2 REG
+
+ #define SHLL12(REG) \
+ shll8 REG; \
+ SHLL4 (REG)
+
+ #define SHLR12(REG) \
+ shlr8 REG; \
+ SHLR4 (REG)
+
+ #define SHLR19(REG) \
+ shlr16 REG; \
+ shlr2 REG; \
+ shlr REG
+
+ #define SHLL23(REG) \
+ shll16 REG; \
+ shlr REG; \
+ shll8 REG
+
+ #define SHLR24(REG) \
+ shlr16 REG; \
+ shlr8 REG
+
+ #define SHLR21(REG) \
+ shlr16 REG; \
+ shll2 REG; \
+ add REG,REG;\
+ shlr8 REG
+
+ #define SHLL21(REG) \
+ shll16 REG; \
+ SHLL4 (REG); \
+ add REG,REG
+
+ #define SHLR11(REG) \
+ shlr8 REG; \
+ shlr2 REG; \
+ shlr REG
+
+ #define SHLR22(REG) \
+ shlr16 REG; \
+ shll2 REG; \
+ shlr8 REG
+
+ #define SHLR23(REG) \
+ shlr16 REG; \
+ add REG,REG;\
+ shlr8 REG
+
+ #define SHLR20(REG) \
+ shlr16 REG; \
+ SHLR4 (REG)
+
+ #define SHLL20(REG) \
+ shll16 REG; \
+ SHLL4 (REG)
+#define SHLD_COUNT(N,COUNT)
+#define SHLRN(N,COUNT,REG) SHLR##N(REG)
+#define SHLLN(N,COUNT,REG) SHLL##N(REG)
+#else
+#define SHLD_COUNT(N,COUNT) mov #N,COUNT
+#define SHLRN(N,COUNT,REG) shld COUNT,REG
+#define SHLLN(N,COUNT,REG) shld COUNT,REG
+#define DYN_SHIFT 1
+#endif
Index: gcc/config/sh/IEEE-754/m3/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/divsf3.S (revision 0)
@@ -0,0 +1,360 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! divsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+! long 0th..3rd significant byte
+#ifdef __LITTLE_ENDIAN__
+#define L0SB 3
+#define L1SB 2
+#define L2SB 1
+#define L3SB 0
+#else
+#define L0SB 0
+#define L1SB 1
+#define L2SB 2
+#define L3SB 3
+#endif
+
+! clobbered: r0,r1,r2,r3,r6,r7,T (and for sh.md's purposes PR)
+!
+! Note: When the divisor is larger than the divident, we have to adjust the
+! exponent down by one. We do this automatically when subtracting the entire
+! exponent/fraction bitstring as an integer, by means of the borrow from
+! bit 23 to bit 24.
+! Note: non-denormal rounding of a division result cannot cause fraction
+! overflow / exponent change. (r4 > r5 : fraction must stay in (2..1] interval;
+! r4 < r5: having an extra bit of precision available, even the smallest
+! possible difference of the result from one is rounded in all rounding modes
+! to a fraction smaller than one.)
+! sh4-200: 59 cycles
+! sh4-300: 44 cycles
+! tab indent: exponent / sign computations
+! tab+space indent: fraction computation
+FUNC(GLOBAL(divsf3))
+ .global GLOBAL(divsf3)
+ .balign 4
+GLOBAL(divsf3):
+ mov.l LOCAL(x7f800000),r3
+ mov #1,r2
+ mov r4,r6
+ shll8 r6
+ mov r5,r7
+ shll8 r7
+ rotr r2
+ tst r3,r4
+ or r2,r6
+ bt/s LOCAL(denorm_arg0)
+ or r2,r7
+ tst r3,r5
+ bt LOCAL(denorm_arg1)
+ shlr r6
+ mov.l LOCAL(x3f000000),r3 ! bias minus explict leading 1
+ div0u
+LOCAL(denorm_done):
+ div1 r7,r6
+ mov.l r8,@-r15
+ bt 0f
+ div1 r7,r6
+0: mov.l r9,@-r15
+ div1 r7,r6
+ add r4,r3
+ div1 r7,r6
+ sub r5,r3 ! result sign/exponent minus 1 if no overflow/underflow
+ div1 r7,r6
+ or r3,r2
+ div1 r7,r6
+ mov.w LOCAL(xff00),r9
+ div1 r7,r6
+ mov.l r2,@-r15 ! L0SB is 0xff iff denorm / infinity exp is computed
+ div1 r7,r6
+ mov.w LOCAL(m23),r2
+ div1 r7,r6
+ mov r4,r0
+ div1 r7,r6
+ extu.b r6,r1
+ and r9,r6
+ swap.w r1,r1 ! first 8 bits of result fraction in bit 23..16
+ div1 r7,r6
+ shld r2,r0
+ div1 r7,r6
+ mov.b r0,@(L3SB,r15) ! 0xff iff divident was infinity / nan
+ div1 r7,r6
+ mov r5,r0
+ div1 r7,r6
+ shld r2,r0
+ div1 r7,r6
+ mov.b r0,@(L2SB,r15) ! 0xff iff divisor was infinity / nan
+ div1 r7,r6
+ mov r4,r0
+ div1 r7,r6
+ mov.w LOCAL(m31),r2
+ div1 r7,r6
+ extu.b r6,r8 ! second 8 bits of result fraction in bit 7..0
+ and r9,r6
+ mov.l LOCAL(xff800000),r9
+ div1 r7,r6
+ xor r5,r0 ! msb := correct result sign
+ div1 r7,r6
+ xor r3,r0 ! xor with sign of result sign/exponent word
+ div1 r7,r6
+ shad r2,r0
+ div1 r7,r6
+ mov.b r0,@(L1SB,r15) ! 0xff iff exponent over/underflows
+ and r9,r3 ! isolate sign / exponent
+ mov.w LOCAL(xff01),r2
+ div1 r7,r6
+ swap.b r8,r0 ! second 8 bits of result fraction in bit 15..8
+ div1 r7,r6
+ or r1,r0 ! first 16 bits of result fraction in bit 23..8
+ div1 r7,r6
+ mov.w LOCAL(m1),r9
+ div1 r7,r6
+ mov.l @r15+,r8 ! load encoding of unusal exponent conditions
+ and r6,r2 ! rest | result lsb
+ mov #0,r1
+ bf 0f ! bit below lsb clear -> no rounding
+ cmp/hi r1,r2
+0: extu.b r6,r1
+ or r1,r0 ! 24 bit result fraction with explicit leading 1
+ addc r3,r0 ! add in exponent / sign
+ cmp/str r9,r8
+ ! (no stall *here* for SH4-100 / SH4-200)
+ bt/s LOCAL(inf_nan_denorm_zero)
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+/* The exponennt adjustment for denormal numbers is done by leaving an
+ adjusted value in r3; r4/r5 are not changed. */
+ .balign 4
+LOCAL(denorm_arg0):
+ mov.w LOCAL(xff00),r1
+ sub r2,r6 ! 0x800000000 : remove implict 1
+ tst r6,r6
+ sts.l pr,@-r15
+ bt LOCAL(div_zero)
+ bsr LOCAL(clz)
+ mov r6,r0
+ shld r0,r6
+ tst r3,r5
+ mov.l LOCAL(x3f800000),r3 ! bias - 1 + 1
+ mov #23,r1
+ shld r1,r0
+ bt/s LOCAL(denorm_arg1_2)
+ sub r0,r3
+ shlr r6
+ bra LOCAL(denorm_done)
+ div0u
+
+LOCAL(denorm_arg1):
+ mov.l LOCAL(x3f000000),r3 ! bias - 1
+LOCAL(denorm_arg1_2):
+ sub r2,r7 ! 0x800000000 : remove implict 1
+ mov.w LOCAL(xff00),r1
+ tst r7,r7
+ sts.l pr,@-r15
+ bt LOCAL(div_by_zero)
+ bsr LOCAL(clz)
+ mov r7,r0
+ shld r0,r7
+ add #-1,r0
+ mov #23,r1
+ shld r1,r0
+ add r0,r3
+ shlr r6
+ bra LOCAL(denorm_done)
+ div0u
+
+ .balign 4
+LOCAL(inf_nan_denorm_zero):
+! r0 has the rounded result, r6 has the non-rounded lowest bits & rest.
+! the bit just below the LSB of r6 is available as ~Q
+
+! Alternative way to get at ~Q:
+! if rounding took place, ~Q must be set.
+! if the rest appears to be zero, ~Q must be set.
+! if the rest appears to be nonzero, but rounding didn't take place,
+! ~Q must be clear; the apparent rest will then require adjusting to test if
+! the actual rest is nonzero.
+ mov r0,r2
+ not r8,r0
+ tst #0xff,r0
+ shlr8 r0
+ mov.l @r15+,r8
+ bt/s LOCAL(div_inf_or_nan)
+ tst #0xff,r0
+ mov r4,r0
+ bt LOCAL(div_by_inf_or_nan)
+ add r0,r0
+ mov r5,r1
+ add r1,r1
+ cmp/hi r1,r0
+ mov r6,r0
+ bt LOCAL(overflow)
+ sub r2,r0
+ exts.b r0,r0 ! -1 if rounding took place
+ shlr8 r6 ! isolate div1-mangled rest
+ addc r2,r0 ! generate carry if rounding took place
+ shlr8 r7
+ sub r3,r0 ! pre-rounding fraction
+ bt 0f ! going directly to denorm_sticky would cause mispredicts
+ tst r6,r6 ! rest can only be zero if lost bit was set
+0: add r7,r6 ! (T ? corrupt : reconstruct) actual rest
+ bt 0f
+ cmp/pl r6
+0: mov.w LOCAL(m24),r1
+ addc r0,r0 ! put in sticky bit
+ add #-1,r3
+ mov.l LOCAL(x40000000),r6
+ add r3,r3
+ mov r0,r2
+ shad r1,r3 ! exponent ; s32.0
+ !
+ shld r3,r0
+ add #30,r3
+ cmp/pl r3
+ shld r3,r2
+ bf LOCAL(zero_nan) ! return zero
+ rotl r2
+ cmp/hi r6,r2
+ mov #0,r7
+ addc r7,r0
+ div0s r4,r5
+ rts
+ rotcr r0
+
+! ????
+! undo normal rounding (lowest bits still in r6). then do denormal rounding.
+
+LOCAL(overflow):
+ mov.l LOCAL(xff000000),r0
+ div0s r4,r5
+ rts
+ rotcl r0
+
+LOCAL(div_inf_or_nan):
+ mov r4,r0
+ bra LOCAL(nan_if_t)
+ add r0,r0
+
+LOCAL(div_by_inf_or_nan):
+ mov.l LOCAL(xff000000),r1
+ mov #0,r0
+ mov r5,r2
+ add r2,r2
+ bra LOCAL(nan_if_t)
+ cmp/hi r1,r2
+
+
+
+! still need to check for divide by zero or divide by nan
+! r3: 0x7f800000
+ .balign 4
+LOCAL(div_zero):
+ mov r5,r1
+ add r1,r1
+ tst r1,r1 ! 0 / 0 -> nan
+ not r5,r1
+ bt LOCAL(nan)
+ add r3,r3
+ cmp/hi r3,r1 ! 0 / nan -> nan (but 0 / inf -> 0)
+LOCAL(zero_nan):
+ mov #0,r0
+LOCAL(nan_if_t):
+ bf 0f:
+LOCAL(nan):
+ mov #-1,r0
+0: div0s r4,r5 ! compute sign
+ rts
+ rotcr r0 ! insert sign
+
+LOCAL(div_by_zero):
+ mov.l LOCAL(xff000000),r0
+ mov r5,r2
+ add r2,r2
+ bra LOCAL(nan_if_t)
+ cmp/hi r0,r2
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #8-8,r9
+ xtrct r0,r8
+ add #16,r9
+0: tst r1,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(m23): .word -23
+LOCAL(m24): .word -24
+LOCAL(m31): .word -31
+LOCAL(xff01): .word 0xff01
+ .balign 4
+LOCAL(xff000000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(xff00): .word 0xff00
+LOCAL(m1): .word -1
+#else
+LOCAL(m1): .word -1
+LOCAL(xff00): .word 0xff00
+#endif
+LOCAL(x7f800000): .long 0x7f800000
+LOCAL(x3f000000): .long 0x3f000000
+LOCAL(x3f800000): .long 0x3f800000
+LOCAL(xff800000): .long 0xff800000
+LOCAL(x40000000): .long 0x40000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(divsf3))
Index: gcc/config/sh/IEEE-754/m3/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3.S (revision 0)
@@ -0,0 +1,603 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* y = 1/x ; x (- [1,2)
+ y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+ y1 = y0 - ((y0) * x - 1) * y0 = y-x*d^2
+ y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+ z0 = y2*a ; a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+ z1 = y2*a1 (round to nearest odd 0.5 ulp);
+ a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+ z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+ Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+ with suitable scaling and/or top truncation.
+ We use a slightly modified algorithm here that checks if the lower
+ bits in z1 are sufficient to determine the outcome of rounding - in that
+ case a2 is not computed.
+ -z1 is computed in units of 1/128 ulp, with an error in the range
+ -0x3.e/128 .. +0 ulp.
+ Thus, after adding three, the result can be safely rounded for normal
+ numbers if any of the bits 5..2 is set, or if the highest guard bit
+ (bit 6 if y <1, otherwise bit 7) is set.
+ (Because of the way truncation works, we would be fine for an open
+ error interval of (-4/128..+1/128) ulp )
+ For denormal numbers, the rounding point lies higher, but it would be
+ quite cumbersome to calculate where exactly; it is sufficient if any
+ of the bits 7..3 is set.
+ x truncated to 20 bits is sufficient to calculate y0 or even y1.
+ Table entries are adjusted by about +128 to use full signed byte range.
+ This adjustment has been perturbed slightly to allow cse with the
+ shift count constant -26.
+ The threshold point for the shift adjust before rounding is found by
+ comparing the fractions, which is exact, unlike the top bit of y2.
+ Therefore, the top bit of y2 becomes slightly random after the adjustment
+ shift, but that's OK because this can happen only at the boundaries of
+ the interval, and the biasing of the error means that it can in fact happen
+ only at the bottom end. And there, the carry propagation will make sure
+ that in the end we will have in effect an implicit 1 (or two whem rounding
+ up...) */
+/* If an exact result exists, it can have no more bits than the divident.
+ Hence, we don't need to bother with the round-to-even tie breaker
+ unless the result is denormalized. */
+/* 64 cycles through main path for sh4-300 (about 93.7% of normalized numbers),
+ 82 for the path for rounding tie-breaking for normalized numbers
+ (including one branch mispredict).
+ Some cycles might be saved by more careful register allocation. */
+
+#define x_h r12
+#define yn r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too. We still have to come back to denorm_arg1_done,
+ since we heven't done any of the work yet that we do till the denorm_arg0
+ entry point. We know that neither of the arguments is inf/nan, but
+ arg0 might be zero. Check for that first to avoid having to establish an
+ rts return address. */
+LOCAL(both_denorm):
+ mov.l r9,@-r15
+ mov DBL0H,r1
+ mov.l r0,@-r15
+ shll2 r1
+ mov.w LOCAL(both_denorm_cleanup_off),r9
+ or DBL0L,r1
+ tst r1,r1
+ mov DBL0H,r0
+ bf/s LOCAL(zero_denorm_arg0_1)
+ shll2 r0
+ mov.l @(4,r15),r9
+ add #8,r15
+ bra LOCAL(ret_inf_nan_0)
+ mov r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+ mov.l @r15+,r0
+ !
+ mov.l @r15+,r9
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ bra LOCAL(denorm_arg1_done)
+ !
+ add r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+ (implicit 1). To leave the result exponent unaltered, the other
+ argument's exponent is adjusted by the the shift count. */
+
+ .balign 4
+LOCAL(arg0_tiny):
+ bsr LOCAL(clz)
+ mov DBL0L,r0
+ shll DBL0H
+ add #1,r0
+ mov DBL0L,DBL0H
+ shld r0,DBL0H
+ rotcr DBL0H
+ tst DBL0L,DBL0L /* Check for divide of zero. */
+ add #-33,r0
+ shld r0,DBL0L
+ bf/s LOCAL(adjust_arg1_exp)
+ add #64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign. */
+ mov.l @r15+,r10
+ mov #0,DBLRH
+ mov.l @r15+,r9
+ bra LOCAL(ret_inf_nan_0)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(arg1_tiny):
+ bsr LOCAL(clz)
+ mov DBL1L,r0
+ shll DBL1H
+ add #1,r0
+ mov DBL1L,DBL1H
+ shld r0,DBL1H
+ rotcr DBL1H
+ tst DBL1L,DBL1L /* Check for divide by zero. */
+ add #-33,r0
+ shld r0,DBL1L
+ bf/s LOCAL(adjust_arg0_exp)
+ add #64,r0
+ mov DBL0H,r0
+ add r0,r0
+ tst r0,r0 ! 0 / 0 ?
+ mov #-1,DBLRH
+ bf LOCAL(return_inf)
+ !
+ bt LOCAL(ret_inf_nan_0)
+ !
+
+ .balign 4
+LOCAL(zero_denorm_arg1):
+ not DBL0H,r3
+ mov DBL1H,r0
+ tst r2,r3
+ shll2 r0
+ bt LOCAL(early_inf_nan_arg0)
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg1_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ !
+ shll DBL1H
+ mov DBL1L,r3
+ shld r0,DBL1H
+ shld r0,DBL1L
+ rotcr DBL1H
+ add #-32,r0
+ shld r0,r3
+ add #32,r0
+ or r3,DBL1H
+LOCAL(adjust_arg0_exp):
+ tst r2,DBL0H
+ mov #20,r3
+ shld r3,r0
+ bt LOCAL(both_denorm)
+ add DBL0H,r0
+ div0s r0,DBL0H ! Check for obvious overflow. */
+ not r0,r3 ! Check for more subtle overflow - lest
+ bt LOCAL(return_inf)
+ mov r0,DBL0H
+ tst r2,r3 ! we mistake it for NaN later
+ mov #12,r3
+ bf LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign. */
+ mov #20,r3
+ mov #-2,DBLRH
+ bra LOCAL(ret_inf_nan_0)
+ shad r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan nan/x -> nan */
+LOCAL(inf_nan_arg0):
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+LOCAL(early_inf_nan_arg0):
+ not DBL1H,r3
+ mov DBL0H,DBLRH
+ tst r2,r3 ! both inf/nan?
+ add DBLRH,DBLRH
+ bf LOCAL(ret_inf_nan_0)
+ mov #-1,DBLRH
+LOCAL(ret_inf_nan_0):
+ mov #0,DBLRL
+ mov.l @r15+,r12
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+/* Already handled: inf/x, nan/x . Thus: x/inf -> 0; x/nan -> nan */
+ .balign 4
+LOCAL(inf_nan_arg1):
+ mov DBL1H,r2
+ mov #12,r1
+ shld r1,r2
+ mov.l @r15+,r10
+ mov #0,DBLRL
+ mov.l @r15+,r9
+ or DBL1L,r2
+ mov.l @r15+,r8
+ cmp/hi DBLRL,r2
+ mov.l @r15+,r12
+ subc DBLRH,DBLRH
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(zero_denorm_arg0):
+ mov.w LOCAL(denorm_arg0_done_off),r9
+ not DBL1H,r1
+ mov DBL0H,r0
+ tst r2,r1
+ shll2 r0
+ bt LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg0_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ shll DBL0H
+ mov DBL0L,r12
+ shld r0,DBL0H
+ shld r0,DBL0L
+ rotcr DBL0H
+ add #-32,r0
+ shld r0,r12
+ add #32,r0
+ or r12,DBL0H
+LOCAL(adjust_arg1_exp):
+ mov #20,r12
+ shld r12,r0
+ add DBL1H,r0
+ div0s r0,DBL1H ! Check for obvious underflow. */
+ not r0,r12 ! Check for more subtle underflow - lest
+ bt LOCAL(return_0)
+ mov r0,DBL1H
+ tst r2,r12 ! we mistake it for NaN later
+ bt LOCAL(return_0)
+ !
+ braf r9
+ mov #13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00): .word 0xff00
+LOCAL(denorm_arg0_done_off):
+ .word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+ .word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign 8
+GLOBAL(divdf3):
+ mov.l LOCAL(x7ff00000),r2
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst r2,DBL1H
+ mov.l r12,@-r15
+ bt LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov DBL1H,x_h ! x_h live in r12
+ shld r3,x_h ! x - 1 ; u0.20
+ mov x_h,yn
+ mova LOCAL(ytab),r0
+ mov.l r8,@-r15
+ shld r1,yn ! x-1 ; u26.6
+ mov.b @(r0,yn),yn
+ mov #6,r0
+ mov.l r9,@-r15
+ mov x_h,r8
+ mov.l r10,@-r15
+ shlr16 x_h ! x - 1; u16.16 ! x/2 - 0.5 ; u15.17
+ add x_h,r1 ! SH4-200 single-issues this insn
+ shld r0,yn
+ sub r1,yn ! yn := y0 ; u15.17
+ mov DBL1L,r1
+ mov #-20,r10
+ mul.l yn,x_h ! r12 dead
+ swap.w yn,r9
+ shld r10,r1
+ sts macl,r0 ! y0 * (x-1) - n ; u-1.32
+ add r9,r0 ! y0 * x - 1 ; s-1.32
+ tst r2,DBL0H
+ dmuls.l r0,yn
+ mov.w LOCAL(d13),r0
+ or r1,r8 ! x - 1; u0.32
+ add yn,yn ! yn = y0 ; u14.18
+ bt LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):
+ sts mach,r1 ! d0 ; s14.18
+ sub r1,yn ! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov DBL0L,r12
+ shld r0,yn ! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld r10,r12
+ mov yn,r0
+ mov DBL0H,r8
+ add yn,yn ! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts mach,r1 ! y1 * (x-1); u1.31
+ add r0,r1 ! y1 * x ; u1.31
+ dmulu.l yn,r1
+ not DBL0H,r10
+ shld r9,r8
+ tst r2,r10
+ or r8,r12 ! a - 1; u0.32
+ bt LOCAL(inf_nan_arg0)
+ sts mach,r1 ! d1+yn; u1.31
+ sett ! adjust y2 so that it can be interpreted as s1.31
+ not DBL1H,r10
+ subc r1,yn ! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst r2,r10
+ or DBL1H,r2
+ bt LOCAL(inf_nan_arg1)
+ mov.l r11,@-r15
+ sts mach,r12 ! y2*(a-1) ; u1.31
+ add yn,r12 ! z0 ; u1.31
+ dmulu.l r12,DBL1L
+ mov.l LOCAL(x40000000),DBLRH ! bias + 1
+ and r9,r2 ! x ; u12.20
+ cmp/hi DBL0L,DBL1L
+ sts macl,r8
+ mov #-24,r11
+ sts mach,r9 ! r9:r8 := z0 * DBL1L; u-19.64
+ subc DBL1H,DBLRH
+ mul.l r12,r2 ! (r9+macl):r8 == z0*x; u-19.64
+ shll r8
+ add DBL0H,DBLRH ! result sign/exponent + 1
+ mov r8,r10
+ sts macl,DBLRL
+ add DBLRL,r9
+ rotcl r9 ! r9:r8 := z*x; u-20.63
+ shld r11,r10
+ mov.l LOCAL(x7fe00000),DBLRL
+ sub DBL0L,r9 ! r9:r8 := -a ; u-20.63
+ cmp/pz r9 ! In corner cases this shift can loose ..
+ shll8 r9 ! .. the sign, so check it first.
+ mov.l LOCAL(x00200000),r11
+ or r10,r9 ! -a1 ; s-28.32
+ mov.l LOCAL(x00100000),r10
+ dmulu.l r9,yn ! sign for r9 is in T
+ xor DBL0H,DBL1H ! calculate expected sign & bit20
+ mov.w LOCAL(d120),DBL0H ! to test bits 6..4
+ xor DBLRH,DBL1H
+ !
+ sts mach,DBL0L ! -z1 ; s-27.32
+ bt 0f
+ sub yn,DBL0L ! multiply adjust for -a1 negative; r3 dies here
+0:tst r10,DBL1H ! set T if a >= x
+ mov.l LOCAL(xfff00000),r3
+ bt 0f
+ add DBL0L,DBL0L ! z1 ; s-27.32 / s-28.32
+0:bt 0f
+ add r12,r12 ! z0 ; u1.31 / u0.31
+0:add #6-64,DBL0L
+ and r3,DBLRH ! isolate sign / exponent
+ tst DBL0H,DBL0L
+ bf/s LOCAL(exact) ! make the hot path taken for best branch prediction
+ cmp/pz DBL1H
+
+! Unless we follow the next branch, we need to test which way the rounding
+! should go.
+! For normal numbers, we know that the result is not exact, so the sign
+! of the rest will be conclusive.
+! We generate a number that looks safely rounded so that denorm handling
+! can safely test the number twice.
+! r10:r8 == 0 will indicate if the number was exact, which can happen
+! when we come here for denormals to check a number that is close or
+! equal to a result in whole ulps.
+ bf LOCAL(ret_denorm_inf) ! denorm or infinity, DBLRH has inverted sign
+ add #64,DBL0L
+LOCAL(find_adjust): tst r10,DBL1H ! set T if a >= x
+ mov #-2,r10
+ addc r10,r10
+ mov DBL0L,DBLRL ! z1 ; s-27.32 / s-28.32 ; lower 4 bits unsafe.
+ shad r10,DBLRL ! tentatively rounded z1 ; s-24.32
+ shll8 r8 ! r9:r8 := -a1 ; s-28.64
+ clrt
+ dmuls.l DBLRL,DBL1L ! DBLRL signed, DBL1L unsigned
+ mov r8,r10
+ shll16 r8 ! r8 := lowpart of -a1 ; s-44.48
+ xtrct r9,r10 ! r10 := highpart of -a1 ; s-44.48
+ !
+ sts macl,r3
+ subc r3,r8
+ sts mach,r3
+ subc r3,r10
+ cmp/pz DBL1L
+ mul.l DBLRL,r2
+ bt 0f
+ sub DBLRL,r10 ! adjust for signed/unsigned multiply
+0: mov.l LOCAL(x7fe00000),DBLRL
+ mov #-26,r2
+ sts macl,r9
+ sub r9,r10 ! r10:r8 := -a2
+ add #-64+16,DBL0L ! the denorm code negates this adj. for exact results
+ shld r2,r10 ! convert sign into adjustment in the range 32..63
+ sub r10,DBL0L
+ cmp/pz DBL1H
+
+ .balign 4
+LOCAL(exact):
+ bf LOCAL(ret_denorm_inf) ! denorm or infinity, DBLRH has inverted sign
+ tst DBLRL,DBLRH
+ bt LOCAL(ret_denorm_inf) ! denorm, DBLRH has correct sign
+ mov #-7,DBL1H
+ cmp/pz DBL0L ! T is sign extension of z1
+ not DBL0L,DBLRL
+ subc r11,DBLRH ! calculate sign / exponent minus implicit 1 minus T
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ shad DBL1H,DBLRL
+ mov.l @r15+,r9
+ mov #-11,DBL1H
+ mov r12,r8 ! z0 contributes to DBLRH and DBLRL
+ shld DBL1H,r12
+ mov #21,DBL1H
+ clrt
+ shld DBL1H,r8
+ addc r8,DBLRL
+ mov.l @r15+,r8
+ addc r12,DBLRH
+ rts
+ mov.l @r15+,r12
+
+! sign in DBLRH ^ DBL1H
+! If the last 7 bits are in the range 64..64+7, we might have an exact
+! value in the preceding bits - or we might not. For denorms, we need to
+! find out.
+! if r10:r8 is zero, we just have found out that there is an exact value.
+ .balign 4
+LOCAL(ret_denorm_inf):
+ mov DBLRH,r3
+ add r3,r3
+ div0s DBL1H,r3
+ mov #120,DBLRL
+ bt LOCAL(ret_inf_late)
+ add #64,DBL0L
+ tst DBLRL,DBL0L
+ mov #-21,DBLRL
+ bt LOCAL(find_adjust)
+ or r10,r8
+ tst r8,r8 ! check if find_adjust found an exact value.
+ shad DBLRL,r3
+ bf 0f
+ add #-16,DBL0L ! if yes, cancel adjustment
+0: mov #-8,DBLRL ! remove the three lowest (inexact) bits
+ and DBLRL,DBL0L
+ add #-2-11,r3 ! shift count for denorm generation
+ mov DBL0L,DBLRL
+ mov #28,r2
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ shll2 DBLRL
+ mov.l @r15+,r9
+ shld r2,DBL0L
+ mov.l @r15+,r8
+ mov #-31,r2
+ cmp/ge r2,r3
+ shll2 DBLRL
+ bt/s 0f
+ add DBL0L,r12 ! fraction in r12:DBLRL ; u1.63
+ negc DBLRL,DBLRL ! T := DBLRL != 0
+ add #31,r3
+ mov r12,DBLRL
+ rotcl DBLRL ! put in sticky bit
+ movt r12
+ cmp/ge r2,r3
+ bt/s LOCAL(return_0_late)
+0: div0s DBL1H,DBLRH ! calculate sign
+ mov r12,DBLRH
+ shld r3,DBLRH
+ mov DBLRL,r2
+ shld r3,DBLRL
+ add #32,r3
+ add DBLRH,DBLRH
+ mov.l LOCAL(x80000000),DBL1H
+ shld r3,r12
+ rotcr DBLRH ! combine sign with highpart
+ add #-1,r3
+ shld r3,r2
+ mov #0,r3
+ rotl r2
+ cmp/hi DBL1H,r2
+ addc r12,DBLRL
+ mov.l @r15+,r12
+ rts
+ addc r3,DBLRH
+
+LOCAL(ret_inf_late):
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ mov DBLRH,DBL0H
+ mov.l @r15+,r9
+ bra LOCAL(return_inf)
+ mov.l @r15+,r8
+
+LOCAL(return_0_late):
+ div0s DBLRH,DBL1H
+ mov.l @r15+,r12
+ mov #0,DBLRH
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #21,r9
+ xtrct r0,r8
+ add #-16,r9
+0: tst r12,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #-8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(d1): .word 1
+LOCAL(d12): .word 12
+LOCAL(d13): .word 13
+LOCAL(d120): .word 120
+
+ .balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+ .byte 120, 105, 91, 78, 66, 54, 43, 33
+ .byte 24, 15, 8, 0, -5, -12, -17, -22
+ .byte -27, -31, -34, -37, -40, -42, -44, -45
+ .byte -46, -46, -47, -46, -46, -45, -44, -42
+ .byte -41, -39, -36, -34, -31, -28, -24, -20
+ .byte -17, -12, -8, -4, 0, 5, 10, 16
+ .byte 21, 27, 33, 39, 45, 52, 58, 65
+ .byte 72, 79, 86, 93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssisf.S (revision 0)
@@ -0,0 +1,89 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsisf))
+ .global GLOBAL(floatunsisf)
+ .balign 4
+GLOBAL(floatunsisf):
+ mov.l LOCAL(c_clz_tab),r0
+ extu.w r4,r1
+ mov.w LOCAL(xff00),r3
+ cmp/eq r4,r1
+ mov #24,r2
+ bt 0f
+ mov r4,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r1
+ mov r4,r0
+ mov.l LOCAL(x4a800000),r3 ! bias + 23 - implicit 1
+ tst r4,r4
+ bt LOCAL(ret0)
+ !
+ sub r1,r2
+ mov.l LOCAL(x80000000),r1
+ shld r2,r0
+ cmp/pz r2
+ add r3,r0
+ bt LOCAL(noround)
+ add #31,r2
+ shld r2,r4
+ rotl r4
+ add #-31,r2
+ cmp/hi r1,r4
+ mov #0,r3
+ addc r3,r0
+LOCAL(noround):
+ mov #23,r1
+ shld r1,r2
+ rts
+ sub r2,r0
+LOCAL(ret0):
+ rts
+ nop
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsisf))
Index: gcc/config/sh/IEEE-754/m3/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssidf.S (revision 0)
@@ -0,0 +1,91 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! floatunssidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsidf))
+ .global GLOBAL(floatunsidf)
+ .balign 4
+GLOBAL(floatunsidf):
+ mov.l LOCAL(c_clz_tab),r0
+ extu.w r4,r1
+ mov.w LOCAL(0xff00),r3
+ cmp/eq r4,r1
+ mov #21,r2
+ bt 0f
+ mov r4,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r5
+ mov r4,DBLRL
+ mov.l LOCAL(x41200000),r3 ! bias + 20 - implicit 1
+ tst r4,r4
+ mov r4,DBLRH
+ bt LOCAL(ret0)
+ sub r5,r2
+ mov r2,r5
+ shld r2,DBLRH
+ cmp/pz r2
+ add r3,DBLRH
+ add #32,r2
+ shld r2,DBLRL
+ bf 0f
+ mov.w LOCAL(d0),DBLRL
+0: mov #20,r2
+ shld r2,r5
+ rts
+ sub r5,DBLRH
+LOCAL(ret0):
+ mov r4,DBLRL
+ rts
+ mov r4,DBLRH
+
+LOCAL(0xff00): .word 0xff00
+ .balign 4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0): .word 0
+ .word 0x4120
+#else
+ .word 0x4120
+LOCAL(d0): .word 0
+#endif
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsidf))
Index: gcc/config/sh/IEEE-754/m3/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixunsdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixunsdfsi.S (revision 0)
@@ -0,0 +1,77 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!! fixunsdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixunsdfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get INT_MAX, for set sign bit, you get INT_MIN.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixunsdfsi)
+ FUNC(GLOBAL(fixunsdfsi))
+ .balign 4
+GLOBAL(fixunsdfsi):
+ mov.w LOCAL(x413),r1 ! bias + 20
+ mov DBL0H,r0
+ shll DBL0H
+ mov.l LOCAL(mask),r3
+ mov #-21,r2
+ shld r2,DBL0H ! SH4-200 will start this insn in a new cycle
+ bt/s LOCAL(ret0)
+ sub r1,DBL0H
+ cmp/pl DBL0H ! SH4-200 will start this insn in a new cycle
+ and r3,r0
+ bf/s LOCAL(ignore_low)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #11,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmax)
+ shld DBL0H,DBL0L
+ rts
+ or DBL0L,r0
+
+ .balign 8
+LOCAL(ignore_low):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ add #1,r0
+ bf 0f
+LOCAL(ret0): mov #0,r0 ! results in 0 return
+0: rts
+ shld DBL0H,r0
+
+LOCAL(retmax):
+ rts
+ mov #-1,r0
+
+LOCAL(x413): .word 0x413
+
+ .balign 4
+LOCAL(mask): .long 0x000fffff
+ ENDFUNC(GLOBAL(fixunsdfsi))
+#endif /* L_fixunsdfsi */
Index: gcc/config/sh/IEEE-754/m3/divdf3-rt.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3-rt.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3-rt.S (revision 0)
@@ -0,0 +1,514 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* This version is not quite finshed, since I've found that I can
+ get better average performance with a slightly altered algorithm.
+ Still, if you want a version for hard real time, this version here might
+ be a good starting point, since it has effectively no conditional
+ branches in the path that deals with normal numbers
+ (branches with zero offset are effectively conditional execution),
+ and thus it has a uniform execution time in this path. */
+
+/* y = 1/x ; x (- [1,2)
+ y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+ y1 = y0 - ((y0) * x - 1) * y0 = y-x*d^2
+ y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+ z0 = y2*a ; a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+ z1 = y2*a1 (round to nearest odd 0.5 ulp);
+ a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+ z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+ Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+ with suitable scaling and/or top truncation.
+ x truncated to 20 bits is sufficient to calculate y0 or even y1.
+ Table entries are adjusted by about +128 to use full signed byte range.
+ This adjustment has been perturbed slightly to allow cse with the
+ shift count constant -26.
+ The threshold point for the shift adjust before rounding is found by
+ comparing the fractions, which is exact, unlike the top bit of y2.
+ Therefore, the top bit of y2 becomes slightly random after the adjustment
+ shift, but that's OK because this can happen only at the boundaries of
+ the interval, and the baising of the error means that it can in fact happen
+ only at the bottom end. And there, the carry propagation will make sure
+ that in the end we will have in effect an implicit 1 (or two whem rounding
+ up...) */
+/* If an exact result exists, it can have no more bits than the divident.
+ Hence, we don't need to bother with the round-to-even tie breaker
+ unless the result is denormalized. */
+/* 70 cycles through main path for sh4-300 . Some cycles might be
+ saved by more careful register allocation.
+ 122 cycles for sh4-200. If execution time for sh4-200 is of concern,
+ a specially scheduled version makes sense. */
+
+#define x_h r12
+#define yn r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too. We still have to come back to denorm_arg1_done,
+ since we heven't done any of the work yet that we do till the denorm_arg0
+ entry point. We know that neither of the arguments is inf/nan, but
+ arg0 might be zero. Check for that first to avoid having to establish an
+ rts return address. */
+LOCAL(both_denorm):
+ mov.l r9,@-r15
+ mov DBL0H,r1
+ mov.l r0,@-r15
+ shll2 r1
+ mov.w LOCAL(both_denorm_cleanup_off),r9
+ or DBL0L,r1
+ tst r1,r1
+ mov DBL0H,r0
+ bf/s LOCAL(zero_denorm_arg0_1)
+ shll2 r0
+ mov.l @(4,r15),r9
+ add #8,r15
+ bra LOCAL(ret_inf_nan_0)
+ mov r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+ mov.l @r15+,r0
+ !
+ mov.l @r15+,r9
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ bra LOCAL(denorm_arg1_done)
+ !
+ add r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+ (implicit 1). To leave the result exponent unaltered, the other
+ argument's exponent is adjusted by the the shift count. */
+
+ .balign 4
+LOCAL(arg0_tiny):
+ bsr LOCAL(clz)
+ mov DBL0L,r0
+ shll DBL0H
+ add #1,r0
+ mov DBL0L,DBL0H
+ shld r0,DBL0H
+ rotcr DBL0H
+ tst DBL0L,DBL0L /* Check for divide of zero. */
+ add #-33,r0
+ shld r0,DBL0L
+ bf/s LOCAL(adjust_arg1_exp)
+ add #64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign. */
+ mov.l @r15+,r10
+ mov #0,DBLRH
+ mov.l @r15+,r9
+ bra LOCAL(ret_inf_nan_0)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(arg1_tiny):
+ bsr LOCAL(clz)
+ mov DBL1L,r0
+ shll DBL1H
+ add #1,r0
+ mov DBL1L,DBL1H
+ shld r0,DBL1H
+ rotcr DBL1H
+ tst DBL1L,DBL1L /* Check for divide by zero. */
+ add #-33,r0
+ shld r0,DBL1L
+ bf/s LOCAL(adjust_arg0_exp)
+ add #64,r0
+ mov DBL0H,r0
+ add r0,r0
+ tst r0,r0 ! 0 / 0 ?
+ mov #-1,DBLRH
+ bf LOCAL(return_inf)
+ !
+ bt LOCAL(ret_inf_nan_0)
+ !
+
+ .balign 4
+LOCAL(zero_denorm_arg1):
+ not DBL0H,r3
+ mov DBL1H,r0
+ tst r2,r3
+ shll2 r0
+ bt LOCAL(early_inf_nan_arg0)
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg1_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ !
+ shll DBL1H
+ mov DBL1L,r3
+ shld r0,DBL1H
+ shld r0,DBL1L
+ rotcr DBL1H
+ add #-32,r0
+ shld r0,r3
+ add #32,r0
+ or r3,DBL1H
+LOCAL(adjust_arg0_exp):
+ tst r2,DBL0H
+ mov #20,r3
+ shld r3,r0
+ bt LOCAL(both_denorm)
+ add DBL0H,r0
+ div0s r0,DBL0H ! Check for obvious overflow. */
+ not r0,r3 ! Check for more subtle overflow - lest
+ bt LOCAL(return_inf)
+ mov r0,DBL0H
+ tst r2,r3 ! we mistake it for NaN later
+ mov #12,r3
+ bf LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign. */
+ mov #20,r3
+ mov #-2,DBLRH
+ bra LOCAL(ret_inf_nan_0)
+ shad r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan nan/x -> nan */
+LOCAL(inf_nan_arg0):
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+LOCAL(early_inf_nan_arg0):
+ not DBL1H,r3
+ mov DBL0H,DBLRH
+ tst r2,r3 ! both inf/nan?
+ add DBLRH,DBLRH
+ bf LOCAL(ret_inf_nan_0)
+ mov #-1,DBLRH
+LOCAL(ret_inf_nan_0):
+ mov #0,DBLRL
+ mov.l @r15+,r12
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+/* Already handled: inf/x, nan/x . Thus: x/inf -> 0; x/nan -> nan */
+ .balign 4
+LOCAL(inf_nan_arg1):
+ mov DBL1H,r2
+ mov #12,r1
+ shld r1,r2
+ mov.l @r15+,r10
+ mov #0,DBLRL
+ mov.l @r15+,r9
+ or DBL1L,r2
+ mov.l @r15+,r8
+ cmp/hi DBLRL,r2
+ mov.l @r15+,r12
+ subc DBLRH,DBLRH
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+ .balign 4
+LOCAL(zero_denorm_arg0):
+ mov.w LOCAL(denorm_arg0_done_off),r9
+ not DBL1H,r1
+ mov DBL0H,r0
+ tst r2,r1
+ shll2 r0
+ bt LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+ tst r0,r0
+ mov.w LOCAL(xff00),r12
+ bt/s LOCAL(arg0_tiny)
+ sts.l pr,@-r15
+ bsr LOCAL(clz)
+ shlr2 r0
+ shll DBL0H
+ mov DBL0L,r12
+ shld r0,DBL0H
+ shld r0,DBL0L
+ rotcr DBL0H
+ add #-32,r0
+ shld r0,r12
+ add #32,r0
+ or r12,DBL0H
+LOCAL(adjust_arg1_exp):
+ mov #20,r12
+ shld r12,r0
+ add DBL1H,r0
+ div0s r0,DBL1H ! Check for obvious underflow. */
+ not r0,r12 ! Check for more subtle underflow - lest
+ bt LOCAL(return_0)
+ mov r0,DBL1H
+ tst r2,r12 ! we mistake it for NaN later
+ bt LOCAL(return_0)
+ !
+ braf r9
+ mov #13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00): .word 0xff00
+LOCAL(denorm_arg0_done_off):
+ .word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+ .word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign 8
+GLOBAL(divdf3):
+ mov.l LOCAL(x7ff00000),r2
+ mov #12,r3
+ mov.l LOCAL(xfffe2006),r1 ! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst r2,DBL1H
+ mov.l r12,@-r15
+ bt LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov DBL1H,x_h ! x_h live in r12
+ shld r3,x_h ! x - 1 ; u0.20
+ mov x_h,yn
+ mova LOCAL(ytab),r0
+ mov.l r8,@-r15
+ shld r1,yn ! x-1 ; u26.6
+ mov.b @(r0,yn),yn
+ mov #6,r0
+ mov.l r9,@-r15
+ mov x_h,r8
+ mov.l r10,@-r15
+ shlr16 x_h ! x - 1; u16.16 ! x/2 - 0.5 ; u15.17
+ add x_h,r1 ! SH4-200 single-issues this insn
+ shld r0,yn
+ sub r1,yn ! yn := y0 ; u15.17
+ mov DBL1L,r1
+ mov #-20,r10
+ mul.l yn,x_h ! r12 dead
+ swap.w yn,r9
+ shld r10,r1
+ sts macl,r0 ! y0 * (x-1) - n ; u-1.32
+ add r9,r0 ! y0 * x - 1 ; s-1.32
+ tst r2,DBL0H
+ dmuls.l r0,yn
+ mov.w LOCAL(d13),r0
+ or r1,r8 ! x - 1; u0.32
+ add yn,yn ! yn = y0 ; u14.18
+ bt LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done): ! This label must stay aligned.
+ sts mach,r1 ! d0 ; s14.18
+ sub r1,yn ! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov DBL0L,r12
+ shld r0,yn ! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld r10,r12
+ mov yn,r0
+ mov DBL0H,r8
+ add yn,yn ! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts mach,r1 ! y1 * (x-1); u1.31
+ add r0,r1 ! y1 * x ; u1.31
+ dmulu.l yn,r1
+ not DBL0H,r10
+ shld r9,r8
+ tst r2,r10
+ or r8,r12 ! a - 1; u0.32
+ bt LOCAL(inf_nan_arg0)
+ sts mach,r1 ! d1+yn; u1.31
+ sett ! adjust y2 so that it can be interpreted as s1.31
+ not DBL1H,r10
+ subc r1,yn ! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst r2,r10
+ or DBL1H,r2
+ bt LOCAL(inf_nan_arg1)
+ mov.l r11,@-r15
+ sts mach,r11 ! y2*(a-1) ; u1.31
+ add yn,r11 ! z0 ; u1.31
+ dmulu.l r11,DBL1L
+ mov.l LOCAL(x40000000),DBLRH ! bias + 1
+ and r9,r2 ! x ; u12.20
+ cmp/hi DBL0L,DBL1L
+ sts macl,r8
+ mov #-24,r12
+ sts mach,r9 ! r9:r8 := z0 * DBL1L; u-19.64
+ subc DBL1H,DBLRH
+ mul.l r11,r2 ! (r9+macl):r8 == z0*x; u-19.64
+ shll r8
+ add DBL0H,DBLRH ! result sign/exponent + 1
+ mov r8,r10
+ sts macl,DBLRL
+ add DBLRL,r9
+ rotcl r9 ! r9:r8 := z*x; u-20.63
+ shld r12,r10
+ mov.l LOCAL(x7fe00000),DBLRL
+ sub DBL0L,r9 ! r9:r8 := -a ; u-20.63
+ mov.l LOCAL(x00200000),r12
+FIXME: the following shift might loose the sign.
+ shll8 r9
+ or r10,r9 ! -a1 ; s-28.32
+ mov.l LOCAL(x00100000),r10
+ dmuls.l r9,yn ! r3 dead
+ mov DBL1H,r3
+ mov.l LOCAL(xfff00000),DBL0L
+ xor DBL0H,r3 ! calculate expected sign & bit20
+ div0s r3,DBLRH
+ xor DBLRH,r3
+ bt LOCAL(ret_denorm_inf)
+ tst DBLRL,DBLRH
+ bt LOCAL(ret_denorm)
+ sub r12,DBLRH ! calculate sign / exponent minus implicit 1
+ tst r10,r3 ! set T if a >= x
+ sts mach,r12! -z1 ; s-27.32
+ bt 0f
+ add r11,r11 ! z0 ; u1.31 / u0.31
+0: mov #6,r3
+ negc r3,r10 ! shift count := a >= x ? -7 : -6; T := 1
+ shll8 r8 ! r9:r8 := -a1 ; s-28.64
+ shad r10,r12 ! -z1 ; truncate to s-20.32 / s-21.32
+ rotcl r12 ! -z1 ; s-21.32 / s-22.32 / round to odd 0.5 ulp ; T := sign
+ add #20,r10
+ dmulu.l r12,DBL1L ! r12 signed, DBL1L unsigned
+ and DBL0L,DBLRH ! isolate sign / exponent
+ shld r10,r9
+ mov r8,r3
+ shld r10,r8
+ sts macl,DBL0L
+ sts mach,DBLRL
+ add #-32,r10
+ shld r10,r3
+ mul.l r12,r2
+ bf 0f ! adjustment for signed/unsigned multiply
+ sub DBL1L,DBLRL ! DBL1L dead
+0: shar r12 ! -z1 ; truncate to s-20.32 / s-21.32
+ sts macl,DBL1L
+ or r3,r9 ! r9:r8 := -a1 ; s-41.64/s-42.64
+ !
+ cmp/hi r8,DBL0L
+ add DBLRL,DBL1L ! DBL1L:DBL0L := -z1*x ; s-41.64/s-42.64
+ subc DBL1L,r9
+ not r12,DBLRL ! z1, truncated to s-20.32 / s-21.32
+ shll r9 ! T := a2 > 0
+ mov r11,r2
+ mov #21,r7
+ shld r7,r11
+ addc r11,DBLRL
+ mov.l @r15+,r11
+ mov.l @r15+,r10
+ mov #-11,r7
+ mov.l @r15+,r9
+ shld r7,r2
+ mov.l @r15+,r8
+ addc r2,DBLRH
+ rts
+ mov.l @r15+,r12
+
+LOCAL(ret_denorm):
+ tst r10,DBLRH
+ bra LOCAL(denorm_have_count)
+ movt DBLRH ! calculate shift count (off by 2)
+
+LOCAL(ret_denorm_inf):
+ mov DBLRH,r12
+ add r12,r12
+ cmp/pz r12
+ mov #-21,DBLRL
+ bt LOCAL(ret_inf_late)
+ shld DBLRL,DBLRH
+LOCAL(denorm_have_count):
+ add #-2,DBLRH
+/* FIXME */
+ bra LOCAL(return_0)
+ mov.l @r15+,r11
+
+LOCAL(ret_inf_late):
+ mov.l @r15+,r11
+ !
+ mov.l @r15+,r10
+ !
+ mov.l @r15+,r9
+ bra LOCAL(return_inf)
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(clz):
+ mov.l r8,@-r15
+ extu.w r0,r8
+ mov.l r9,@-r15
+ cmp/eq r0,r8
+ bt/s 0f
+ mov #8-11,r9
+ xtrct r0,r8
+ add #16,r9
+0: tst r12,r8 ! 0xff00
+ mov.l LOCAL(c_clz_tab),r0
+ bt 0f
+ shlr8 r8
+0: bt 0f
+ add #8,r9
+0:
+#ifdef __PIC__
+ add r0,r8
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r8),r8
+ mov r9,r0
+ mov.l @r15+,r9
+ !
+ !
+ !
+ sub r8,r0
+ mov.l @r15+,r8
+ rts
+ lds.l @r15+,pr
+
+! We encode even some words as pc-relative that would fit as immediate
+! in the instruction in order to avoid some pipeline stalls on
+! SH4-100 / SH4-200.
+LOCAL(d1): .word 1
+LOCAL(d12): .word 12
+LOCAL(d13): .word 13
+
+ .balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+ .byte 120, 105, 91, 78, 66, 54, 43, 33
+ .byte 24, 15, 8, 0, -5, -12, -17, -22
+ .byte -27, -31, -34, -37, -40, -42, -44, -45
+ .byte -46, -46, -47, -46, -46, -45, -44, -42
+ .byte -41, -39, -36, -34, -31, -28, -24, -20
+ .byte -17, -12, -8, -4, 0, 5, 10, 16
+ .byte 21, 27, 33, 39, 45, 52, 58, 65
+ .byte 72, 79, 86, 93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/addsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/addsf3.S (revision 0)
@@ -0,0 +1,285 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! addsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+#ifdef L_add_sub_sf3
+ .balign 4
+ .global GLOBAL(subsf3)
+ FUNC(GLOBAL(subsf3))
+ .global GLOBAL(addsf3)
+ FUNC(GLOBAL(addsf3))
+GLOBAL(subsf3):
+ cmp/pz r5
+ add r5,r5
+ rotcr r5
+ .balign 4
+GLOBAL(addsf3):
+ mov.l LOCAL(x7f800000),r3
+ mov r4,r6
+ add r6,r6
+ mov r5,r7
+ add r7,r7
+ mov r4,r0
+ or r3,r0
+ cmp/hi r6,r7
+ mov r5,r1
+ bf/s LOCAL(r4_hs)
+ or r3,r1
+ cmp/eq r5,r1
+ bt LOCAL(ret_r5) /* sole Inf or NaN, return unchanged. */
+ shll8 r0 ! r4 fraction
+ shll8 r1 ! r5 fraction
+ mov r6,r3
+ mov #-24,r2
+ mov r7,r6
+ shld r2,r6 ! r5 exp
+ mov r0,r7
+ shld r2,r3 ! r4 exp
+ tst r6,r6
+ sub r6,r3 ! exp difference (negative or 0)
+ bt LOCAL(denorm_r4)
+LOCAL(denorm_r4_done): ! r1: u1.31
+ shld r3,r0 ! Get 31 upper bits, including 8 guard bits
+ mov.l LOCAL(xff000000),r2
+ add #31,r3
+ mov.l r5,@-r15 ! push result sign.
+ cmp/pl r3 ! r0 has no more than one bit set -> return arg 1
+ shld r3,r7 ! copy of lowest guard bit in r0 and lower guard bits
+ bf LOCAL(ret_stack)
+ div0s r4,r5
+ bf/s LOCAL(add)
+ cmp/pl r7 /* Is LSB in r0 clear, but any lower guards bit set? */
+ subc r0,r1
+ mov.l LOCAL(c__clz_tab),r7
+ tst r2,r1
+ mov #-24,r3
+ bf/s LOCAL(norm_r0)
+ mov r1,r0
+ extu.w r1,r1
+ bra LOCAL(norm_check2)
+ cmp/eq r0,r1
+LOCAL(ret_r5):
+ rts
+ mov r5,r0
+LOCAL(ret_stack):
+ rts
+ mov.l @r15+,r0
+
+/* We leave the numbers denormalized, but we change the bit position to be
+ consistent with normalized numbers. This also removes the spurious
+ leading one that was inserted before. */
+LOCAL(denorm_r4):
+ tst r3,r3
+ bf/s LOCAL(denorm_r4_done)
+ add r0,r0
+ bra LOCAL(denorm_r4_done)
+ add r1,r1
+LOCAL(denorm_r5):
+ tst r6,r6
+ add r1,r1
+ bf LOCAL(denorm_r5_done)
+ clrt
+ bra LOCAL(denorm_r5_done)
+ add r0,r0
+
+/* If the exponent differs by two or more, normalization is minimal, and
+ few guard bits are needed for an exact final result, so sticky guard
+ bit compresion before subtraction (or add) works fine.
+ If the exponent differs by one, only one extra guard bit is generated,
+ and effectively no guard bit compression takes place. */
+
+ .balign 4
+LOCAL(r4_hs):
+ cmp/eq r4,r0
+ mov #-24,r3
+ bt LOCAL(inf_nan_arg0)
+ shld r3,r7
+ shll8 r0
+ tst r7,r7
+ shll8 r1
+ mov.l LOCAL(xff000000),r2
+ bt/s LOCAL(denorm_r5)
+ shld r3,r6
+LOCAL(denorm_r5_done):
+ mov r1,r3
+ subc r6,r7
+ bf LOCAL(same_exp)
+ shld r7,r1 /* Get 31 upper bits. */
+ add #31,r7
+ mov.l r4,@-r15 ! push result sign.
+ cmp/pl r7
+ shld r7,r3
+ bf LOCAL(ret_stack)
+ div0s r4,r5
+ bf/s LOCAL(add)
+ cmp/pl r3 /* Is LSB in r1 clear, but any lower guard bit set? */
+ subc r1,r0
+ mov.l LOCAL(c__clz_tab),r7
+LOCAL(norm_check):
+ tst r2,r0
+ mov #-24,r3
+ bf LOCAL(norm_r0)
+ extu.w r0,r1
+ cmp/eq r0,r1
+LOCAL(norm_check2):
+ mov #-8,r3
+ bt LOCAL(norm_r0)
+ mov #-16,r3
+LOCAL(norm_r0):
+ mov r0,r1
+ shld r3,r0
+#ifdef __pic__
+ add r0,r7
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r7),r7
+ add #25,r3
+ add #-9+1,r6
+ mov r1,r0
+ sub r7,r3
+ mov.l LOCAL(xbfffffff),r7
+ sub r3,r6 /* generate exp-1 */
+ mov.w LOCAL(d24),r2
+ cmp/pz r6 /* check exp > 0 */
+ shld r3,r0 /* Leading 1 becomes +1 exp adjustment. */
+ bf LOCAL(zero_denorm)
+LOCAL(denorm_done):
+ add #30,r3
+ shld r3,r1
+ mov.w LOCAL(m1),r3
+ tst r7,r1 ! clear T if rounding up
+ shld r2,r6
+ subc r3,r0 ! round - overflow will boost exp adjustment to 2.
+ mov.l @r15+,r2
+ add r6,r0 ! overflow will generate inf
+ cmp/ge r2,r3 ! get sign into T
+ rts
+ rotcr r0
+LOCAL(ret_r4):
+ rts
+ mov r4,r0
+
+/* At worst, we are shifting the number back in place where an incoming
+ denormal was. Thus, the shifts won't get out of range. They still
+ might generate a zero fraction, but that's OK, that makes it 0. */
+LOCAL(zero_denorm):
+ add r6,r3
+ mov r1,r0
+ mov #0,r6 /* leading one will become free (except for rounding) */
+ bra LOCAL(denorm_done)
+ shld r3,r0
+
+/* Handle abs(r4) >= abs(r5), same exponents specially so we don't need
+ check for a zero fraction in the main path. */
+LOCAL(same_exp):
+ div0s r4,r5
+ mov.l r4,@-r15
+ bf LOCAL(add)
+ cmp/eq r1,r0
+ mov.l LOCAL(c__clz_tab),r7
+ bf/s LOCAL(norm_check)
+ sub r1,r0
+ rts ! zero difference -> return +zero
+ mov.l @r15+,r1
+
+/* r2: 0xff000000 */
+LOCAL(add):
+ addc r1,r0
+ mov.w LOCAL(x2ff),r7
+ shll8 r6
+ bf/s LOCAL(no_carry)
+ shll16 r6
+ tst r7,r0
+ shlr8 r0
+ mov.l @r15+,r3 ! discard saved sign
+ subc r2,r0
+ sett
+ addc r6,r0
+ cmp/hs r2,r0
+ bt/s LOCAL(inf)
+ div0s r7,r4 /* Copy sign. */
+ rts
+ rotcr r0
+LOCAL(inf):
+ mov r6,r0
+ rts
+ rotcr r0
+LOCAL(no_carry):
+ mov.w LOCAL(m1),r3
+ tst r6,r6
+ bt LOCAL(denorm_add)
+ add r0,r0
+ tst r7,r0 ! check if lower guard bit set or round to even
+ shlr8 r0
+ mov.l @r15+,r1 ! discard saved sign
+ subc r3,r0 ! round ; overflow -> exp++
+ cmp/ge r4,r3 /* Copy sign. */
+ add r6,r0 ! overflow -> inf
+ rts
+ rotcr r0
+
+LOCAL(denorm_add):
+ cmp/ge r4,r3 /* Copy sign. */
+ shlr8 r0
+ mov.l @r15+,r1 ! discard saved sign
+ rts
+ rotcr r0
+
+LOCAL(inf_nan_arg0):
+ cmp/eq r5,r1
+ bf LOCAL(ret_r4)
+ div0s r4,r5 /* Both are inf or NaN, check signs. */
+ bt LOCAL(ret_nan) /* inf - inf, or NaN. */
+ mov r4,r0 ! same sign; return NaN if either is NaN.
+ rts
+ or r5,r0
+LOCAL(ret_nan):
+ rts
+ mov #-1,r0
+
+LOCAL(d24):
+ .word 24
+LOCAL(x2ff):
+ .word 0x2ff
+LOCAL(m1):
+ .word -1
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(xbfffffff):
+ .long 0xbfffffff
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(xfe000000):
+ .long 0xfe000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+
+ ENDFUNC(GLOBAL(addsf3))
+ ENDFUNC(GLOBAL(subsf3))
+#endif /* L_add_sub_sf3 */
Index: gcc/config/sh/IEEE-754/m3/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/adddf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/adddf3.S (revision 0)
@@ -0,0 +1,582 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! adddf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4-200 without FPU, but can also be used for SH3.
+! Numbers with same sign are added in typically 37 cycles, worst case is
+! 43 cycles, unless there is an overflow, in which case the addition can
+! take up to takes 47 cycles.
+! Normal numbers with different sign are added in 56 (57 for PIC) cycles
+! or less on SH4.
+! If one of the inputs is a denormal, the worst case is 59 (60 for PIC)
+! cycles. (Two denormal inputs are faster than normal inputs, and
+! denormal outputs don't slow down computation).
+! Subtraction takes two cycles to negate the second input and then drops
+! through to addition.
+
+/* If the input exponents of a difference of two normalized numbers
+ differ by more than one, the output does not need to be adjusted
+ by more than one bit position. Hence, it makes sense to ensure that
+ the shifts by 0 & 1 are handled quickly to reduce average and worst
+ case times. */
+FUNC(GLOBAL(adddf3))
+FUNC(GLOBAL(subdf3))
+ .global GLOBAL(adddf3)
+ .global GLOBAL(subdf3)
+LOCAL(denorm_arg1):
+ bt LOCAL(inf_nan_arg0)
+ tst r0,r2
+ bt/s LOCAL(denorm_both)
+ shlr r1
+ mov.l LOCAL(x00100000),r3
+ bra LOCAL(denorm_arg1_done)
+ sub r2,r3
+
+! Handle denorm addition here because otherwise the ordinary addition would
+! have to check for denormal results.
+! Denormal subtraction could also be done faster, but the denorm subtraction
+! path here is still one cycles faster than the one for normalized input
+! numbers, and 16 instructions shorter than the fastest version.
+! Here we also generate +0.0 + +0.0 -> +0.0 ; -0.0 + -0.0 -> -0.0
+LOCAL(denorm_both):
+ div0s DBL0H,DBL1H
+ mov.l LOCAL(x800fffff),r9
+ bt/s LOCAL(denorm_sub)
+ and r1,DBL1H
+ and r9,DBL0H
+ mov.l @r15+,r9
+ mov DBL0L,DBLRL
+ mov DBL0H,DBLRH
+ addc DBL1L,DBLRL
+ mov.l @r15+,r8
+ rts
+ addc DBL1H,DBLRH
+
+! N.B., since subtraction also generates +0.0 for subtraction of numbers
+! with identical fractions, this also covers the +0.0 + -0.0 -> +0.0 /
+! -0.0 + +0.0 -> +0.0 cases.
+LOCAL(denorm_sub):
+ mov DBL0H,r8 ! tentative result sign
+ and r1,DBL0H
+ bra LOCAL(sub_same_exp)
+ addc r1,r2 ! exponent++, clear T
+
+LOCAL(inf_nan_arg0):
+ mov DBL0L,DBLRL
+ bra LOCAL(pop_r8_r9)
+ mov DBL0H,DBLRH
+
+LOCAL(ret_arg0):
+ mov.l LOCAL(x800fffff),DBLRH
+ mov DBL0L,DBLRL
+ mov r2,r3
+LOCAL(ret_arg):
+ mov.l @r15+,r9
+ and r8,DBLRH
+ mov.l @r15+,r8
+ rts
+ or r3,DBLRH
+
+ .balign 4
+GLOBAL(subdf3):
+ cmp/pz DBL1H
+ add DBL1H,DBL1H
+ rotcr DBL1H
+ nop
+
+GLOBAL(adddf3):
+ mov.l LOCAL(x7ff00000),r0
+ mov DBL0H,r2
+ mov.l LOCAL(x001fffff),r1
+ mov DBL1H,r3
+ mov.l r8,@-r15
+ and r0,r2
+ mov.l r9,@-r15
+ and r0,r3
+ cmp/hi r2,r3
+ or r0,DBL0H
+ or r0,DBL1H
+ bt LOCAL(arg1_gt)
+ tst r0,r3
+ mov #-20,r9
+ bt/s LOCAL(denorm_arg1)
+ cmp/hs r0,r2
+ bt LOCAL(inf_nan_arg0)
+ sub r2,r3
+LOCAL(denorm_arg1_done): ! r2 is tentative result exponent
+ shad r9,r3
+ mov.w LOCAL(m32),r9
+ mov DBL0H,r8 ! tentative result sign
+ and r1,DBL0H ! arg0 fraction
+ mov DBL1H,r0 ! the 'other' sign
+ and r1,DBL1H ! arg1 fraction
+ cmp/ge r9,r3
+ mov DBL1H,r1
+ bf/s LOCAL(large_shift_arg1)
+ shld r3,DBL1H
+LOCAL(small_shift_arg1):
+ mov DBL1L,r9
+ shld r3,DBL1L
+ tst r3,r3
+ add #32,r3
+ bt/s LOCAL(same_exp)
+ div0s r8,r0 ! compare signs
+ shld r3,r1
+
+ or r1,DBL1L
+ bf/s LOCAL(add)
+ shld r3,r9
+ clrt
+ negc r9,r9
+ mov.l LOCAL(x001f0000),r3
+LOCAL(sub_high):
+ mov DBL0L,DBLRL
+ subc DBL1L,DBLRL
+ mov DBL0H,DBLRH
+ bra LOCAL(subtract_done)
+ subc DBL1H,DBLRH
+
+LOCAL(large_shift_arg1):
+ mov.w LOCAL(d0),r9
+ add #64,r3
+ cmp/pl r3
+ shld r3,r1
+ bf LOCAL(ret_arg0)
+ cmp/hi r9,DBL1L
+ mov DBL1H,DBL1L
+ mov r9,DBL1H
+ addc r1,r9
+
+ div0s r8,r0 ! compare signs
+
+ bf LOCAL(add)
+ clrt
+ mov.l LOCAL(x001f0000),r3
+ bra LOCAL(sub_high)
+ negc r9,r9
+
+LOCAL(add_clr_r9):
+ mov #0,r9
+LOCAL(add):
+ mov.l LOCAL(x00200000),r3
+ addc DBL1L,DBL0L
+ addc DBL1H,DBL0H
+ mov.l LOCAL(x80000000),r1
+ tst r3,DBL0H
+ mov.l LOCAL(x7fffffff),r3
+ mov DBL0L,r0
+ bt/s LOCAL(no_carry)
+ and r1,r8
+ tst r9,r9
+ bf LOCAL(add_one)
+ tst #2,r0
+LOCAL(add_one):
+ subc r9,r9
+ sett
+ mov r0,DBLRL
+ addc r9,DBLRL
+ mov DBL0H,DBLRH
+ addc r9,DBLRH
+ shlr DBLRH
+ mov.l LOCAL(x7ff00000),r3
+ add r2,DBLRH
+ mov.l @r15+,r9
+ rotcr DBLRL
+ cmp/hi r3,DBLRH
+LOCAL(add_done):
+ bt LOCAL(inf)
+LOCAL(or_sign):
+ or r8,DBLRH
+ rts
+ mov.l @r15+,r8
+
+LOCAL(inf):
+ bra LOCAL(or_sign)
+ mov r3,DBLRH
+
+LOCAL(pos_difference_0):
+ tst r3,DBL0H
+ mov DBL0L,DBLRL
+ mov.l LOCAL(x80000000),DBL0L
+ mov DBL0H,DBLRH
+ mov.l LOCAL(x00100000),DBL0H
+ bt/s LOCAL(long_norm)
+ and DBL0L,r8
+ bra LOCAL(norm_loop)
+ not DBL0L,r3
+
+LOCAL(same_exp):
+ bf LOCAL(add_clr_r9)
+ clrt
+LOCAL(sub_same_exp):
+ subc DBL1L,DBL0L
+ mov.l LOCAL(x001f0000),r3
+ subc DBL1H,DBL0H
+ mov.w LOCAL(d0),r9
+ bf LOCAL(pos_difference_0)
+ clrt
+ negc DBL0L,DBLRL
+ mov.l LOCAL(x80000000),DBL0L
+ negc DBL0H,DBLRH
+ mov.l LOCAL(x00100000),DBL0H
+ tst r3,DBLRH
+ not r8,r8
+ bt/s LOCAL(long_norm)
+ and DBL0L,r8
+ bra LOCAL(norm_loop)
+ not DBL0L,r3
+
+LOCAL(large_shift_arg0):
+ add #64,r2
+
+ mov #0,r9
+ cmp/pl r2
+ shld r2,r1
+ bf LOCAL(ret_arg1_exp_r3)
+ cmp/hi r9,DBL0L
+ mov DBL0H,DBL0L
+ mov r9,DBL0H
+ addc r1,r9
+ div0s r8,r0 ! compare signs
+ mov r3,r2 ! tentative result exponent
+ bf LOCAL(add)
+ clrt
+ negc r9,r9
+ bra LOCAL(subtract_arg0_arg1_done)
+ mov DBL1L,DBLRL
+
+LOCAL(arg1_gt):
+ tst r0,r2
+ mov #-20,r9
+ bt/s LOCAL(denorm_arg0)
+ cmp/hs r0,r3
+ bt LOCAL(inf_nan_arg1)
+ sub r3,r2
+LOCAL(denorm_arg0_done):
+ shad r9,r2
+ mov.w LOCAL(m32),r9
+ mov DBL1H,r8 ! tentative result sign
+ and r1,DBL1H
+ mov DBL0H,r0 ! the 'other' sign
+ and r1,DBL0H
+ cmp/ge r9,r2
+ mov DBL0H,r1
+ shld r2,DBL0H
+ bf LOCAL(large_shift_arg0)
+ mov DBL0L,r9
+ shld r2,DBL0L
+ add #32,r2
+ mov.l r3,@-r15
+ shld r2,r1
+ mov r2,r3
+ div0s r8,r0 ! compare signs
+ mov.l @r15+,r2 ! tentative result exponent
+ shld r3,r9
+ bf/s LOCAL(add)
+ or r1,DBL0L
+ clrt
+ negc r9,r9
+ mov DBL1L,DBLRL
+LOCAL(subtract_arg0_arg1_done):
+ subc DBL0L,DBLRL
+ mov DBL1H,DBLRH
+ mov.l LOCAL(x001f0000),r3
+ subc DBL0H,DBLRH
+/* Since the exponents were different, the difference is positive. */
+/* Fall through */
+LOCAL(subtract_done):
+/* First check if a shift by a few bits is sufficient. This not only
+ speeds up this case, but also alleviates the need for considering
+ lower bits from r9 or rounding in the other code.
+ Moreover, by handling the upper 1+4 bits of the fraction here, long_norm
+ can assume that DBLRH fits into 20 (20 < 16) bit. */
+ tst r3,DBLRH
+ mov.l LOCAL(x80000000),r3
+ mov.l LOCAL(x00100000),DBL0H
+ bt/s LOCAL(long_norm)
+ and r3,r8
+ mov.l LOCAL(x7fffffff),r3
+LOCAL(norm_loop): ! Well, this used to be a loop...
+ tst DBL0H,DBLRH
+ sub DBL0H,r2
+ bf LOCAL(norm_round)
+ shll r9
+ rotcl DBLRL
+
+ rotcl DBLRH
+
+ tst DBL0H,DBLRH
+ sub DBL0H,r2
+ bf LOCAL(norm_round)
+ shll DBLRL
+ rotcl DBLRH
+ mov.l @r15+,r9
+ cmp/gt r2,DBL0H
+ sub DBL0H,r2
+LOCAL(norm_loop_1):
+ bt LOCAL(denorm0_n)
+ tst DBL0H,DBLRH
+ bf LOCAL(norm_pack)
+ shll DBLRL
+ rotcl DBLRH ! clears T
+ bra LOCAL(norm_loop_1)
+ subc DBL0H,r2
+
+LOCAL(no_carry):
+ shlr r0
+ mov.l LOCAL(x000fffff),DBLRH
+ addc r3,r9
+ mov.w LOCAL(d0),DBL1H
+ mov DBL0L,DBLRL
+ and DBL0H,DBLRH ! mask out implicit 1
+ mov.l LOCAL(x7ff00000),r3
+ addc DBL1H,DBLRL
+ addc r2,DBLRH
+ mov.l @r15+,r9
+ add DBL1H,DBLRH ! fraction overflow -> exp increase
+ bra LOCAL(add_done)
+ cmp/hi r3,DBLRH
+
+LOCAL(denorm_arg0):
+ bt LOCAL(inf_nan_arg1)
+ mov.l LOCAL(x00100000),r2
+ shlr r1
+ bra LOCAL(denorm_arg0_done)
+ sub r3,r2
+
+LOCAL(inf_nan_arg1):
+ mov DBL1L,DBLRL
+ bra LOCAL(pop_r8_r9)
+ mov DBL1H,DBLRH
+
+LOCAL(ret_arg1_exp_r3):
+ mov.l LOCAL(x800fffff),DBLRH
+ bra LOCAL(ret_arg)
+ mov DBL1L,DBLRL
+
+#ifdef __pic__
+ .balign 8
+#endif
+LOCAL(m32):
+ .word -32
+LOCAL(d0):
+ .word 0
+#ifndef __pic__
+ .balign 8
+#endif
+! Because we had several bits of cancellations, we know that r9 contains
+! only one bit.
+! We'll normalize by shifting words so that DBLRH:DBLRL contains
+! the fraction with 0 < DBLRH <= 0x1fffff, then we shift DBLRH:DBLRL
+! up by 21 minus the number of non-zero bits in DBLRH.
+LOCAL(long_norm):
+ tst DBLRH,DBLRH
+ mov.w LOCAL(xff),DBL0L
+ mov #21,r3
+ bf LOCAL(long_norm_highset)
+ mov.l LOCAL(x02100000),DBL1L ! shift 32, implicit 1
+ tst DBLRL,DBLRL
+ extu.w DBLRL,DBL0H
+ bt LOCAL(zero_or_ulp)
+ mov DBLRL,DBLRH
+ cmp/hi DBL0H,DBLRL
+ bf 0f
+ mov.l LOCAL(x01100000),DBL1L ! shift 16, implicit 1
+ clrt
+ shlr16 DBLRH
+ xtrct DBLRL,r9
+ mov DBLRH,DBL0H
+LOCAL(long_norm_ulp_done):
+0: mov r9,DBLRL ! DBLRH:DBLRL == fraction; DBL0H == DBLRH
+ subc DBL1L,r2
+ bt LOCAL(denorm1_b)
+#ifdef __pic__
+ mov.l LOCAL(c__clz_tab),DBL1H
+LOCAL(long_norm_lookup):
+ mov r0,r9
+ mova LOCAL(c__clz_tab),r0
+ add DBL1H,r0
+#else
+ mov r0,r9
+LOCAL(long_norm_lookup):
+ mov.l LOCAL(c__clz_tab),r0
+#endif /* __pic__ */
+ cmp/hi DBL0L,DBL0H
+ bf 0f
+ shlr8 DBL0H
+0: mov.b @(r0,DBL0H),r0
+ bf 0f
+ add #-8,r3
+0: mov.w LOCAL(d20),DBL0L
+ mov #-20,DBL0H
+ clrt
+ sub r0,r3
+ mov r9,r0
+ mov r3,DBL1H
+ shld DBL0L,DBL1H
+ subc DBL1H,r2
+ !
+ bf LOCAL(no_denorm)
+ shad DBL0H,r2
+ bra LOCAL(denorm1_done)
+ add r2,r3
+
+LOCAL(norm_round):
+ cmp/pz r2
+ mov #0,DBL1H
+ bf LOCAL(denorm0_1)
+ or r8,r2
+ mov DBLRL,DBL1L
+ shlr DBL1L
+ addc r3,r9
+ mov.l @r15+,r9
+ addc DBL1H,DBLRL ! round to even
+ mov.l @r15+,r8
+ rts
+ addc r2,DBLRH
+
+LOCAL(norm_pack):
+ add r8,DBLRH
+ mov.l @r15+,r8
+ rts
+ add r2,DBLRH
+
+LOCAL(denorm0_1):
+ mov.l @r15+,r9
+ mov r8,DBL0L
+ mov.l @r15+,r8
+LOCAL(denorm0_shift):
+ shlr DBLRH
+ rotcr DBLRL
+
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(denorm0_n):
+ mov r8,DBL0L
+ addc DBL0H,r2
+ mov.l @r15+,r8
+ bf LOCAL(denorm0_shift)
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(no_denorm):
+ add r2,r8 ! add (exponent - 1) to sign
+
+LOCAL(denorm1_done):
+ shld r3,DBLRH
+ mov DBLRL,DBL0L
+ shld r3,DBLRL
+
+ add r8,DBLRH ! add in sign and (exponent - 1)
+ mov.l @r15+,r9
+ add #-32,r3
+ mov.l @r15+,r8
+ shld r3,DBL0L
+
+ rts
+ add DBL0L,DBLRH
+
+LOCAL(long_norm_highset):
+ mov.l LOCAL(x00200000),DBL1L ! shift 1, implicit 1
+ shll r9
+ rotcl DBLRL
+ mov DBLRH,DBL0H
+ rotcl DBLRH ! clears T
+#ifdef __pic__
+ mov.l LOCAL(c__clz_tab),DBL1H
+#else
+ mov r0,r9
+#endif /* __pic__ */
+ subc DBL1L,r2
+ add #-1,r3
+ bf LOCAL(long_norm_lookup)
+LOCAL(denorm1_a):
+ shlr DBLRH
+ rotcr DBLRL
+ mov.l @r15+,r9
+ or r8,DBLRH
+
+ rts
+ mov.l @r15+,r8
+
+ .balign 4
+LOCAL(denorm1_b):
+ mov #-20,DBL0L
+ shad DBL0L,r2
+ mov DBLRH,DBL0L
+ shld r2,DBLRH
+ shld r2,DBLRL
+ or r8,DBLRH
+ mov.l @r15+,r9
+ add #32,r2
+ mov.l @r15+,r8
+ shld r2,DBL0L
+ rts
+ or DBL0L,DBLRL
+
+LOCAL(zero_or_ulp):
+ tst r9,r9
+ bf LOCAL(long_norm_ulp_done)
+ ! return +0.0
+LOCAL(pop_r8_r9):
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+LOCAL(d20):
+ .word 20
+LOCAL(xff):
+ .word 0xff
+ .balign 4
+LOCAL(x7ff00000):
+ .long 0x7ff00000
+LOCAL(x001fffff):
+ .long 0x001fffff
+LOCAL(x80000000):
+ .long 0x80000000
+LOCAL(x000fffff):
+ .long 0x000fffff
+LOCAL(x800fffff):
+ .long 0x800fffff
+LOCAL(x001f0000):
+ .long 0x001f0000
+LOCAL(x00200000):
+ .long 0x00200000
+LOCAL(x7fffffff):
+ .long 0x7fffffff
+LOCAL(x00100000):
+ .long 0x00100000
+LOCAL(x02100000):
+ .long 0x02100000
+LOCAL(x01100000):
+ .long 0x01100000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(adddf3))
+ENDFUNC(GLOBAL(subdf3))
Index: gcc/config/sh/IEEE-754/m3/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/mulsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/mulsf3.S (revision 0)
@@ -0,0 +1,241 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! mulsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+ .balign 4
+ .global GLOBAL(mulsf3)
+ FUNC(GLOBAL(mulsf3))
+GLOBAL(mulsf3):
+ mov.l LOCAL(x7f800000),r1
+ not r4,r2
+ mov r4,r3
+ not r5,r0
+ tst r1,r2
+ or r1,r3
+ bt/s LOCAL(inf_nan_arg0)
+ tst r1,r0
+ bt LOCAL(inf_nan_arg1)
+ tst r1,r5
+ mov r1,r2
+ shll8 r3
+ or r5,r1
+ bt/s LOCAL(zero_denorm_arg1)
+ shll8 r1
+ tst r2,r4
+ bt LOCAL(zero_denorm_arg0)
+ dmulu.l r3,r1
+ mov r4,r0
+ and r2,r0
+LOCAL(arg_norm):
+ and r5,r2
+ mov.l LOCAL(x3f800000),r3
+ sts mach,r1
+ sub r3,r0
+ sts macl,r3
+ add r2,r0
+ cmp/pz r1
+ mov.w LOCAL(x100),r2
+ bf/s LOCAL(norm_frac)
+ tst r3,r3
+ shll2 r1 /* Shift one up, replace leading 1 with 0. */
+ shlr r1
+ tst r3,r3
+LOCAL(norm_frac):
+ mov.w LOCAL(mx80),r3
+ bf LOCAL(round_frac)
+ tst r2,r1
+LOCAL(round_frac):
+ mov.l LOCAL(xff000000),r2
+ subc r3,r1 /* Even overflow gives right result: exp++, frac=0. */
+ shlr8 r1
+ add r1,r0
+ shll r0
+ bt LOCAL(ill_exp)
+ tst r2,r0
+ bt LOCAL(denorm0)
+ cmp/hs r2,r0
+ bt LOCAL(inf)
+LOCAL(insert_sign):
+ div0s r4,r5
+ rts
+ rotcr r0
+LOCAL(denorm0):
+ sub r2,r0
+ bra LOCAL(insert_sign)
+ shlr r0
+LOCAL(zero_denorm_arg1):
+ mov.l LOCAL(x60000000),r2 /* Check exp0 >= -64 */
+ add r1,r1
+ tst r1,r1 /* arg1 == 0 ? */
+ mov #0,r0
+ bt LOCAL(insert_sign) /* argument 1 is zero ==> return 0 */
+ tst r4,r2
+ bt LOCAL(insert_sign) /* exp0 < -64 ==> return 0 */
+ mov.l LOCAL(c__clz_tab),r0
+ mov r3,r2
+ mov r1,r3
+ bra LOCAL(arg_normalize)
+ mov r2,r1
+LOCAL(zero_denorm_arg0):
+ mov.l LOCAL(x60000000),r2 /* Check exp1 >= -64 */
+ add r3,r3
+ tst r3,r3 /* arg0 == 0 ? */
+ mov #0,r0
+ bt LOCAL(insert_sign) /* argument 0 is zero ==> return 0 */
+ tst r5,r2
+ bt LOCAL(insert_sign) /* exp1 < -64 ==> return 0 */
+ mov.l LOCAL(c__clz_tab),r0
+LOCAL(arg_normalize):
+ mov.l r7,@-r15
+ extu.w r3,r7
+ cmp/eq r3,r7
+ mov.l LOCAL(xff000000),r7
+ mov #-8,r2
+ bt 0f
+ tst r7,r3
+ mov #-16,r2
+ bt 0f
+ mov #-24,r2
+0:
+ mov r3,r7
+ shld r2,r7
+#ifdef __pic__
+ add r0,r7
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r7),r0
+ add #32,r2
+ mov r2,r7
+ mov #23,r2
+ sub r0,r7
+ mov.l LOCAL(x7f800000),r0
+ shld r7,r3
+ shld r2,r7
+ mov r0,r2
+ and r4,r0
+ sub r7,r0
+ mov.l @r15+,r7
+ bra LOCAL(arg_norm)
+ dmulu.l r3,r1
+#if 0 /* This is slightly slower, but could be used if table lookup causes
+ cache thrashing. */
+ bt LOCAL(insert_sign) /* exp1 < -64 ==> return 0 */
+ mov.l LOCAL(xff000000),r2
+ mov r4,r0
+LOCAL(arg_normalize):
+ tst r2,r3
+ bf LOCAL(arg_bit_norm)
+LOCAL(arg_byte_loop):
+ tst r2,r3
+ add r2,r0
+ shll8 r3
+ bt LOCAL(arg_byte_loop)
+ add r4,r0
+LOCAL(arg_bit_norm):
+ mov.l LOCAL(x7f800000),r2
+ rotl r3
+LOCAL(arg_bit_loop):
+ add r2,r0
+ bf/s LOCAL(arg_bit_loop)
+ rotl r3
+ rotr r3
+ rotr r3
+ sub r2,r0
+ bra LOCAL(arg_norm)
+ dmulu.l r3,r1
+#endif /* 0 */
+LOCAL(inf):
+ bra LOCAL(insert_sign)
+ mov r2,r0
+LOCAL(inf_nan_arg0):
+ bt LOCAL(inf_nan_both)
+ add r0,r0
+ cmp/eq #-1,r0 /* arg1 zero? -> NAN */
+ bt LOCAL(insert_sign)
+ mov r4,r0
+LOCAL(inf_insert_sign):
+ bra LOCAL(insert_sign)
+ add r0,r0
+LOCAL(inf_nan_both):
+ mov r4,r0
+ bra LOCAL(inf_insert_sign)
+ or r5,r0
+LOCAL(inf_nan_arg1):
+ mov r2,r0
+ add r0,r0
+ cmp/eq #-1,r0 /* arg0 zero? */
+ bt LOCAL(insert_sign)
+ bra LOCAL(inf_insert_sign)
+ mov r5,r0
+LOCAL(ill_exp):
+ cmp/pz r0
+ mov #-24,r3
+ bt LOCAL(inf)
+ add r1,r1
+ mov r0,r2
+ sub r1,r2 ! remove fraction to get back pre-rounding exponent.
+ sts mach,r0
+ sts macl,r1
+ shad r3,r2
+ mov r0,r3
+ shld r2,r0
+ add #32,r2
+ cmp/pz r2
+ shld r2,r3
+ bf LOCAL(zero)
+ or r1,r3
+ mov #-1,r1
+ tst r3,r3
+ mov.w LOCAL(x100),r3
+ bf/s LOCAL(denorm_round_up)
+ mov #-0x80,r1
+ tst r3,r0
+LOCAL(denorm_round_up):
+ mov #-7,r3
+ subc r1,r0
+ bra LOCAL(insert_sign)
+ shld r3,r0
+LOCAL(zero):
+ bra LOCAL(insert_sign)
+ mov #0,r0
+LOCAL(x100):
+ .word 0x100
+LOCAL(mx80):
+ .word -0x80
+ .balign 4
+LOCAL(x7f800000):
+ .long 0x7f800000
+LOCAL(x3f800000):
+ .long 0x3f800000
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x60000000):
+ .long 0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ ENDFUNC(GLOBAL(mulsf3))
Index: gcc/config/sh/IEEE-754/m3/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsisf.S (revision 0)
@@ -0,0 +1,101 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsisf))
+ .global GLOBAL(floatsisf)
+ .balign 4
+GLOBAL(floatsisf):
+ cmp/pz r4
+ mov r4,r5
+ bt 0f
+ neg r4,r5
+0: mov.l LOCAL(c_clz_tab),r0
+ extu.w r5,r1
+ mov.w LOCAL(xff00),r3
+ cmp/eq r5,r1
+ mov #24,r2
+ bt 0f
+ mov r5,r1
+ shlr16 r1
+ add #-16,r2
+0: tst r3,r1 ! 0xff00
+ bt 0f
+ shlr8 r1
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r1
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r1),r1
+ cmp/pz r4
+ mov.l LOCAL(x4a800000),r3 ! bias + 23 - implicit 1
+ bt 0f
+ mov.l LOCAL(xca800000),r3 ! sign + bias + 23 - implicit 1
+0: mov r5,r0
+ sub r1,r2
+ mov.l LOCAL(x80000000),r1
+ shld r2,r0
+ cmp/pz r2
+ add r3,r0
+ bt LOCAL(noround)
+ add #31,r2
+ shld r2,r5
+ add #-31,r2
+ rotl r5
+ cmp/hi r1,r5
+ mov #0,r3
+ addc r3,r0
+ mov #23,r1
+ shld r1,r2
+ rts
+ sub r2,r0
+ .balign 8
+LOCAL(noround):
+ mov #23,r1
+ tst r4,r4
+ shld r1,r2
+ bt LOCAL(ret0)
+ rts
+ sub r2,r0
+LOCAL(ret0):
+ rts
+ mov #0,r0
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(xca800000): .long 0xca800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsisf))
Index: gcc/config/sh/IEEE-754/m3/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/muldf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/muldf3.S (revision 0)
@@ -0,0 +1,481 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! muldf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+! Normal numbers are multiplied in 53 or 54 cycles on SH4-200.
+
+FUNC(GLOBAL(muldf3))
+ .global GLOBAL(muldf3)
+LOCAL(inf_nan_denorm_or_zero_a):
+ mov.l r8,@-r15
+ sub r3,DBL0H ! isolate high fraction
+ mov.l @(4,r15),r8 ! original DBL0H (with sign & exp)
+ sub r3,r1 ! 0x7ff00000
+ mov.l LOCAL(x60000000),r3
+ shll16 r2 ! 0xffff0000
+ ! no stall here for sh4-200
+ !
+ tst r1,r8
+ mov.l r0,@-r15
+ bf LOCAL(inf_nan_a)
+ tst r1,r0 ! test for DBL1 inf, nan or small
+ bt LOCAL(ret_inf_nan_zero)
+LOCAL(normalize_arg):
+ tst DBL0H,DBL0H
+ bf LOCAL(normalize_arg53)
+ tst DBL0L,DBL0L
+ bt LOCAL(a_zero)
+ tst r2,DBL0L
+ mov DBL0L,DBL0H
+ bt LOCAL(normalize_arg16)
+ shlr16 DBL0H
+ mov.w LOCAL(m15),r2 ! 1-16
+ bra LOCAL(normalize_arg48)
+ shll16 DBL0L
+
+LOCAL(normalize_arg53):
+ tst r2,DBL0H
+ mov #1,r2
+ bt LOCAL(normalize_arg48)
+ mov DBL0H,r1
+ shlr16 r1
+ bra LOCAL(normalize_DBL0H)
+ mov #21-16,r3
+
+LOCAL(normalize_arg16):
+ mov.w LOCAL(m31),r2 ! 1-32
+ mov #0,DBL0L
+LOCAL(normalize_arg48):
+ mov DBL0H,r1
+ mov #21,r3
+LOCAL(normalize_DBL0H):
+ extu.b r1,r8
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r8,r1
+ !
+ bt 0f
+ shlr8 r1
+0:
+#ifdef __pic__
+ add r0,r1
+
+ mova LOCAL(c__clz_tab),r0
+
+#endif /* __pic__ */
+ mov.b @(r0,r1),r8
+ mov DBL0L,r1
+ mov.l @r15+,r0
+ bt 0f
+ add #-8,r3
+0: clrt
+ sub r8,r3
+ mov.w LOCAL(d20),r8
+ shld r3,DBL0H
+ shld r3,DBL0L
+ sub r3,r2
+ add #-32,r3
+ shld r3,r1
+ mov.l LOCAL(x00100000),r3
+ or r1,DBL0H
+ shld r8,r2
+ mov.l @r15+,r8
+ add r2,DBL1H
+ mov.l LOCAL(x001fffff),r2
+ dmulu.l DBL0L,DBL1L
+ bra LOCAL(arg_denorm_done)
+ or r3,r0 ! set implicit 1 bit
+
+LOCAL(a_zero):
+ mov.l @(4,r15),r8
+ add #8,r15
+LOCAL(zero):
+ mov #0,DBLRH
+ bra LOCAL(pop_ret)
+ mov #0,DBLRL
+
+! both inf / nan -> result is nan if at least one is none, else inf.
+! BBL0 inf/nan, DBL1 zero -> result is nan
+! DBL0 inf/nan, DBL1 finite -> result is DBL0 with sign adjustemnt
+LOCAL(inf_nan_a):
+ mov r8,DBLRH
+ mov.l @(4,r15),r8
+ add #8,r15
+ tst r1,r0 ! arg1 inf/nan ?
+ mov DBL0L,DBLRL
+ bt LOCAL(both_inf_nan)
+ tst DBL1L,DBL1L
+ mov DBL1H,r1
+ bf LOCAL(pop_ret)
+ add r1,r1
+ tst r1,r1
+ !
+ bf LOCAL(pop_ret)
+LOCAL(nan):
+ mov #-1,DBLRL
+ bra LOCAL(pop_ret)
+ mov #-1,DBLRH
+
+LOCAL(both_inf_nan):
+ or DBL1L,DBLRL
+ bra LOCAL(pop_ret)
+ or DBL1H,DBLRH
+
+LOCAL(ret_inf_nan_zero):
+ tst r1,r0
+ mov.l @(4,r15),r8
+ or DBL0L,DBL0H
+ bf/s LOCAL(zero)
+ add #8,r15
+ tst DBL0H,DBL0H
+ bt LOCAL(nan)
+LOCAL(inf_nan_b):
+ mov DBL1L,DBLRL
+ mov DBL1H,DBLRH
+LOCAL(pop_ret):
+ mov.l @r15+,DBL0H
+ add DBLRH,DBLRH
+
+
+ div0s DBL0H,DBL1H
+
+ rts
+ rotcr DBLRH
+
+ .balign 4
+/* Argument a has already been tested for being zero or denorm.
+ On the other side, we have to swap a and b so that we can share the
+ normalization code.
+ a: sign/exponent : @r15 fraction: DBL0H:DBL0L
+ b: sign/exponent: DBL1H fraction: r0:DBL1L */
+LOCAL(inf_nan_denorm_or_zero_b):
+ sub r3,r1 ! 0x7ff00000
+ mov.l @r15,r2 ! get original DBL0H
+ tst r1,DBL1H
+ sub r3,r0 ! isolate high fraction
+ bf LOCAL(inf_nan_b)
+ mov.l DBL1H,@r15
+ mov r0,DBL0H
+ mov.l r8,@-r15
+ mov r2,DBL1H
+ mov.l LOCAL(0xffff0000),r2
+ mov.l r1,@-r15
+ mov DBL1L,r1
+ mov DBL0L,DBL1L
+ bra LOCAL(normalize_arg)
+ mov r1,DBL0L
+
+LOCAL(d20):
+ .word 20
+LOCAL(m15):
+ .word -15
+LOCAL(m31):
+ .word -31
+LOCAL(xff):
+ .word 0xff
+
+ .balign 4
+LOCAL(0xffff0000): .word 0xffff0000
+
+ ! calculate a (DBL0H:DBL0L) * b (DBL1H:DBL1L)
+ .balign 4
+GLOBAL(muldf3):
+ mov.l LOCAL(xfff00000),r3
+ mov DBL1H,r0
+ dmulu.l DBL0L,DBL1L
+ mov.l LOCAL(x7fe00000),r1
+ sub r3,r0
+ mov.l DBL0H,@-r15
+ sub r3,DBL0H
+ tst r1,DBL0H
+ or r3,DBL0H
+ mov.l LOCAL(x001fffff),r2
+ bt LOCAL(inf_nan_denorm_or_zero_a)
+ tst r1,r0
+ or r3,r0 ! r0:DBL1L := b fraction ; u12.52
+ bt LOCAL(inf_nan_denorm_or_zero_b) ! T clear on fall-through
+LOCAL(arg_denorm_done):
+ and r2,r0 ! r0:DBL1L := b fraction ; u12.52
+ sts macl,r3
+ sts mach,r1
+ dmulu.l DBL0L,r0
+ and r2,DBL0H ! DBL0H:DBL0L := a fraction ; u12.52
+ mov.l r8,@-r15
+ mov #0,DBL0L
+ mov.l r9,@-r15
+ sts macl,r2
+ sts mach,r8
+ dmulu.l DBL0H,DBL1L
+ addc r1,r2
+
+ addc DBL0L,r8 ! add T; clears T
+
+ sts macl,r1
+ sts mach,DBL1L
+ dmulu.l DBL0H,r0
+ addc r1,r2
+ mov.l LOCAL(x7ff00000),DBL0H
+ addc DBL1L,r8 ! clears T
+ mov.l @(8,r15),DBL1L ! a sign/exp w/fraction
+ sts macl,DBLRL
+ sts mach,DBLRH
+ and DBL0H,DBL1L ! a exponent
+ mov.w LOCAL(x200),r9
+ addc r8,DBLRL
+ mov.l LOCAL(x3ff00000),r8 ! bias
+ addc DBL0L,DBLRH ! add T
+ cmp/hi DBL0L,r3 ! 32 guard bits -> sticky: T := r3 != 0
+ movt r3
+ tst r9,DBLRH ! T := fraction < 2
+ or r3,r2 ! DBLRH:DBLRL:r2 := result fraction; u24.72
+ bt/s LOCAL(shll12)
+ sub r8,DBL1L
+ mov.l LOCAL(x002fffff),r8
+ and DBL1H,DBL0H ! b exponent
+ mov.l LOCAL(x00100000),r9
+ add DBL0H,DBL1L ! result exponent - 1
+ tst r8,r2
+ mov.w LOCAL(m20),r8
+ subc DBL0L,r9
+ addc r2,r9 ! r2 value is still needed for denormal rounding
+ mov.w LOCAL(d11),DBL0L
+ rotcr r9
+ clrt
+ shld r8,r9
+ mov.w LOCAL(m21),r8
+ mov DBLRL,r3
+ shld DBL0L,DBLRL
+ addc r9,DBLRL
+ mov.l @r15+,r9
+ shld r8,r3
+ mov.l @r15+,r8
+ shld DBL0L,DBLRH
+ mov.l @r15+,DBL0H
+ addc r3,DBLRH
+ mov.l LOCAL(x7ff00000),DBL0L
+ add DBL1L,DBLRH ! implicit 1 adjusts exponent
+ mov.l LOCAL(xffe00000),r3
+ cmp/hs DBL0L,DBLRH
+ add DBLRH,DBLRH
+ bt LOCAL(ill_exp_11)
+ tst r3,DBLRH
+ bt LOCAL(denorm_exp0_11)
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+
+LOCAL(shll12):
+ mov.l LOCAL(x0017ffff),r8
+ extu.b DBLRH,DBLRH ! remove implicit 1.
+ mov.l LOCAL(x00080000),r9
+ and DBL1H,DBL0H ! b exponent
+ add DBL0H,DBL1L ! result exponent
+ tst r8,r2 ! rounding adjust for lower guard ...
+ mov.w LOCAL(m19),r8
+ subc DBL0L,r9 ! ... bits and round to even; clear T
+ addc r2,r9 ! r2 value is still needed for denormal rounding
+ mov.w LOCAL(d12),DBL0L
+ rotcr r9
+ clrt
+ shld r8,r9
+ mov.w LOCAL(m20),r8
+ mov DBLRL,r3
+ shld DBL0L,DBLRL
+ addc r9,DBLRL
+ mov.l @r15+,r9
+ shld r8,r3
+ mov.l @r15+,r8
+ shld DBL0L,DBLRH
+ mov.l LOCAL(x7ff00000),DBL0L
+ addc r3,DBLRH
+ mov.l @r15+,DBL0H
+ add DBL1L,DBLRH
+ mov.l LOCAL(xffe00000),r3
+ cmp/hs DBL0L,DBLRH
+ add DBLRH,DBLRH
+ bt LOCAL(ill_exp_12)
+ tst r3,DBLRH
+ bt LOCAL(denorm_exp0_12)
+LOCAL(insert_sign):
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+
+LOCAL(overflow):
+ mov r3,DBLRH
+ mov #0,DBLRL
+ bra LOCAL(insert_sign)
+ mov.l @r15+,r8
+
+LOCAL(denorm_exp0_11):
+ mov.l r8,@-r15
+ mov #-21,r8
+ mov.l r9,@-r15
+ bra LOCAL(denorm)
+ mov #-2,DBL1L ! one for denormal, and one for sticky bit
+
+LOCAL(ill_exp_11):
+ mov DBL1H,DBL1L
+ and r3,DBL0L ! 0x7fe00000
+ add DBL1L,DBL1L
+ mov.l r8,@-r15
+ cmp/hi DBL1L,DBL0L ! check if exp a was large
+ mov #-20,DBL0L
+ bf LOCAL(overflow)
+ mov #-21,r8
+ mov DBLRH,DBL1L
+ rotcr DBL1L ! shift in negative sign
+ mov.l r9,@-r15
+ shad DBL0L,DBL1L ! exponent ; s32
+ bra LOCAL(denorm)
+ add #-2,DBL1L ! add one for denormal, and one for sticky bit
+
+LOCAL(denorm_exp0_12):
+ mov.l r8,@-r15
+ mov #-20,r8
+ mov.l r9,@-r15
+ bra LOCAL(denorm)
+ mov #-2,DBL1L ! one for denormal, and one for sticky bit
+
+ .balign 4 ! also aligns LOCAL(denorm)
+LOCAL(ill_exp_12):
+ and r3,DBL0L ! 0x7fe00000
+ mov DBL1H,DBL1L
+ add DBL1L,DBL1L
+ mov.l r8,@-r15
+ cmp/hi DBL1L,DBL0L ! check if exp a was large
+ bf LOCAL(overflow)
+ mov DBLRH,DBL1L
+ rotcr DBL1L ! shift in negative sign
+ mov #-20,r8
+ shad r8,DBL1L ! exponent ; s32
+ mov.l r9,@-r15
+ add #-2,DBL1L ! add one for denormal, and one for sticky bit
+LOCAL(denorm):
+ not r3,r9 ! 0x001fffff
+ mov.l r10,@-r15
+ mov r2,r10
+ shld r8,r10 ! 11 or 12 lower bit valid
+ and r9,DBLRH ! Mask away vestiges of exponent.
+ add #32,r8
+ sub r3,DBLRH ! Make leading 1 explicit.
+ shld r8,r2 ! r10:r2 := unrounded result lowpart
+ shlr DBLRH ! compensate for doubling at end of normal code
+ sub DBLRL,r10 ! reconstruct effect of previous rounding
+ exts.b r10,r9
+ shad r3,r10 ! sign extension
+ mov #0,r3
+ clrt
+ addc r9,DBLRL ! Undo previous rounding.
+ mov.w LOCAL(m32),r9
+ addc r10,DBLRH
+ cmp/hi r3,r2
+ rotcl DBLRL ! fit in the rest of r2 as a sticky bit.
+ mov.l @r15+,r10
+ rotcl DBLRH
+ cmp/ge r9,DBL1L
+ bt LOCAL(small_norm_shift)
+ cmp/hi r3,DBLRL
+ add #32,DBL1L
+ movt DBLRL
+ cmp/gt r9,DBL1L
+ or DBLRH,DBLRL
+ bt/s LOCAL(small_norm_shift)
+ mov r3,DBLRH
+ mov r3,DBLRL ! exponent too negative to shift - return zero
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+ div0s DBL0H,DBL1H
+ rts
+ rotcr DBLRH
+ .balign 4
+LOCAL(small_norm_shift):
+ mov DBLRL,r2 ! stash away guard bits
+ shld DBL1L,DBLRL
+ mov DBLRH,DBL0L
+ shld DBL1L,DBLRH
+ mov.l LOCAL(x7fffffff),r9
+ add #32,DBL1L
+ shld DBL1L,r2
+ shld DBL1L,DBL0L
+ or DBL0L,DBLRL
+ shlr DBL0L
+ addc r2,r9
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+ addc r3,DBLRL
+ addc r3,DBLRH
+ div0s DBL0H,DBL1H
+ add DBLRH,DBLRH
+ rts
+ rotcr DBLRH
+
+
+LOCAL(x200):
+ .word 0x200
+LOCAL(m19):
+ .word -19
+LOCAL(m20):
+ .word -20
+LOCAL(m21):
+ .word -21
+LOCAL(m32):
+ .word -32
+LOCAL(d11):
+ .word 11
+LOCAL(d12):
+ .word 12
+ .balign 4
+LOCAL(x60000000):
+ .long 0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+LOCAL(xfff00000):
+ .long 0xfff00000
+LOCAL(x7fffffff):
+ .long 0x7fffffff
+LOCAL(x00100000):
+ .long 0x00100000
+LOCAL(x7fe00000):
+ .long 0x7fe00000
+LOCAL(x001fffff):
+ .long 0x001fffff
+LOCAL(x7ff00000):
+ .long 0x7ff00000
+LOCAL(x3ff00000):
+ .long 0x3ff00000
+LOCAL(x002fffff):
+ .long 0x002fffff
+LOCAL(xffe00000):
+ .long 0xffe00000
+LOCAL(x0017ffff):
+ .long 0x0017ffff
+LOCAL(x00080000):
+ .long 0x00080000
+ENDFUNC(GLOBAL(muldf3))
Index: gcc/config/sh/IEEE-754/m3/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsidf.S (revision 0)
@@ -0,0 +1,98 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+! floatsidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsidf))
+ .global GLOBAL(floatsidf)
+ .balign 4
+GLOBAL(floatsidf):
+ tst r4,r4
+ mov r4,r1
+ bt LOCAL(ret0)
+ cmp/pz r4
+ bt 0f
+ neg r4,r1
+0: mov.l LOCAL(c_clz_tab),r0
+ extu.w r1,r5
+ mov.w LOCAL(xff00),r3
+ cmp/eq r1,r5
+ mov #21,r2
+ bt 0f
+ mov r1,r5
+ shlr16 r5
+ add #-16,r2
+0: tst r3,r5 ! 0xff00
+ bt 0f
+ shlr8 r5
+0: bt 0f
+ add #-8,r2
+0:
+#ifdef __PIC__
+ add r0,r5
+ mova LOCAL(c_clz_tab),r0
+#endif
+ mov.b @(r0,r5),r5
+ cmp/pz r4
+ mov.l LOCAL(x41200000),r3 ! bias + 20 - implicit 1
+ bt 0f
+ mov.l LOCAL(xc1200000),r3 ! sign + bias + 20 - implicit 1
+0: mov r1,r0 ! DBLRL & DBLRH
+ sub r5,r2
+ mov r2,r5
+ shld r2,DBLRH
+ cmp/pz r2
+ add r3,DBLRH
+ add #32,r2
+ shld r2,DBLRL
+ bf 0f
+ mov.w LOCAL(d0),DBLRL
+0: mov #20,r2
+ shld r2,r5
+ rts
+ sub r5,DBLRH
+LOCAL(ret0):
+ mov #0,DBLRL
+ rts
+ mov #0,DBLRH
+
+LOCAL(xff00): .word 0xff00
+ .balign 4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0): .word 0
+ .word 0x4120
+#else
+ .word 0x4120
+LOCAL(d0): .word 0
+#endif
+LOCAL(xc1200000): .long 0xc1200000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsidf))
Index: gcc/config/sh/IEEE-754/m3/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixdfsi.S (revision 0)
@@ -0,0 +1,110 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!! fixdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixdfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get UINT_MAX, for set sign bit, you get 0.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixdfsi)
+ FUNC(GLOBAL(fixdfsi))
+ .balign 4
+GLOBAL(fixdfsi):
+ mov.w LOCAL(x413),r1
+ mov DBL0H,r0
+ shll DBL0H
+ mov.l LOCAL(mask),r3
+ mov #-21,r2
+ shld r2,DBL0H ! SH4-200 will start this insn in a new cycle
+ bt/s LOCAL(neg)
+ sub r1,DBL0H
+ cmp/pl DBL0H ! SH4-200 will start this insn in a new cycle
+ and r3,r0
+ bf/s LOCAL(ignore_low)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #10,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmax)
+ shld DBL0H,DBL0L
+ rts
+ or DBL0L,r0
+
+ .balign 8
+LOCAL(ignore_low):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ bf 0f ! SH4-200 will start this insn in a new cycle
+ mov #-31,DBL0H ! results in 0 return
+0: add #1,r0
+ rts
+ shld DBL0H,r0
+
+ .balign 4
+LOCAL(neg):
+ cmp/pl DBL0H
+ and r3,r0
+ bf/s LOCAL(ignore_low_neg)
+ addc r3,r0 ! uses T == 1; sets implict 1
+ mov #10,r2
+ shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
+ cmp/gt r2,DBL0H
+ add #-32,DBL0H
+ bt LOCAL(retmin)
+ shld DBL0H,DBL0L
+ or DBL0L,r0 ! SH4-200 will start this insn in a new cycle
+ rts
+ neg r0,r0
+
+ .balign 4
+LOCAL(ignore_low_neg):
+ mov #-21,r2
+ cmp/gt DBL0H,r2 ! SH4-200 will start this insn in a new cycle
+ add #1,r0
+ shld DBL0H,r0
+ bf 0f
+ mov #0,r0 ! results in 0 return
+0: rts
+ neg r0,r0
+
+LOCAL(retmax):
+ mov #-1,r0
+ rts
+ shlr r0
+
+LOCAL(retmin):
+ mov #1,r0
+ rts
+ rotr r0
+
+LOCAL(x413): .word 0x413
+
+ .balign 4
+LOCAL(mask): .long 0x000fffff
+ ENDFUNC(GLOBAL(fixdfsi))
+#endif /* L_fixdfsi */
Index: gcc/config/sh/IEEE-754/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divdf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/divdf3.S (revision 0)
@@ -0,0 +1,593 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!division of two double precision floating point numbers
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:dividend
+!
+!r6,r7:divisor
+!
+!Exit:
+!r0,r1:quotient
+
+!Notes: dividend is passed in regs r4 and r5 and divisor is passed in regs
+!r6 and r7, quotient is returned in regs r0 and r1. dividend is referred as op1
+!and divisor as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (divdf3)
+ FUNC (GLOBAL (divdf3))
+
+GLOBAL (divdf3):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+ mov r6,r1
+ mov r7,r6
+ mov r1,r7
+#endif
+ mov r4,r2
+ mov.l .L_inf,r1
+
+ and r1,r2
+ mov.l r8,@-r15
+
+ cmp/eq r1,r2
+ mov r6,r8
+
+ bt .L_a_inv
+ and r1,r8
+
+ cmp/eq r1,r8
+ mov.l .L_high_mant,r3
+
+ bf .L_chk_zero
+ and r6,r3
+
+ mov.l .L_mask_sign,r8
+ cmp/pl r7
+
+ mov r8,r0
+ bt .L_ret_b !op2=NaN,return op2
+
+ and r4,r8
+ cmp/pl r3
+
+ and r6,r0
+ bt .L_ret_b !op2=NaN,return op2
+
+ xor r8,r0 !op1=normal no,op2=Inf, return Zero
+ mov #0,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_ret_b:
+ mov r7,r1
+ mov r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_a_inv:
+ !chk if op1 is Inf or NaN
+ mov.l .L_high_mant,r2
+ cmp/pl r5
+
+ and r4,r2
+ bt .L_ret_a
+
+ and r1,r8 !r1 contains infinity
+ cmp/pl r2
+
+ bt .L_ret_a
+ cmp/eq r1,r8
+
+ mov r1,DBLRH
+ add DBLRH,DBLRH
+ bf 0f
+ mov #-1,DBLRH ! Inf/Inf, return NaN.
+0: div0s r4,r6
+ mov.l @r15+,r8
+ rts
+ rotcr DBLRH
+
+.L_ret_a:
+ !return op1
+ mov r5,r1
+ mov r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_chk_zero:
+ !chk if op1=0
+ mov.l .L_mask_sign,r0
+ mov r4,r3
+
+ and r0,r3
+ shll r4
+
+ and r6,r0
+ shlr r4
+
+ xor r3,r0
+ shll r6
+
+ shlr r6
+ tst r4,r4
+
+
+ bf .L_op1_not_zero
+ tst r5,r5
+
+ bf .L_op1_not_zero
+ tst r7,r7
+
+ mov.l @r15+,r8
+ bf .L_ret_zero
+
+ tst r6,r6
+ bf .L_ret_zero
+
+ rts
+ mov #-1,DBLRH !op1=op2=0, return NaN
+
+.L_ret_zero:
+ !return zero
+ mov r0,r1
+ rts
+#ifdef __LITTLE__ENDIAN
+ mov #0,r0
+#else
+ mov #0,r1 !op1=0,op2=normal no,return zero
+#endif
+
+.L_norm_b:
+ !normalize op2
+ shll r7
+ mov.l .L_imp_bit,r3
+
+ rotcl r6
+ tst r3,r6
+
+ add #-1,r8
+ bt .L_norm_b
+
+ bra .L_divide
+ add #1,r8
+
+.L_op1_not_zero:
+ !op1!=0, chk if op2=0
+ tst r7,r7
+ mov r1,r3
+
+ mov #0,r1
+ bf .L_normal_nos
+
+ tst r6,r6
+ bf .L_normal_nos
+
+ mov.l @r15+,r8
+ or r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ nop
+
+.L_normal_nos:
+ !op1 and op2 are normal nos
+ tst r2,r2
+ mov #-20,r1
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2
+#else
+ SHLR20 (r2)
+#endif
+ bt .L_norm_a !normalize dividend
+
+.L_chk_b:
+ mov.l r9,@-r15
+ tst r8,r8
+
+ mov.l .L_high_mant,r9
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r8
+#else
+ SHLR20 (r8)
+#endif
+ ! T set -> normalize divisor
+ SL(bt, .L_norm_b,
+ and r9,r4)
+
+.L_divide:
+ mov.l .L_2047,r1
+ sub r8,r2
+
+ mov.l .L_1023,r8
+ and r9,r6
+
+ !resultant exponent
+ add r8,r2
+ !chk the exponent for overflow
+ cmp/ge r1,r2
+
+ mov.l .L_imp_bit,r1
+ bt .L_overflow
+
+ mov #0,r8
+ or r1,r4
+
+ or r1,r6
+ mov #-24,r3
+
+ !chk if the divisor is 1(mantissa only)
+ cmp/eq r8,r7
+ bf .L_div2
+
+ cmp/eq r6,r1
+ bt .L_den_one
+
+.L_div2:
+ !divide the mantissas
+ shll8 r4
+ mov r5,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r9
+#else
+ SHLR24 (r9)
+#endif
+ shll8 r6
+
+ or r9,r4
+ shll8 r5
+
+ mov r7,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r9
+#else
+ SHLR24 (r9)
+#endif
+ mov r8,r3
+ shll8 r7
+
+ or r9,r6
+ cmp/gt r4,r6
+
+ mov r3,r9
+ bt .L_shift
+
+ cmp/eq r4,r6
+ bf .L_loop
+
+ cmp/gt r5,r7
+ bf .L_loop
+
+.L_shift:
+ add #-1,r2
+ shll r5
+ rotcl r4
+
+.L_loop:
+ !actual division loop
+ cmp/gt r6,r4
+ bt .L_subtract
+
+ cmp/eq r6,r4
+ bf .L_skip
+
+ cmp/ge r7,r5
+ bf .L_skip
+
+.L_subtract:
+ clrt
+ subc r7,r5
+
+ or r1,r8
+ subc r6,r4
+
+.L_skip:
+ shlr r1
+ shll r5
+
+ rotcl r4
+ cmp/eq r1,r3
+
+ bf .L_loop
+ mov.l .L_imp_bit,r1
+
+ !chk if the divison was for the higher word of the quotient
+ tst r1,r9
+ bf .L_chk_exp
+
+ mov r8,r9
+ mov.l .L_mask_sign,r1
+
+ !divide for the lower word of the quotient
+ bra .L_loop
+ mov r3,r8
+
+.L_chk_exp:
+ !chk if the result needs to be denormalized
+ cmp/gt r2,r3
+ bf .L_round
+ mov #-53,r7
+
+.L_underflow:
+ !denormalize the result
+ add #1,r2
+ cmp/gt r2,r7
+
+ or r4,r5 !remainder
+ add #-2,r2
+
+ mov #32,r4
+ bt .L_return_zero
+
+ add r2,r4
+ cmp/ge r3,r4
+
+ mov r2,r7
+ mov r3,r1
+
+ mov #-54,r2
+ bt .L_denorm
+ mov #-32,r7
+
+.L_denorm:
+ shlr r8
+ rotcr r1
+
+ shll r8
+ add #1,r7
+
+ shlr r9
+ rotcr r8
+
+ cmp/eq r3,r7
+ bf .L_denorm
+
+ mov r4,r7
+ cmp/eq r2,r4
+
+ bt .L_break
+ mov r3,r6
+
+ cmp/gt r7,r3
+ bf .L_break
+
+ mov r2,r4
+ mov r1,r6
+
+ mov r3,r1
+ bt .L_denorm
+
+.L_break:
+ mov #0,r2
+
+ cmp/gt r1,r2
+
+ addc r2,r8
+ mov.l .L_comp_1,r4
+
+ addc r3,r9
+ or r9,r0
+
+ cmp/eq r5,r3
+ bf .L_return
+
+ cmp/eq r3,r6
+ mov.l .L_mask_sign,r7
+
+ bf .L_return
+ cmp/eq r7,r1
+
+ bf .L_return
+ and r4,r8
+
+.L_return:
+ mov.l @r15+,r9
+ mov r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_norm_a:
+ !normalize op1
+ shll r5
+ mov.l .L_imp_bit,r3
+
+ rotcl r4
+ tst r3,r4
+
+ add #-1,r2
+ bt .L_norm_a
+
+ bra .L_chk_b
+ add #1,r2
+
+.L_overflow:
+ !overflow, return inf
+ mov.l .L_inf,r2
+#ifdef __LITTLE_ENDIAN__
+ or r2,r1
+ mov #0,r0
+#else
+ or r2,r0
+ mov #0,r1
+#endif
+ mov.l @r15+,r9
+ rts
+ mov.l @r15+,r8
+
+.L_den_one:
+ !denominator=1, result=numerator
+ mov r4,r9
+ mov #-53,r7
+
+ cmp/ge r2,r8
+ mov r8,r4
+
+ mov r5,r8
+ mov r4,r3
+
+ !chk the exponent for underflow
+ SL(bt, .L_underflow,
+ mov r4,r5)
+
+ mov.l .L_high_mant,r7
+ bra .L_pack
+ mov #20,r6
+
+.L_return_zero:
+ !return zero
+ mov r3,r1
+ mov.l @r15+,r9
+
+ rts
+ mov.l @r15+,r8
+
+.L_round:
+ !apply rounding
+ cmp/eq r4,r6
+ bt .L_lower
+
+ clrt
+ subc r6,r4
+
+ bra .L_rounding
+ mov r4,r6
+
+.L_lower:
+ clrt
+ subc r7,r5
+ mov r5,r6
+
+.L_rounding:
+ !apply rounding
+ mov.l .L_invert,r1
+ mov r3,r4
+
+ movt r3
+ clrt
+
+ not r3,r3
+ and r1,r3
+
+ addc r3,r8
+ mov.l .L_high_mant,r7
+
+ addc r4,r9
+ cmp/eq r4,r6
+
+ mov.l .L_comp_1,r3
+ SL (bf, .L_pack,
+ mov #20,r6)
+ and r3,r8
+
+.L_pack:
+ !pack the result, r2=exponent,r0=sign,r8=lower mantissa, r9=higher mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ SHLL20 (r2)
+#endif
+ and r7,r9
+
+ or r2,r0
+ mov r8,r1
+
+ or r9,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+ .align 2
+
+.L_mask_sign:
+ .long 0x80000000
+.L_high_mant:
+ .long 0x000fffff
+.L_inf:
+ .long 0x7ff00000
+.L_1023:
+ .long 1023
+.L_2047:
+ .long 2047
+.L_imp_bit:
+ .long 0x00100000
+.L_comp_1:
+ .long 0xfffffffe
+.L_invert:
+ .long 0x00000001
+
+ENDFUNC (GLOBAL (divdf3))
Index: gcc/config/sh/IEEE-754/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatunssisf.S (revision 0)
@@ -0,0 +1,132 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of unsigned integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatunsisf)
+ FUNC (GLOBAL (floatunsisf))
+
+GLOBAL (floatunsisf):
+ tst r4,r4
+ mov #23,r6
+
+ mov.l .L_set_24_bits,r7
+ SL(bt, .L_return,
+ not r7,r3)
+
+ ! Decide the direction for shifting
+ mov.l .L_set_24_bit,r5
+ cmp/hi r7,r4
+
+ not r5,r2
+ SL(bt, .L_shift_right,
+ mov #0,r7)
+
+ tst r5,r4
+
+ mov #0,r0
+ bf .L_pack_sf
+
+! Shift the bits to the left. Adjust the exponent
+.L_shift_left:
+ shll r4
+ tst r5,r4
+
+ add #-1,r6
+ bt .L_shift_left
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa
+.L_pack_sf:
+ mov #23,r3
+ add #127,r6
+
+ ! Align the exponent
+ and r2,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+
+ or r6,r0
+ rts
+ or r4,r0
+
+! Shift right the number with rounding
+.L_shift_right:
+ shlr r4
+ rotcr r7
+
+ tst r4,r3
+ add #1,r6
+
+ bf .L_shift_right
+
+ tst r7,r7
+ bt .L_sh_rt_1
+
+ shll r7
+ movt r1
+
+ add r1,r4
+
+ tst r7,r7
+ bf .L_sh_rt_1
+
+ ! Halfway between two numbers.
+ ! Round towards LSB = 0
+ shlr r4
+ shll r4
+
+.L_sh_rt_1:
+ mov r4,r0
+
+ ! Rounding may have misplaced MSB. Adjust.
+ and r3,r0
+ cmp/eq #0,r0
+
+ bf .L_shift_right
+ bt .L_pack_sf
+
+.L_return:
+ rts
+ mov r4,r0
+
+ .align 2
+.L_set_24_bit:
+ .long 0x00800000
+
+.L_set_24_bits:
+ .long 0x00FFFFFF
+
+ENDFUNC (GLOBAL (floatunsisf))
Index: gcc/config/sh/IEEE-754/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunsdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixunsdfsi.S (revision 0)
@@ -0,0 +1,176 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to unsigned integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixunsdfsi)
+ FUNC (GLOBAL (fixunsdfsi))
+
+GLOBAL (fixunsdfsi):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+#endif
+ mov.l .L_p_inf,r2
+ mov #-20,r1
+
+ mov r2,r7
+ mov.l .L_1023,r3
+
+ and r4,r2
+ shll r4
+
+ movt r6 ! r6 contains the sign bit
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2 ! r2 contains the exponent
+#else
+ SHLR20 (r2)
+#endif
+ shlr r4
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r7
+#else
+ SHLR20 (r7)
+#endif
+ tst r6,r6
+ SL(bf, .L_epil,
+ mov #0,r0)
+
+ cmp/hi r2,r3 ! if exp < 1023,return 0
+ mov.l .L_high_mant,r1
+
+ SL(bt, .L_epil,
+ and r4,r1) ! r1 contains high mantissa
+
+ cmp/eq r2,r7 ! chk if exp is invalid
+ mov.l .L_1054,r7
+
+ bt .L_inv_exp
+ mov #11,r0
+
+ cmp/hi r7,r2 ! If exp > 1054,return maxint
+ sub r2,r7 !r7 contains the number of shifts
+
+ mov.l .L_21bit,r2
+ bt .L_ret_max
+
+ or r2,r1
+ mov r7,r3
+
+ shll8 r1
+ neg r7,r7
+
+ shll2 r1
+
+ shll r1
+ cmp/hi r3,r0
+
+ SL(bt, .L_lower_mant,
+ mov #21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_sh_loop:
+ tst r7,r7
+ bt .L_break
+ add #1,r7
+ bra .L_sh_loop
+ shlr r1
+
+.L_break:
+#endif
+ rts
+ mov r1,r0
+
+.L_lower_mant:
+ neg r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r0,r5
+#else
+ SHLR21 (r5)
+#endif
+ or r5,r1 !pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_loop:
+ tst r7,r7
+ bt .L_break1
+ add #1,r7
+ bra .L_loop
+ shlr r1
+
+.L_break1:
+#endif
+ mov r1,r0
+.L_epil:
+ rts
+ nop
+
+.L_inv_exp:
+ cmp/hi r0,r5
+ bt .L_epil
+
+ cmp/hi r0,r1 !compare high mantissa,r1
+ bt .L_epil
+
+.L_ret_max:
+ mov.l .L_maxint,r0
+
+ rts
+ nop
+
+ .align 2
+
+.L_maxint:
+ .long 0xffffffff
+.L_p_inf:
+ .long 0x7ff00000
+.L_high_mant:
+ .long 0x000fffff
+.L_1023:
+ .long 0x000003ff
+.L_1054:
+ .long 1054
+.L_21bit:
+ .long 0x00100000
+
+ENDFUNC (GLOBAL (fixunsdfsi))
Index: gcc/config/sh/IEEE-754/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/adddf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/adddf3.S (revision 0)
@@ -0,0 +1,786 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for adding two double numbers
+
+! Author: Rakesh Kumar
+! SH1 Support by Joern Rennecke
+! Sticky Bit handling : Joern Rennecke
+
+! Arguments: r4-r5, r6-r7
+! Result: r0-r1
+
+! The value in r4-r5 is referred to as op1
+! and that in r6-r7 is referred to as op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (subdf3)
+ FUNC (GLOBAL (subdf3))
+ .global GLOBAL (adddf3)
+ FUNC (GLOBAL (adddf3))
+
+GLOBAL (subdf3):
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r6,r2
+
+ mov r5,r4
+ mov r7,r6
+
+ mov r1,r5
+ mov r2,r7
+#endif
+ mov.l .L_sign,r2
+ bra .L_adddf3_1
+ xor r2,r6
+
+GLOBAL (adddf3):
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r6,r2
+
+ mov r5,r4
+ mov r7,r6
+
+ mov r1,r5
+ mov r2,r7
+#endif
+
+.L_adddf3_1:
+ mov.l r8,@-r15
+ mov r4,r1
+
+ mov.l .L_inf,r2
+ mov r6,r3
+
+ mov.l r9,@-r15
+ and r2,r1 !Exponent of op1 in r1
+
+ mov.l r10,@-r15
+ and r2,r3 !Exponent of op2 in r3
+
+ ! Check for Nan or Infinity
+ mov.l .L_sign,r9
+ cmp/eq r2,r1
+
+ mov r9,r10
+ bt .L_thread_inv_exp_op1
+
+ mov r9,r0
+ cmp/eq r2,r3
+! op1 has a valid exponent. We need not check it again.
+! Return op2 straight away.
+ and r4,r9 !r9 has sign bit for op1
+ bt .L_ret_op2
+
+ ! Check for -ve zero
+ cmp/eq r4,r0
+ and r6,r10 !r10 has sign bit for op2
+
+ bt .L_op1_nzero
+
+ cmp/eq r6,r0
+ bt .L_op2_nzero
+
+! Check for zero
+.L_non_zero:
+ tst r4,r4
+ bt .L_op1_zero
+
+ ! op1 is not zero, check op2 for zero
+ tst r6,r6
+ bt .L_op2_zero
+
+! r1 and r3 has masked out exponents, r9 and r10 has signs
+.L_add:
+ mov.l .L_high_mant,r8
+ mov #-20,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r1 ! r1 now has exponent for op1 in its lower bits
+#else
+ SHLR20 (r1)
+#endif
+ and r8,r6 ! Higher bits of mantissa of op2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3 ! r3 has exponent for op2 in its lower bits
+#else
+ SHLR20 (r3)
+#endif
+ and r8,r4 ! Higher bits of mantissa of op1
+
+ mov.l .L_21bit,r8
+
+ tst r1,r1
+ bt .L_norm_op1
+
+ ! Set the 21st bit.
+ or r8,r4
+ tst r3,r3
+
+ bt .L_norm_op2
+ or r8,r6
+
+! Check for negative mantissas. Make them positive by negation
+! r9 and r10 have signs of op1 and op2 respectively
+.L_neg_mant:
+ tst r9,r9
+ bf .L_neg_op1
+
+ tst r10,r10
+ bf .L_neg_op2
+
+.L_add_1:
+ cmp/ge r1,r3
+
+ mov r1,r0
+ bt .L_op2_exp_greater
+
+ sub r3,r0
+ ! If exponent difference is greater than 54, the resultant exponent
+ ! won't be changed. Return op1 straight away.
+ mov #54,r2
+ cmp/gt r2,r0
+
+ bt .L_pack_op1
+
+ mov r1,r3
+ clrt
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+
+ ! Shift left the first operand and apply rest of shifts to second operand.
+ mov #0,r2
+ shll r5
+
+ rotcl r4
+
+ add #-1,r3
+ dt r0
+
+ bt .L_add_mant
+ dt r0
+
+ bt LOCAL(got_guard)
+ dt r0
+
+ bt LOCAL(got_sticky)
+
+! Shift the mantissa part of op2 so that both exponents are equal
+.L_shfrac_op2:
+ shar r6
+ or r7,r2 ! sticky bit
+
+ rotcr r7
+ dt r0
+
+ bf .L_shfrac_op2
+
+ shlr r2
+
+ subc r2,r2 ! spread sticky bit across r2
+LOCAL(got_sticky):
+ shar r6
+
+ rotcr r7
+
+ rotcr r2
+LOCAL(got_guard):
+ shar r6
+
+ rotcr r7
+
+ rotcr r2
+
+
+! Add the psotive mantissas and check for overflow by checking the
+! MSB of the resultant. In case of overflow, negate the result.
+.L_add_mant:
+ clrt
+ addc r7,r5
+
+ mov #0,r10 ! Assume resultant to be positive
+ addc r6,r4
+
+ cmp/pz r4
+
+ bt .L_mant_ptv
+ negc r2,r2
+
+ negc r5,r5
+
+ mov.l .L_sign,r10 ! The assumption was wrong, result is negative
+ negc r4,r4
+
+! 23rd bit in the high part of mantissa could be set.
+! In this case, right shift the mantissa.
+.L_mant_ptv:
+ mov.l .L_23bit,r0
+
+ tst r4,r0
+ bt .L_mant_ptv_0
+
+ shlr r4
+ rotcr r5
+
+ add #1,r3
+ bra .L_mant_ptv_1
+ rotcr r2
+
+.L_mant_ptv_0:
+ mov.l .L_22bit,r0
+ tst r4,r0
+
+ bt .L_norm_mant
+
+.L_mant_ptv_1:
+ ! 22 bit of resultant mantissa is set. Shift right the mantissa
+ ! and add 1 to exponent
+ add #1,r3
+ shlr r4
+ rotcr r5
+ ! The mantissa is already normalized. We don't need to
+ ! spend any effort. Branch to epilogue.
+ bra .L_epil
+ rotcr r2
+
+! Normalize operands
+.L_norm_op1:
+ shll r5
+
+ rotcl r4
+ add #-1,r1
+
+ tst r4,r8
+ bt .L_norm_op1
+
+ tst r3,r3
+ SL(bf, .L_neg_mant,
+ add #1,r1)
+
+.L_norm_op2:
+ shll r7
+
+ rotcl r6
+ add #-1,r3
+
+ tst r6,r8
+ bt .L_norm_op2
+
+ bra .L_neg_mant
+ add #1,r3
+
+! Negate the mantissa of op1
+.L_neg_op1:
+ clrt
+ negc r5,r5
+
+ negc r4,r4
+ tst r10,r10
+
+ bt .L_add_1
+
+! Negate the mantissa of op2
+.L_neg_op2:
+ clrt
+ negc r7,r7
+
+ bra .L_add_1
+ negc r6,r6
+
+! Thread the jump to .L_inv_exp_op1
+.L_thread_inv_exp_op1:
+ bra .L_inv_exp_op1
+ nop
+
+.L_ret_op2:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r6,r1
+#else
+ mov r6,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r7,r0
+#else
+ mov r7,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_op1_nzero:
+ tst r5,r5
+ bt .L_ret_op2
+
+ ! op1 is not zero. Check op2 for negative zero
+ cmp/eq r6,r0
+ bf .L_non_zero ! both op1 and op2 are not -0
+
+.L_op2_nzero:
+ tst r7,r7
+ bf .L_non_zero
+
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0 ! op2 is -0, return op1
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! High bit of op1 is known to be zero.
+! Check low bit. r2 contains 0x00000000
+.L_op1_zero:
+ tst r5,r5
+ bt .L_ret_op2
+
+ ! op1 is not zero. Check high bit of op2
+ tst r6,r6
+ bf .L_add ! both op1 and op2 are not zero
+
+! op1 is not zero. High bit of op2 is known to be zero.
+! Check low bit of op2. r2 contains 0x00000000
+.L_op2_zero:
+ tst r7,r7
+ bf .L_add
+
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0 ! op2 is zero, return op1
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! exp (op1) is smaller or equal to exp (op2)
+! The logic of same operations is present in .L_add. Kindly refer it for
+! comments
+.L_op2_exp_greater:
+ mov r3,r0
+ sub r1,r0
+
+ mov #54,r2
+ cmp/gt r2,r0
+
+ bt .L_pack_op2
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+
+ mov #0,r2
+ shll r7
+ rotcl r6
+ add #-1,r0
+ add #-1,r3
+
+ cmp/eq #0,r0
+ bt .L_add_mant
+.L_shfrac_op1:
+ add #-1,r0
+ shar r4
+
+ rotcr r5
+ rotcr r2
+
+ cmp/eq #0,r0
+ bf .L_shfrac_op1
+
+ bra .L_add_mant
+ nop
+
+! Return the value in op1
+.L_ret_op1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! r1 has exp, r9 has sign, r4 and r5 mantissa
+.L_pack_op1:
+ mov.l .L_high_mant,r7
+ mov r4,r0
+
+ tst r9,r9
+ bt .L_pack_op1_1
+
+ clrt
+ negc r5,r5
+ negc r0,r0
+
+.L_pack_op1_1:
+ and r7,r0
+ mov r1,r3
+
+ mov #20,r2
+ mov r5,r1
+
+ mov.l @r15+,r10
+ or r9,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+ mov.l @r15+,r9
+
+ or r3,r0
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+!r2 has exp, r10 has sign, r6 and r7 mantissa
+.L_pack_op2:
+ mov.l .L_high_mant,r9
+ mov r6,r0
+
+ tst r10,r10
+ bt .L_pack_op2_1
+
+ clrt
+ negc r7,r7
+ negc r0,r0
+
+.L_pack_op2_1:
+ and r9,r0
+ mov r7,r1
+
+ mov #20,r2
+ or r10,r0
+
+ mov.l @r15+,r10
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+
+ mov.l @r15+,r9
+
+ or r3,r0
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+! Normalize the mantissa by setting its 21 bit in high part
+.L_norm_mant:
+ mov.l .L_21bit,r0
+
+ tst r4,r0
+ bf .L_epil
+
+ tst r4,r4
+ bf .L_shift_till_1
+
+ tst r5,r5
+ bf .L_shift_till_1
+
+ ! Mantissa is zero, return 0
+ mov.l @r15+,r10
+ mov #0,r0
+
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+
+ rts
+ mov #0,r1
+
+! A loop for making the 21st bit 1 in high part of resultant mantissa
+! It is already ensured that 1 bit is present in the mantissa
+.L_shift_till_1:
+ clrt
+ shll r5
+
+ rotcl r4
+ add #-1,r3
+
+ tst r4,r0
+ bt .L_shift_till_1
+
+! Return the result. Mantissa is in r4-r5. Exponent is in r3
+! Sign bit in r10
+.L_epil:
+ cmp/pl r3
+
+ bf .L_denorm
+ mov.l LOCAL(x7fffffff),r0
+
+ mov r5,r1
+ shlr r1
+
+ mov #0,r1
+ addc r0,r2
+
+! Check extra MSB here
+ mov.l .L_22bit,r9
+ addc r1,r5 ! round to even
+
+ addc r1,r4
+ tst r9,r4
+
+ bf .L_epil_1
+
+.L_epil_0:
+ mov.l .L_21bit,r1
+
+ not r1,r1
+ and r1,r4
+
+ mov r4,r0
+ or r10,r0
+
+ mov.l @r15+,r10
+ mov #20,r2
+
+ mov.l @r15+,r9
+ mov r5,r1
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r3
+#else
+ SHLL20 (r3)
+#endif
+ or r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_epil_1:
+ shlr r4
+ add #1,r3
+ bra .L_epil_0
+ rotcr r5
+
+.L_denorm:
+ add #-1,r3
+.L_denorm_1:
+ tst r3,r3
+ bt .L_denorm_2
+
+ shlr r4
+ rotcr r5
+
+ movt r1
+ bra .L_denorm_1
+ add #1,r3
+
+.L_denorm_2:
+ clrt
+ mov #0,r2
+ addc r1,r5
+
+ addc r2,r4
+ mov r4,r0
+
+ or r10,r0
+ mov.l @r15+,r10
+
+ mov r5,r1
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+! op1 is known to be positive infinity, and op2 is Inf. The sign
+! of op2 is not known. Return the appropriate value
+.L_op1_pinf_op2_inf:
+ mov.l .L_sign,r0
+ tst r6,r0
+
+ bt .L_ret_op2_1
+
+ ! op2 is negative infinity. Inf - Inf is being performed
+ mov.l .L_inf,r0
+ mov.l @r15+,r10
+ mov.l @r15+,r9
+ mov.l @r15+,r8
+ rts
+ mov #-1,DBLRH ! return NaN.
+
+.L_ret_op1_1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+#else
+ mov r4,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r5,r0
+#else
+ mov r5,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_ret_op2_1:
+ mov.l @r15+,r10
+#ifdef __LITTLE_ENDIAN__
+ mov r6,r1
+#else
+ mov r6,r0
+#endif
+
+ mov.l @r15+,r9
+#ifdef __LITTLE_ENDIAN__
+ mov r7,r0
+#else
+ mov r7,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+! op1 is negative infinity. Check op2 for infinity or Nan
+.L_op1_ninf:
+ cmp/eq r2,r3
+ bf .L_ret_op1_1 ! op2 is neither Nan nor Inf
+
+ mov.l @r15+,r9
+ div0s r4,r6 ! different signs -> NaN
+ mov r4,DBLRH
+ or r6,DBLRH
+ mov.l @r15+,r8
+ SL(bf, 0f,
+ mov r5,DBLRL)
+ mov #-1,DBLRH ! return NaN.
+0: rts
+ or r7,DBLRL
+
+!r1 contains exponent for op1, r3 contains exponent for op2
+!r2 has .L_inf (+ve Inf)
+!op1 has invalid exponent. Either it contains Nan or Inf
+.L_inv_exp_op1:
+ ! Check if a is Nan
+ cmp/pl r5
+ bt .L_ret_op1_1
+
+ mov.l .L_high_mant,r0
+ and r4,r0
+
+ cmp/pl r0
+ bt .L_ret_op1_1
+
+ ! op1 is not Nan. It is infinity. Check the sign of it.
+ ! If op2 is Nan, return op2
+ cmp/pz r4
+
+ bf .L_op1_ninf
+
+ ! op2 is +ve infinity here
+ cmp/eq r2,r3
+ bf .L_ret_op1_1 ! op2 is neither Nan nor Inf
+
+ ! r2 is free now
+ mov.l .L_high_mant,r0
+ tst r6,r0 ! op2 also has invalid exponent
+
+ bf .L_ret_op2_1 ! branch if op2 is NaN
+
+ tst r7,r7
+ bt .L_op1_pinf_op2_inf ! op2 is Infinity, and op1 is +Infinity
+ !op2 is not infinity, It is Nan
+ bf .L_ret_op2_1
+
+ .align 2
+.L_high_mant:
+ .long 0x000FFFFF
+
+.L_21bits:
+ .long 0x001FFFFF
+
+.L_22bit:
+ .long 0x00200000
+
+.L_23bit:
+ .long 0x00400000
+
+.L_21bit:
+ .long 0x00100000
+
+.L_sign:
+ .long 0x80000000
+
+.L_inf:
+ .long 0x7ff00000
+
+LOCAL(x7fffffff): .long 0x7fffffff
+
+ENDFUNC (GLOBAL (subdf3))
+ENDFUNC (GLOBAL (adddf3))
Index: gcc/config/sh/IEEE-754/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsisf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatsisf.S (revision 0)
@@ -0,0 +1,195 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatsisf)
+ FUNC (GLOBAL (floatsisf))
+
+GLOBAL (floatsisf):
+ mov.l .L_sign,r2
+ mov #23,r6
+
+ ! Check for zero
+ tst r4,r4
+ mov.l .L_24_bits,r7
+
+ ! Extract sign
+ and r4,r2
+ bt .L_ret
+
+ ! Negative ???
+ mov.l .L_imp_bit,r5
+ cmp/pl r4
+
+ not r7,r3
+ bf .L_neg
+
+ ! Decide the direction for shifting
+ cmp/gt r7,r4
+ mov r4,r0
+
+ and r5,r0
+ bt .L_shr_0
+
+ ! Number may already be in normalized form
+ cmp/eq #0,r0
+ bf .L_pack
+
+! Shift the bits to the left. Adjust the exponent
+.L_shl:
+ shll r4
+ mov r4,r0
+
+ and r5,r0
+ cmp/eq #0,r0
+
+ SL(bt, .L_shl,
+ add #-1,r6)
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa, r2 has sign
+.L_pack:
+ mov #23,r3
+ not r5,r5
+
+ mov r2,r0
+ add #127,r6
+
+ and r5,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+
+ or r6,r0
+ rts
+ or r4,r0
+
+! Negate the number
+.L_neg:
+ ! Take care for -2147483648.
+ mov r4,r0
+ shll r0
+
+ cmp/eq #0,r0
+ SL(bt, .L_ret_min,
+ neg r4,r4)
+
+ cmp/gt r7,r4
+ bt .L_shr_0
+
+ mov r4,r0
+ and r5,r0
+
+ cmp/eq #0,r0
+ bf .L_pack
+ bt .L_shl
+
+.L_shr_0:
+ mov #0,r1
+
+! Shift right the number with rounding
+.L_shr:
+ shlr r4
+ movt r7
+
+ tst r7,r7
+
+ ! Count number of ON bits shifted
+ bt .L_shr_1
+ add #1,r1
+
+.L_shr_1:
+ mov r4,r0
+ add #1,r6
+
+ and r3,r0
+ cmp/eq #0,r0
+
+ ! Add MSB of shifted bits
+ bf .L_shr
+ add r7,r4
+
+ tst r7,r7
+ bt .L_pack
+
+.L_pack1:
+ mov #1,r0
+ cmp/eq r1,r0
+
+ bt .L_rnd
+ mov r4,r0
+
+ ! Rounding may have misplaced MSB. Adjust.
+ and r3,r0
+ cmp/eq #0,r0
+
+ bf .L_shr
+ bt .L_pack
+
+! If only MSB of shifted bits is ON, we are halfway
+! between two numbers. Round towards even LSB of
+! resultant mantissa.
+.L_rnd:
+ shlr r4
+ bra .L_pack
+ shll r4
+
+.L_ret:
+ rts
+ mov r4,r0
+
+! Return value for -2147483648
+.L_ret_min:
+ mov.l .L_min_val,r0
+ rts
+ nop
+
+ .align 2
+.L_sign:
+ .long 0x80000000
+
+.L_imp_bit:
+ .long 0x00800000
+
+.L_24_bits:
+ .long 0x00FFFFFF
+
+.L_nsign:
+ .long 0x7FFFFFFF
+
+.L_min_val:
+ .long 0xCF000000
+
+ENDFUNC (GLOBAL (floatsisf))
Index: gcc/config/sh/IEEE-754/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/muldf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/muldf3.S (revision 0)
@@ -0,0 +1,596 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!multiplication of two double precision floating point numbers
+!Author:Aanchal Khanna
+!SH1 Support / Simplifications: Joern Rennecke
+!
+!Entry:
+!r4,r5:operand 1
+!
+!r6,r7:operand 2
+!
+!Exit:
+!r0,r1:result
+!
+!Notes: argument 1 is passed in regs r4 and r5 and argument 2 is passed in regs
+!r6 and r7, result is returned in regs r0 and r1. operand 1 is referred as op1
+!and operand 2 as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+ .text
+ .align 5
+ .global GLOBAL (muldf3)
+ FUNC (GLOBAL (muldf3))
+
+GLOBAL (muldf3):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+ mov r6,r1
+ mov r7,r6
+ mov r1,r7
+#endif
+ mov.l .L_mask_sign,r0
+ mov r4,r2
+
+ and r0,r2
+ mov #0,r1
+
+ shll r4
+ and r6,r0
+
+ xor r2,r0 !r0 contains the result's sign bit
+ shlr r4
+
+ mov.l .L_inf,r2
+ shll r6
+
+ mov r4,r3
+ shlr r6
+
+.L_chk_a_inv:
+ !chk if op1 is Inf/NaN
+ and r2,r3
+ mov.l r8,@-r15
+
+ cmp/eq r3,r2
+ mov.l .L_mask_high_mant,r8
+
+ mov r2,r3
+ bf .L_chk_b_inv
+
+ mov r8,r3
+ and r4,r8
+
+ cmp/hi r1,r8
+ bt .L_return_a !op1 NaN, return op1
+
+ cmp/hi r1,r5
+ mov r2,r8
+
+ bt .L_return_a !op1 NaN, return op1
+ and r6,r8
+
+ cmp/eq r8,r2
+ and r6,r3
+
+ bt .L_b_inv
+ cmp/eq r1,r6
+
+ bf .L_return_a !op1 Inf,op2= normal no return op1
+ cmp/eq r1,r7
+
+ bf .L_return_a !op1 Inf,op2= normal no return op1
+ mov.l @r15+,r8
+
+ rts
+ mov #-1,DBLRH !op1=Inf, op2=0,return nan
+
+.L_b_inv:
+ !op2 is NaN/Inf
+ cmp/hi r1,r7
+ mov r1,r2
+
+ mov r5,r1
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/hi r2,r6
+ or r4,r0
+
+ bt .L_return_b !op2=NaN,return op2
+ mov.l @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts !op1=Inf,op2=Inf,return Inf with sign
+ nop
+
+.L_chk_b_inv:
+ !Chk if op2 is NaN/Inf
+ and r6,r2
+ cmp/eq r3,r2
+
+ bf .L_chk_a_for_zero
+ and r6,r8
+
+ cmp/hi r1,r8
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/hi r1,r7
+ bt .L_return_b !op2=NaN,return op2
+
+ cmp/eq r5,r1
+ bf .L_return_b !op1=normal number,op2=Inf,return Inf
+
+ mov r7,r1
+ cmp/eq r4,r1
+
+ bf .L_return_b /* op1=normal number, op2=Inf,return Inf */
+ mov.l @r15+,r8
+
+ rts
+ mov #-1,DBLRH !op1=0,op2=Inf,return NaN
+
+.L_return_a:
+ mov r5,r1
+ or r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_return_b:
+ mov r7,r1
+ or r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+.L_chk_a_for_zero:
+ !Chk if op1 is zero
+ cmp/eq r1,r4
+ bf .L_chk_b_for_zero
+
+ cmp/eq r1,r5
+ bf .L_chk_b_for_zero
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_chk_b_for_zero:
+ !op1=0,chk if op2 is zero
+ cmp/eq r1,r6
+ mov r1,r3
+
+ mov.l .L_inf,r1
+ bf .L_normal_nos
+
+ cmp/eq r3,r7
+ bf .L_normal_nos
+
+ mov r3,r1
+ mov.l @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ nop
+
+.L_normal_nos:
+ !op1 and op2 are normal nos
+ mov.l r9,@-r15
+ mov r4,r3
+
+ mov #-20,r9
+ and r1,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r9,r2
+#else
+ SHLR20 (r2)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r9,r3
+#else
+ SHLR20 (r3)
+#endif
+ cmp/pl r3
+
+ bf .L_norm_a !normalize op1
+.L_chk_b:
+ cmp/pl r2
+ bf .L_norm_b !normalize op2
+
+.L_mul1:
+ add r3,r2
+ mov.l .L_1023,r1
+
+ !resultant exponent in r2
+ add r1,r2
+ mov.l .L_2047,r1
+
+ !Chk the exponent for overflow
+ cmp/ge r1,r2
+ and r8,r4
+
+ bt .L_return_inf
+ mov.l .L_imp_bit,r1
+
+ or r1,r4
+ and r8,r6
+
+ or r1,r6
+ clrt
+
+ !multiplying the mantissas
+ DMULU_SAVE
+ DMULUL (r7,r5,r1) !bits 0-31 of product
+
+ DMULUH (r3)
+
+ DMULUL (r4,r7,r8)
+
+ addc r3,r8
+
+ DMULUH (r3)
+
+ movt r9
+ clrt
+
+ DMULUL (r5,r6,r7)
+
+ addc r7,r8 !bits 63-32 of product
+
+ movt r7
+ add r7,r9
+
+ DMULUH (r7)
+
+ add r7,r3
+
+ add r9,r3
+ clrt
+
+ DMULUL (r4,r6,r7)
+
+ addc r7,r3 !bits 64-95 of product
+
+ DMULUH (r7)
+ DMULU_RESTORE
+
+ mov #0,r5
+ addc r5,r7 !bits 96-105 of product
+
+ cmp/eq r5,r1
+ mov #1,r4
+
+ bt .L_skip
+ or r4,r8
+.L_skip:
+ mov.l .L_106_bit,r4
+ mov r8,r9
+
+.L_chk_extra_msb:
+ !chk if exra MSB is generated
+ and r7,r4
+ cmp/eq r5,r4
+
+ mov #12,r4
+ SL(bf, .L_shift_rt_by_1,
+ mov #31,r5)
+
+.L_pack_mantissa:
+ !scale the mantissa t0 53 bits
+ mov #-19,r6
+ mov.l .L_mask_high_mant,r5
+
+ SHLRN (19, r6, r8)
+
+ and r3,r5
+
+ shlr r8
+ movt r1
+
+ SHLLN (12, r4, r5)
+
+ add #-1,r6
+
+ or r5,r8 !lower bits of resulting mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r3
+#else
+ SHLR20 (r3)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r4,r7
+#else
+ SHLL12 (r7)
+#endif
+ clrt
+
+ or r7,r3 !higher bits of resulting mantissa
+ mov #0,r7
+
+ !chk the exponent for underflow
+ cmp/ge r2,r7
+ bt .L_underflow
+
+ addc r1,r8 !rounding
+ mov r8,r1
+
+ addc r7,r3 !rounding
+ mov.l .L_mask_22_bit,r5
+
+ and r3,r5
+ !chk if extra msb is generated after rounding
+ cmp/eq r7,r5
+
+ mov.l .L_mask_high_mant,r8
+ bt .L_pack_result
+
+ add #1,r2
+ mov.l .L_2047,r6
+
+ cmp/ge r6,r2
+
+ bt .L_return_inf
+ shlr r3
+
+ rotcr r1
+
+.L_pack_result:
+ !pack the result, r2=exponent, r3=higher mantissa, r1=lower mantissa
+ !r0=sign bit
+ mov #20,r6
+ and r8,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ SHLL20 (r2)
+#endif
+ or r3,r0
+
+ or r2,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_norm_a:
+ !normalize op1
+ shll r5
+ mov.l .L_imp_bit,r1
+
+ rotcl r4
+ add #-1,r3
+
+ tst r1,r4
+ bt .L_norm_a
+
+ bra .L_chk_b
+ add #1,r3
+
+.L_norm_b:
+ !normalize op2
+ shll r7
+ mov.l .L_imp_bit,r1
+
+ rotcl r6
+ add #-1,r2
+
+ tst r1,r6
+ bt .L_norm_b
+
+ bra .L_mul1
+ add #1,r2
+
+.L_shift_rt_by_1:
+ !adjust the extra msb
+
+ add #1,r2 !add 1 to exponent
+ mov.l .L_2047,r6
+
+ cmp/ge r6,r2
+ mov #20,r6
+
+ bt .L_return_inf
+ shlr r7 !r7 contains bit 96-105 of product
+
+ rotcr r3 !r3 contains bit 64-95 of product
+
+ rotcr r8 !r8 contains bit 32-63 of product
+ bra .L_pack_mantissa
+
+ rotcr r1 !r1 contains bit 31-0 of product
+
+.L_return_inf:
+ !return Inf
+ mov.l .L_inf,r2
+ mov #0,r1
+
+ or r2,r0
+ mov.l @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_underflow:
+ !check if the result needs to be denormalized
+ mov #-53,r1
+ add #1,r2
+
+ cmp/gt r2,r1
+ mov #32,r4
+
+ add #-2,r2
+ bt .L_return_zero
+
+ add r2,r4
+ mov r7,r1
+
+ cmp/ge r7,r4
+ mov r2,r6
+
+ mov #-54,r2
+ bt .L_denorm
+
+ mov #-32,r6
+
+.L_denorm:
+ !denormalize the result
+ shlr r8
+ rotcr r1
+
+ shll r8
+ add #1,r6
+
+ shlr r3
+ rotcr r8
+
+ cmp/eq r7,r6
+ bf .L_denorm
+
+ mov r4,r6
+ cmp/eq r2,r4
+
+ bt .L_break
+ mov r7,r5
+
+ cmp/gt r6,r7
+ bf .L_break
+
+ mov r2,r4
+ mov r1,r5
+
+ mov r7,r1
+ bt .L_denorm
+
+.L_break:
+ mov #0,r2
+
+ cmp/gt r1,r2
+
+ addc r2,r8
+ mov.l .L_comp_1,r4
+
+ addc r7,r3
+ or r3,r0
+
+ cmp/eq r9,r7
+ bf .L_return
+
+ cmp/eq r7,r5
+ mov.l .L_mask_sign,r6
+
+ bf .L_return
+ cmp/eq r1,r6
+
+ bf .L_return
+ and r4,r8
+
+.L_return:
+ mov.l @r15+,r9
+ mov r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ mov.l @r15+,r8
+
+.L_return_zero:
+ mov.l @r15+,r9
+ mov r7,r1
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+
+ rts
+ mov.l @r15+,r8
+
+ .align 2
+
+.L_mask_high_mant:
+ .long 0x000fffff
+.L_inf:
+ .long 0x7ff00000
+.L_mask_sign:
+ .long 0x80000000
+.L_1023:
+ .long -1023
+.L_2047:
+ .long 2047
+.L_imp_bit:
+ .long 0x00100000
+.L_mask_22_bit:
+ .long 0x00200000
+.L_106_bit:
+ .long 0x00000200
+.L_comp_1:
+ .long 0xfffffffe
+
+ENDFUNC (GLOBAL (muldf3))
Index: gcc/config/sh/IEEE-754/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixdfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixdfsi.S (revision 0)
@@ -0,0 +1,195 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to signed integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixdfsi)
+ FUNC (GLOBAL (fixdfsi))
+
+GLOBAL (fixdfsi):
+
+#ifdef __LITTLE_ENDIAN__
+ mov r4,r1
+ mov r5,r4
+ mov r1,r5
+
+#endif
+ mov.l .L_p_inf,r2
+ mov #-20,r1
+
+ mov r2,r7
+ mov.l .L_1023,r3
+
+ and r4,r2
+ shll r4
+
+ movt r6 ! r6 contains the sign bit
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r2 ! r2 contains the exponent
+#else
+ SHLR20 (r2)
+#endif
+ shlr r4
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r7
+#else
+ SHLR20 (r7)
+#endif
+ cmp/hi r2,r3 ! if exp < 1023,return 0
+ mov.l .L_mask_high_mant,r1
+
+ SL(bt, .L_epil,
+ mov #0,r0)
+ and r4,r1 ! r1 contains high mantissa
+
+ cmp/eq r2,r7 ! chk if exp is invalid
+ mov.l .L_1053,r7
+
+ bt .L_inv_exp
+ mov #11,r0
+
+ cmp/hi r7,r2 ! If exp > 1053,return maxint
+ sub r2,r7
+
+ mov.l .L_21bit,r2
+ SL(bt, .L_ret_max,
+ add #1,r7) ! r7 contains the number of shifts
+
+ or r2,r1
+ mov r7,r3
+ shll8 r1
+
+ neg r7,r7
+ shll2 r1
+
+ shll r1
+ cmp/hi r3,r0
+
+ !chk if the result can be made only from higher mantissa
+ SL(bt, .L_lower_mantissa,
+ mov #21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_loop:
+ tst r7,r7
+ bt .L_break1
+ add #1,r7
+ bra .L_loop
+ shlr r1
+
+.L_break1:
+#endif
+ tst r6,r6
+ SL(bt, .L_epil,
+ mov r1,r0)
+
+ rts
+ neg r0,r0
+
+.L_lower_mantissa:
+ !result is made from lower mantissa also
+ neg r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r0,r5
+#else
+ SHLR21 (r5)
+#endif
+
+ or r5,r1 !pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r1
+#else
+.L_sh_loop:
+ tst r7,r7
+ bt .L_break
+ add #1,r7
+ bra .L_sh_loop
+ shlr r1
+
+.L_break:
+#endif
+ mov r1,r0
+ bra .L_chk_sign
+ nop
+
+.L_epil:
+ rts
+ nop
+
+.L_inv_exp:
+ cmp/hi r0,r5
+ bt .L_epil
+
+ cmp/hi r0,r1 !compare high mantissa,r1
+ bt .L_epil
+
+.L_ret_max:
+ mov.l .L_maxint,r0
+ tst r6,r6
+ bt .L_epil
+
+ rts
+ add #1,r0
+
+.L_chk_sign:
+ tst r6,r6 !sign bit is set, number is -ve
+ bt .L_epil
+
+ rts
+ neg r0,r0
+
+ .align 2
+
+.L_maxint:
+ .long 0x7fffffff
+.L_p_inf:
+ .long 0x7ff00000
+.L_mask_high_mant:
+ .long 0x000fffff
+.L_1023:
+ .long 0x000003ff
+.L_1053:
+ .long 1053
+.L_21bit:
+ .long 0x00100000
+
+ENDFUNC (GLOBAL (fixdfsi))
Index: gcc/config/sh/IEEE-754/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/divsf3.S (revision 0)
@@ -0,0 +1,393 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!divides two single precision floating point
+
+! Author: Aanchal Khanna
+
+! Arguments: Dividend is in r4, divisor in r5
+! Result: r0
+
+! r4 and r5 are referred as op1 and op2 resp.
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (divsf3)
+ FUNC (GLOBAL (divsf3))
+
+GLOBAL (divsf3):
+ mov.l .L_mask_sign,r1
+ mov r4,r3
+
+ xor r5,r3
+ shll r4
+
+ shlr r4
+ mov.l .L_inf,r2
+
+ and r3,r1 !r1=resultant sign
+ mov r4,r6
+
+ shll r5
+ mov #0,r0
+
+ shlr r5
+ and r2,r6
+
+ cmp/eq r2,r6
+ mov r5,r7
+
+ and r2,r7
+ bt .L_op1_inv
+
+ cmp/eq r2,r7
+ mov #-23,r3
+
+ bt .L_op2_inv
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+ SHLR23 (r7)
+#else
+ shld r3,r6
+ shld r3,r7
+#endif
+
+ cmp/eq r0,r4
+
+ bt .L_op1_zero !dividend=0
+ cmp/eq r0,r6
+
+ mov.l .L_imp_bit,r3
+ bt .L_norm_op1 !normalize dividend
+.L_chk_op2:
+ cmp/eq r0,r5
+ bt .L_op2_zero !divisor=0
+
+ cmp/eq r0,r7
+ bt .L_norm_op2 !normalize divisor
+
+.L_div1:
+ sub r7,r6
+ add #127,r6 !r6=resultant exponent
+
+ mov r3,r7
+ mov.l .L_mask_mant,r3
+
+ and r3,r4
+ !chk exponent for overflow
+ mov.l .L_255,r2
+
+ and r3,r5
+ or r7,r4
+
+ cmp/ge r2,r6
+ or r7,r5
+
+ bt .L_return_inf
+ mov r0,r2
+
+ cmp/eq r4,r5
+ bf .L_den_one
+
+ cmp/ge r6,r0
+ !numerator=denominator, quotient=1, remainder=0
+ mov r7,r2
+
+ mov r0,r4
+ !chk exponent for underflow
+ bt .L_underflow
+ bra .L_pack
+ nop
+
+.L_den_one:
+ !denominator=1, result=numerator
+
+ cmp/eq r7,r5
+ bf .L_divide
+
+ !chk exponent for underflow
+ cmp/ge r6,r0
+ mov r4,r2
+
+ SL(bt, .L_underflow,
+ mov r0,r4)
+ bra .L_pack
+ nop
+
+.L_divide:
+ !dividing the mantissas r4<-dividend, r5<-divisor
+
+ cmp/hi r4,r5
+ bf .L_loop
+
+ shll r4 ! if mantissa(op1)< mantissa(op2)
+ add #-1,r6 ! shift left the numerator and decrease the exponent.
+
+.L_loop:
+ !division loop
+
+ cmp/ge r5,r4
+ bf .L_skip
+
+ or r7,r2
+ sub r5,r4
+
+.L_skip:
+ shlr r7
+ shll r4
+
+ cmp/eq r0,r7
+ bf .L_loop
+
+ !chk the exponent for underflow
+ cmp/ge r6,r0
+ bt .L_underflow
+
+ !apply rounding
+ cmp/gt r5,r4
+ bt .L_round1
+
+ cmp/eq r4,r5
+ bt .L_round2
+
+.L_pack:
+ !pack the result, r1=sign, r2=quotient, r6=exponent
+
+ mov #23,r4
+ and r3,r2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r4,r6
+#endif
+ or r2,r1
+
+ or r6,r1
+ mov r1,r0
+
+ rts
+ nop
+
+.L_round1:
+ !Apply proper rounding
+
+ bra .L_pack
+ add #1,r2
+
+.L_round2:
+ !Apply proper rounding
+
+ mov.l .L_comp_1,r5
+ bra .L_pack
+ and r5,r2
+
+.L_op1_inv:
+ !chk if op1 is Inf or NaN
+
+ mov.l .L_mask_mant,r3
+ mov r4,r6
+
+ and r3,r6
+ cmp/hi r0,r6
+
+ bt .L_ret_NaN ! op1 is NaN, return NaN.
+ cmp/eq r2,r7
+
+ SL(bf, .L_return,
+ mov r4,r0) ! Inf/finite, return Inf
+
+ ! Inf/Inf or Inf/NaN, return NaN
+.L_ret_NaN:
+ rts
+ mov #-1,r0
+
+.L_op2_inv:
+ !chk if op2 is Inf or NaN
+
+ mov.l .L_mask_mant,r3
+ mov r5,r7
+
+ and r3,r7
+ cmp/hi r0,r7
+
+ bt .L_ret_op2
+ mov r1,r0
+
+ rts
+ nop
+
+.L_op1_zero:
+ !op1 is zero. If op2 is zero, return NaN, else return zero
+
+ cmp/eq r0,r5
+
+ bf .L_return
+
+ rts
+ mov #-1,r0
+
+.L_op2_zero:
+ !B is zero,return Inf
+
+ rts
+ or r2,r0
+
+.L_return_inf:
+ mov.l .L_inf,r0
+
+ rts
+ or r1,r0
+
+.L_norm_op1:
+ !normalize dividend
+
+ shll r4
+ tst r2,r4
+
+ add #-1,r6
+ bt .L_norm_op1
+
+ bra .L_chk_op2
+ add #1,r6
+
+.L_norm_op2:
+ !normalize divisor
+
+ shll r5
+ tst r2,r5
+
+ add #-1,r7
+ bt .L_norm_op2
+
+ bra .L_div1
+ add #1,r7
+
+.L_underflow:
+ !denormalize the result
+
+ add #1,r6
+ mov #-24,r7
+
+ cmp/gt r6,r7
+ mov r2,r5
+
+ bt .L_return_zero
+ add #-1,r6
+
+ mov #32,r3
+ neg r6,r7
+
+ add #1,r7
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r6,r2
+#else
+ cmp/ge r0,r6
+ bf .L_mov_right
+
+.L_mov_left:
+ cmp/eq r0,r6
+ bt .L_out
+
+ shll r2
+ bra .L_mov_left
+ add #-1,r6
+
+.L_mov_right:
+ cmp/eq r0,r6
+ bt .L_out
+
+ add #1,r6
+ bra .L_mov_right
+ shlr r2
+
+.L_out:
+#endif
+ sub r7,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r5
+#else
+ cmp/ge r0,r3
+ bf .L_mov_right_1
+
+.L_mov_left_1:
+ shll r5
+ add #-1,r3
+
+ cmp/eq r0,r3
+ bf .L_mov_left_1
+
+ bt .L_out_1
+
+.L_mov_right_1:
+ cmp/eq r0,r3
+ bt .L_out_1
+
+ add #1,r3
+ bra .L_mov_right_1
+ shlr r5
+
+.L_out_1:
+#endif
+ shlr r2
+ addc r0,r2
+
+ cmp/eq r4,r0 !r4 contains the remainder
+ mov r2,r0
+
+ mov.l .L_mask_sign,r7
+ bf .L_return
+
+ mov.l .L_comp_1,r2
+ cmp/eq r7,r5
+
+ bf .L_return
+ and r2,r0
+
+.L_return:
+.L_return_zero:
+ rts
+ or r1,r0
+
+.L_ret_op2:
+ rts
+ or r5,r0
+
+
+ .align 2
+.L_inf:
+ .long 0x7f800000
+.L_mask_sign:
+ .long 0x80000000
+.L_mask_mant:
+ .long 0x007fffff
+.L_imp_bit:
+ .long 0x00800000
+.L_comp_1:
+ .long 0xfffffffe
+.L_255:
+ .long 255
+
+ENDFUNC (GLOBAL (divsf3))
Index: gcc/config/sh/IEEE-754/fixunssfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunssfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixunssfsi.S (revision 0)
@@ -0,0 +1,150 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion from floating point to unsigned integer
+
+! Author: Rakesh Kumar
+
+! Argument: r4 (in floating point format)
+! Result: r0
+
+! For negative floating point numbers, it returns zero
+
+! The argument is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixunssfsi)
+ FUNC (GLOBAL (fixunssfsi))
+
+GLOBAL (fixunssfsi):
+ mov.l .L_sign,r0
+ mov r4,r2
+
+ ! Check for NaN
+ mov.l .L_inf,r1
+ and r4,r0
+
+ mov.l .L_mask_sign,r7
+ mov #127,r5
+
+ ! Remove sign bit
+ cmp/eq #0,r0
+ and r7,r2
+
+ ! If number is negative, return 0
+ ! LIBGCC deviates from standard in this regard.
+ mov r4,r3
+ SL(bf, .L_epil,
+ mov #0,r0)
+
+ mov.l .L_frac,r6
+ cmp/gt r1,r2
+
+ shll r2
+ SL1(bt, .L_epil,
+ shlr16 r2)
+
+ shlr8 r2 ! r2 has exponent
+ mov.l .L_24bit,r1
+
+ and r6,r3 ! r3 has fraction
+ cmp/gt r2,r5
+
+ ! If exponent is less than 127, return 0
+ or r1,r3
+ bt .L_epil
+
+ ! Process only if exponent is less than 158
+ mov.l .L_158,r1
+ shll8 r3
+
+ cmp/gt r1,r2
+ sub r2,r1
+
+ neg r1,r1
+ bt .L_ret_max
+
+! Shift the mantissa with exponent difference from 158
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r3
+#else
+ cmp/gt r0,r1
+ bt .L_mov_left
+
+.L_mov_right:
+ cmp/eq r1,r0
+ bt .L_ret
+
+ add #1,r1
+ bra .L_mov_right
+ shlr r3
+
+.L_mov_left:
+ add #-1,r1
+
+ shll r3
+ cmp/eq r1,r0
+
+ bf .L_mov_left
+
+.L_ret:
+#endif
+ rts
+ mov r3,r0
+
+! r0 already has appropriate value
+.L_epil:
+ rts
+ nop
+
+! Return the maximum unsigned integer value
+.L_ret_max:
+ mov.l .L_max,r3
+
+ rts
+ mov r3,r0
+
+ .align 2
+.L_inf:
+ .long 0x7F800000
+
+.L_158:
+ .long 158
+
+.L_max:
+ .long 0xFFFFFFFF
+
+.L_frac:
+ .long 0x007FFFFF
+
+.L_sign:
+ .long 0x80000000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_mask_sign:
+ .long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixunssfsi))
Index: gcc/config/sh/IEEE-754/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatunssidf.S (revision 0)
@@ -0,0 +1,71 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of unsigned integer to double precision floating point number
+!Author:Rakesh Kumar
+!Rewritten for SH1 support: Joern Rennecke
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatunsidf)
+ FUNC (GLOBAL (floatunsidf))
+
+GLOBAL (floatunsidf):
+ mov.w LOCAL(x41f0),DBLRH ! bias + 32
+ tst r4,r4 ! check for zero
+ bt .L_ret_zero
+.L_loop:
+ shll r4
+ SL(bf, .L_loop,
+ add #-16,DBLRH)
+
+ mov r4,DBLRL
+
+ SHLL20 (DBLRL)
+
+ shll16 DBLRH ! put exponent in proper place
+
+ SHLR12 (r4)
+
+ rts
+ or r4,DBLRH
+
+.L_ret_zero:
+ mov #0,r1
+ rts
+ mov #0,r0
+
+LOCAL(x41f0): .word 0x41f0
+ .align 2
+
+ENDFUNC (GLOBAL (floatunsidf))
Index: gcc/config/sh/IEEE-754/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/addsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/addsf3.S (revision 0)
@@ -0,0 +1,530 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Add floating point numbers in r4, r5.
+
+! Author: Rakesh Kumar
+
+! Arguments are in r4, r5 and result in r0
+
+! Entry points: ___subsf3, ___addsf3
+
+! r4 and r5 are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (subsf3)
+ .global GLOBAL (addsf3)
+ FUNC (GLOBAL (subsf3))
+ FUNC (GLOBAL (addsf3))
+
+GLOBAL (subsf3):
+ mov.l .L_sign_bit,r1
+ xor r1,r5
+
+GLOBAL (addsf3):
+ mov.l r8,@-r15
+ mov r4,r3
+
+ mov.l .L_pinf,r2
+ mov #0,r8
+
+ and r2,r3 ! op1's exponent.
+ mov r5,r6
+
+ ! Check NaN or Infinity
+ and r2,r6 ! op2's exponent.
+ cmp/eq r2,r3
+
+ ! go if op1 is NaN or INF.
+ mov.l .L_sign_bit,r0
+ SL(bt, .L_inv_op1,
+ mov #-23,r1)
+
+ ! Go if op2 is NaN/INF.
+ cmp/eq r2,r6
+ mov r0,r7
+ bt .L_ret_op2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r3)
+#else
+ shld r1,r3
+#endif
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+#else
+ shld r1,r6
+#endif
+
+ ! Check for negative zero
+ cmp/eq r0,r5
+
+ mov r5,r1
+ SL(bt, .L_ret_op1,
+ and r7,r1)
+
+ cmp/eq r0,r4
+ bt .L_ret_op2
+
+ ! if op1 is zero return op2
+ tst r4,r4
+ bt .L_ret_op2
+
+ ! Equal numbers with opposite sign
+ mov r4,r2
+ xor r5,r2
+
+ cmp/eq r0,r2
+ bt .L_ret_zero
+
+ ! if op2 is zero return op1
+ mov.l .L_mask_fra,r2
+ tst r5,r5
+
+ ! Extract the mantissa
+ mov r4,r0
+ SL(bt, .L_ret_op1,
+ and r2,r5)
+
+ and r2,r4
+
+ mov.l .L_imp_bit,r2
+ and r7,r0 ! sign bit of op1
+
+ ! Check for denormals
+ tst r3,r3
+ bt .L_norm_op1
+
+ ! Attach the implicit bit
+ or r2,r4
+ tst r6,r6
+
+ bt .L_norm_op2
+
+ or r2,r5
+ tst r0,r0
+
+ ! operands are +ve or -ve??
+ bt .L_ptv_op1
+
+ neg r4,r4
+
+.L_ptv_op1:
+ tst r1,r1
+ bt .L_ptv_op2
+
+ neg r5,r5
+
+! Test exponents for equality
+.L_ptv_op2:
+ cmp/eq r3,r6
+ bt .L_exp_eq
+
+! Make exponents of two arguments equal
+.L_exp_ne:
+ ! r0, r1 contain sign bits.
+ ! r4, r5 contain mantissas.
+ ! r3, r6 contain exponents.
+ ! r2, r7 scratch.
+
+ ! Calculate result exponent.
+ mov r6,r2
+ sub r3,r2 ! e2 - e1
+
+ cmp/pl r2
+ mov #23,r7
+
+ ! e2 - e1 is -ve
+ bf .L_exp_ne_1
+
+ mov r6,r3 ! Result exp.
+ cmp/gt r7,r2 ! e2-e1 > 23
+
+ mov #1,r7
+ bt .L_pack_op2_0
+
+ ! Align the mantissa
+.L_loop_ne:
+ shar r4
+
+ rotcr r8
+ cmp/eq r7,r2
+
+ add #-1,r2
+ bf .L_loop_ne
+
+ bt .L_exp_eq
+
+! Exponent difference is too high.
+! Return op2 after placing pieces in proper place
+.L_pack_op2_0:
+ ! If op1 is -ve
+ tst r1,r1
+ bt .L_pack_op2
+
+ neg r5,r5
+
+! r6 has exponent
+! r5 has mantissa, r1 has sign
+.L_pack_op2:
+ mov.l .L_nimp_bit,r2
+ mov #23,r3
+
+ mov r1,r0
+
+ and r2,r5
+ mov.l @r15+,r8
+
+ or r5,r0
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r6)
+#else
+ shld r3,r6
+#endif
+ rts
+ or r6,r0
+
+! return op1. It is NAN or INF or op2 is zero.
+.L_ret_op1:
+ mov r4,r0
+
+ rts
+ mov.l @r15+,r8
+
+! return zero
+.L_ret_zero:
+ mov #0,r0
+
+ rts
+ mov.l @r15+,r8
+
+! return op2. It is NaN or INF or op1 is zero.
+.L_ret_op2:
+ mov r5,r0
+
+ rts
+ mov.l @r15+,r8
+
+! op2 is denormal. Normalize it.
+.L_norm_op2:
+ shll r5
+ add #-1,r6
+
+ tst r2,r5
+ bt .L_norm_op2
+
+ ! Check sign
+ tst r1,r1
+ bt .L_norm_op2_2
+
+ neg r5,r5
+
+.L_norm_op2_2:
+ add #1,r6
+ cmp/eq r3,r6
+
+ bf .L_exp_ne
+ bt .L_exp_eq
+
+! Normalize op1
+.L_norm_op1:
+ shll r4
+ add #-1,r3
+
+ tst r2,r4
+ bt .L_norm_op1
+
+ ! Check sign
+ tst r0,r0
+ bt .L_norm_op1_1
+
+ neg r4,r4
+
+.L_norm_op1_1:
+ ! Adjust biasing
+ add #1,r3
+
+ ! Check op2 for denormalized value
+ tst r6,r6
+ bt .L_norm_op2
+
+ mov.l .L_imp_bit,r2
+
+ tst r1,r1 ! Check sign
+ or r2,r5 ! Attach 24th bit
+
+ bt .L_norm_op1_2
+
+ neg r5,r5
+
+.L_norm_op1_2:
+ cmp/eq r3,r6
+
+ bt .L_exp_eq
+ bf .L_exp_ne
+
+! op1 is NaN or Inf
+.L_inv_op1:
+ ! Return op1 if it is NAN.
+ ! r2 is infinity
+ cmp/gt r2,r4
+ bt .L_ret_op1
+
+ ! op1 is +/- INF
+ ! If op2 is same return now.
+ cmp/eq r4,r5
+ bt .L_ret_op1
+
+ ! return op2 if it is NAN
+ cmp/gt r2,r5
+ bt .L_ret_op2
+
+ ! Check if op2 is inf
+ cmp/eq r2,r6
+ bf .L_ret_op1
+
+ ! Both op1 and op2 are infinities
+ !of opp signs, or there is -NAN. Return a NAN.
+ mov.l @r15+,r8
+ rts
+ mov #-1,r0
+
+! Make unequal exponents equal.
+.L_exp_ne_1:
+ mov #-25,r7
+ cmp/gt r2,r7 ! -23 > e2 - e1
+
+ add #1,r2
+ bf .L_exp_ne_2
+
+ tst r0,r0
+ bt .L_pack_op1
+
+.L_pack_op1_0:
+ bra .L_pack_op1
+ neg r4,r4
+
+! Accumulate the shifted bits in r8
+.L_exp_ne_2:
+ ! Shift with rounding
+ shar r5
+ rotcr r8
+
+ tst r2,r2
+
+ add #1,r2
+ bf .L_exp_ne_2
+
+! Exponents of op1 and op2 are equal (or made so)
+! The mantissas are in r4-r5 and remaining bits in r8
+.L_exp_eq:
+ add r5,r4 ! Add fractions.
+ mov.l .L_sign_bit,r2
+
+ ! Check for negative result
+ mov #0,r0
+ tst r2,r4
+
+ mov.l .L_255,r5
+ bt .L_post_add
+
+ negc r8,r8
+ negc r4,r4
+ or r2,r0
+
+.L_post_add:
+ ! Check for extra MSB
+ mov.l .L_chk_25,r2
+
+ tst r2,r4
+ bt .L_imp_check
+
+ shar r4
+ rotcr r8
+
+ add #1,r3
+ cmp/ge r5,r3
+
+ ! Return Inf if exp > 254
+ bt .L_ret_inf
+
+! Check for implicit (24th) bit in result
+.L_imp_check:
+ mov.l .L_imp_bit,r2
+ tst r2,r4
+
+ bf .L_pack_op1
+
+! Result needs left shift
+.L_lft_shft:
+ shll r8
+ rotcl r4
+
+ add #-1,r3
+ tst r2,r4
+
+ bt .L_lft_shft
+
+! Pack the result after rounding
+.L_pack_op1:
+ ! See if denormalized result is possible
+ mov.l .L_chk_25,r5
+ cmp/pl r3
+
+ bf .L_denorm_res
+
+ ! Are there any bits shifted previously?
+ tst r8,r8
+ bt .L_pack_1
+
+ ! Round
+ shll r8
+ movt r6
+
+ add r6,r4
+
+ ! If we are halfway between two numbers,
+ ! round towards LSB = 0
+ tst r8,r8
+
+ bf .L_pack_1
+
+ shlr r4
+ shll r4
+
+.L_pack_1:
+ ! Adjust extra MSB generated after rounding
+ tst r4,r5
+ mov.l .L_255,r2
+
+ bt .L_pack_2
+ shar r4
+
+ add #1,r3
+ cmp/ge r2,r3 ! Check for exp overflow
+
+ bt .L_ret_inf
+
+! Pack it finally
+.L_pack_2:
+ ! Do not store implicit bit
+ mov.l .L_nimp_bit,r2
+ mov #23,r1
+
+ and r2,r4
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r3)
+#else
+ shld r1,r3
+#endif
+ mov.l @r15+,r8
+
+ or r4,r0
+ rts
+ or r3,r0
+
+! Return infinity
+.L_ret_inf:
+ mov.l .L_pinf,r2
+
+ mov.l @r15+,r8
+ rts
+ or r2,r0
+
+! Result must be denormalized
+.L_denorm_res:
+ mov #0,r2
+
+! Denormalizing loop with rounding
+.L_den_1:
+ shar r4
+ movt r6
+
+ tst r3,r3
+ bt .L_den_2
+
+ ! Increment the exponent
+ add #1,r3
+
+ tst r6,r6
+ bt .L_den_0
+
+ ! Count number of ON bits shifted
+ add #1,r2
+
+.L_den_0:
+ bra .L_den_1
+ nop
+
+! Apply rounding
+.L_den_2:
+ cmp/eq r6,r1
+ bf .L_den_3
+
+ add r6,r4
+ mov #1,r1
+
+ ! If halfway between two numbers,
+ ! round towards LSB = 0
+ cmp/eq r2,r1
+ bf .L_den_3
+
+ shar r4
+ shll r4
+
+.L_den_3:
+
+ mov.l @r15+,r8
+ rts
+ or r4,r0
+
+ .align 2
+.L_imp_bit:
+ .long 0x00800000
+
+.L_nimp_bit:
+ .long 0xFF7FFFFF
+
+.L_mask_fra:
+ .long 0x007FFFFF
+
+.L_pinf:
+ .long 0x7F800000
+
+.L_sign_bit:
+ .long 0x80000000
+
+.L_bit_25:
+ .long 0x01000000
+
+.L_chk_25:
+ .long 0x7F000000
+
+.L_255:
+ .long 0x000000FF
+
+ENDFUNC (GLOBAL (addsf3))
+ENDFUNC (GLOBAL (subsf3))
Index: gcc/config/sh/IEEE-754/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/mulsf3.S (revision 0)
+++ gcc/config/sh/IEEE-754/mulsf3.S (revision 0)
@@ -0,0 +1,347 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for multiplying two floating point numbers
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 and r5
+! Result: r0
+
+! The arguments are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (mulsf3)
+ FUNC (GLOBAL (mulsf3))
+
+GLOBAL (mulsf3):
+ ! Extract the sign bits
+ mov.l .L_sign,r3
+ mov r3,r0
+
+ and r4,r3 ! sign bit for op1
+ mov.l .L_sign_mask,r6
+
+ ! Mask out the sign bit from op1 and op2
+ and r5,r0 ! sign bit for op2
+ mov.l .L_inf,r2
+
+ and r6,r4
+ xor r3,r0 ! Final sign in r0
+
+ and r6,r5
+ tst r4,r4
+
+ ! Check for zero
+ mov r5,r7
+ ! Check op1 for zero
+ SL(bt, .L_op1_zero,
+ mov r4,r6)
+
+ tst r5,r5
+ bt .L_op2_zero ! op2 is zero
+
+ ! Extract the exponents
+ and r2,r6 ! Exponent of op1
+ cmp/eq r2,r6
+
+ and r2,r7
+ bt .L_inv_op1 ! op1 is NaN or Inf
+
+ mov.l .L_mant,r3
+ cmp/eq r2,r7
+
+ and r3,r4 ! Mantissa of op1
+ bt .L_ret_op2 ! op2 is Nan or Inf
+
+ and r3,r5 ! Mantissa of op2
+
+ mov #-23,r3
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLR23 (r6)
+ SHLR23 (r7)
+#else
+ shld r3,r6
+ shld r3,r7
+#endif
+ ! Check for denormals
+ mov.l .L_24bit,r3
+ tst r6,r6
+
+ bt .L_norm_op1 ! op1 is denormal
+ add #-127,r6 ! Unbias op1's exp
+
+ tst r7,r7
+ bt .L_norm_op2 ! op2 is denormal
+
+ add #-127,r7 ! Unbias op2's exp
+
+.L_multiply:
+ add r6,r7 ! Final exponent in r7
+ mov.l .L_24bit,r1
+
+ ! set 24th bit of mantissas
+ mov #127,r3
+ or r1,r4
+
+ DMULU_SAVE
+
+ ! Multiply
+ or r1,r5
+ DMULUL (r4,r5,r4)
+
+ DMULUH (r5)
+
+ DMULU_RESTORE
+
+ mov.l .L_16bit,r6
+
+ ! Check for extra MSB generated
+ tst r5,r6
+
+ mov.l .L_255,r1
+ bf .L_shift_by_1 ! Adjust the extra MSB
+
+! Normalize the result with rounding
+.L_epil:
+ ! Bias the exponent
+ add #127,r7
+ cmp/ge r1,r7
+
+ ! Check exponent overflow and underflow
+ bt .L_ret_inf
+
+ cmp/pl r7
+ bf .L_denorm
+
+.L_epil_0:
+ mov #-23,r3
+ shll r5
+ mov #0,r6
+
+! Fit resultant mantissa in 24 bits
+! Apply default rounding
+.L_loop_epil_0:
+ tst r3,r3
+ bt .L_loop_epil_out
+
+ add #1,r3
+ shlr r4
+
+ bra .L_loop_epil_0
+ rotcr r6
+
+! Round mantissa
+.L_loop_epil_out:
+ shll8 r5
+ or r5,r4
+
+ mov.l .L_mant,r2
+ mov #23,r3
+
+ ! Check last bit shifted out of result
+ tst r6,r6
+ bt .L_epil_2
+
+ ! Round
+ shll r6
+ movt r5
+
+ add r5,r4
+
+ ! If this is the only ON bit shifted
+ ! Round towards LSB = 0
+ tst r6,r6
+ bf .L_epil_2
+
+ shlr r4
+ shll r4
+
+.L_epil_2:
+ ! Rounding may have produced extra MSB.
+ mov.l .L_25bit,r5
+ tst r4,r5
+
+ bt .L_epil_1
+
+ add #1,r7
+ shlr r4
+
+.L_epil_1:
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+ SHLL23 (r7)
+#else
+ shld r3,r7
+#endif
+
+ and r2,r4
+
+ or r7,r4
+ rts
+ or r4,r0
+
+.L_denorm:
+ mov #0,r3
+
+.L_den_1:
+ shlr r5
+ rotcr r4
+
+ cmp/eq r3,r7
+ bt .L_epil_0
+
+ bra .L_den_1
+ add #1,r7
+
+
+! Normalize the first argument
+.L_norm_op1:
+ shll r4
+ tst r3,r4
+
+ add #-1,r6
+ bt .L_norm_op1
+
+ ! The biasing is by 126
+ add #-126,r6
+ tst r7,r7
+
+ bt .L_norm_op2
+
+ bra .L_multiply
+ add #-127,r7
+
+! Normalize the second argument
+.L_norm_op2:
+ shll r5
+ tst r3,r5
+
+ add #-1,r7
+ bt .L_norm_op2
+
+ bra .L_multiply
+ add #-126,r7
+
+! op2 is zero. Check op1 for exceptional cases
+.L_op2_zero:
+ mov.l .L_inf,r2
+ and r2,r6
+
+ ! Check if op1 is deterministic
+ cmp/eq r2,r6
+ SL(bf, .L_ret_op2,
+ mov #1,r1)
+
+ ! Return NaN
+ rts
+ mov #-1,r0
+
+! Adjust the extra MSB
+.L_shift_by_1:
+ shlr r5
+ rotcr r4
+
+ add #1,r7 ! Show the shift in exponent
+
+ cmp/gt r3,r7
+ bf .L_epil
+
+ ! The resultant exponent is invalid
+ mov.l .L_inf,r1
+ rts
+ or r1,r0
+
+.L_ret_op1:
+ rts
+ or r4,r0
+
+! op1 is zero. Check op2 for exceptional cases
+.L_op1_zero:
+ mov.l .L_inf,r2
+ and r2,r7
+
+ ! Check if op2 is deterministic
+ cmp/eq r2,r7
+ SL(bf, .L_ret_op1,
+ mov #1,r1)
+
+ ! Return NaN
+ rts
+ mov #-1,r0
+
+.L_inv_op1:
+ mov.l .L_mant,r3
+ mov r4,r6
+
+ and r3,r6
+ tst r6,r6
+
+ bf .L_ret_op1 ! op1 is Nan
+ ! op1 is not Nan. It is Inf
+
+ cmp/eq r2,r7
+ bf .L_ret_op1 ! op2 has a valid exponent
+
+! op2 has a invalid exponent. It could be Inf, -Inf, Nan.
+! It doesn't make any difference.
+.L_ret_op2:
+ rts
+ or r5,r0
+
+.L_ret_inf:
+ rts
+ or r2,r0
+
+.L_ret_zero:
+ mov #0,r2
+ rts
+ or r2,r0
+
+
+ .align 2
+.L_mant:
+ .long 0x007FFFFF
+
+.L_inf:
+ .long 0x7F800000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_25bit:
+ .long 0x01000000
+
+.L_16bit:
+ .long 0x00008000
+
+.L_sign:
+ .long 0x80000000
+
+.L_sign_mask:
+ .long 0x7FFFFFFF
+
+.L_255:
+ .long 0x000000FF
+
+ENDFUNC (GLOBAL (mulsf3))
Index: gcc/config/sh/IEEE-754/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsidf.S (revision 0)
+++ gcc/config/sh/IEEE-754/floatsidf.S (revision 0)
@@ -0,0 +1,146 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of signed integer to double precision floating point number
+!Author:Rakesh Kumar
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (floatsidf)
+ FUNC (GLOBAL (floatsidf))
+
+GLOBAL (floatsidf):
+ mov.l .L_sign,r0
+ mov #0,r1
+
+ mov r0,r2
+ tst r4,r4 ! check r4 for zero
+
+ ! Extract the sign
+ mov r2,r3
+ SL(bt, .L_ret_zero,
+ and r4,r0)
+
+ cmp/eq r1,r0
+ not r3,r3
+
+ mov r1,r7
+ SL(bt, .L_loop,
+ and r4,r3)
+
+ ! Treat -2147483648 as special case
+ cmp/eq r1,r3
+ neg r4,r4
+
+ bt .L_ret_min
+
+.L_loop:
+ shll r4
+ mov r4,r5
+
+ and r2,r5
+ cmp/eq r1,r5
+
+ add #1,r7
+ bt .L_loop
+
+ mov.l .L_initial_exp,r6
+ not r2,r2
+
+ and r2,r4
+ mov #21,r3
+
+ sub r7,r6
+ mov r4,r1
+
+ mov #20,r7
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r3,r1
+#else
+ SHLL21 (r1)
+#endif
+ mov #-11,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r7,r6 ! Exponent in proper place
+#else
+ SHLL20 (r6)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r2,r4
+#else
+ SHLR11 (r4)
+#endif
+ or r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+#ifdef __LITTLE_ENDIAN__
+ or r4,r1
+#else
+ or r4,r0
+#endif
+
+.L_ret_zero:
+ rts
+ mov #0,r0
+
+.L_ret_min:
+ mov.l .L_min,r0
+
+#ifdef __LITTLE_ENDIAN__
+ mov r0,r2
+ mov r1,r0
+ mov r2,r1
+#endif
+ rts
+ nop
+
+ .align 2
+
+.L_initial_exp:
+ .long 0x0000041E
+
+.L_sign:
+ .long 0x80000000
+
+.L_min:
+ .long 0xC1E00000
+
+ENDFUNC (GLOBAL (floatsidf))
Index: gcc/config/sh/IEEE-754/fixsfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixsfsi.S (revision 0)
+++ gcc/config/sh/IEEE-754/fixsfsi.S (revision 0)
@@ -0,0 +1,160 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion routine for float to integer
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 (in floating point format)
+! Return: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+ .text
+ .align 5
+ .global GLOBAL (fixsfsi)
+ FUNC (GLOBAL (fixsfsi))
+
+GLOBAL (fixsfsi):
+ mov.l .L_mask_sign,r7
+ mov r4,r2
+
+ ! Check for NaN
+ mov.l .L_inf,r1
+ and r7,r2
+
+ cmp/gt r1,r2
+ mov #127,r5
+
+ mov r4,r3
+ SL(bt, .L_epil,
+ mov #0,r0)
+
+ shll r2
+ mov.l .L_frac,r6
+
+ shlr16 r2
+ and r6,r3 ! r3 has fraction
+
+ shlr8 r2 ! r2 has exponent
+ mov.l .L_24bit,r1
+
+ ! If exponent is less than 127, return 0
+ cmp/gt r2,r5
+ or r1,r3 ! Set the implicit bit
+
+ mov.l .L_157,r1
+ SL1(bt, .L_epil,
+ shll8 r3)
+
+ ! If exponent is greater than 157,
+ ! return the maximum/minumum integer
+ ! value deducing from sign
+ cmp/gt r1,r2
+ sub r2,r1
+
+ mov.l .L_sign,r2
+ SL(bt, .L_ret_max,
+ add #1,r1)
+
+ and r4,r2 ! Sign in r2
+ neg r1,r1
+
+ ! Shift mantissa by exponent difference from 157
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+ shld r1,r3
+#else
+ cmp/gt r0,r1
+ bt .L_mov_left
+
+.L_mov_right:
+ cmp/eq r1,r0
+ bt .L_ret
+
+ add #1,r1
+ bra .L_mov_right
+
+ shlr r3
+
+.L_mov_left:
+ add #-1,r1
+
+ shll r3
+ cmp/eq r1,r0
+
+ bf .L_mov_left
+.L_ret:
+#endif
+ ! If op1 is negative, negate the result
+ cmp/eq r0,r2
+ SL(bf, .L_negate,
+ mov r3,r0)
+
+! r0 has the appropriate value
+.L_epil:
+ rts
+ nop
+
+! Return the max/min integer value
+.L_ret_max:
+ and r4,r2 ! Sign in r2
+ mov.l .L_max,r3
+
+ mov.l .L_sign,r1
+ cmp/eq r0,r2
+
+ mov r3,r0
+ bt .L_epil
+
+ ! Negative number, return min int
+ rts
+ mov r1,r0
+
+! Negate the result
+.L_negate:
+ rts
+ neg r0,r0
+
+ .align 2
+.L_inf:
+ .long 0x7F800000
+
+.L_157:
+ .long 157
+
+.L_max:
+ .long 0x7FFFFFFF
+
+.L_frac:
+ .long 0x007FFFFF
+
+.L_sign:
+ .long 0x80000000
+
+.L_24bit:
+ .long 0x00800000
+
+.L_mask_sign:
+ .long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixsfsi))
Index: gcc/config/sh/ieee-754-sf.S
===================================================================
--- gcc/config/sh/ieee-754-sf.S (revision 0)
+++ gcc/config/sh/ieee-754-sf.S (revision 0)
@@ -0,0 +1,692 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_ANY__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Single-precision floating-point emulation.
+ We handle NANs, +-infinity, and +-zero.
+ However, we assume that for NANs, the topmost bit of the fraction is set. */
+#ifdef L_nesf2
+/* -ffinite-math-only inline version, T := r4:SF == r5:SF
+ cmp/eq r4,r5
+ mov r4,r0
+ bt 0f
+ or r5,r0
+ add r0,r0
+ tst r0,r0 ! test for +0.0 == -0.0 ; -0.0 == +0.0
+ 0: */
+ .balign 4
+ .global GLOBAL(nesf2)
+ HIDDEN_FUNC(GLOBAL(nesf2))
+GLOBAL(nesf2):
+ /* If the raw values are unequal, the result is unequal, unless
+ both values are +-zero.
+ If the raw values are equal, the result is equal, unless
+ the values are NaN. */
+ cmp/eq r4,r5
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ not r4,r0
+ bt LOCAL(check_nan)
+ mov r4,r0
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(check_nan):
+ tst r1,r0
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(nesf2))
+#endif /* L_nesf2 */
+
+#ifdef L_unordsf2
+ .balign 4
+ .global GLOBAL(unordsf2)
+ HIDDEN_FUNC(GLOBAL(unordsf2))
+GLOBAL(unordsf2):
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ not r4,r0
+ tst r1,r0
+ not r5,r0
+ bt LOCAL(unord)
+ tst r1,r0
+LOCAL(unord):
+ rts
+ movt r0
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(unordsf2))
+#endif /* L_unordsf2 */
+
+#if defined(L_gtsf2t) || defined(L_gtsf2t_trap)
+/* -ffinite-math-only inline version, T := r4:SF > r5:SF ? 0 : 1
+ cmp/pz r4
+ mov r4,r0
+ bf/s 0f
+ cmp/hs r5,r4
+ cmp/ge r4,r5
+ or r5,r0
+ bt 0f
+ add r0,r0
+ tst r0,r0
+ 0: */
+#ifdef L_gtsf2t
+#define fun_label GLOBAL(gtsf2t)
+#else
+#define fun_label GLOBAL(gtsf2t_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater, the result true, unless
+ any of them is a nan (but infinity is fine), or both values are
+ +- zero. Otherwise, the result false. */
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ cmp/pz r4
+ not r5,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov r4,r0
+ bt LOCAL(nan)
+ cmp/gt r5,r4
+ SLC(bf, LOCAL(check_nan),
+ cmp/gt r4,r1)
+ bf LOCAL(nan)
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ not r4,r0
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi r4,r5
+#if defined(L_gtsf2t) && defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+#endif /* DELAYED_BRANCHES */
+ rts
+ movt r0
+#ifdef L_gtsf2t
+LOCAL(check_nan):
+LOCAL(nan):
+ rts
+ mov #0,r0
+#else /* ! L_gtsf2t */
+LOCAL(check_nan):
+ SLI(cmp/gt r4,r1)
+ bf LOCAL(nan)
+ rts
+ movt r0
+LOCAL(nan):
+ mov #0,r0
+ trapa #0
+#endif /* ! L_gtsf2t */
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(fun_label)
+#endif /* L_gtsf2t */
+
+#if defined(L_gesf2f) || defined(L_gesf2f_trap)
+/* -ffinite-math-only inline version, T := r4:SF >= r5:SF */
+ cmp/pz r5
+ mov r4,r0
+ bf/s 0f
+ cmp/hs r4,r5
+ cmp/ge r5,r4
+ or r5,r0
+ bt 0f
+ add r0,r0
+ tst r0,r0
+ 0:
+#ifdef L_gesf2f
+#define fun_label GLOBAL(gesf2f)
+#else
+#define fun_label GLOBAL(gesf2f_trap)
+#endif
+ .balign 4
+ .global fun_label
+ HIDDEN_FUNC(fun_label)
+fun_label:
+ /* If the raw values compare greater or equal, the result is
+ true, unless any of them is a nan. If both are -+zero, the
+ result is true; otherwise, it is false.
+ We use 0 as true and nonzero as false for this function. */
+ mov.l LOCAL(c_SF_NAN_MASK),r1
+ cmp/pz r5
+ not r4,r0
+ SLC(bf, LOCAL(neg),
+ tst r1,r0)
+ mov r4,r0
+ bt LOCAL(nan)
+ cmp/gt r4,r5
+ SLC(bf, LOCAL(check_nan),
+ cmp/ge r1,r5)
+ bt LOCAL(nan)
+ or r5,r0
+ rts
+ add r0,r0
+LOCAL(neg):
+ SLI(tst r1,r0)
+ bt LOCAL(nan)
+ not r5,r0
+ tst r1,r0
+ bt LOCAL(nan)
+ cmp/hi r5,r4
+#if defined(L_gesf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+ rts
+ movt r0
+#if defined(L_gesf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+ cmp/ge r1,r5
+LOCAL(nan):
+ rts
+ movt r0
+#endif /* ! DELAYED_BRANCHES */
+#ifdef L_gesf2f_trap
+LOCAL(check_nan):
+ SLI(cmp/ge r1,r5)
+ bt LOCAL(nan)
+ rts
+LOCAL(nan):
+ movt r0
+ trapa #0
+#endif /* L_gesf2f_trap */
+ .balign 4
+LOCAL(c_SF_NAN_MASK):
+ .long SF_NAN_MASK
+ ENDFUNC(GLOBAL(gesf2f))
+#endif /* L_gesf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_add_sub_sf3
+#include "IEEE-754/addsf3.S"
+#endif /* _add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+#include "IEEE-754/fixunssfsi.S"
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+#include "IEEE-754/fixsfsi.S"
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/divsf3.S"
+#endif /* L_divsf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift. Supporting SH1 / SH2 here would
+ make this code too hard to maintain, so if you want to add SH1 / SH2
+ support, do it in a separate copy. */
+#ifdef DYN_SHIFT
+#ifdef L_add_sub_sf3
+#include "IEEE-754/m3/addsf3.S"
+#endif /* L_add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/m3/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get UINT_MAX, for set sign bit, you get 0.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixunssfsi)
+ FUNC(GLOBAL(fixunssfsi))
+GLOBAL(fixunssfsi):
+ mov.l LOCAL(max),r2
+ mov #-23,r1
+ mov r4,r0
+ shad r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/ge r2,r0
+ or r2,r0
+ bt LOCAL(retmax)
+ cmp/pz r4
+ and r1,r0
+ bf LOCAL(ret0)
+ add #-23,r4
+ rts
+ shld r4,r0
+LOCAL(ret0):
+LOCAL(retmax):
+ rts
+ subc r0,r0
+ .balign 4
+LOCAL(mask):
+ .long 0x00ffffff
+LOCAL(max):
+ .long 0x4f800000
+ ENDFUNC(GLOBAL(fixunssfsi))
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+ ! What is a bit unusal about this implementation is that the
+ ! sign bit influences the result for NANs: for cleared sign bit, you
+ ! get INT_MAX, for set sign bit, you get INT_MIN.
+ ! However, since the result for NANs is undefined, this should be no
+ ! problem.
+ ! N.B. This is scheduled both for SH4-200 and SH4-300
+ .balign 4
+ .global GLOBAL(fixsfsi)
+ FUNC(GLOBAL(fixsfsi))
+ .balign 4
+GLOBAL(fixsfsi):
+ mov r4,r0
+ shll r4
+ mov #-24,r1
+ bt LOCAL(neg)
+ mov.l LOCAL(max),r2
+ shld r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/pz r4
+ add #-23,r4
+ bf LOCAL(ret0)
+ cmp/gt r0,r2
+ bf LOCAL(retmax)
+ and r1,r0
+ addc r1,r0
+ rts
+ shld r4,r0
+
+ .balign 4
+LOCAL(neg):
+ mov.l LOCAL(min),r2
+ shld r1,r4
+ mov.l LOCAL(mask),r1
+ add #-127,r4
+ cmp/pz r4
+ add #-23,r4
+ bf LOCAL(ret0)
+ cmp/gt r0,r2
+ bf LOCAL(retmin)
+ and r1,r0
+ addc r1,r0
+ shld r4,r0 ! SH4-200 will start this insn on a new cycle
+ rts
+ neg r0,r0
+
+ .balign 4
+LOCAL(ret0):
+ rts
+ mov #0,r0
+
+LOCAL(retmax):
+ mov #-1,r0
+ rts
+ shlr r0
+
+LOCAL(retmin):
+ mov #1,r0
+ rts
+ rotr r0
+
+ .balign 4
+LOCAL(mask):
+ .long 0x007fffff
+LOCAL(max):
+ .long 0x4f000000
+LOCAL(min):
+ .long 0xcf000000
+ ENDFUNC(GLOBAL(fixsfsi))
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/m3/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/m3/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/m3/divsf3.S"
+#endif /* L_divsf3 */
+
+#ifdef L_hypotf
+ .balign 4
+ .global GLOBAL(hypotf)
+ FUNC(GLOBAL(hypotf))
+GLOBAL(hypotf):
+/* This integer implementation takes 71 to 72 cycles in the main path.
+ This is a bit slower than the SH4 can do this computation using double
+ precision hardware floating point - 57 cycles, or 69 with mode switches. */
+ /* First, calculate x (r4) as the sum of the square of the fractions -
+ the exponent is calculated separately in r3.
+ Then, calculate sqrt(x) for the fraction by reciproot iteration.
+ We get an 7.5 bit inital value using linear approximation with two slopes
+ that are powers of two.
+ x (- [1. .. 2.) y0 := 1.25 - x/4 - tab(x) y (- (0.8 .. 1.0)
+ x (- [2. .. 4.) y0 := 1. - x/8 - tab(x) y (- (0.5 .. 0.8)
+ x is represented with two bits before the point,
+ y with 0 bits before the binary point.
+ Thus, to calculate y0 := 1. - x/8 - tab(x), all you have to do is to shift x
+ right by 1, negate it, and subtract tab(x). */
+
+ /* y1 := 1.5*y0 - 0.5 * (x * y0) * (y0 * y0)
+ z0 := x * y1
+ z1 := z0 + 0.5 * (y1 - (y1*y1) * z0) */
+
+ mov.l LOCAL(xff000000),r1
+ add r4,r4
+ mov r4,r0
+ add r5,r5
+ cmp/hs r5,r4
+ sub r5,r0
+ mov #-24,r2
+ bf/s LOCAL(r5_large)
+ shad r2,r0
+ mov r4,r3
+ shll8 r4
+ rotcr r4
+ tst #0xe0,r0
+ neg r0,r0
+ bt LOCAL(ret_abs_r3)
+ tst r1,r5
+ shll8 r5
+ bt/s LOCAL(denorm_r5)
+ cmp/hi r3,r1
+ dmulu.l r4,r4
+ bf LOCAL(inf_nan)
+ rotcr r5
+ shld r0,r5
+LOCAL(denorm_r5_done):
+ sts mach,r4
+ dmulu.l r5,r5
+ mov.l r6,@-r15
+ mov #20,r6
+
+ sts mach,r5
+LOCAL(add_frac):
+ mova LOCAL(tab)-32,r0
+ mov.l r7,@-r15
+ mov.w LOCAL(x1380),r7
+ and r1,r3
+ addc r5,r4
+ mov.w LOCAL(m25),r2 ! -25
+ bf LOCAL(frac_ok)
+ sub r1,r3
+ rotcr r4
+ cmp/eq r1,r3 ! did we generate infinity ?
+ bt LOCAL(inf_nan)
+ shlr r4
+ mov r4,r1
+ shld r2,r1
+ mov.b @(r0,r1),r0
+ mov r4,r1
+ shld r6,r1
+ bra LOCAL(frac_low2)
+ sub r1,r7
+
+LOCAL(frac_ok):
+ mov r4,r1
+ shld r2,r1
+ mov.b @(r0,r1),r1
+ cmp/pz r4
+ mov r4,r0
+ bt/s LOCAL(frac_low)
+ shld r6,r0
+ mov.w LOCAL(xf80),r7
+ shlr r0
+LOCAL(frac_low):
+ sub r0,r7
+LOCAL(frac_low2):
+ mov.l LOCAL(x40000080),r0 ! avoid denorm results near 1. << r3
+ sub r1,r7 ! {0.12}
+ mov.l LOCAL(xfffe0000),r5 ! avoid rounding overflow near 4. << r3
+ swap.w r7,r1 ! {0.28}
+ dmulu.l r1,r4 /* two issue cycles */
+ mulu.w r7,r7 /* two issue cycles */
+ sts mach,r2 ! {0.26}
+ mov r1,r7
+ shlr r1
+ sts macl,r6 ! {0.24}
+ cmp/hi r0,r4
+ shlr2 r2
+ bf LOCAL(near_one)
+ shlr r2 ! {0.23} systemic error of linear approximation keeps y1 < 1
+ dmulu.l r2,r6
+ cmp/hs r5,r4
+ add r7,r1 ! {1.28}
+ bt LOCAL(near_four)
+ shlr2 r1 ! {1.26}
+ sts mach,r0 ! {0.15} x*y0^3 == {0.16} 0.5*x*y0^3
+ shlr2 r1 ! {1.24}
+ shlr8 r1 ! {1.16}
+ sett ! compensate for truncation of subtrahend, keep y1 < 1
+ subc r0,r1 ! {0.16} y1; max error about 3.5 ulp
+ swap.w r1,r0
+ dmulu.l r0,r4 ! { 1.30 }
+ mulu.w r1,r1
+ sts mach,r2
+ shlr2 r0
+ sts macl,r1
+ add r2,r0
+ mov.l LOCAL(xff000000),r6
+ add r2,r0
+ dmulu.l r1,r2
+ add #127,r0
+ add r6,r3 ! precompensation for adding leading 1
+ sts mach,r1
+ shlr r3
+ mov.l @r15+,r7
+ sub r1,r0 ! {0.31} max error about 50 ulp (+127)
+ mov.l @r15+,r6
+ shlr8 r0 ! {0.23} max error about 0.7 ulp
+ rts
+ add r3,r0
+
+LOCAL(r5_large):
+ mov r5,r3
+ mov #-31,r2
+ cmp/ge r2,r0
+ shll8 r5
+ bf LOCAL(ret_abs_r3)
+ rotcr r5
+ tst r1,r4
+ shll8 r4
+ bt/s LOCAL(denorm_r4)
+ cmp/hi r3,r1
+ dmulu.l r5,r5
+ bf LOCAL(inf_nan)
+ rotcr r4
+LOCAL(denorm_r4_done):
+ shld r0,r4
+ sts mach,r5
+ dmulu.l r4,r4
+ mov.l r6,@-r15
+ mov #20,r6
+ bra LOCAL(add_frac)
+ sts mach,r4
+
+LOCAL(near_one):
+ bra LOCAL(assemble_sqrt)
+ mov #0,r0
+LOCAL(near_four):
+ ! exact round-to-nearest would add 255. We add 256 for speed & compactness.
+ mov r4,r0
+ shlr8 r0
+ add #1,r0
+ tst r0,r0
+ addc r0,r3 ! might generate infinity.
+LOCAL(assemble_sqrt):
+ mov.l @r15+,r7
+ shlr r3
+ mov.l @r15+,r6
+ rts
+ add r3,r0
+LOCAL(inf_nan):
+LOCAL(ret_abs_r3):
+ mov r3,r0
+ rts
+ shlr r0
+LOCAL(denorm_r5):
+ bf LOCAL(inf_nan)
+ tst r1,r4
+ bt LOCAL(denorm_both)
+ dmulu.l r4,r4
+ bra LOCAL(denorm_r5_done)
+ shld r0,r5
+LOCAL(denorm_r4):
+ bf LOCAL(inf_nan)
+ tst r1,r5
+ dmulu.l r5,r5
+ bf LOCAL(denorm_r4_done)
+LOCAL(denorm_both): ! normalize according to r3.
+ extu.w r3,r2
+ mov.l LOCAL(c__clz_tab),r0
+ cmp/eq r3,r2
+ mov #-8,r2
+ bt 0f
+ tst r1,r3
+ mov #-16,r2
+ bt 0f
+ mov #-24,r2
+0:
+ shld r2,r3
+ mov.l r7,@-r15
+#ifdef __pic__
+ add r0,r3
+ mova LOCAL(c__clz_tab),r0
+#endif
+ mov.b @(r0,r3),r0
+ add #32,r2
+ sub r0,r2
+ shld r2,r4
+ mov r2,r7
+ dmulu.l r4,r4
+ sts.l pr,@-r15
+ mov #1,r3
+ bsr LOCAL(denorm_r5_done)
+ shld r2,r5
+ mov.l LOCAL(x01000000),r1
+ neg r7,r2
+ lds.l @r15+,pr
+ tst r1,r0
+ mov.l @r15+,r7
+ bt 0f
+ add #1,r2
+ sub r1,r0
+0:
+ rts
+ shld r2,r0
+
+LOCAL(m25):
+ .word -25
+LOCAL(x1380):
+ .word 0x1380
+LOCAL(xf80):
+ .word 0xf80
+ .balign 4
+LOCAL(xff000000):
+ .long 0xff000000
+LOCAL(x40000080):
+ .long 0x40000080
+LOCAL(xfffe0000):
+ .long 0xfffe0000
+LOCAL(x01000000):
+ .long 0x01000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+ .long GLOBAL(clz_tab) - .
+#else
+ .long GLOBAL(clz_tab)
+#endif
+
+/*
+double err(double x)
+{
+ return (x < 2. ? 1.25 - x/4. : 1. - x/8.) - 1./sqrt(x);
+}
+
+int
+main ()
+{
+ int i = 0;
+ double x, s, v;
+ double lx, hx;
+
+ s = 1./32.;
+ for (x = 1.; x < 4; x += s, i++)
+ {
+ lx = x;
+ hx = x + s - 1. / (1 << 30);
+ v = 0.5 * (err (lx) + err (hx));
+ printf ("%s% 4d%c",
+ (i & 7) == 0 ? "\t.byte\t" : "",
+ (int)(v * 4096 + 0.5) - 128,
+ (i & 7) == 7 ? '\n' : ',');
+ }
+ return 0;
+} */
+
+ .balign 4
+LOCAL(tab):
+ .byte -113, -84, -57, -33, -11, 8, 26, 41
+ .byte 55, 67, 78, 87, 94, 101, 106, 110
+ .byte 113, 115, 115, 115, 114, 112, 109, 106
+ .byte 101, 96, 91, 84, 77, 69, 61, 52
+ .byte 51, 57, 63, 68, 72, 77, 80, 84
+ .byte 87, 89, 91, 93, 95, 96, 97, 97
+ .byte 97, 97, 97, 96, 95, 94, 93, 91
+ .byte 89, 87, 84, 82, 79, 76, 72, 69
+ .byte 65, 61, 57, 53, 49, 44, 39, 34
+ .byte 29, 24, 19, 13, 8, 2, -4, -10
+ .byte -17, -23, -29, -36, -43, -50, -57, -64
+ .byte -71, -78, -85, -93,-101,-108,-116,-124
+ ENDFUNC(GLOBAL(hypotf))
+#endif /* L_hypotf */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_ANY__ */
Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md (revision 162269)
+++ gcc/config/sh/sh.md (working copy)
@@ -107,6 +107,7 @@ (define_constants [
(DR0_REG 64)
(DR2_REG 66)
(DR4_REG 68)
+ (FR4_REG 68)
(FR23_REG 87)
(TR0_REG 128)
@@ -174,6 +175,16 @@ (define_constants [
(UNSPECV_WINDOW_END 10)
(UNSPECV_CONST_END 11)
(UNSPECV_EH_RETURN 12)
+
+ ;; NaN handling for software floating point:
+ ;; We require one bit specific for a precision to be set in all NaNs,
+ ;; so that we can test them with a not / tst sequence.
+ ;; ??? Ironically, this is the quiet bit for now, because that is the
+ ;; only bit set by __builtin_nan ("").
+ ;; ??? Should really use one bit lower and force it set by using
+ ;; a custom encoding function.
+ (SF_NAN_MASK 0x7fc00000)
+ (DF_NAN_MASK 0x7ff80000)
])
;; -------------------------------------------------------------------------
@@ -615,6 +626,14 @@ (define_insn "cmpeqsi_t"
cmp/eq %1,%0"
[(set_attr "type" "mt_group")])
+(define_insn "fpcmp_i1"
+ [(set (reg:SI T_REG)
+ (match_operator:SI 1 "soft_fp_comparison_operator"
+ [(match_operand 0 "soft_fp_comparison_operand" "r") (const_int 0)]))]
+ "TARGET_SH1_SOFTFP"
+ "tst %0,%0"
+ [(set_attr "type" "mt_group")])
+
(define_insn "cmpgtsi_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:SI 0 "arith_reg_operand" "r,r")
@@ -1154,9 +1173,9 @@ (define_insn_and_split "*movsicc_umin"
(define_insn "*movsicc_t_false"
[(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
- (if_then_else (eq (reg:SI T_REG) (const_int 0))
- (match_operand:SI 1 "general_movsrc_operand" "r,I08")
- (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+ (if_then_else:SI (eq (reg:SI T_REG) (const_int 0))
+ (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+ (match_operand:SI 2 "arith_reg_operand" "0,0")))]
"TARGET_PRETEND_CMOVE
&& (arith_reg_operand (operands[1], SImode)
|| (immediate_operand (operands[1], SImode)
@@ -1167,9 +1186,9 @@ (define_insn "*movsicc_t_false"
(define_insn "*movsicc_t_true"
[(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
- (if_then_else (ne (reg:SI T_REG) (const_int 0))
- (match_operand:SI 1 "general_movsrc_operand" "r,I08")
- (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+ (if_then_else:SI (ne (reg:SI T_REG) (const_int 0))
+ (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+ (match_operand:SI 2 "arith_reg_operand" "0,0")))]
"TARGET_PRETEND_CMOVE
&& (arith_reg_operand (operands[1], SImode)
|| (immediate_operand (operands[1], SImode)
@@ -6849,6 +6868,50 @@ (define_insn "stuff_delay_slot"
\f
;; Conditional branch insns
+(define_expand "cmpun_sdf"
+ [(unordered (match_operand 0 "" "") (match_operand 1 "" ""))]
+ ""
+ "
+{
+ HOST_WIDE_INT mask;
+ switch (GET_MODE (operands[0]))
+ {
+ case SFmode:
+ mask = SF_NAN_MASK;
+ break;
+ case DFmode:
+ mask = DF_NAN_MASK;
+ break;
+ default:
+ FAIL;
+ }
+ emit_insn (gen_cmpunsf_i1 (operands[0], operands[1],
+ force_reg (SImode, GEN_INT (mask))));
+ DONE;
+}")
+
+(define_expand "cmpuneq_sdf"
+ [(uneq (match_operand 0 "" "") (match_operand 1 "" ""))]
+ ""
+ "
+{
+ HOST_WIDE_INT mask;
+ switch (GET_MODE (operands[0]))
+ {
+ case SFmode:
+ mask = SF_NAN_MASK;
+ break;
+ case DFmode:
+ mask = DF_NAN_MASK;
+ break;
+ default:
+ FAIL;
+ }
+ emit_insn (gen_cmpuneqsf_i1 (operands[0], operands[1],
+ force_reg (SImode, GEN_INT (mask))));
+ DONE;
+}")
+
(define_expand "cbranchint4_media"
[(set (pc)
(if_then_else (match_operator 0 "shmedia_cbranch_comparison_operator"
@@ -9394,11 +9457,15 @@ (define_split
(define_expand "cstoresf4"
[(set (match_operand:SI 0 "register_operand" "=r")
(match_operator:SI 1 "sh_float_comparison_operator"
- [(match_operand:SF 2 "arith_operand" "")
- (match_operand:SF 3 "arith_operand" "")]))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ [(match_operand:SF 2 "nonmemory_operand" "")
+ (match_operand:SF 3 "nonmemory_operand" "")]))]
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"if (TARGET_SHMEDIA)
{
+ if (!arith_operand (operands[2], DFmode))
+ operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+ if (!arith_operand (operands[3], DFmode))
+ operands[3] = copy_to_mode_reg (DFmode, operands[3]);
emit_insn (gen_cstore4_media (operands[0], operands[1],
operands[2], operands[3]));
DONE;
@@ -9407,18 +9474,22 @@ (define_expand "cstoresf4"
if (! currently_expanding_to_rtl)
FAIL;
- sh_emit_compare_and_set (operands, SFmode);
+ sh_expand_float_scc (operands);
DONE;
")
(define_expand "cstoredf4"
[(set (match_operand:SI 0 "register_operand" "=r")
(match_operator:SI 1 "sh_float_comparison_operator"
- [(match_operand:DF 2 "arith_operand" "")
- (match_operand:DF 3 "arith_operand" "")]))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ [(match_operand:DF 2 "nonmemory_operand" "")
+ (match_operand:DF 3 "nonmemory_operand" "")]))]
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"if (TARGET_SHMEDIA)
{
+ if (!arith_operand (operands[2], DFmode))
+ operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+ if (!arith_operand (operands[3], DFmode))
+ operands[3] = copy_to_mode_reg (DFmode, operands[3]);
emit_insn (gen_cstore4_media (operands[0], operands[1],
operands[2], operands[3]));
DONE;
@@ -9427,7 +9498,7 @@ (define_expand "cstoredf4"
if (! currently_expanding_to_rtl)
FAIL;
- sh_emit_compare_and_set (operands, DFmode);
+ sh_expand_float_scc (operands);
DONE;
")
@@ -9765,7 +9836,7 @@ (define_expand "addsf3"
[(set (match_operand:SF 0 "arith_reg_operand" "")
(plus:SF (match_operand:SF 1 "arith_reg_operand" "")
(match_operand:SF 2 "arith_reg_operand" "")))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SH2E)
@@ -9773,6 +9844,12 @@ (define_expand "addsf3"
expand_sf_binop (&gen_addsf3_i, operands);
DONE;
}
+ else if (TARGET_OSFP)
+ {
+ expand_sfunc_binop (SFmode, &gen_addsf3_i3, \"__addsf3\", PLUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*addsf3_media"
@@ -9871,6 +9948,22 @@ (define_insn_and_split "binary_sf_op1"
}"
[(set_attr "type" "fparith_media")])
+(define_insn "addsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (plus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R6_REG))
+ (clobber (reg:SI R7_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "addsf3_i"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(plus:SF (match_operand:SF 1 "fp_arith_reg_operand" "%0")
@@ -9885,7 +9978,7 @@ (define_expand "subsf3"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "")
(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
(match_operand:SF 2 "fp_arith_reg_operand" "")))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SH2E)
@@ -9893,6 +9986,12 @@ (define_expand "subsf3"
expand_sf_binop (&gen_subsf3_i, operands);
DONE;
}
+ else if (TARGET_OSFP)
+ {
+ expand_sfunc_binop (SFmode, &gen_subsf3_i3, \"__subsf3\", MINUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*subsf3_media"
@@ -9903,6 +10002,23 @@ (define_insn "*subsf3_media"
"fsub.s %1, %2, %0"
[(set_attr "type" "fparith_media")])
+(define_insn "subsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (minus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R5_REG))
+ (clobber (reg:SI R6_REG))
+ (clobber (reg:SI R7_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "subsf3_i"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")
@@ -9915,10 +10031,15 @@ (define_insn "subsf3_i"
(define_expand "mulsf3"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "")
- (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
- (match_operand:SF 2 "fp_arith_reg_operand" "")))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
- "")
+ (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+ (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+ "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
+ "if (TARGET_SH1_SOFTFP_MODE (SFmode))
+ {
+ expand_sfunc_binop (SFmode, &gen_mulsf3_i3, \"__mulsf3\", MULT,
+ operands);
+ DONE;
+ }")
(define_insn "*mulsf3_media"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
@@ -9959,6 +10080,22 @@ (define_insn "mulsf3_i4"
[(set_attr "type" "fp")
(set_attr "fp_mode" "single")])
+(define_insn "mulsf3_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (mult:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_insn "mac_media"
[(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
(plus:SF (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -10119,6 +10256,149 @@ (define_insn "*fixsfsi"
"ftrc %1,%0"
[(set_attr "type" "fp")])
+(define_insn "cmpnesf_i1"
+ [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+ (compare:CC_FP_NE (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtsf_i1"
+ [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+ (compare:CC_FP_GT (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltsf_i1"
+ [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+ (compare:CC_FP_UNLT (reg:SF R4_REG) (reg:SF R5_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqsf_i1_finite"
+ [(set (reg:SI T_REG)
+ (eq:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+ (clobber (match_scratch:SI 2 "=0,1,?r"))]
+ "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+ "*
+{
+ if (which_alternative == 0)
+ output_asm_insn (\"cmp/eq\t%0,%1\;or\t%1,%2\;bt\t0f\", operands);
+ else if (which_alternative == 1)
+ output_asm_insn (\"cmp/eq\t%0,%1\;or\t%0,%2\;bt\t0f\", operands);
+ else
+ output_asm_insn (\"cmp/eq\t%0,%1\;mov\t%0,%2\;bt\t0f\;or\t%1,%2\",
+ operands);
+ return \"add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+ [(set_attr "length" "10,10,12")])
+
+(define_insn "cmplesf_i1_finite"
+ [(set (reg:SI T_REG)
+ (le:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+ (clobber (match_scratch:SI 2 "=0,1,r"))]
+ "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+ "*
+{
+ output_asm_insn (\"cmp/pz\t%0\", operands);
+ if (which_alternative == 2)
+ output_asm_insn (\"mov\t%0,%2\", operands);
+ if (TARGET_SH2)
+ output_asm_insn (\"bf/s\t0f\;cmp/hs\t%1,%0\;cmp/ge\t%0,%1\", operands);
+ else
+ output_asm_insn (\"bt\t1f\;bra\t0f\;cmp/hs\t%1,%0\\n1:\tcmp/ge\t%0,%1\",
+ operands);
+ if (which_alternative == 1)
+ output_asm_insn (\"or\t%0,%2\", operands);
+ else
+ output_asm_insn (\"or\t%1,%2\", operands);
+ return \"bt\t0f\;add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+ [(set_attr "length" "18,18,20")])
+
+(define_insn "cmpunsf_i1"
+ [(set (reg:SI T_REG)
+ (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
+ (match_operand:SF 1 "arith_reg_operand" "r,r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+ (clobber (match_scratch:SI 3 "=0,&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+ [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqsf_i1"
+ [(set (reg:SI T_REG)
+ (uneq:SI (match_operand:SF 0 "arith_reg_operand" "r")
+ (match_operand:SF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "*
+{
+ output_asm_insn (\"not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\", operands);
+ output_asm_insn (\"bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%0,%1\", operands);
+ output_asm_insn (\"mov\t%0,%3\;bt\t0f\;or\t%1,%3\", operands);
+ return \"add\t%3,%3\;tst\t%3,%3\\n0:\";
+}"
+ [(set_attr "length" "24")])
+
+(define_insn "movcc_fp_ne"
+ [(set (match_operand:CC_FP_NE 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_NE 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_gt"
+ [(set (match_operand:CC_FP_GT 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_GT 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_unlt"
+ [(set (match_operand:CC_FP_UNLT 0 "general_movdst_operand"
+ "=r,r,m")
+ (match_operand:CC_FP_UNLT 1 "general_movsrc_operand"
+ "rI08,mr,r"))]
+ "TARGET_SH1"
+ "@
+ mov %1,%0
+ mov.l %1,%0
+ mov.l %1,%0"
+ [(set_attr "type" "move,load,store")])
+
(define_insn "cmpgtsf_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
@@ -10146,6 +10426,22 @@ (define_insn "ieee_ccmpeqsf_t"
"* return output_ieee_ccmpeq (insn, operands);"
[(set_attr "length" "4")])
+(define_insn "*cmpltgtsf_t"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+ "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")])
+
+(define_insn "*cmporderedsf_t"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+ "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")])
+
(define_insn "cmpgtsf_t_i4"
[(set (reg:SI T_REG)
@@ -10178,6 +10474,26 @@ (define_insn "*ieee_ccmpeqsf_t_4"
[(set_attr "length" "4")
(set_attr "fp_mode" "single")])
+(define_insn "*cmpltgtsf_t_4"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "single")])
+
+(define_insn "*cmporderedsf_t_4"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+ (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "single")])
+
(define_insn "cmpeqsf_media"
[(set (match_operand:SI 0 "register_operand" "=r")
(eq:SI (match_operand:SF 1 "fp_arith_reg_operand" "f")
@@ -10213,18 +10529,24 @@ (define_insn "cmpunsf_media"
(define_expand "cbranchsf4"
[(set (pc)
(if_then_else (match_operator 0 "sh_float_comparison_operator"
- [(match_operand:SF 1 "arith_operand" "")
- (match_operand:SF 2 "arith_operand" "")])
+ [(match_operand:SF 1 "nonmemory_operand" "")
+ (match_operand:SF 2 "nonmemory_operand" "")])
(match_operand 3 "" "")
(pc)))]
- "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SHMEDIA)
- emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
- operands[3]));
+ {
+ if (!arith_operand (operands[1], SFmode))
+ operands[1] = copy_to_mode_reg (SFmode, operands[1]);
+ if (!arith_operand (operands[2], SFmode))
+ operands[2] = copy_to_mode_reg (SFmode, operands[2]);
+ emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+ operands[2], operands[3]));
+ }
else
- sh_emit_compare_and_branch (operands, SFmode);
+ sh_expand_float_cbranch (operands);
DONE;
}")
@@ -10426,11 +10748,39 @@ (define_insn "abssf2_i"
[(set_attr "type" "fmove")
(set_attr "fp_mode" "single")])
+(define_expand "abssc2"
+ [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+ (abs:SF (match_operand:SC 1 "fp_arith_reg_operand" "")))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "
+{
+ expand_sfunc_unop (SCmode, &gen_abssc2_i3, \"__hypotf\", ABS, operands);
+ DONE;
+}")
+
+(define_insn "abssc2_i3"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (abs:SF (reg:SC R4_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI R5_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_OSFP && ! TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "adddf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(plus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
(match_operand:DF 2 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_FPU_DOUBLE || TARGET_SH3"
"
{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10438,6 +10788,12 @@ (define_expand "adddf3"
expand_df_binop (&gen_adddf3_i, operands);
DONE;
}
+ else if (TARGET_SH3)
+ {
+ expand_sfunc_binop (DFmode, &gen_adddf3_i3_wrap, \"__adddf3\", PLUS,
+ operands);
+ DONE;
+ }
}")
(define_insn "*adddf3_media"
@@ -10458,6 +10814,30 @@ (define_insn "adddf3_i"
[(set_attr "type" "dfp_arith")
(set_attr "fp_mode" "double")])
+(define_expand "adddf3_i3_wrap"
+ [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+ "TARGET_SH3"
+ "
+{
+ emit_insn (gen_adddf3_i3 (operands[1]));
+ emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+ DONE;
+}")
+
+(define_insn "adddf3_i3"
+ [(set (reg:DF R0_REG)
+ (plus:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:DI R2_REG))
+ (clobber (reg:DF R4_REG))
+ (clobber (reg:DF R6_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH3"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "subdf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(minus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10494,7 +10874,7 @@ (define_expand "muldf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(mult:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
(match_operand:DF 2 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_FPU_DOUBLE || TARGET_SH3"
"
{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10502,6 +10882,12 @@ (define_expand "muldf3"
expand_df_binop (&gen_muldf3_i, operands);
DONE;
}
+ else if (TARGET_SH3)
+ {
+ expand_sfunc_binop (DFmode, &gen_muldf3_i3_wrap, \"__muldf3\", MULT,
+ operands);
+ DONE;
+ }
}")
(define_insn "*muldf3_media"
@@ -10522,6 +10908,32 @@ (define_insn "muldf3_i"
[(set_attr "type" "dfp_mul")
(set_attr "fp_mode" "double")])
+(define_expand "muldf3_i3_wrap"
+ [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+ "TARGET_SH3"
+ "
+{
+ emit_insn (gen_muldf3_i3 (operands[1]));
+ emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+ DONE;
+}")
+
+(define_insn "muldf3_i3"
+ [(set (reg:DF R0_REG)
+ (mult:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI MACH_REG))
+ (clobber (reg:SI MACL_REG))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:DI R2_REG))
+ (clobber (reg:DF R4_REG))
+ (clobber (reg:DF R6_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH3"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "divdf3"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(div:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10651,6 +11063,73 @@ (define_insn "fix_truncdfsi2_i"
;; (use (match_dup 2))])
;; (set (match_dup 0) (reg:SI FPUL_REG))])
+(define_insn "cmpnedf_i1"
+ [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+ (compare:CC_FP_NE (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtdf_i1"
+ [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+ (compare:CC_FP_GT (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltdf_i1"
+ [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+ (compare:CC_FP_UNLT (reg:DF R4_REG) (reg:DF R6_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqdf_i1_finite"
+ [(set (reg:SI T_REG)
+ (eq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (clobber (match_scratch:SI 2 "=&r"))]
+ "TARGET_SH1_SOFTFP && flag_finite_math_only"
+ "cmp/eq\t%R0,%R1\;mov\t%S0,%2\;bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;or\t%S1,%2\;add\t%2,%2\;or\t%R0,%2\;tst\t%2,%2\\n0:"
+ [(set_attr "length" "18")])
+
+(define_insn "cmpundf_i1"
+ [(set (reg:SI T_REG)
+ (unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
+ (match_operand:DF 1 "arith_reg_operand" "r,r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+ (clobber (match_scratch:SI 3 "=0,&r"))]
+ "TARGET_SH1 && ! TARGET_SH2E"
+ "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+ [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqdf_i1"
+ [(set (reg:SI T_REG)
+ (uneq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
+ "TARGET_SH1_SOFTFP"
+ "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%R0,%R1\; bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;mov\t%S0,%3\;or\t%S1,%3\;add\t%3,%3\;or\t%R0,%3\;tst\t%3,%3\\n0:"
+ [(set_attr "length" "30")])
+
(define_insn "cmpgtdf_t"
[(set (reg:SI T_REG)
(gt:SI (match_operand:DF 0 "arith_reg_operand" "f")
@@ -10682,6 +11161,26 @@ (define_insn "*ieee_ccmpeqdf_t"
[(set_attr "length" "4")
(set_attr "fp_mode" "double")])
+(define_insn "*cmpltgtdf_t"
+ [(set (reg:SI T_REG)
+ (ltgt:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+ (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_DOUBLE"
+ "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "double")])
+
+(define_insn "*cmpordereddf_t_4"
+ [(set (reg:SI T_REG)
+ (ordered:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+ (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+ (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+ "TARGET_SH4 || TARGET_SH2A_SINGLE"
+ "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+ [(set_attr "length" "6")
+ (set_attr "fp_mode" "double")])
+
(define_insn "cmpeqdf_media"
[(set (match_operand:SI 0 "register_operand" "=r")
(eq:SI (match_operand:DF 1 "fp_arith_reg_operand" "f")
@@ -10717,18 +11216,24 @@ (define_insn "cmpundf_media"
(define_expand "cbranchdf4"
[(set (pc)
(if_then_else (match_operator 0 "sh_float_comparison_operator"
- [(match_operand:DF 1 "arith_operand" "")
- (match_operand:DF 2 "arith_operand" "")])
+ [(match_operand:DF 1 "nonmemory_operand" "")
+ (match_operand:DF 2 "nonmemory_operand" "")])
(match_operand 3 "" "")
(pc)))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SHMEDIA)
- emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
- operands[3]));
+ {
+ if (!arith_operand (operands[1], DFmode))
+ operands[1] = copy_to_mode_reg (DFmode, operands[1]);
+ if (!arith_operand (operands[2], DFmode))
+ operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+ emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+ operands[2], operands[3]));
+ }
else
- sh_emit_compare_and_branch (operands, DFmode);
+ sh_expand_float_cbranch (operands);
DONE;
}")
@@ -10823,7 +11328,7 @@ (define_insn "absdf2_i"
(define_expand "extendsfdf2"
[(set (match_operand:DF 0 "fp_arith_reg_operand" "")
(float_extend:DF (match_operand:SF 1 "fpul_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
"
{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10832,6 +11337,18 @@ (define_expand "extendsfdf2"
get_fpscr_rtx ()));
DONE;
}
+ else if (TARGET_SH2E)
+ {
+ expand_sfunc_unop (SFmode, &gen_extendsfdf2_i2e, \"__extendsfdf2\",
+ FLOAT_EXTEND, operands);
+ DONE;
+ }
+ else if (TARGET_SH1)
+ {
+ expand_sfunc_unop (SFmode, &gen_extendsfdf2_i1, \"__extendsfdf2\",
+ FLOAT_EXTEND, operands);
+ DONE;
+ }
}")
(define_insn "*extendsfdf2_media"
@@ -10850,16 +11367,94 @@ (define_insn "extendsfdf2_i4"
[(set_attr "type" "fp")
(set_attr "fp_mode" "double")])
+;; ??? In order to use this efficiently, we'd have to have an extra
+;; register class for r0 and r1 - and that would cause repercussions in
+;; register allocation elsewhere. So just say we clobber r0 / r1, and
+;; that we can use an arbitrary target. */
+(define_insn_and_split "extendsfdf2_i1"
+ [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+ (float_extend:DF (reg:SF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (reg:DF R0_REG))]
+ "emit_insn (gen_extendsfdf2_i1_r0 (operands[1]));"
+ [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i1_r0"
+ [(set (reg:DF R0_REG) (float_extend:DF (reg:SF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn_and_split "extendsfdf2_i2e"
+ [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+ (float_extend:DF (reg:SF FR4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI FPUL_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (reg:DF R0_REG))]
+ "emit_insn (gen_extendsfdf2_i2e_r0 (operands[1]));"
+ [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i2e_r0"
+ [(set (reg:DF R0_REG) (float_extend:DF (reg:SF FR4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (clobber (reg:SI R4_REG))
+ (clobber (reg:SI FPUL_REG))
+ (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "jsr @%0%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
(define_expand "truncdfsf2"
[(set (match_operand:SF 0 "fpul_operand" "")
- (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
- "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
- "
-{
+ (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
+ "TARGET_SH1 || TARGET_SHMEDIA_FPU"
+ "
+{
if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
{
emit_df_insn (gen_truncdfsf2_i4 (operands[0], operands[1],
- get_fpscr_rtx ()));
+ get_fpscr_rtx ()));
+ DONE;
+ }
+ else if (TARGET_SH2E)
+ {
+ expand_sfunc_unop (DFmode, &gen_truncdfsf2_i2e, \"__truncdfsf2\",
+ FLOAT_TRUNCATE, operands);
+ DONE;
+ }
+ else if (TARGET_SH1)
+ {
+ expand_sfunc_unop (DFmode, &gen_truncdfsf2_i1, \"__truncdfsf2\",
+ FLOAT_TRUNCATE, operands);
DONE;
}
}")
@@ -10879,6 +11474,37 @@ (define_insn "truncdfsf2_i4"
"fcnvds %1,%0"
[(set_attr "type" "fp")
(set_attr "fp_mode" "double")])
+
+(define_insn "truncdfsf2_i1"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+ (float_truncate:SF (reg:DF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "truncdfsf2_i2e"
+ [(set (match_operand:SF 0 "arith_reg_dest" "=w")
+ (float_truncate:SF (reg:DF R4_REG)))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ (clobber (reg:SI FPUL_REG))
+ (clobber (reg:SI R0_REG))
+ (clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
+ (clobber (reg:SI R3_REG))
+ (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+ "TARGET_SH1_SOFTFP && TARGET_SH2E"
+ "jsr @%1%#"
+ [(set_attr "type" "sfunc")
+ (set_attr "needs_delay_slot" "yes")])
+
\f
;; Bit field extract patterns. These give better code for packed bitfields,
;; because they allow auto-increment addresses to be generated.
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: SH optimized software floating point routines
2010-07-17 21:23 ` Kaz Kojima
@ 2010-07-19 9:23 ` Naveen H. S
0 siblings, 0 replies; 30+ messages in thread
From: Naveen H. S @ 2010-07-19 9:23 UTC (permalink / raw)
To: amylaar; +Cc: gcc, Prafulla Thakare, Kaz Kojima
Hi.
Thank you for the modified patch.
I have applied the patch to gcc-4.5 sources and checking the regression
for SH[1234].
I will run some more tests on the modified (patched) toolchain.
I will share the test results after the regression and other tests are
complete.
Regards,
Naveen
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-19 0:59 ` Joern Rennecke
@ 2010-07-20 13:35 ` Kaz Kojima
2010-07-21 11:25 ` Christian Bruel
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Kaz Kojima @ 2010-07-20 13:35 UTC (permalink / raw)
To: joern.rennecke; +Cc: Naveen.S, gcc, Prafulla.Thakare
Joern Rennecke <joern.rennecke@embecosm.com> wrote:
> I've found two bugs in truncdfsf2;
> I've also added back a number of hunks that Naveen had dropped.
>
> Note that most of the patch has been prepared in 2006, so that is the
> proper most recent copyright date for those files that haven't been touched
> save for swapping the Copyright notice.
I've got some regressions with "make check" on sh4-unknown-linux-gnu.
It looks that all of them are failed with the undefined references to
__unorddf2/__unordsf2 when -mieee enabled.
I'm trying the attached patch over sh-softfp-20100718-2131 patch.
All regressions go away with it on cross sh4-unknown-linux-gnu,
though the native bootstrap will take a few days more.
BTW, it looks that softfp __unord?f2 routines check signaling NaNs
only. This makes __builtin_isnan return false for quiet NaNs for
which current fp-bit ones return true when -mieee enabled. Perhaps
that change of behavior might be OK for software FP.
Regards,
kaz
--
diff -uprN ORIG/trunk/gcc/config/sh/ieee-754-df.S trunk/gcc/config/sh/ieee-754-df.S
--- ORIG/trunk/gcc/config/sh/ieee-754-df.S 2010-07-20 11:39:29.000000000 +0900
+++ trunk/gcc/config/sh/ieee-754-df.S 2010-07-20 11:36:15.000000000 +0900
@@ -23,11 +23,11 @@ see the files COPYING3 and COPYING.RUNTI
!! STMicroelectronics ST40 CPUs
!! Contributed by J"orn Rennecke joern.rennecke@st.com
-#ifndef __SH_FPU_DOUBLE__
-
#include "lib1funcs.h"
#include "insn-constants.h"
+#ifndef __SH_FPU_DOUBLE__
+
/* Double-precision floating-point emulation.
We handle NANs, +-infinity, and +-zero.
However, we assume that for NANs, the topmost bit of the fraction is set. */
@@ -123,7 +123,7 @@ GLOBAL(unorddf2):
mov.l LOCAL(c_DF_NAN_MASK),r1
not DBL0H,r0
tst r1,r0
- not r6,r0
+ not DBL1H,r0
bt LOCAL(unord)
tst r1,r0
LOCAL(unord):
@@ -788,4 +788,52 @@ LOCAL(x7ff80000):
#endif /* L_divdf3 */
#endif /* DYN_SHIFT */
+#else /* __SH_FPU_DOUBLE__ */
+
+#ifdef L_unorddf2
+ .balign 4
+ .global GLOBAL(unorddf2)
+ FUNC(GLOBAL(unorddf2))
+GLOBAL(unorddf2):
+ flds fr4,fpul
+ sts fpul,r4
+ shll r4
+ mov.l LOCAL(c_DF_QNAN_MASK),r1
+ shlr r4
+ cmp/eq r4,r1
+ bf/s LOCAL(unord_check_qnan0)
+ flds fr5,fpul
+ sts fpul,r5
+ tst r5,r5
+ bt LOCAL(unord_next)
+LOCAL(unord_check_qnan0):
+ not r4,r0
+ tst r1,r0
+ bt LOCAL(unord)
+LOCAL(unord_next):
+ flds fr6,fpul
+ sts fpul,r6
+ shll r6
+ shlr r6
+ cmp/eq r6,r1
+ bf/s LOCAL(unord_check_qnan1)
+ flds fr7,fpul
+ sts fpul,r7
+ tst r7,r7
+ bt LOCAL(unord_fail)
+LOCAL(unord_check_qnan1):
+ not r6,r0
+ tst r1,r0
+LOCAL(unord):
+ rts
+ movt r0
+LOCAL(unord_fail):
+ rts
+ mov #0,r0
+ .balign 4
+LOCAL(c_DF_QNAN_MASK):
+ .long 0x7ff00000
+ ENDFUNC(GLOBAL(unorddf2))
+#endif /* L_unorddf2 */
+
#endif /* __SH_FPU_DOUBLE__ */
diff -uprN ORIG/trunk/gcc/config/sh/ieee-754-sf.S trunk/gcc/config/sh/ieee-754-sf.S
--- ORIG/trunk/gcc/config/sh/ieee-754-sf.S 2010-07-20 11:39:30.000000000 +0900
+++ trunk/gcc/config/sh/ieee-754-sf.S 2010-07-20 11:35:58.000000000 +0900
@@ -23,11 +23,11 @@ see the files COPYING3 and COPYING.RUNTI
!! STMicroelectronics ST40 CPUs
!! Contributed by J"orn Rennecke joern.rennecke@st.com
-#ifndef __SH_FPU_ANY__
-
#include "lib1funcs.h"
#include "insn-constants.h"
+#ifndef __SH_FPU_ANY__
+
/* Single-precision floating-point emulation.
We handle NANs, +-infinity, and +-zero.
However, we assume that for NANs, the topmost bit of the fraction is set. */
@@ -689,4 +689,42 @@ LOCAL(tab):
#endif /* L_hypotf */
#endif /* DYN_SHIFT */
+#else /* __SH_FPU_ANY__ */
+
+#ifdef L_unordsf2
+ .balign 4
+ .global GLOBAL(unordsf2)
+ FUNC(GLOBAL(unordsf2))
+GLOBAL(unordsf2):
+ flds fr5,fpul
+ sts fpul,r4
+ shll r4
+ mov.l LOCAL(c_SF_QNAN_MASK),r1
+ shlr r4
+ cmp/eq r4,r1
+ bt/s LOCAL(unord_next)
+ not r4,r0
+ tst r1,r0
+ bt LOCAL(unord)
+LOCAL(unord_next):
+ flds fr4,fpul
+ sts fpul,r5
+ shll r5
+ shlr r5
+ cmp/eq r5,r1
+ bt/s LOCAL(unord_fail)
+ not r5,r0
+ tst r1,r0
+LOCAL(unord):
+ rts
+ movt r0
+LOCAL(unord_fail):
+ rts
+ mov #0,r0
+ .balign 4
+LOCAL(c_SF_QNAN_MASK):
+ .long 0x7f800000
+ ENDFUNC(GLOBAL(unordsf2))
+#endif /* L_unordsf2 */
+
#endif /* __SH_FPU_ANY__ */
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-20 13:35 ` Kaz Kojima
@ 2010-07-21 11:25 ` Christian Bruel
2010-07-22 7:22 ` Christian Bruel
2010-07-22 0:45 ` Kaz Kojima
2010-07-22 6:41 ` Joern Rennecke
2 siblings, 1 reply; 30+ messages in thread
From: Christian Bruel @ 2010-07-21 11:25 UTC (permalink / raw)
To: Kaz Kojima; +Cc: joern.rennecke, Naveen.S, gcc, Prafulla.Thakare
[-- Attachment #1: Type: text/plain, Size: 790 bytes --]
Hi Kaz,
Kaz Kojima wrote:
>
> BTW, it looks that softfp __unord?f2 routines check signaling NaNs
> only. This makes __builtin_isnan return false for quiet NaNs for
> which current fp-bit ones return true when -mieee enabled. Perhaps
> that change of behavior might be OK for software FP.
I use the attached patch to handle the QNaNs in the assembly solf-fp.
Need to be updated for trunk (and update the dates in changelogs). Will do.
Cheers
Christian
2010-04-20 Christian Bruel <christian.bruel@st.com>
* gcc.dg/builtins-nan.c: New test.
2010-04-20 Christian Bruel <christian.bruel@st.com>
* config/sh/ieee-754-df.S (nedf2f): Don't check Qbit for NaNs.
* config/sh/ieee-754-sf.S (nesf2f): Likewise.
* config/sh/sh.md (cmpunsf_i1, cmpundf_i1): Likewise. Clobber R2.
[-- Attachment #2: qnans.patch --]
[-- Type: text/plain, Size: 4377 bytes --]
2010-04-20 Christian Bruel <christian.bruel@st.com>
* gcc.dg/builtins-nan.c: New test.
2010-04-20 Christian Bruel <christian.bruel@st.com>
* config/sh/ieee-754-df.S (nedf2f): Don't check Qbit for NaNs.
* config/sh/ieee-754-sf.S (nesf2f): Likewise.
* config/sh/sh.md (cmpunsf_i1, cmpundf_i1): Likewise. Clobber R2.
Index: gcc/config/sh/ieee-754-df.S
===================================================================
--- gcc/config/sh/ieee-754-df.S (revision 1352)
+++ gcc/config/sh/ieee-754-df.S (revision 1373)
@@ -88,11 +88,12 @@
HIDDEN_FUNC(GLOBAL(nedf2f))
GLOBAL(nedf2f):
cmp/eq DBL0L,DBL1L
+ bf.s LOCAL(ne)
+ mov #1,r0
+ cmp/eq DBL0H,DBL1H
mov.l LOCAL(c_DF_NAN_MASK),r1
- bf LOCAL(ne)
- cmp/eq DBL0H,DBL1H
- not DBL0H,r0
- bt LOCAL(check_nan)
+ bt.s LOCAL(check_nan)
+ not DBL0H,r0
mov DBL0H,r0
or DBL1H,r0
add r0,r0
@@ -100,11 +101,17 @@
or DBL0L,r0
LOCAL(check_nan):
tst r1,r0
- rts
+ bt.s LOCAL(nan)
+ mov #12,r2
+ shll16 r2
+ xor r2,r1
+ tst r1,r0
+LOCAL(nan):
movt r0
LOCAL(ne):
rts
- mov #1,r0
+ nop
+
.balign 4
LOCAL(c_DF_NAN_MASK):
.long DF_NAN_MASK
Index: gcc/config/sh/ieee-754-sf.S
===================================================================
--- gcc/config/sh/ieee-754-sf.S (revision 1352)
+++ gcc/config/sh/ieee-754-sf.S (revision 1373)
@@ -55,19 +55,27 @@
the values are NaN. */
cmp/eq r4,r5
mov.l LOCAL(c_SF_NAN_MASK),r1
+ bt.s LOCAL(check_nan)
not r4,r0
- bt LOCAL(check_nan)
mov r4,r0
or r5,r0
rts
add r0,r0
LOCAL(check_nan):
tst r1,r0
+ bt.s LOCAL(nan)
+ mov #96,r2
+ shll16 r2
+ xor r2,r1
+ tst r1,r0
+LOCAL(nan):
rts
movt r0
+
.balign 4
LOCAL(c_SF_NAN_MASK):
.long SF_NAN_MASK
+LOCAL(c_SF_SNAN_MASK):
ENDFUNC(GLOBAL(nesf2f))
#endif /* L_nesf2f */
Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md (revision 1352)
+++ gcc/config/sh/sh.md (revision 1373)
@@ -11182,6 +11182,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
"TARGET_SH1 && ! TARGET_SH2E"
"jsr @%1%#"
@@ -11257,13 +11258,18 @@
(define_insn "cmpunsf_i1"
[(set (reg:SI T_REG)
- (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
- (match_operand:SF 1 "arith_reg_operand" "r,r")))
- (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
- (clobber (match_scratch:SI 3 "=0,&r"))]
+ (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r")
+ (match_operand:SF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
"TARGET_SH1 && ! TARGET_SH2E"
- "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
- [(set_attr "length" "10")])
+ "not\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+\tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f
+\tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2
+\tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+\tnot\t%1,%3\;tst\t%2,%3
+0:"
+ [(set_attr "length" "28")])
;; ??? This is a lot of code with a lot of branches; a library function
;; might be better.
@@ -11967,6 +11973,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
"TARGET_SH1_SOFTFP"
"jsr @%1%#"
@@ -12008,13 +12015,18 @@
(define_insn "cmpundf_i1"
[(set (reg:SI T_REG)
- (unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
- (match_operand:DF 1 "arith_reg_operand" "r,r")))
- (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
- (clobber (match_scratch:SI 3 "=0,&r"))]
+ (unordered:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
"TARGET_SH1 && ! TARGET_SH2E"
- "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
- [(set_attr "length" "10")])
+ "not\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+ \tnot\t%S1,%3\;tst\t%2,%3\;bt.s\t0f
+ \tmov\t#12,%3\;shll16\t%3\;xor\t%3,%2
+ \tnot\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+ \tnot\t%S1,%3\;tst\t%2,%3
+0:"
+ [(set_attr "length" "28")])
;; ??? This is a lot of code with a lot of branches; a library function
;; might be better.
[-- Attachment #3: buitins-nan.c --]
[-- Type: text/plain, Size: 724 bytes --]
/* { dg-do run } */
/* { dg-options "-mieee" { target sh*-*-* } } */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
static int lisnan(double v)
{
return (v != v);
}
static int lisnanf(float v)
{
return (v != v);
}
int main(void)
{
double d;
float f;
/* double */
d = __builtin_nans("");
if (! lisnan(d))
abort();
if (! __builtin_isnan(d))
abort();
d = __builtin_nan("");
if (! lisnan(d))
abort();
if (! __builtin_isnan(d))
abort();
/* float */
f = __builtin_nansf("");
if (! lisnanf(f))
abort();
if (! __builtin_isnanf(f))
abort();
f = __builtin_nanf("");
if (! lisnanf(f))
abort();
if (! __builtin_isnanf(f))
abort();
exit (0);
}
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-20 13:35 ` Kaz Kojima
2010-07-21 11:25 ` Christian Bruel
@ 2010-07-22 0:45 ` Kaz Kojima
2010-07-22 6:41 ` Joern Rennecke
2 siblings, 0 replies; 30+ messages in thread
From: Kaz Kojima @ 2010-07-22 0:45 UTC (permalink / raw)
To: joern.rennecke; +Cc: Naveen.S, gcc, Prafulla.Thakare
> I'm trying the attached patch over sh-softfp-20100718-2131 patch.
> All regressions go away with it on cross sh4-unknown-linux-gnu,
> though the native bootstrap will take a few days more.
There are a few warnings in bootstrap:
../trunk/gcc/config/sh/sh.c: In function 'sh_soft_fp_cmp':
../trunk/gcc/config/sh/sh.c:2193:8: error: enum conversion in initialization is invalid in C++ [-Werror=c++-compat]
../trunk/gcc/config/sh/sh.c:2248:3: error: enum conversion when passing argument 1 of 'gen_rtx_fmt_ee_stat' is invalid in C++ [-Werror=c++-compat]
./genrtl.h:24:1: note: expected 'enum rtx_code' but argument is of type 'int'
../trunk/gcc/config/sh/sh.c: In function 'expand_sfunc_op':
../trunk/gcc/config/sh/sh.c:8744:19: error: variable 'insn' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors
The attached is a fixup for them.
Unfortunately, the bootstrap has failed with stage2/stage3
comparison failure, though it might be irrelevant with softfp
patches. I'll try again with some older revision.
Regards,
kaz
--
--- ORIG/trunk/gcc/config/sh/sh.c 2010-07-19 10:58:36.000000000 +0900
+++ trunk/gcc/config/sh/sh.c 2010-07-21 06:45:18.000000000 +0900
@@ -2185,12 +2185,13 @@ sh_emit_cheap_store_flag (enum machine_m
}
static rtx
-sh_soft_fp_cmp (int code, enum machine_mode op_mode, rtx op0, rtx op1)
+sh_soft_fp_cmp (enum rtx_code code, enum machine_mode op_mode, rtx op0,
+ rtx op1)
{
const char *name = NULL;
rtx (*fun) (rtx, rtx), addr, tmp, last, equiv;
int df = op_mode == DFmode;
- enum machine_mode mode = CODE_FOR_nothing; /* shut up warning. */
+ enum machine_mode mode = MAX_MACHINE_MODE; /* shut up warning. */
switch (code)
{
@@ -8741,13 +8742,13 @@ expand_sfunc_op (int nargs, enum machine
const char *name, rtx equiv, rtx *operands)
{
int next_reg = FIRST_PARM_REG, i;
- rtx addr, last, insn;
+ rtx addr, last;
addr = gen_reg_rtx (Pmode);
function_symbol (addr, name, SFUNC_FREQUENT);
for ( i = 1; i <= nargs; i++)
{
- insn = emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
+ emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
next_reg += GET_MODE_SIZE (mode) / UNITS_PER_WORD;
}
last = emit_insn ((*fun) (operands[0], addr));
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-20 13:35 ` Kaz Kojima
2010-07-21 11:25 ` Christian Bruel
2010-07-22 0:45 ` Kaz Kojima
@ 2010-07-22 6:41 ` Joern Rennecke
2010-07-22 12:23 ` Kaz Kojima
2 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-22 6:41 UTC (permalink / raw)
To: Kaz Kojima; +Cc: Naveen.S, gcc, Prafulla.Thakare
Quoting Kaz Kojima <kkojima@rr.iij4u.or.jp>:
> I've got some regressions with "make check" on sh4-unknown-linux-gnu.
> It looks that all of them are failed with the undefined references to
> __unorddf2/__unordsf2 when -mieee enabled.
That's a bug, then; we shouldn't use a library function there,
but the cmpordered[sd]f_t_4 patterns.
> I'm trying the attached patch over sh-softfp-20100718-2131 patch.
> All regressions go away with it on cross sh4-unknown-linux-gnu,
> though the native bootstrap will take a few days more.
But it's really the instruction expansion that needs to be fixed.
> BTW, it looks that softfp __unord?f2 routines check signaling NaNs
> only. This makes __builtin_isnan return false for quiet NaNs for
> which current fp-bit ones return true when -mieee enabled. Perhaps
> that change of behavior might be OK for software FP.
The SH port so far has been using fp-bit.c, which does not actually
support floating point signals, and neither does this optimized software
floating point.
So in essence, we only have quit NaNs. We might as well choose a bit pattern
that is easy to process, to keep code size down and improve performance.
Having a mantissa bit set that is adjacent to the exponent makes for easier
testing.
There is precedence for having the signalling bit in different places and
with different values (i.e. some have 1 == signalling, others 0 ==
signalling).
So we could say that the bit two below the exponent is the signalling bit,
and is active-low. Thus a 0xffffffff in the high or only word is a quiet
NaN.
Tests that feed specific NaN hex values could be disabled or feed modified
values for the SH[123].
OTOH for unorddf / unordsf support with sh4, you would want to keep the
distinction between signalling / quiet NaNs.
(Although I doubt many use signalling support,considering the cost when
you take a trap on every floating point instruction).
The sh.md cmpordered* patterns should do the right thing there, we just
have to keep emitting them.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-21 11:25 ` Christian Bruel
@ 2010-07-22 7:22 ` Christian Bruel
2010-07-22 7:37 ` Joern Rennecke
2010-07-22 13:58 ` Christian Bruel
0 siblings, 2 replies; 30+ messages in thread
From: Christian Bruel @ 2010-07-22 7:22 UTC (permalink / raw)
To: Kaz Kojima; +Cc: joern.rennecke, Naveen.S, gcc, Prafulla.Thakare
[-- Attachment #1: Type: text/plain, Size: 695 bytes --]
Christian Bruel wrote:
> Hi Kaz,
>
> Kaz Kojima wrote:
>
>>
>> BTW, it looks that softfp __unord?f2 routines check signaling NaNs
>> only. This makes __builtin_isnan return false for quiet NaNs for
>> which current fp-bit ones return true when -mieee enabled. Perhaps
>> that change of behavior might be OK for software FP.
>
> I use the attached patch to handle the QNaNs in the assembly solf-fp.
> Need to be updated for trunk (and update the dates in changelogs). Will do.
Edited to apply on top of latest Joern's patch. Certainly not optimal
but it fixes the QNaNs checks for builtins and inlined unordered
comparisons for -mieee or -fno-inite-math-only.
Best Regards
Christian
[-- Attachment #2: qnans_trunk.patch --]
[-- Type: text/plain, Size: 4848 bytes --]
2010-07-22 Christian Bruel <christian.bruel@st.com>
* gcc.dg/builtins-nan.c: New test.
2010-07-22 Christian Bruel <christian.bruel@st.com>
* config/sh/ieee-754-df.S (nedf2f): Don't check Qbit for NaNs.
* config/sh/ieee-754-sf.S (nesf2f): Likewise.
* config/sh/sh.md (cmpunsf_i1, cmpundf_i1): Likewise.
(cmpnesf_i1, cmpnedf_i1): Clobber R2.
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S 2010-07-21 18:04:17.000000000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S 2010-07-21 18:09:10.000000000 +0200
@@ -92,11 +92,12 @@
HIDDEN_FUNC(GLOBAL(nedf2))
GLOBAL(nedf2):
cmp/eq DBL0L,DBL1L
- mov.l LOCAL(c_DF_NAN_MASK),r1
- bf LOCAL(ne)
+ bf.s LOCAL(ne)
+ mov #1,r0
cmp/eq DBL0H,DBL1H
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ bt.s LOCAL(check_nan)
not DBL0H,r0
- bt LOCAL(check_nan)
mov DBL0H,r0
or DBL1H,r0
add r0,r0
@@ -104,11 +105,17 @@
or DBL0L,r0
LOCAL(check_nan):
tst r1,r0
- rts
+ bt.s LOCAL(nan)
+ mov #12,r2
+ shll16 r2
+ xor r2,r1
+ tst r1,r0
+LOCAL(nan):
movt r0
LOCAL(ne):
rts
- mov #1,r0
+ nop
+
.balign 4
LOCAL(c_DF_NAN_MASK):
.long DF_NAN_MASK
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S 2010-07-21 18:04:18.000000000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S 2010-07-21 18:09:10.000000000 +0200
@@ -51,13 +51,19 @@
cmp/eq r4,r5
mov.l LOCAL(c_SF_NAN_MASK),r1
not r4,r0
- bt LOCAL(check_nan)
+ bt.s LOCAL(check_nan)
mov r4,r0
or r5,r0
rts
add r0,r0
LOCAL(check_nan):
tst r1,r0
+ bt.s LOCAL(nan)
+ mov #96,r2
+ shll16 r2
+ xor r2,r1
+ tst r1,r0
+ LOCAL(nan):
rts
movt r0
.balign 4
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/sh.md gnu_trunk/gcc/gcc/config/sh/sh.md
--- gnu_trunk.ref/gcc/gcc/config/sh/sh.md 2010-07-21 18:06:25.000000000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/sh.md 2010-07-22 09:13:12.000000000 +0200
@@ -10262,6 +10262,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
"TARGET_SH1 && ! TARGET_SH2E"
"jsr @%1%#"
@@ -10337,13 +10338,18 @@
(define_insn "cmpunsf_i1"
[(set (reg:SI T_REG)
- (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
- (match_operand:SF 1 "arith_reg_operand" "r,r")))
- (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
- (clobber (match_scratch:SI 3 "=0,&r"))]
+ (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r")
+ (match_operand:SF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
"TARGET_SH1 && ! TARGET_SH2E"
- "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
- [(set_attr "length" "10")])
+ "not\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+ \tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f
+ \tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2
+ \tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+ \tnot\t%1,%3\;tst\t%2,%3
+ 0:"
+ [(set_attr "length" "28")])
;; ??? This is a lot of code with a lot of branches; a library function
;; might be better.
@@ -11069,6 +11075,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
"TARGET_SH1_SOFTFP"
"jsr @%1%#"
@@ -11093,6 +11100,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
"TARGET_SH1_SOFTFP"
"jsr @%1%#"
@@ -11110,13 +11118,18 @@
(define_insn "cmpundf_i1"
[(set (reg:SI T_REG)
- (unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
- (match_operand:DF 1 "arith_reg_operand" "r,r")))
- (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
- (clobber (match_scratch:SI 3 "=0,&r"))]
+ (unordered:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
"TARGET_SH1 && ! TARGET_SH2E"
- "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
- [(set_attr "length" "10")])
+ "not\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+ \tnot\t%S1,%3\;tst\t%2,%3\;bt.s\t0f
+ \tmov\t#12,%3\;shll16\t%3\;xor\t%3,%2
+ \tnot\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+ \tnot\t%S1,%3\;tst\t%2,%3
+0:"
+ [(set_attr "length" "28")])
;; ??? This is a lot of code with a lot of branches; a library function
;; might be better.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-22 7:22 ` Christian Bruel
@ 2010-07-22 7:37 ` Joern Rennecke
2010-07-22 11:58 ` Christian Bruel
2010-07-22 13:58 ` Christian Bruel
1 sibling, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-22 7:37 UTC (permalink / raw)
To: Christian Bruel; +Cc: Kaz Kojima, Naveen.S, gcc, Prafulla.Thakare
[-- Attachment #1: Type: text/plain, Size: 412 bytes --]
Quoting Christian Bruel <christian.bruel@st.com>:
> Edited to apply on top of latest Joern's patch. Certainly not optimal
> but it fixes the QNaNs checks for builtins and inlined unordered
> comparisons for -mieee or -fno-inite-math-only.
You are still on the wrong track; as I said in my earlier message, we
should not emit the library call for SH4 in the first place.
Please try the attached patch instead.
[-- Attachment #2: sh-softfp-predicate-fix --]
[-- Type: text/plain, Size: 665 bytes --]
--- predicates.md-20100718 2010-07-22 08:17:37.273500678 +0100
+++ predicates.md 2010-07-22 08:28:57.257502902 +0100
@@ -575,9 +575,10 @@
;; UNORDERED is only supported on SHMEDIA.
(define_predicate "sh_float_comparison_operator"
- (ior (match_operand 0 "ordered_comparison_operator")
- (and (match_test "TARGET_SHMEDIA")
- (match_code "unordered"))))
+ (if_then_else (match_test "TARGET_SHMEDIA")
+ (ior (match_operand 0 "ordered_comparison_operator")
+ (match_code "unordered"))
+ (match_operand 0 "comparison_operator")))
(define_predicate "shmedia_cbranch_comparison_operator"
(ior (match_operand 0 "equality_comparison_operator")
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-22 7:37 ` Joern Rennecke
@ 2010-07-22 11:58 ` Christian Bruel
2010-07-22 14:25 ` Joern Rennecke
0 siblings, 1 reply; 30+ messages in thread
From: Christian Bruel @ 2010-07-22 11:58 UTC (permalink / raw)
To: Joern Rennecke; +Cc: Kaz Kojima, Naveen.S, gcc, Prafulla.Thakare
Joern Rennecke wrote:
> Quoting Christian Bruel <christian.bruel@st.com>:
>
>> Edited to apply on top of latest Joern's patch. Certainly not optimal
>> but it fixes the QNaNs checks for builtins and inlined unordered
>> comparisons for -mieee or -fno-inite-math-only.
>
> You are still on the wrong track; as I said in my earlier message, we
> should not emit the library call for SH4 in the first place.
>
>
> Please try the attached patch instead.
>
Hello, Sorry for the mails that crossed.
I think we are dealing with 2 different problems here, that have the
same root. Original one was about undefined __unorddf2/__unordsf2
regression, for which you said that the library functions should not be
called. I agree, and my patch is not exclusive with yours in this regard.
I was dealing with functional issues in the SNanS bit checking in the
cmpun_ patterns (in addition to the floating point comparisons
functions). Which is exposed by the regression test that I provided (for
-m4-nofpu -mieee).
About the other part of your answer, non supporting SNaNs in the
fp-bit.c, it is a possibility that I didn't consider in my fix. This
restriction is quite a surprise to me because, related to NaNs, it is
not what I guess from the implementation of the fp-bit.c's isnan
function that does check for CLASS_SNAN, and CLASS_QNAN.
See for example the result of
static int misnanf(float v)
{
return (v != v);
}
called with either a QNaN or a SNaN. IMO The assembly model should have
the same semantic that the C model, which is not the case today.
Using -fsignaling-nans and eventually putting #ifdef __SUPPORT_SNAN__
around the checking doesn't change anything since the same call is done
to the floating point comparison function, that really needs to check
for both formats. If your are concerned about the extra cycles needed in
the nesf2f implementation (wich is nothing anyway compared to the C
model), we could certainly provide a specialized one just for
-fsignaling-nans.
Best Regards
Christian
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-22 6:41 ` Joern Rennecke
@ 2010-07-22 12:23 ` Kaz Kojima
2010-07-23 13:45 ` Kaz Kojima
0 siblings, 1 reply; 30+ messages in thread
From: Kaz Kojima @ 2010-07-22 12:23 UTC (permalink / raw)
To: joern.rennecke; +Cc: Naveen.S, gcc, Prafulla.Thakare
Joern Rennecke <joern.rennecke@embecosm.com> wrote:
> That's a bug, then; we shouldn't use a library function there,
> but the cmpordered[sd]f_t_4 patterns.
Argh, I've missed the required patterns are incorporated already
in your patch. I'll test it again with sh-softfp-predicate-fix
when the tests for 4.5.1-rc are done. Thanks!
Regards,
kaz
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-22 7:22 ` Christian Bruel
2010-07-22 7:37 ` Joern Rennecke
@ 2010-07-22 13:58 ` Christian Bruel
2010-07-22 16:14 ` Joern Rennecke
2010-07-22 16:23 ` Joern Rennecke
1 sibling, 2 replies; 30+ messages in thread
From: Christian Bruel @ 2010-07-22 13:58 UTC (permalink / raw)
To: Kaz Kojima; +Cc: joern.rennecke, Naveen.S, gcc, Prafulla.Thakare
[-- Attachment #1: Type: text/plain, Size: 1220 bytes --]
oops, resending it with a small typo fix (a branch became delayed :-().
Just in case it we accepted that SNaNs and QNaNs are not exclusive and
mimic the C model, a synthetic illustrative test case:
Compile with
sh-superh-elf-gcc -O2 -mieee -m4-nofpu snan.c snan2.c -g -o l.u ;
sh-superh-elf-run l.u ; echo $?
Original 4.6 fp-bit C model:
OK
Using the ieee-sf.S implementation:
FAIL
Using the ieee-sf.S + this patch
OK
same for sh4-linux.
Best Regards,
Christian
Christian Bruel wrote:
> Christian Bruel wrote:
>> Hi Kaz,
>>
>> Kaz Kojima wrote:
>>
>>> BTW, it looks that softfp __unord?f2 routines check signaling NaNs
>>> only. This makes __builtin_isnan return false for quiet NaNs for
>>> which current fp-bit ones return true when -mieee enabled. Perhaps
>>> that change of behavior might be OK for software FP.
>> I use the attached patch to handle the QNaNs in the assembly solf-fp.
>> Need to be updated for trunk (and update the dates in changelogs). Will do.
>
> Edited to apply on top of latest Joern's patch. Certainly not optimal
> but it fixes the QNaNs checks for builtins and inlined unordered
> comparisons for -mieee or -fno-inite-math-only.
>
> Best Regards
>
> Christian
>
>
>
[-- Attachment #2: qnans_trunk.patch --]
[-- Type: text/plain, Size: 4357 bytes --]
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S 2010-07-21 18:04:17.949950000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S 2010-07-21 18:09:10.602376000 +0200
@@ -92,11 +92,12 @@
HIDDEN_FUNC(GLOBAL(nedf2))
GLOBAL(nedf2):
cmp/eq DBL0L,DBL1L
- mov.l LOCAL(c_DF_NAN_MASK),r1
- bf LOCAL(ne)
+ bf.s LOCAL(ne)
+ mov #1,r0
cmp/eq DBL0H,DBL1H
+ mov.l LOCAL(c_DF_NAN_MASK),r1
+ bt.s LOCAL(check_nan)
not DBL0H,r0
- bt LOCAL(check_nan)
mov DBL0H,r0
or DBL1H,r0
add r0,r0
@@ -104,11 +105,17 @@
or DBL0L,r0
LOCAL(check_nan):
tst r1,r0
- rts
+ bt.s LOCAL(nan)
+ mov #12,r2
+ shll16 r2
+ xor r2,r1
+ tst r1,r0
+LOCAL(nan):
movt r0
LOCAL(ne):
rts
- mov #1,r0
+ nop
+
.balign 4
LOCAL(c_DF_NAN_MASK):
.long DF_NAN_MASK
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S 2010-07-22 14:21:50.606831000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S 2010-07-22 15:30:17.928097000 +0200
@@ -58,6 +58,12 @@
add r0,r0
LOCAL(check_nan):
tst r1,r0
+ bt.s LOCAL(nan)
+ mov #96,r2
+ shll16 r2
+ xor r2,r1
+ tst r1,r0
+ LOCAL(nan):
rts
movt r0
.balign 4
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/sh.md gnu_trunk/gcc/gcc/config/sh/sh.md
--- gnu_trunk.ref/gcc/gcc/config/sh/sh.md 2010-07-21 18:06:25.978547000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/sh.md 2010-07-22 09:13:12.599669000 +0200
@@ -10262,6 +10262,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
"TARGET_SH1 && ! TARGET_SH2E"
"jsr @%1%#"
@@ -10337,13 +10338,18 @@
(define_insn "cmpunsf_i1"
[(set (reg:SI T_REG)
- (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
- (match_operand:SF 1 "arith_reg_operand" "r,r")))
- (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
- (clobber (match_scratch:SI 3 "=0,&r"))]
+ (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r")
+ (match_operand:SF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
"TARGET_SH1 && ! TARGET_SH2E"
- "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
- [(set_attr "length" "10")])
+ "not\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+ \tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f
+ \tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2
+ \tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+ \tnot\t%1,%3\;tst\t%2,%3
+ 0:"
+ [(set_attr "length" "28")])
;; ??? This is a lot of code with a lot of branches; a library function
;; might be better.
@@ -11069,6 +11075,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
"TARGET_SH1_SOFTFP"
"jsr @%1%#"
@@ -11093,6 +11100,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+ (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
"TARGET_SH1_SOFTFP"
"jsr @%1%#"
@@ -11110,13 +11118,18 @@
(define_insn "cmpundf_i1"
[(set (reg:SI T_REG)
- (unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
- (match_operand:DF 1 "arith_reg_operand" "r,r")))
- (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
- (clobber (match_scratch:SI 3 "=0,&r"))]
+ (unordered:SI (match_operand:DF 0 "arith_reg_operand" "r")
+ (match_operand:DF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
"TARGET_SH1 && ! TARGET_SH2E"
- "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
- [(set_attr "length" "10")])
+ "not\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+ \tnot\t%S1,%3\;tst\t%2,%3\;bt.s\t0f
+ \tmov\t#12,%3\;shll16\t%3\;xor\t%3,%2
+ \tnot\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+ \tnot\t%S1,%3\;tst\t%2,%3
+0:"
+ [(set_attr "length" "28")])
;; ??? This is a lot of code with a lot of branches; a library function
;; might be better.
[-- Attachment #3: snan.c --]
[-- Type: text/plain, Size: 401 bytes --]
#include <stdlib.h>
extern int misnanf(float v);
extern int eqnf(float f3, float f4);
int main(void)
{
float f1 = __builtin_nansf("");
float f2 = __builtin_nanf("");
float f3 = 2.0;
float f4 = 3.3;
if (! misnanf (f1))
abort();
if (! misnanf (f2))
abort();
if (misnanf (f3))
abort();
if (!eqnf (f3, f4))
abort();
if (eqnf (f4, f4))
abort();
return 0;
}
[-- Attachment #4: snan2.c --]
[-- Type: text/plain, Size: 97 bytes --]
int eqnf(float f3, float f4)
{
return f3 != f4;
}
int misnanf(float v)
{
return (v != v);
}
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-22 11:58 ` Christian Bruel
@ 2010-07-22 14:25 ` Joern Rennecke
0 siblings, 0 replies; 30+ messages in thread
From: Joern Rennecke @ 2010-07-22 14:25 UTC (permalink / raw)
To: Christian Bruel; +Cc: Kaz Kojima, Naveen.S, gcc, Prafulla.Thakare
Quoting Christian Bruel <christian.bruel@st.com>:
> About the other part of your answer, non supporting SNaNs in the
> fp-bit.c, it is a possibility that I didn't consider in my fix. This
> restriction is quite a surprise to me because, related to NaNs, it is
> not what I guess from the implementation of the fp-bit.c's isnan
> function that does check for CLASS_SNAN, and CLASS_QNAN.
Well, it looks like a classic top-down implementation, carving up the
problem in little sub-problems, and then not implementing some of these
so that the case distinction between CLASS_SNAM and CLASS_QNAN becomes
pointless.
> See for example the result of
>
> static int misnanf(float v)
> {
> return (v != v);
> }
>
> called with either a QNaN or a SNaN. IMO The assembly model should have
> the same semantic that the C model, which is not the case today.
I would consider the exact bit patterns used for NaNs an implementation
detail, which the user should not need to care about.
We only implement QNaNs. fp-bit.c recognizes all NaN patterns, but treats
them all as QNaNs.
> Using -fsignaling-nans and eventually putting #ifdef __SUPPORT_SNAN__
> around the checking doesn't change anything since the same call is done
> to the floating point comparison function, that really needs to check
> for both formats.
Considering that the signals don't work, wouldn't a better implementation
of -fsignaling-nans be to issue a diagnostic when using this for a software
floating point ABI in sh.h OVERRIDE_OPTIONS ?
And somehow make using __builtin_nans / __builtin_nansf give a
diagnostic, too.
Unless you want to go further and really implement the signals.
I suppose you could use config/soft-fp for that.
> If your are concerned about the extra cycles needed
Both cycles and bytes.
> in the nesf2f implementation (wich is nothing anyway compared to the C
> model),
fp-bit is so slow that it can't be taken seriously as a benchmark for
software floating point emulation speed. The point of having a
hand-optimized assembly version is that you actually can show reasonable
performance for codes with light fpu usage, compared to a processors with
hardware floating point (which needs more die space and power, and might
not clock as high as the fpu-less version).
IIRC some EEMBC benchmarks are in that class, i.e. with the hand-optimized
software floating point they run several times faster than with fp-bit,
but going all the way to hardware floating point then gives diminishing
returns.
> we could certainly provide a specialized one just for
> -fsignaling-nans.
You'd also have to handle the other comparisons. grep for F_NAN_MASK
in ieee-754-sf.S / ieee-754-df.S.
The original intent was that the faster & more compact NaN check would
be available for all the software emulation code, although I used a more
inclusive check if I saw it could be done with the same cycle count.
I can't remember if I ended up using the mask check anywhere but in
ieee-754-sf.S / ieee-754-df.S .
If you want all possible IEEE NaN patterns to be honoured, someone should
check all these checks in the config/sh/IEEE-754/m3 directory...
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-22 13:58 ` Christian Bruel
@ 2010-07-22 16:14 ` Joern Rennecke
2010-07-23 9:32 ` Christian Bruel
2010-07-22 16:23 ` Joern Rennecke
1 sibling, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-22 16:14 UTC (permalink / raw)
To: Christian Bruel; +Cc: Kaz Kojima, Naveen.S, gcc, Prafulla.Thakare
Quoting Christian Bruel <christian.bruel@st.com>:
> Using the ieee-sf.S + this patch
> OK
Is this only a proof-of-concept, because you only change the ne[sd]f2
implementation? And you go out of your way to only accept a restricted
set of values. Plus, the overuse of the arithmetic unit hurts SH4-100 /
SH4-200 instruction pairing.
AFAICT you need only one cycle penalty, in the check_nan path:
GLOBAL(nesf2):
/* If the raw values are unequal, the result is unequal, unless
both values are +-zero.
If the raw values are equal, the result is equal, unless
the values are NaN. */
cmp/eq r4,r5
mov.l LOCAL(inf2),r1
bt/s LOCAL(check_nan)
mov r4,r0
or r5,r0
rts
add r0,r0
LOCAL(check_nan):
add r0,r0
cmp/hi r1,r0
rts
movt r0
.balign 4
LOCAL(inf2):
.long 0xff000000
You could even save four bytes by putting the check_nan label into the
delay slot, but I'm not sure if that'll discomfit any branch
prediction mechanism.
Disclaimer: I've not tested this code.
For the DFmode case, what about NaNs denoted by the low word, e.g.
0x7ff00000 000000001 ?
If so, the DFmode code could become something like this:
GLOBAL(nedf2):
cmp/eq DBL0L,DBL1L
mov.l LOCAL(inf2),r1
bf LOCAL(ne)
cmp/eq DBL0H,DBL1H
bt/s LOCAL(check_nan)
mov DBL0H,r0
or DBL1H,r0
add r0,r0
rts
or DBL0L,r0
LOCAL(check_nan):
tst DBL0L,DBL0L
add r0,r0
subc r1,r0
mov #-1,r0
rts
negc r0,r0
LOCAL(ne):
rts
mov #1,r0
.balign 4
LOCAL(inf2):
.long 0xffe00000
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-22 13:58 ` Christian Bruel
2010-07-22 16:14 ` Joern Rennecke
@ 2010-07-22 16:23 ` Joern Rennecke
1 sibling, 0 replies; 30+ messages in thread
From: Joern Rennecke @ 2010-07-22 16:23 UTC (permalink / raw)
To: gcc
Quoting Christian Bruel <christian.bruel@st.com>:
> oops, resending it with a small typo fix (a branch became delayed :-().
For an actual patch, you need to use the SL* macros from
config/sh/lib1funcs.h because the SH1 does not have delayed branches.
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: SH optimized software floating point routines
2010-07-16 10:26 ` Joern Rennecke
@ 2010-07-22 23:10 ` Joseph S. Myers
2010-07-23 2:00 ` Joern Rennecke
0 siblings, 1 reply; 30+ messages in thread
From: Joseph S. Myers @ 2010-07-22 23:10 UTC (permalink / raw)
To: Joern Rennecke; +Cc: Naveen H. S, Kaz Kojima, gcc, Prafulla Thakare
On Fri, 16 Jul 2010, Joern Rennecke wrote:
> Quoting "Naveen H. S" <Naveen.S@kpitcummins.com>:
>
> > extendsfdf2 - gcc.c-torture/execute/conversion.c
> > gcc.dg/torture/fp-int-convert-float.c, gcc.dg/pr28796-2.c
>
> Note that some tests invoke undefined behaviour; I've also come across this
> when doing optimized soft FP for ARCompact:
>
> http://gcc.gnu.org/viewcvs/branches/arc-4_4-20090909-branch/gcc/testsuite/gcc.dg/torture/fp-int-convert.h?r1=151539&r2=151545
That diff does not appear to relate to undefined behavior. GCC considers
these out-of-range conversions to yield an unspecified value, possibly
raising an exception, as per Annex F, and does not take the liberty of
optimizing on the basis of them being undefined when not in an IEEE mode.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: SH optimized software floating point routines
2010-07-22 23:10 ` Joseph S. Myers
@ 2010-07-23 2:00 ` Joern Rennecke
2010-07-23 9:02 ` Joseph S. Myers
0 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-23 2:00 UTC (permalink / raw)
To: Joseph S. Myers; +Cc: Naveen H. S, Kaz Kojima, gcc, Prafulla Thakare
Quoting "Joseph S. Myers" <joseph@codesourcery.com>:
> That diff does not appear to relate to undefined behavior. GCC considers
> these out-of-range conversions to yield an unspecified value, possibly
> raising an exception, as per Annex F, and does not take the liberty of
> optimizing on the basis of them being undefined when not in an IEEE mode.
Well, still, the test is wrong in possibly raising an exception there,
with no provisions to ignore the exception or catch any signal raised.
For the ARCompact, in order to test the floating point emulation better,
I had (there are still there in #if 0 /*DEBUG */ blocks) small wrappers
for each function to evaluate it once with the hand-optimized version,
and once with fp-bit.c, and abort on getting different values.
Now, fp-bit generally tries to yield some value that the programmer thought
might mean something, whereas the hand-optimized version treats computations
of unspecified values as irrelevant.
Considering:
GLOBAL(fixunsdfsi):
mov.w LOCAL(x413),r1 ! bias + 20
mov DBL0H,r0
shll DBL0H
mov.l LOCAL(mask),r3
mov #-21,r2
shld r2,DBL0H ! SH4-200 will start this insn in a new cycle
bt/s LOCAL(ret0)
sub r1,DBL0H
cmp/pl DBL0H ! SH4-200 will start this insn in a new cycle
and r3,r0
bf/s LOCAL(ignore_low)
addc r3,r0 ! uses T == 1; sets implict 1
mov #11,r2
shld DBL0H,r0 ! SH4-200 will start this insn in a new cycle
cmp/gt r2,DBL0H
add #-32,DBL0H
bt LOCAL(retmax)
shld DBL0H,DBL0L
rts
or DBL0L,r0
and:
__fixunsdfsi:
bbit0 DBL0H,30,.Lret0or1
lsr r2,DBL0H,20
bmsk_s DBL0H,DBL0H,19
sub_s r2,r2,19; 0x3ff+20-0x400
neg_s r3,r2
btst_s r3,10
bset_s DBL0H,DBL0H,20
#ifdef __LITTLE_ENDIAN__
mov.ne DBL0L,DBL0H
asl DBL0H,DBL0H,r2
#else
asl.eq DBL0H,DBL0H,r2
lsr.ne DBL0H,DBL0H,r3
#endif
lsr DBL0L,DBL0L,r3
j_s.d [blink]
add.eq r0,r0,r1
.Lret0:
j_s.d [blink]
mov_l r0,0
.Lret0or1:
add_s DBL0H,DBL0H,0x100000
lsr_s DBL0H,DBL0H,30
j_s.d [blink]
bmsk_l r0,DBL0H,0
You can see that an SH4-300 can perform software floating point
fixunsdfsi in ten cycles, and the SH4-400 (SH4-200 sans FPU)
and ARC700 in twelve.
Adding any code in order to compute nice, fluffy values for
unspecified results would cause a significant performance degradation.
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: SH optimized software floating point routines
2010-07-23 2:00 ` Joern Rennecke
@ 2010-07-23 9:02 ` Joseph S. Myers
0 siblings, 0 replies; 30+ messages in thread
From: Joseph S. Myers @ 2010-07-23 9:02 UTC (permalink / raw)
To: Joern Rennecke; +Cc: Naveen H. S, Kaz Kojima, gcc, Prafulla Thakare
On Thu, 22 Jul 2010, Joern Rennecke wrote:
> Quoting "Joseph S. Myers" <joseph@codesourcery.com>:
>
> > That diff does not appear to relate to undefined behavior. GCC considers
> > these out-of-range conversions to yield an unspecified value, possibly
> > raising an exception, as per Annex F, and does not take the liberty of
> > optimizing on the basis of them being undefined when not in an IEEE mode.
>
> Well, still, the test is wrong in possibly raising an exception there,
> with no provisions to ignore the exception or catch any signal raised.
The expectation is that floating-point exceptions will not trap by
default, again in accordance with Annex F even when not in an IEEE mode.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-22 16:14 ` Joern Rennecke
@ 2010-07-23 9:32 ` Christian Bruel
0 siblings, 0 replies; 30+ messages in thread
From: Christian Bruel @ 2010-07-23 9:32 UTC (permalink / raw)
To: Joern Rennecke; +Cc: Kaz Kojima, Naveen.S, gcc, Prafulla.Thakare
Joern Rennecke wrote:
> Quoting Christian Bruel <christian.bruel@st.com>:
>
>> Using the ieee-sf.S + this patch
>> OK
>
> Is this only a proof-of-concept, because you only change the ne[sd]f2
> implementation?
I changed also the unordered comparison patterns. (cmpunsf_i1,
cmpundf_i1). But yes, the other functions that would need the same kind
of check would be unordsf2, and all the comparisons (gtsf2, gesf2f...)
for floats and doubles.
But I will only consider those after/if we all agree that this needs to
be done instead of keeping the current QNaN only restrictions.
And you go out of your way to only accept a restricted
> set of values.
This hold for the original optimized implementation as well, for example
I don't think that 0x7f800001 was caught. In fact implementing correctly
the isnan check without restricted set of value makes the original
discussion pointless, since the Q/S bits are a subpart of all possible
codings, with any fractional part != 0.
Plus, the overuse of the arithmetic unit hurts SH4-100 /
> SH4-200 instruction pairing.
>
> AFAICT you need only one cycle penalty, in the check_nan path:
>
> GLOBAL(nesf2):
> /* If the raw values are unequal, the result is unequal, unless
> both values are +-zero.
> If the raw values are equal, the result is equal, unless
> the values are NaN. */
> cmp/eq r4,r5
> mov.l LOCAL(inf2),r1
> bt/s LOCAL(check_nan)
> mov r4,r0
> or r5,r0
> rts
> add r0,r0
> LOCAL(check_nan):
> add r0,r0
> cmp/hi r1,r0
> rts
> movt r0
> .balign 4
> LOCAL(inf2):
> .long 0xff000000
>
> You could even save four bytes by putting the check_nan label into the
> delay slot, but I'm not sure if that'll discomfit any branch
> prediction mechanism.
Thanks a lot of this one, It should fix the original problem on the
restricted set of values as well. The cmpund patterns fix should
probably have a similar checks.
>
> Disclaimer: I've not tested this code.
>
> For the DFmode case, what about NaNs denoted by the low word, e.g.
> 0x7ff00000 000000001 ?
>
> If so, the DFmode code could become something like this:
>
> GLOBAL(nedf2):
> cmp/eq DBL0L,DBL1L
> mov.l LOCAL(inf2),r1
> bf LOCAL(ne)
> cmp/eq DBL0H,DBL1H
> bt/s LOCAL(check_nan)
> mov DBL0H,r0
> or DBL1H,r0
>
> add r0,r0
> rts
> or DBL0L,r0
> LOCAL(check_nan):
> tst DBL0L,DBL0L
> add r0,r0
> subc r1,r0
> mov #-1,r0
> rts
> negc r0,r0
> LOCAL(ne):
> rts
> mov #1,r0
> .balign 4
> LOCAL(inf2):
> .long 0xffe00000
> For an actual patch, you need to use the SL* macros from
> config/sh/lib1funcs.h because the SH1 does not have delayed branches.
OK, thanks
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-07-22 12:23 ` Kaz Kojima
@ 2010-07-23 13:45 ` Kaz Kojima
2010-08-04 10:03 ` Naveen H. S
0 siblings, 1 reply; 30+ messages in thread
From: Kaz Kojima @ 2010-07-23 13:45 UTC (permalink / raw)
To: joern.rennecke; +Cc: christian.bruel, Naveen.S, gcc, Prafulla.Thakare
> Joern Rennecke <joern.rennecke@embecosm.com> wrote:
>> That's a bug, then; we shouldn't use a library function there,
>> but the cmpordered[sd]f_t_4 patterns.
>
> Argh, I've missed the required patterns are incorporated already
> in your patch. I'll test it again with sh-softfp-predicate-fix
> when the tests for 4.5.1-rc are done. Thanks!
I've tested sh-softfp-20100718-2131 + sh-softfp-predicate-fix
on -m1, -m2, -m3, -m3 -ml, -m2a on sh-elf, sh4-linux and
sh64-linux. sh64-linux required the first hunk of the attached
patch to build. The test with -m3 -ml shows some regressions
which look to be solved with the second hunk of the patch.
Now all test results look clean.
For NaN issue, I'd like to wait a full patch from Christian.
I'm curious again about the numbers with/without it.
Regards,
kaz
--
* config/sh/sh.md (cstoresf4): Fix typos.
* config/sh/ieee-754-df.S (unorddf2): Use DBL1H instead of r6.
diff -upr ORIG/trunk/gcc/config/sh/sh.md trunk/gcc/config/sh/sh.md
--- ORIG/trunk/gcc/config/sh/sh.md Wed Jul 21 08:12:23 2010
+++ trunk/gcc/config/sh/sh.md Thu Jul 22 10:36:36 2010
@@ -9462,10 +9462,10 @@ mov.l\\t1f,r0\\n\\
"TARGET_SH1 || TARGET_SHMEDIA_FPU"
"if (TARGET_SHMEDIA)
{
- if (!arith_operand (operands[2], DFmode))
- operands[2] = copy_to_mode_reg (DFmode, operands[2]);
- if (!arith_operand (operands[3], DFmode))
- operands[3] = copy_to_mode_reg (DFmode, operands[3]);
+ if (!arith_operand (operands[2], SFmode))
+ operands[2] = copy_to_mode_reg (SFmode, operands[2]);
+ if (!arith_operand (operands[3], SFmode))
+ operands[3] = copy_to_mode_reg (SFmode, operands[3]);
emit_insn (gen_cstore4_media (operands[0], operands[1],
operands[2], operands[3]));
DONE;
diff -uprN ORIG/trunk/gcc/config/sh/ieee-754-df.S trunk/gcc/config/sh/ieee-754-df.S
--- ORIG/trunk/gcc/config/sh/ieee-754-df.S 2010-07-20 11:39:29.000000000 +0900
+++ trunk/gcc/config/sh/ieee-754-df.S 2010-07-22 13:16:07.000000000 +0900
@@ -123,7 +123,7 @@ GLOBAL(unorddf2):
mov.l LOCAL(c_DF_NAN_MASK),r1
not DBL0H,r0
tst r1,r0
- not r6,r0
+ not DBL1H,r0
bt LOCAL(unord)
tst r1,r0
LOCAL(unord):
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: SH optimized software floating point routines
2010-07-23 13:45 ` Kaz Kojima
@ 2010-08-04 10:03 ` Naveen H. S
2010-08-04 11:36 ` Kaz Kojima
0 siblings, 1 reply; 30+ messages in thread
From: Naveen H. S @ 2010-08-04 10:03 UTC (permalink / raw)
To: Kaz Kojima, joern.rennecke; +Cc: christian.bruel, gcc, Prafulla Thakare
Hi,
>> I've tested sh-softfp-20100718-2131 + sh-softfp-predicate-fix
>> on -m1, -m2, -m3, -m3 -ml, -m2a on sh-elf, sh4-linux and
>> sh64-linux
The SH toolchain was built with the following patches and regression
is completed.
1. sh-softfp-20100718-2131
2. sh-softfp-predicate-fix
3. Patch by Kaz Kojima-san at following link
http://gcc.gnu.org/ml/gcc/2010-07/msg00352.html
However, there were some regressions compared to fresh toolchain.
The following list summarizes the regressions for each target.
m1, m2, m2a-nofpu
gcc.dg/pr28796-2.c
gcc.dg/torture/type-generic-1.c
m2e, m3e, -m2a-single-only
gcc.c-torture/execute/ieee/fp-cmp-4l.c
gcc.c-torture/execute/ieee/fp-cmp-8l.c
gcc.dg/builtins-43.c
gcc.dg/pr28796-2.c
gcc.dg/torture/type-generic-1.c
m3, m4-nofpu, m4-single-only, m4a-nofpu, m4a-single-only
gcc.c-torture/execute/20000605-1.c (-O0)
gcc.c-torture/execute/20060420-1.c (-Os)
gcc.c-torture/execute/loop-ivopts-1.c
gcc.c-torture/execute/pr39228.c(-O0)
gcc.dg/pr28796-2.c
gcc.dg/torture/type-generic-1.c
gcc.dg/pr41963.c
c-c++-common/torture/complex-sign-mixed-mul.c
gcc.target/sh/sh4a-fprun.c
>> Now all test results look clean
Please let me know whether these regressions are known and okay?
OR
Am I missing something in the patch which solves them?
Thanks & Regards,
Naveen
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: SH optimized software floating point routines
2010-08-04 10:03 ` Naveen H. S
@ 2010-08-04 11:36 ` Kaz Kojima
0 siblings, 0 replies; 30+ messages in thread
From: Kaz Kojima @ 2010-08-04 11:36 UTC (permalink / raw)
To: Naveen.S; +Cc: joern.rennecke, christian.bruel, gcc, Prafulla.Thakare
"Naveen H. S" <Naveen.S@kpitcummins.com> wrote:
> The SH toolchain was built with the following patches and regression
> is completed.
> 1. sh-softfp-20100718-2131
> 2. sh-softfp-predicate-fix
> 3. Patch by Kaz Kojima-san at following link
> http://gcc.gnu.org/ml/gcc/2010-07/msg00352.html
Thanks for testing.
> However, there were some regressions compared to fresh toolchain.
> The following list summarizes the regressions for each target.
[snip]
>>> Now all test results look clean
>
> Please let me know whether these regressions are known and okay?
"clean" might be a bad choice of words. I meant that results
are expected i.e. regressions I saw are NaN issues we discussed
in the list.
Although your list of arches and regressions is different from
mine, some regressions you've caught
> gcc.c-torture/execute/20000605-1.c (-O0)
> gcc.c-torture/execute/20060420-1.c (-Os)
> gcc.c-torture/execute/loop-ivopts-1.c
> gcc.c-torture/execute/pr39228.c(-O0)
> gcc.dg/pr41963.c
> c-c++-common/torture/complex-sign-mixed-mul.c
> gcc.target/sh/sh4a-fprun.c
don't look like NaN issues and we should know what is going on
in these cases.
Regards,
kaz
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2010-08-04 11:36 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-10 12:01 SH optimized software floating point routines Naveen H. S
2010-06-14 4:50 ` Kaz Kojima
2010-06-14 7:27 ` Joern Rennecke
2010-07-16 10:04 ` Naveen H. S
2010-07-16 10:26 ` Joern Rennecke
2010-07-22 23:10 ` Joseph S. Myers
2010-07-23 2:00 ` Joern Rennecke
2010-07-23 9:02 ` Joseph S. Myers
2010-07-16 14:01 ` Kaz Kojima
2010-07-17 14:31 ` Joern Rennecke
2010-07-17 21:23 ` Kaz Kojima
2010-07-19 9:23 ` Naveen H. S
2010-07-17 13:30 ` Joern Rennecke
2010-07-19 0:59 ` Joern Rennecke
2010-07-20 13:35 ` Kaz Kojima
2010-07-21 11:25 ` Christian Bruel
2010-07-22 7:22 ` Christian Bruel
2010-07-22 7:37 ` Joern Rennecke
2010-07-22 11:58 ` Christian Bruel
2010-07-22 14:25 ` Joern Rennecke
2010-07-22 13:58 ` Christian Bruel
2010-07-22 16:14 ` Joern Rennecke
2010-07-23 9:32 ` Christian Bruel
2010-07-22 16:23 ` Joern Rennecke
2010-07-22 0:45 ` Kaz Kojima
2010-07-22 6:41 ` Joern Rennecke
2010-07-22 12:23 ` Kaz Kojima
2010-07-23 13:45 ` Kaz Kojima
2010-08-04 10:03 ` Naveen H. S
2010-08-04 11:36 ` Kaz Kojima
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).