public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* SH optimized software floating point routines
@ 2010-06-10 12:01 Naveen H. S
  2010-06-14  4:50 ` Kaz Kojima
  0 siblings, 1 reply; 30+ messages in thread
From: Naveen H. S @ 2010-06-10 12:01 UTC (permalink / raw)
  To: gcc; +Cc: kkojima, Prafulla Thakare

Hi,

Software floating point(libgcc) routines were implemented for SH in the
following links:-
http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00063.html
http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00614.html
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00624.html

There were some discussions regarding the testing of these routines.
We had briefly tested those routines and found that they did not have
any major issues.
http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00791.html
Please let me know whether these routines can be used in SH toolchain.

Please let me know whether we should invoke these routines by default.
Currently, we are thinking of invoking these routines only on specifying
command line options. 

Regards,
Naveen.H.S



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-06-10 12:01 SH optimized software floating point routines Naveen H. S
@ 2010-06-14  4:50 ` Kaz Kojima
  2010-06-14  7:27   ` Joern Rennecke
  2010-07-16 10:04   ` Naveen H. S
  0 siblings, 2 replies; 30+ messages in thread
From: Kaz Kojima @ 2010-06-14  4:50 UTC (permalink / raw)
  To: Naveen.S; +Cc: gcc, Prafulla.Thakare

"Naveen H. S" <Naveen.S@kpitcummins.com> wrote:
> Software floating point(libgcc) routines were implemented for SH in the
> following links:-
> http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00063.html
> http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00614.html
> http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00624.html
> 
> There were some discussions regarding the testing of these routines.
> We had briefly tested those routines and found that they did not have
> any major issues.
> http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00791.html
> Please let me know whether these routines can be used in SH toolchain.

SH currently uses fp-bit.c implementation of soft fp which is
known as a rather inefficient one.  PowerPC uses a more efficient
soft-fp derived from glibc.  Other targets like arm, ia64, sparc
and also some new targets use it.  Please see config/*/t-*softfp
and config/*/sfp-machine.h.  We can expect the best performance
from Joern's assembly soft fp, but in general the maintenance of
hand crafted assembly codes will be hard.
If I remember correctly, there were some arguments for this
target specific one vs. generic one when soft-fp was introduced
in gcc.  It would be better to try soft-fp on SH and see numbers
with current fp-bit, soft-fp and assembly one. 
If soft-fp is yet too inefficient for you, you can free to propose
a complete and regtested patch for SH assembly soft fp against
trunk.  The original Joern's patch touched not only SH specific
part but also the generic part of the compiler.  The revised patch
is needed to test it for various targets too.

Regards,
	kaz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-06-14  4:50 ` Kaz Kojima
@ 2010-06-14  7:27   ` Joern Rennecke
  2010-07-16 10:04   ` Naveen H. S
  1 sibling, 0 replies; 30+ messages in thread
From: Joern Rennecke @ 2010-06-14  7:27 UTC (permalink / raw)
  To: Kaz Kojima; +Cc: Naveen.S, gcc, Prafulla.Thakare

Quoting Kaz Kojima <kkojima@rr.iij4u.or.jp>:
> but in general the maintenance of
> hand crafted assembly codes will be hard.

If you have a fixed feature set and pipeline, and have made sure the code
is correct, no further maintenance should be required - which is more
than can be said of the target port code generator, which tends to fall
back in generated code performance if you have a finely tuned port and
don't keep up with changes in the optimizers.

AFAIK the pipeline variations in the SH still don't come close to the
difference between compiler generated code and assembly code that is
properly hand-optimized for the SH4-[123]00 pipelines.

> If soft-fp is yet too inefficient for you, you can free to propose
> a complete and regtested patch for SH assembly soft fp against
> trunk.

IIRC I never finished working on some corner cases of division rounding in
the SH floating-point emulation because I had no testcase (and it was not
high on my agenda because there seemed to be little practical relevance).

In order to test the ARCompact floating point emulations, I made a new test
to check rounding of subnormal numbers during division:

http://gcc.gnu.org/viewcvs/branches/arc-4_4-20090909-branch/gcc/testsuite/gcc.c-torture/execute/ieee/denorm-rand.c?limit_changes=0&view=markup&pathrev=151545

To give the code a good workout, you can up the iteration count, e.g.
1000 -> 1000000 .  However, you probably don't want to do this with
fp-bit at -O0.

Also note that fp-bit.c in SVN gets the rounding wrong, you can use this
patch to have a sane comparison:
http://gcc.gnu.org/viewcvs/branches/arc-4_4-20090909-branch/gcc/config/fp-bit.c?limit_changes=0&r1=151545&r2=151544&pathrev=151545

FWIW, these are the associated ChangeLog entries:

gcc/testsuite:

2008-04-04 J"orn Rennecke <joern.rennecke@arc.com>

	* gcc.c-torture/execute/ieee/denorm-rand.c: New file.
	* gcc.dg/torture/fp-int-convert.h: Avoid undefined behaviour.
gcc:

2008-04-04 J"orn Rennecke <joern.rennecke@arc.com>

	* config/fp-bit.c (_fpdiv_parts): Avoid double rounding.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: SH optimized software floating point routines
  2010-06-14  4:50 ` Kaz Kojima
  2010-06-14  7:27   ` Joern Rennecke
@ 2010-07-16 10:04   ` Naveen H. S
  2010-07-16 10:26     ` Joern Rennecke
                       ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Naveen H. S @ 2010-07-16 10:04 UTC (permalink / raw)
  To: Kaz Kojima; +Cc: gcc, Prafulla Thakare, amylaar

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

Hi,

>> you can free to propose a complete and regtested patch for SH 
>> assembly soft fp against trunk.

Please find attached the ported soft float patch "sh_softfloat.patch". 
The original patch was posted at the following link by Joern RENNECKE.
http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00614.html

The following modifications have been done in the patch.

sched-deps.c :- Hunk was not applied due to modifications in the 
current gcc source.

t-sh :- divsf3, extendsfdf2 and truncdfsf2 routines are not included
as they resulted in some regressions.
divsf3 - c-c++-common/torture/complex-sign-mixed-div.c
extendsfdf2 - gcc.c-torture/execute/conversion.c 
gcc.dg/torture/fp-int-convert-float.c, gcc.dg/pr28796-2.c
gcc.dg/torture/type-generic-1.c

sh.md :- cbranchsf4, cbranchdf4, cstoresf4, cstoredf4 instruction 
patterns are not included as they are already present in current source.
Modifying the routines referring patch resulted in build failure.
Branch and other related instruction patterns bgt, blt, ble, bge, bleu, 
bunle, bunordered, bunlt, bunge, bungt, seq, sge, sgeu, sne etc are not
modified as they should not be present in current gcc source.

After the above modifications, the regressions are reduced. However, 
there are some regressions in SH3 and related  (m4-nofpu,
m4a-nofpu and m2a-single) targets.
gcc.c-torture/execute/20060420-1.c - O1 and O2 execution failures
gcc.dg/fold-overflow-1.c scan-assembler-times 2139095040

We are working on fixing the above failures and also extendsfdf2, divsf3
and truncdfsf2 routines. Please review the patch and let us know if 
there should be any modifications in it.

Please let us know if there are any inputs in resolving the failures and 
routines.

>> It would be better to try soft-fp on SH 

Please find attached the patch ""sh_softfp.patch" which implements basic
support of soft-fp for SH target. There were no regressions found with 
the patch. Please let us know if there should be any further improvements
required for complete soft-fp support.

Thanks for the co-operation and support.

Regards,
Naveen.H.S


[-- Attachment #2: sh_softfloat.patch --]
[-- Type: application/octet-stream, Size: 270966 bytes --]

diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/Makefile.in gcc-4.5.0/gcc/Makefile.in
--- gcc-4.5.0/gcc/Makefile.in	2010-04-02 13:19:06.000000000 +0530
+++ gcc-4.5.0/gcc/Makefile.in	2010-07-14 11:58:33.000000000 +0530
@@ -2701,7 +2701,7 @@ opts-common.o : opts-common.c opts.h $(C
    coretypes.h intl.h
 targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \
    $(EXPR_H) $(TM_H) $(RTL_H) $(TM_P_H) $(FUNCTION_H) output.h $(TOPLEV_H) \
-   $(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) gt-targhooks.h \
+   $(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) $(REGS_H) gt-targhooks.h \
    $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h
 
 bversion.h: s-bversion; @true
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/reload.c gcc-4.5.0/gcc/reload.c
--- gcc-4.5.0/gcc/reload.c	2009-12-21 22:02:44.000000000 +0530
+++ gcc-4.5.0/gcc/reload.c	2010-07-14 11:58:33.000000000 +0530
@@ -2212,14 +2212,8 @@ operands_match_p (rtx x, rtx y)
 	 multiple hard register group of scalar integer registers, so that
 	 for example (reg:DI 0) and (reg:SI 1) will be considered the same
 	 register.  */
-      if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
-	  && SCALAR_INT_MODE_P (GET_MODE (x))
-	  && i < FIRST_PSEUDO_REGISTER)
-	i += hard_regno_nregs[i][GET_MODE (x)] - 1;
-      if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (y)) > UNITS_PER_WORD
-	  && SCALAR_INT_MODE_P (GET_MODE (y))
-	  && j < FIRST_PSEUDO_REGISTER)
-	j += hard_regno_nregs[j][GET_MODE (y)] - 1;
+      i = targetm.match_adjust (x, i);
+      j = targetm.match_adjust (y, j);
 
       return i == j;
     }
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/target-def.h gcc-4.5.0/gcc/target-def.h
--- gcc-4.5.0/gcc/target-def.h	2010-03-25 02:14:48.000000000 +0530
+++ gcc-4.5.0/gcc/target-def.h	2010-07-14 11:58:33.000000000 +0530
@@ -401,6 +401,10 @@
 #define TARGET_SUPPORT_VECTOR_MISALIGNMENT \
   default_builtin_support_vector_misalignment
 
+#ifndef TARGET_MATCH_ADJUST
+#define TARGET_MATCH_ADJUST default_match_adjust
+#endif
+
 
 #define TARGET_VECTORIZE                                                \
   {									\
@@ -1009,6 +1013,7 @@
   TARGET_CONVERT_TO_TYPE,			\
   TARGET_IRA_COVER_CLASSES,			\
   TARGET_SECONDARY_RELOAD,			\
+  TARGET_MATCH_ADJUST,				\
   TARGET_EXPAND_TO_RTL_HOOK,			\
   TARGET_INSTANTIATE_DECLS,			\
   TARGET_HARD_REGNO_SCRATCH_OK,			\
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/target.h gcc-4.5.0/gcc/target.h
--- gcc-4.5.0/gcc/target.h	2010-03-27 15:57:39.000000000 +0530
+++ gcc-4.5.0/gcc/target.h	2010-07-14 11:58:33.000000000 +0530
@@ -1027,6 +1027,10 @@ struct gcc_target
 				      enum machine_mode,
 				      secondary_reload_info *);
 
+  /* Take an rtx and its regno, and return the regno for purposes of
+     checking a matching constraint.  */
+  int (*match_adjust) (rtx, int);
+
   /* This target hook allows the backend to perform additional
      processing while initializing for variable expansion.  */
   void (* expand_to_rtl_hook) (void);
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/targhooks.c gcc-4.5.0/gcc/targhooks.c
--- gcc-4.5.0/gcc/targhooks.c	2010-03-27 15:57:39.000000000 +0530
+++ gcc-4.5.0/gcc/targhooks.c	2010-07-14 11:58:33.000000000 +0530
@@ -66,6 +66,7 @@ along with GCC; see the file COPYING3.  
 #include "reload.h"
 #include "optabs.h"
 #include "recog.h"
+#include "regs.h"
 
 
 bool
@@ -268,6 +269,27 @@ default_cxx_guard_type (void)
   return long_long_integer_type_node;
 }
 
+/*  Given an rtx and its regno, return a regno value that shall be used for
+    purposes of comparison in operands_match_p.
+    Generally, we say that integer registers are subject to big-endian
+    adjustment.  This default target hook should generally work if the mode
+    of a register is a sufficient indication if this adjustment is to take
+    place; this will not work when software floating point is done in integer
+    registers.  */
+int
+default_match_adjust (rtx x, int regno)
+{
+  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+     multiple hard register group of scalar integer registers, so that
+     for example (reg:DI 0) and (reg:SI 1) will be considered the same
+     register.  */
+  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+      && SCALAR_INT_MODE_P (GET_MODE (x))
+      && regno < FIRST_PSEUDO_REGISTER)
+    regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+  return regno;
+}
+
 
 /* Returns the size of the cookie to use when allocating an array
    whose elements have the indicated TYPE.  Assumes that it is already
diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/targhooks.h gcc-4.5.0/gcc/targhooks.h
--- gcc-4.5.0/gcc/targhooks.h	2010-03-27 15:57:39.000000000 +0530
+++ gcc-4.5.0/gcc/targhooks.h	2010-07-14 11:58:33.000000000 +0530
@@ -109,6 +109,7 @@ extern const enum reg_class *default_ira
 extern enum reg_class default_secondary_reload (bool, rtx, enum reg_class,
 						enum machine_mode,
 						secondary_reload_info *);
+extern int default_match_adjust (rtx, int);
 extern void hook_void_bitmap (bitmap);
 extern bool default_handle_c_option (size_t, const char *, int);
 extern int default_reloc_rw_mask (void);
Common subdirectories: gcc-4.5.0/gcc/testsuite and gcc-4.5.0/gcc/testsuite
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/adddf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/adddf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/adddf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/adddf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,799 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for adding two double numbers
+
+! Author: Rakesh Kumar
+! SH1 Support by Joern Rennecke
+! Sticky Bit handling : Joern Rennecke
+
+! Arguments: r4-r5, r6-r7
+! Result: r0-r1
+
+! The value in r4-r5 is referred to as op1
+! and that in r6-r7 is referred to as op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+        .align 5
+	.global	GLOBAL (subdf3)
+	FUNC (GLOBAL (subdf3))
+        .global GLOBAL (adddf3)
+	FUNC (GLOBAL (adddf3))
+
+GLOBAL (subdf3):
+#ifdef __LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r6,r2
+
+	mov	r5,r4
+	mov	r7,r6
+
+	mov	r1,r5
+	mov	r2,r7
+#endif
+	mov.l	.L_sign,r2
+	bra	.L_adddf3_1
+	xor	r2,r6
+
+GLOBAL (adddf3):
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r6,r2
+
+	mov	r5,r4
+	mov	r7,r6
+
+	mov	r1,r5
+	mov	r2,r7
+#endif
+	
+.L_adddf3_1:
+	mov.l	r8,@-r15
+	mov	r4,r1
+
+	mov.l 	.L_inf,r2
+	mov	r6,r3
+
+	mov.l	r9,@-r15
+	and	r2,r1		!Exponent of op1 in r1
+
+	mov.l	r10,@-r15
+	and	r2,r3		!Exponent of op2 in r3
+
+	! Check for Nan or Infinity
+	mov.l	.L_sign,r9
+	cmp/eq	r2,r1
+
+	mov	r9,r10
+	bt	.L_thread_inv_exp_op1
+
+	mov	r9,r0
+	cmp/eq	r2,r3
+! op1 has a valid exponent. We need not check it again.
+! Return op2 straight away.
+	and	r4,r9		!r9 has sign bit for op1
+	bt	.L_ret_op2
+
+	! Check for -ve zero
+	cmp/eq	r4,r0
+	and	r6,r10		!r10 has sign bit for op2
+
+	bt	.L_op1_nzero
+
+	cmp/eq	r6,r0
+	bt	.L_op2_nzero
+
+! Check for zero
+.L_non_zero:
+	tst	r4,r4
+	bt	.L_op1_zero
+
+	! op1 is not zero, check op2 for zero
+	tst	r6,r6
+	bt	.L_op2_zero
+
+! r1 and r3 has masked out exponents, r9 and r10 has signs
+.L_add:
+	mov.l	.L_high_mant,r8
+	mov	#-20,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r1		! r1 now has exponent for op1 in its lower bits
+#else
+	SHLR20 (r1)
+#endif
+	and	r8,r6	! Higher bits of mantissa of op2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3		! r3 has exponent for op2 in its lower bits
+#else
+	SHLR20 (r3)
+#endif
+	and	r8,r4	! Higher bits of mantissa of op1
+
+	mov.l	.L_21bit,r8
+
+	tst	r1,r1
+	bt	.L_norm_op1
+
+	! Set the 21st bit.
+	or	r8,r4
+	tst	r3,r3
+
+	bt	.L_norm_op2
+	or	r8,r6
+
+! Check for negative mantissas. Make them positive by negation
+! r9 and r10 have signs of op1 and op2 respectively
+.L_neg_mant:
+	tst	r9,r9
+	bf	.L_neg_op1
+
+	tst	r10,r10
+	bf	.L_neg_op2
+
+.L_add_1:
+	cmp/ge	r1,r3
+
+	mov	r1,r0
+	bt	.L_op2_exp_greater
+
+	sub	r3,r0
+	! If exponent difference is greater than 54, the resultant exponent
+	! won't be changed. Return op1 straight away.
+	mov	#54,r2
+	cmp/gt	r2,r0
+
+	bt	.L_pack_op1
+
+	mov	r1,r3
+	clrt
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+
+	! Shift left the first operand and apply rest of shifts to second operand.
+	mov	#0,r2
+	shll	r5
+
+	rotcl	r4
+
+	add	#-1,r3
+	dt	r0
+
+	bt	.L_add_mant
+	dt	r0
+
+	bt	LOCAL(got_guard)
+	dt	r0
+
+	bt	LOCAL(got_sticky)
+
+! Shift the mantissa part of op2 so that both exponents are equal
+.L_shfrac_op2:
+	shar	r6
+	or	r7,r2	! sticky bit
+
+	rotcr	r7
+	dt	r0
+
+	bf	.L_shfrac_op2
+
+	shlr	r2
+
+	subc	r2,r2	! spread sticky bit across r2
+LOCAL(got_sticky):
+	shar	r6
+
+	rotcr	r7
+
+	rotcr	r2
+LOCAL(got_guard):
+	shar	r6
+
+	rotcr	r7
+
+	rotcr	r2
+
+
+! Add the psotive mantissas and check for overflow by checking the
+! MSB of the resultant. In case of overflow, negate the result.
+.L_add_mant:
+	clrt
+	addc	r7,r5
+
+	mov	#0,r10	! Assume resultant to be positive
+	addc	r6,r4
+
+	cmp/pz	r4
+
+	bt	.L_mant_ptv
+	negc	r2,r2
+
+	negc	r5,r5
+
+	mov.l	.L_sign,r10 ! The assumption was wrong, result is negative
+	negc	r4,r4
+
+! 23rd bit in the high part of mantissa could be set.
+! In this case, right shift the mantissa.
+.L_mant_ptv:
+	mov.l	.L_23bit,r0
+
+	tst	r4,r0
+	bt	.L_mant_ptv_0
+
+	shlr	r4
+	rotcr	r5
+
+	add	#1,r3
+	bra	.L_mant_ptv_1
+	rotcr	r2
+
+.L_mant_ptv_0:
+	mov.l	.L_22bit,r0
+	tst	r4,r0
+
+	bt	.L_norm_mant
+
+.L_mant_ptv_1:
+	! 22 bit of resultant mantissa is set. Shift right the mantissa
+	! and add 1 to exponent
+	add	#1,r3
+	shlr	r4
+	rotcr	r5
+	! The mantissa is already normalized. We don't need to
+	! spend any effort. Branch to epilogue. 
+	bra	.L_epil
+	rotcr	r2
+
+! Normalize operands
+.L_norm_op1:
+	shll	r5
+
+	rotcl	r4
+	add	#-1,r1
+
+	tst	r4,r8
+	bt	.L_norm_op1
+
+	tst	r3,r3
+	SL(bf,	.L_neg_mant,
+	 add	#1,r1)
+
+.L_norm_op2:
+	shll	r7
+
+	rotcl	r6
+	add	#-1,r3
+
+	tst	r6,r8
+	bt	.L_norm_op2
+
+	bra	.L_neg_mant
+	add	#1,r3
+
+! Negate the mantissa of op1
+.L_neg_op1:
+	clrt
+	negc	r5,r5
+
+	negc	r4,r4
+	tst	r10,r10
+
+	bt	.L_add_1
+
+! Negate the mantissa of op2
+.L_neg_op2:
+	clrt
+	negc	r7,r7
+
+	bra	.L_add_1
+	negc	r6,r6
+
+! Thread the jump to .L_inv_exp_op1
+.L_thread_inv_exp_op1:
+	bra	.L_inv_exp_op1
+	nop
+
+.L_ret_op2:
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r6,r1
+#else
+	mov	r6,r0
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r7,r0
+#else
+	mov	r7,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+.L_op1_nzero:
+	tst	r5,r5
+	bt	.L_ret_op2
+
+	! op1 is not zero. Check op2 for negative zero
+	cmp/eq	r6,r0
+	bf	.L_non_zero	! both op1 and op2 are not -0
+
+.L_op2_nzero:
+	tst	r7,r7
+	bf	.L_non_zero
+
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+	mov	r4,r0	! op2 is -0, return op1
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+	mov	r5,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! High bit of op1 is known to be zero.
+! Check low bit. r2 contains 0x00000000
+.L_op1_zero:
+	tst	r5,r5
+	bt	.L_ret_op2
+
+	! op1 is not zero. Check high bit of op2
+	tst	r6,r6
+	bf	.L_add	! both op1 and op2 are not zero
+
+! op1 is not zero. High bit of op2 is known to be zero.
+! Check low bit of op2. r2 contains 0x00000000
+.L_op2_zero:
+	tst	r7,r7
+	bf	.L_add
+
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+	mov	r4,r0	! op2 is zero, return op1
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+	mov	r5,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! exp (op1) is smaller or equal to exp (op2)
+! The logic of same operations is present in .L_add. Kindly refer it for
+! comments
+.L_op2_exp_greater:
+	mov	r3,r0
+	sub	r1,r0
+
+	mov	#54,r2
+	cmp/gt	r2,r0
+
+	bt	.L_pack_op2
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+
+	mov	#0,r2
+	shll	r7
+	rotcl	r6
+	add	#-1,r0
+	add	#-1,r3
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+.L_shfrac_op1:	
+        add     #-1,r0
+        shar    r4
+
+	rotcr	r5
+	rotcr	r2
+
+        cmp/eq  #0,r0
+        bf      .L_shfrac_op1
+
+	bra	.L_add_mant
+	nop
+
+! Return the value in op1
+.L_ret_op1:
+        mov.l   @r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+        mov     r4,r0
+#endif
+
+        mov.l   @r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+        mov     r5,r1
+#endif
+
+        rts
+        mov.l   @r15+,r8
+
+! r1 has exp, r9 has sign, r4 and r5 mantissa
+.L_pack_op1:
+	mov.l	.L_high_mant,r7
+	mov	r4,r0
+
+	tst	r9,r9
+	bt	.L_pack_op1_1
+
+	clrt
+	negc	r5,r5
+	negc	r0,r0
+
+.L_pack_op1_1:
+	and	r7,r0
+	mov	r1,r3
+
+	mov	#20,r2
+	mov	r5,r1
+
+	mov.l	@r15+,r10
+	or	r9,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+	mov.l	@r15+,r9
+
+	or	r3,r0
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+!r2 has exp, r10 has sign, r6 and r7 mantissa
+.L_pack_op2:
+	mov.l	.L_high_mant,r9
+	mov	r6,r0
+
+	tst	r10,r10
+	bt	.L_pack_op2_1
+
+	clrt
+	negc	r7,r7
+	negc	r0,r0
+
+.L_pack_op2_1:
+	and	r9,r0
+	mov	r7,r1
+
+	mov	#20,r2
+	or	r10,r0
+
+	mov.l	@r15+,r10
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+
+	mov.l	@r15+,r9
+
+	or	r3,r0
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+! Normalize the mantissa by setting its 21 bit in high part
+.L_norm_mant:
+	mov.l	.L_21bit,r0
+
+	tst	r4,r0
+	bf	.L_epil
+
+	tst	r4,r4
+	bf	.L_shift_till_1
+
+	tst	r5,r5
+	bf	.L_shift_till_1
+
+	! Mantissa is zero, return 0
+	mov.l	@r15+,r10
+	mov	#0,r0
+
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+
+	rts
+	mov	#0,r1
+
+! A loop for making the 21st bit 1 in high part of resultant mantissa
+! It is already ensured that 1 bit is present in the mantissa
+.L_shift_till_1:
+	clrt
+	shll	r5
+
+	rotcl	r4
+	add	#-1,r3
+
+	tst	r4,r0
+	bt	.L_shift_till_1
+
+! Return the result. Mantissa is in r4-r5. Exponent is in r3
+! Sign bit in r10
+.L_epil:
+	cmp/pl	r3
+
+	bf	.L_denorm
+	mov.l	LOCAL(x7fffffff),r0
+
+	mov	r5,r1
+	shlr	r1
+
+	mov	#0,r1
+	addc	r0,r2
+
+! Check extra MSB here
+	mov.l	.L_22bit,r9
+	addc	r1,r5	! round to even
+
+	addc	r1,r4
+	tst	r9,r4
+
+	bf	.L_epil_1
+
+.L_epil_0:
+	mov.l	.L_21bit,r1
+
+	not	r1,r1
+	and	r1,r4
+
+	mov	r4,r0
+	or	r10,r0
+
+	mov.l	@r15+,r10
+	mov	#20,r2
+
+	mov.l	@r15+,r9
+	mov	r5,r1
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+	or	r3,r0
+
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+.L_epil_1:
+	shlr	r4
+	add	#1,r3
+	bra	.L_epil_0
+	rotcr	r5
+
+.L_denorm:
+	add	#-1,r3
+.L_denorm_1:
+	tst	r3,r3
+	bt	.L_denorm_2
+
+	shlr	r4
+	rotcr	r5
+
+	movt	r1
+	bra	.L_denorm_1
+	add	#1,r3
+
+.L_denorm_2:
+	clrt
+	mov	#0,r2
+	addc	r1,r5
+
+	addc	r2,r4
+	mov	r4,r0
+
+	or	r10,r0
+	mov.l	@r15+,r10
+
+	mov	r5,r1
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+! op1 is known to be positive infinity, and op2 is Inf. The sign
+! of op2 is not known. Return the appropriate value
+.L_op1_pinf_op2_inf:
+	mov.l	.L_sign,r0
+	tst	r6,r0
+
+	bt	.L_ret_op2_1
+
+	! op2 is negative infinity. Inf - Inf is being performed
+	mov.l	.L_inf,r0
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r1
+#endif
+	mov.l	@r15+,r8
+
+	rts
+#ifdef	__LITTLE_ENDIAN__
+	mov	#1,r0
+#else
+	mov	#1,r1	! Any value here will return Nan
+#endif
+	
+.L_ret_op1_1:
+        mov.l   @r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+        mov     r4,r0
+#endif
+
+        mov.l   @r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+        mov     r5,r1
+#endif
+
+        rts
+        mov.l   @r15+,r8
+
+.L_ret_op2_1:
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r6,r1
+#else
+	mov	r6,r0
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r7,r0
+#else
+	mov	r7,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! op1 is negative infinity. Check op2 for infinity or Nan
+.L_op1_ninf:
+	cmp/eq	r2,r3
+	bf	.L_ret_op1_1	! op2 is neither Nan nor Inf
+
+	mov.l	@r15+,r9
+	div0s	r4,r6		! different signs -> NaN
+	mov	r4,DBLRH
+	or	r6,DBLRH
+	mov.l	@r15+,r8
+	SL(bf, 0f,
+	 mov	r5,DBLRL)
+	mov	#-1,DBLRH	! return NaN.
+0:	rts
+	or	r7,DBLRL
+
+!r1 contains exponent for op1, r3 contains exponent for op2
+!r2 has .L_inf (+ve Inf)
+!op1 has invalid exponent. Either it contains Nan or Inf
+.L_inv_exp_op1:
+	! Check if a is Nan
+	cmp/pl	r5
+	bt	.L_ret_op1_1
+
+	mov.l	.L_high_mant,r0
+	and	r4,r0
+
+	cmp/pl	r0
+	bt	.L_ret_op1_1
+
+	! op1 is not Nan. It is infinity. Check the sign of it.
+	! If op2 is Nan, return op2
+	cmp/pz	r4
+
+	bf	.L_op1_ninf
+
+	! op2 is +ve infinity here
+	cmp/eq	r2,r3
+	bf	.L_ret_op1_1	! op2 is neither Nan nor Inf
+
+	! r2 is free now
+	mov.l	.L_high_mant,r0
+	tst	r6,r0		! op2 also has invalid exponent
+
+	bf	.L_ret_op2_1	! op2 is Infinity, and op1 is +Infinity
+
+	tst	r7,r7
+	bt	.L_op1_pinf_op2_inf	! op2 is Infinity, and op1 is +Infinity
+	!op2 is not infinity, It is Nan
+	bf	.L_ret_op2_1
+
+	.align 2	
+.L_high_mant:
+	.long 0x000FFFFF
+
+.L_21bits:
+	.long 0x001FFFFF
+
+.L_22bit:
+	.long 0x00200000
+
+.L_23bit:
+	.long 0x00400000
+
+.L_21bit:
+	.long 0x00100000
+
+.L_sign:
+	.long 0x80000000
+
+.L_inf:
+	.long 0x7ff00000
+
+LOCAL(x7fffffff): .long 0x7fffffff
+
+ENDFUNC (GLOBAL (subdf3))
+ENDFUNC (GLOBAL (adddf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/addsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/addsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/addsf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/addsf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,535 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Add floating point numbers in r4, r5.
+
+! Author: Rakesh Kumar
+
+! Arguments are in r4, r5 and result in r0
+
+! Entry points: ___subsf3, ___addsf3
+
+! r4 and r5 are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+        .global GLOBAL (subsf3)
+	.global	GLOBAL (addsf3)
+	FUNC (GLOBAL (subsf3))
+	FUNC (GLOBAL (addsf3))
+
+GLOBAL (subsf3):
+        mov.l   .L_sign_bit,r1
+        xor     r1,r5
+
+GLOBAL (addsf3):
+	mov.l	r8,@-r15
+	mov	r4,r3
+
+	mov.l	.L_pinf,r2
+	mov	#0,r8
+
+	and	r2,r3 ! op1's exponent.
+	mov	r5,r6
+
+	! Check NaN or Infinity
+	and	r2,r6 ! op2's exponent.
+	cmp/eq	r2,r3
+
+	! go if op1 is NaN or INF. 
+	mov.l	.L_sign_bit,r0
+	SL(bt,	.L_inv_op1,
+	 mov	#-23,r1)
+	
+	! Go if op2 is NaN/INF.
+	cmp/eq	r2,r6
+	mov	r0,r7
+	bt	.L_ret_op2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r3)
+#else
+	shld	r1,r3
+#endif
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+#else
+	shld	r1,r6
+#endif
+
+	! Check for negative zero
+	cmp/eq	r0,r5
+
+	mov	r5,r1
+	SL(bt,	.L_ret_op1,
+	 and	r7,r1)
+
+	cmp/eq	r0,r4
+	bt	.L_ret_op2
+
+	! if op1 is zero return op2
+	tst	r4,r4
+	bt	.L_ret_op2
+
+	! Equal numbers with opposite sign
+	mov	r4,r2
+	xor	r5,r2
+
+	cmp/eq	r0,r2
+	bt	.L_ret_zero
+
+	! if op2 is zero return op1
+	mov.l	.L_mask_fra,r2
+	tst	r5,r5
+
+	! Extract the mantissa
+	mov	r4,r0
+	SL(bt,	.L_ret_op1,
+	 and	r2,r5)
+
+	and	r2,r4
+
+	mov.l	.L_imp_bit,r2
+	and	r7,r0	! sign bit of op1
+
+	! Check for denormals
+	tst	r3,r3
+	bt	.L_norm_op1
+
+	! Attach the implicit bit
+	or	r2,r4
+	tst	r6,r6
+
+	bt	.L_norm_op2
+
+	or	r2,r5
+	tst	r0,r0
+
+	! operands are +ve or -ve??
+	bt	.L_ptv_op1
+
+	neg	r4,r4
+
+.L_ptv_op1:
+	tst	r1,r1
+	bt	.L_ptv_op2
+
+	neg	r5,r5
+
+! Test exponents for equality
+.L_ptv_op2:
+	cmp/eq	r3,r6
+	bt	.L_exp_eq
+
+! Make exponents of two arguments equal
+.L_exp_ne:
+	! r0, r1 contain sign bits.
+	! r4, r5 contain mantissas.
+	! r3, r6 contain exponents.
+	! r2, r7 scratch.
+
+	! Calculate result exponent.
+	mov	r6,r2
+	sub	r3,r2	! e2 - e1
+
+	cmp/pl	r2
+	mov	#23,r7
+
+	! e2 - e1 is -ve
+	bf	.L_exp_ne_1
+
+	mov	r6,r3 ! Result exp.
+	cmp/gt	r7,r2 ! e2-e1 > 23
+
+	mov	#1,r7
+	bt	.L_pack_op2_0
+
+	! Align the mantissa
+.L_loop_ne:
+	shar	r4
+
+	rotcr	r8
+	cmp/eq	r7,r2
+
+	add	#-1,r2
+	bf	.L_loop_ne
+
+	bt	.L_exp_eq
+
+! Exponent difference is too high.
+! Return op2 after placing pieces in proper place
+.L_pack_op2_0:
+	! If op1 is -ve
+	tst	r1,r1
+	bt	.L_pack_op2
+
+	neg	r5,r5
+
+! r6 has exponent
+! r5 has mantissa, r1 has sign
+.L_pack_op2:
+	mov.l	.L_nimp_bit,r2
+	mov	#23,r3
+
+	mov	r1,r0
+	
+	and	r2,r5
+	mov.l	@r15+,r8
+
+	or	r5,r0
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+        rts
+	or	r6,r0
+
+! return op1. It is NAN or INF or op2 is zero.
+.L_ret_op1:
+	mov	r4,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! return zero
+.L_ret_zero:
+	mov	#0,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! return op2. It is NaN or INF or op1 is zero.
+.L_ret_op2:
+	mov	r5,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! op2 is denormal. Normalize it.
+.L_norm_op2:
+	shll	r5
+	add	#-1,r6
+
+	tst	r2,r5
+	bt	.L_norm_op2
+
+	! Check sign
+	tst	r1,r1
+	bt	.L_norm_op2_2
+
+	neg	r5,r5
+
+.L_norm_op2_2:
+	add	#1,r6
+	cmp/eq	r3,r6
+
+	bf	.L_exp_ne
+	bt	.L_exp_eq
+
+! Normalize op1
+.L_norm_op1:
+	shll	r4
+	add	#-1,r3
+
+	tst	r2,r4
+	bt	.L_norm_op1
+
+	! Check sign
+	tst	r0,r0
+	bt	.L_norm_op1_1
+
+	neg	r4,r4
+
+.L_norm_op1_1:
+	! Adjust biasing
+	add	#1,r3
+
+	! Check op2 for denormalized value
+	tst	r6,r6
+	bt	.L_norm_op2
+
+	mov.l	.L_imp_bit,r2
+
+	tst	r1,r1	! Check sign
+	or	r2,r5	! Attach 24th bit
+
+	bt	.L_norm_op1_2
+
+	neg	r5,r5
+
+.L_norm_op1_2:
+	cmp/eq	r3,r6
+
+	bt	.L_exp_eq
+	bf	.L_exp_ne
+
+! op1 is NaN or Inf
+.L_inv_op1:
+	! Return op1 if it is NAN. 
+	! r2 is infinity
+	cmp/gt	r2,r4
+	bt	.L_ret_op1
+
+	! op1 is +/- INF
+	! If op2 is same return now.
+	cmp/eq	r4,r5
+	bt	.L_ret_op1
+
+	! return op2 if it is NAN
+	cmp/gt	r2,r5
+	bt	.L_ret_op2
+
+	! Check if op2 is inf
+	cmp/eq	r2,r6
+	bf	.L_ret_op1
+	
+	! Both op1 and op2 are infinities 
+	!of opp signs, or there is -NAN. Return a NAN.
+	mov.l	@r15+,r8
+	rts
+	mov	#-1,r0
+
+! Make unequal exponents equal.
+.L_exp_ne_1:
+	mov	#-25,r7
+	cmp/gt	r2,r7 ! -23 > e2 - e1
+
+	add	#1,r2
+	bf	.L_exp_ne_2
+
+	tst	r0,r0
+	bt	.L_pack_op1
+
+.L_pack_op1_0:
+	bra	.L_pack_op1
+	neg	r4,r4
+
+! Accumulate the shifted bits in r8
+.L_exp_ne_2:
+	! Shift with rounding
+	shar	r5
+	rotcr	r8
+
+	tst	r2,r2
+
+	add	#1,r2
+	bf	.L_exp_ne_2
+
+! Exponents of op1 and op2 are equal (or made so)
+! The mantissas are in r4-r5 and remaining bits in r8
+.L_exp_eq:
+	add	r5,r4 ! Add fractions.
+	mov.l	.L_sign_bit,r2
+
+	! Check for negative result
+	mov	#0,r0
+	tst	r2,r4
+
+	mov.l	.L_255,r5
+	bt	.L_post_add
+
+	negc	r8,r8
+	negc	r4,r4
+	or	r2,r0
+
+.L_post_add:
+	! Check for extra MSB
+	mov.l	.L_chk_25,r2
+
+	tst	r2,r4
+	bt	.L_imp_check
+
+	shar 	r4
+	rotcr	r8
+
+	add	#1,r3
+	cmp/ge	r5,r3
+
+	! Return Inf if exp > 254
+	bt	.L_ret_inf
+
+! Check for implicit (24th) bit in result
+.L_imp_check:
+        mov.l	.L_imp_bit,r2
+	tst	r2,r4
+
+	bf	.L_pack_op1
+
+! Result needs left shift
+.L_lft_shft:
+	shll	r8
+	rotcl	r4
+
+	add	#-1,r3
+	tst	r2,r4
+
+	bt	.L_lft_shft
+	
+! Pack the result after rounding
+.L_pack_op1:
+	! See if denormalized result is possible 
+	mov.l	.L_chk_25,r5
+	cmp/pl	r3
+
+	bf	.L_denorm_res
+
+	! Are there any bits shifted previously?
+	tst	r8,r8
+	bt	.L_pack_1
+
+	! Round
+	shll	r8
+	movt	r6
+
+	add	r6,r4
+
+	! If we are halfway between two numbers,
+	! round towards LSB = 0
+	tst	r8,r8
+
+	bf	.L_pack_1
+
+	shlr	r4
+	shll	r4
+
+.L_pack_1:
+	! Adjust extra MSB generated after rounding
+	tst	r4,r5
+	mov.l	.L_255,r2
+
+	bt	.L_pack_2
+	shar	r4
+
+	add	#1,r3 
+	cmp/ge	r2,r3	! Check for exp overflow
+
+	bt	.L_ret_inf
+	
+! Pack it finally
+.L_pack_2:
+	! Do not store implicit bit
+	mov.l	.L_nimp_bit,r2
+	mov	#23,r1
+
+	and	r2,r4
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r3)
+#else
+	shld	r1,r3
+#endif
+	mov.l	@r15+,r8
+
+	or	r4,r0
+        rts
+	or	r3,r0
+
+! Return infinity
+.L_ret_inf:
+	mov.l	.L_pinf,r2
+
+	mov.l	@r15+,r8
+	rts
+	or	r2,r0
+
+! Result must be denormalized
+.L_denorm_res:
+	mov	#0,r2
+	
+! Denormalizing loop with rounding
+.L_den_1:
+	shar	r4
+	movt	r6
+
+	tst	r3,r3
+	bt	.L_den_2
+
+	! Increment the exponent
+	add	#1,r3
+
+	tst	r6,r6
+	bt	.L_den_0
+
+	! Count number of ON bits shifted
+	add	#1,r2
+
+.L_den_0:
+	bra	.L_den_1
+	nop
+
+! Apply rounding
+.L_den_2:
+	cmp/eq	r6,r1
+	bf	.L_den_3
+
+	add	r6,r4
+	mov	#1,r1
+
+	! If halfway between two numbers,
+	! round towards LSB = 0
+	cmp/eq	r2,r1
+	bf	.L_den_3
+
+	shar	r4
+	shll	r4
+
+.L_den_3:
+
+	mov.l	@r15+,r8
+	rts
+	or	r4,r0
+	
+	.align 2
+.L_imp_bit:
+        .long   0x00800000
+
+.L_nimp_bit:
+	.long	0xFF7FFFFF
+
+.L_mask_fra:
+        .long   0x007FFFFF
+
+.L_pinf:
+        .long   0x7F800000
+
+.L_sign_bit:
+	.long	0x80000000
+
+.L_bit_25:
+	.long	0x01000000
+
+.L_chk_25:
+        .long   0x7F000000
+
+.L_255:
+	.long	0x000000FF
+
+ENDFUNC (GLOBAL (addsf3))
+ENDFUNC (GLOBAL (subsf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/divdf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/divdf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/divdf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/divdf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,598 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!division of two double precision floating point numbers
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:dividend
+!
+!r6,r7:divisor
+!
+!Exit:
+!r0,r1:quotient
+
+!Notes: dividend is passed in regs r4 and r5 and divisor is passed in regs 
+!r6 and r7, quotient is returned in regs r0 and r1. dividend is referred as op1
+!and divisor as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align	5
+	.global	GLOBAL (divdf3)
+	FUNC (GLOBAL (divdf3))
+
+GLOBAL (divdf3):
+
+#ifdef  __LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r5,r4
+	mov	r1,r5
+
+        mov     r6,r1
+        mov     r7,r6
+        mov     r1,r7
+#endif
+	mov	r4,r2
+	mov.l	.L_inf,r1
+
+	and	r1,r2
+	mov.l   r8,@-r15
+
+	cmp/eq	r1,r2
+	mov     r6,r8
+
+	bt	.L_a_inv
+	and	r1,r8
+
+	cmp/eq	r1,r8
+	mov.l	.L_high_mant,r3
+
+	bf	.L_chk_zero
+	and	r6,r3
+
+	mov.l   .L_mask_sign,r8	
+	cmp/pl	r7
+
+	mov	r8,r0
+	bt	.L_ret_b	!op2=NaN,return op2
+
+	and	r4,r8
+	cmp/pl	r3
+
+	and	r6,r0
+	bt	.L_ret_b	!op2=NaN,return op2
+
+	xor     r8,r0           !op1=normal no,op2=Inf, return Zero
+	mov     #0,r1
+	
+#ifdef __LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_ret_b:
+	mov	r7,r1
+	mov     r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l   @r15+,r8
+
+.L_a_inv:
+	!chk if op1 is Inf or NaN
+	mov.l   .L_high_mant,r2
+	cmp/pl  r5
+
+	and	r4,r2
+	bt	.L_ret_a
+
+	and	r1,r8		!r1 contains infinity
+	cmp/pl	r2
+
+	bt	.L_ret_a
+	cmp/eq	r1,r8
+
+	mov	r1,DBLRH
+	add	DBLRH,DBLRH
+	bf	0f
+	mov	#-1,DBLRH	! Inf/Inf, return NaN.
+0:	div0s	r4,r6
+	mov.l   @r15+,r8	
+	rts
+	rotcr	DBLRH
+
+.L_ret_a:
+	!return op1
+	mov	r5,r1
+	mov	r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+        mov.l   @r15+,r8
+
+.L_chk_zero:
+	!chk if op1=0
+	mov.l   .L_mask_sign,r0
+        mov     r4,r3
+
+        and     r0,r3
+        shll    r4
+
+        and     r6,r0
+        shlr    r4
+
+        xor     r3,r0
+        shll    r6
+
+	shlr	r6
+	tst	r4,r4
+
+
+	bf      .L_op1_not_zero	
+	tst	r5,r5
+	
+        bf      .L_op1_not_zero
+	tst	r7,r7
+
+	mov.l   @r15+,r8
+	bf	.L_ret_zero
+
+	tst	r6,r6
+	bf	.L_ret_zero
+
+	rts
+	mov     #-1,DBLRH       !op1=op2=0, return NaN
+	
+.L_ret_zero:
+	!return zero
+	mov	r0,r1
+	rts
+#ifdef __LITTLE__ENDIAN
+	mov	#0,r0
+#else
+	mov	#0,r1		!op1=0,op2=normal no,return zero
+#endif
+
+.L_norm_b:
+	!normalize op2
+        shll    r7
+        mov.l   .L_imp_bit,r3
+
+        rotcl   r6
+        tst     r3,r6
+
+        add     #-1,r8
+        bt      .L_norm_b
+
+        bra     .L_divide
+        add     #1,r8
+
+.L_op1_not_zero:
+	!op1!=0, chk if op2=0
+	tst	r7,r7	
+	mov	r1,r3
+	
+	mov	#0,r1
+	bf	.L_normal_nos
+
+	tst	r6,r6
+	bf      .L_normal_nos
+
+	mov.l   @r15+,r8
+	or	r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	nop
+
+.L_normal_nos:
+	!op1 and op2 are normal nos
+	tst	r2,r2
+	mov	#-20,r1
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld    r1,r2
+#else
+	SHLR20 (r2)
+#endif
+	bt	.L_norm_a	!normalize dividend
+	
+.L_chk_b:
+	mov.l	r9,@-r15
+	tst	r8,r8
+
+        mov.l   .L_high_mant,r9
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r8
+#else
+        SHLR20 (r8)
+#endif
+				! T set -> normalize divisor
+	SL(bt,	.L_norm_b,
+	 and	r9,r4)
+
+.L_divide:
+	mov.l   .L_2047,r1
+	sub	r8,r2
+
+	mov.l	.L_1023,r8
+	and	r9,r6
+
+	!resultant exponent
+	add	r8,r2
+	!chk the exponent for overflow
+	cmp/ge	r1,r2
+	
+	mov.l	.L_imp_bit,r1
+	bt	.L_overflow
+	
+	mov	#0,r8
+	or	r1,r4
+	
+	or      r1,r6	
+	mov	#-24,r3
+
+	!chk if the divisor is 1(mantissa only)
+	cmp/eq	r8,r7
+	bf	.L_div2
+
+	cmp/eq	 r6,r1
+	bt	.L_den_one
+
+.L_div2:
+	!divide the mantissas
+	shll8	r4
+	mov	r5,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r9
+#else
+        SHLR24 (r9)
+#endif
+	shll8	r6
+
+	or	r9,r4
+	shll8   r5
+
+	mov	r7,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r9
+#else
+        SHLR24 (r9)
+#endif
+	mov	r8,r3
+	shll8	r7
+
+	or	r9,r6	
+	cmp/gt	r4,r6
+
+	mov	r3,r9
+	bt	.L_shift
+
+	cmp/eq	r4,r6
+	bf	.L_loop
+
+	cmp/gt	r5,r7
+	bf	.L_loop
+
+.L_shift:
+	add	#-1,r2
+	shll	r5
+	rotcl	r4
+
+.L_loop:
+	!actual division loop
+	cmp/gt	r6,r4
+	bt	.L_subtract
+
+	cmp/eq	r6,r4
+	bf	.L_skip
+
+	cmp/ge	r7,r5
+	bf	.L_skip
+
+.L_subtract:
+	clrt
+	subc	r7,r5
+	
+	or	r1,r8
+	subc	r6,r4
+
+.L_skip:
+	shlr	r1
+	shll	r5
+
+	rotcl	r4
+	cmp/eq	r1,r3
+
+	bf	.L_loop
+	mov.l	.L_imp_bit,r1
+
+	!chk if the divison was for the higher word of the quotient
+	tst	r1,r9
+	bf	.L_chk_exp
+
+	mov	r8,r9
+	mov.l   .L_mask_sign,r1
+
+	!divide for the lower word of the quotient
+	bra	.L_loop
+	mov	r3,r8
+
+.L_chk_exp:
+	!chk if the result needs to be denormalized
+	cmp/gt	r2,r3
+	bf	.L_round
+	mov     #-53,r7
+
+.L_underflow:
+	!denormalize the result
+	add	#1,r2
+	cmp/gt	r2,r7
+
+	or      r4,r5           !remainder
+	add	#-2,r2
+
+	mov	#32,r4
+	bt      .L_return_zero
+
+	add	r2,r4
+	cmp/ge	r3,r4
+
+	mov	r2,r7
+	mov	r3,r1
+
+	mov     #-54,r2
+	bt	.L_denorm
+	mov	#-32,r7
+
+.L_denorm:
+	shlr	r8
+	rotcr	r1
+
+	shll	r8
+	add     #1,r7
+
+	shlr	r9
+	rotcr	r8
+
+	cmp/eq	r3,r7
+	bf	.L_denorm
+
+	mov	r4,r7
+	cmp/eq	r2,r4
+
+	bt	.L_break
+	mov     r3,r6
+
+	cmp/gt	r7,r3
+	bf	.L_break
+
+	mov	r2,r4
+	mov	r1,r6
+
+	mov	r3,r1
+	bt	.L_denorm
+
+.L_break:
+	mov     #0,r2
+
+	cmp/gt	r1,r2
+
+	addc	r2,r8
+	mov.l   .L_comp_1,r4
+
+	addc	r3,r9		
+	or	r9,r0
+
+	cmp/eq	r5,r3
+	bf	.L_return	
+
+	cmp/eq	r3,r6
+	mov.l	.L_mask_sign,r7
+
+	bf	.L_return
+	cmp/eq	r7,r1
+
+	bf	.L_return
+	and	r4,r8
+
+.L_return:
+	mov.l	@r15+,r9
+	mov     r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_norm_a:
+        !normalize op1
+        shll    r5
+        mov.l   .L_imp_bit,r3
+
+        rotcl   r4
+        tst     r3,r4
+
+        add     #-1,r2
+        bt      .L_norm_a
+
+        bra     .L_chk_b
+        add     #1,r2
+
+.L_overflow:
+	!overflow, return inf
+	mov.l   .L_inf,r2
+#ifdef __LITTLE_ENDIAN__
+	or	r2,r1	
+	mov	#0,r0
+#else
+	or	r2,r0
+	mov	#0,r1
+#endif
+        mov.l   @r15+,r9
+        rts
+        mov.l   @r15+,r8
+
+.L_den_one:
+	!denominator=1, result=numerator
+        mov     r4,r9
+        mov   	#-53,r7
+
+	cmp/ge	r2,r8
+	mov	r8,r4
+
+	mov	r5,r8
+	mov	r4,r3
+
+	!chk the exponent for underflow
+	SL(bt,	.L_underflow,
+	 mov     r4,r5)
+
+	mov.l	.L_high_mant,r7
+        bra     .L_pack
+	mov     #20,r6
+
+.L_return_zero:
+	!return zero
+	mov	r3,r1
+	mov.l	@r15+,r9
+
+	rts
+	mov.l   @r15+,r8
+
+.L_round:
+	!apply rounding
+	cmp/eq	r4,r6
+	bt	.L_lower
+
+	clrt
+	subc    r6,r4
+
+	bra     .L_rounding
+	mov	r4,r6
+	
+.L_lower:
+	clrt
+	subc	r7,r5
+	mov	r5,r6
+	
+.L_rounding:
+	!apply rounding
+	mov.l   .L_invert,r1
+	mov	r3,r4
+
+	movt	r3
+	clrt
+	
+	not	r3,r3
+	and	r1,r3	
+
+	addc	r3,r8
+	mov.l   .L_high_mant,r7
+
+	addc	r4,r9
+	cmp/eq	r4,r6
+
+	mov.l   .L_comp_1,r3
+	SL (bf,	.L_pack,
+	 mov     #20,r6)
+	and	r3,r8
+
+.L_pack:
+	!pack the result, r2=exponent,r0=sign,r8=lower mantissa, r9=higher mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld    r6,r2
+#else
+        SHLL20 (r2)
+#endif
+	and	r7,r9
+
+	or	r2,r0
+	mov	r8,r1
+
+	or      r9,r0
+	mov.l	@r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+	.align	2
+
+.L_mask_sign:
+	.long	0x80000000
+.L_high_mant:
+	.long	0x000fffff
+.L_inf:
+	.long	0x7ff00000
+.L_1023:
+	.long	1023
+.L_2047:
+	.long	2047
+.L_imp_bit:
+	.long	0x00100000	
+.L_comp_1:
+	.long	0xfffffffe
+.L_invert:
+	.long	0x00000001
+
+ENDFUNC (GLOBAL (divdf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/divsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/divsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/divsf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/divsf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,404 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!divides two single precision floating point 
+
+! Author: Aanchal Khanna
+
+! Arguments: Dividend is in r4, divisor in r5
+! Result: r0
+
+! r4 and r5 are referred as op1 and op2 resp.
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align	5
+	.global	GLOBAL (divsf3)
+	FUNC (GLOBAL (divsf3))
+
+GLOBAL (divsf3):
+	mov.l	.L_mask_sign,r1
+	mov	r4,r3
+
+	xor	r5,r3
+	shll	r4
+
+	shlr	r4
+	mov.l	.L_inf,r2
+
+	and	r3,r1		!r1=resultant sign
+	mov	r4,r6
+
+	shll	r5
+	mov	#0,r0		
+
+	shlr	r5
+	and	r2,r6
+
+	cmp/eq	r2,r6
+	mov	r5,r7
+
+	and     r2,r7
+	bt	.L_op1_inv
+
+	cmp/eq	r2,r7
+	mov	#-23,r3
+
+	bt	.L_op2_inv
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+	SHLR23 (r7)
+#else
+	shld	r3,r6
+	shld	r3,r7
+#endif
+
+	cmp/eq	r0,r4
+
+	bt	.L_op1_zero		!dividend=0
+	cmp/eq	r0,r6
+
+	mov.l   .L_imp_bit,r3
+	bt	.L_norm_op1		!normalize dividend
+.L_chk_op2:
+	cmp/eq	r0,r5
+	bt	.L_op2_zero		!divisor=0
+
+	cmp/eq	r0,r7
+	bt	.L_norm_op2		!normalize divisor
+
+.L_div1:
+	sub	r7,r6
+	add	#127,r6			!r6=resultant exponent
+
+	mov     r3,r7
+	mov.l	.L_mask_mant,r3
+
+	and	r3,r4
+	!chk exponent for overflow
+        mov.l   .L_255,r2
+
+	and     r3,r5
+	or	r7,r4
+
+	cmp/ge  r2,r6
+	or	r7,r5
+
+	bt	.L_return_inf
+	mov	r0,r2
+
+	cmp/eq  r4,r5
+	bf      .L_den_one
+
+	cmp/ge	r6,r0
+	!numerator=denominator, quotient=1, remainder=0
+	mov	r7,r2			
+
+	mov     r0,r4
+	!chk exponent for underflow
+	bt	.L_underflow
+        bra     .L_pack
+        nop
+
+.L_den_one:
+	!denominator=1, result=numerator
+
+	cmp/eq  r7,r5
+        bf      .L_divide
+
+	!chk exponent for underflow
+	cmp/ge  r6,r0
+        mov    r4,r2           
+
+        SL(bt,    .L_underflow,
+	 mov	r0,r4)
+	bra     .L_pack
+	nop
+
+.L_divide:
+	!dividing the mantissas r4<-dividend, r5<-divisor
+
+	cmp/hi	r4,r5
+	bf	.L_loop
+
+	shll	r4		! if mantissa(op1)< mantissa(op2)
+	add     #-1,r6		! shift left the numerator and decrease the exponent.
+
+.L_loop:
+	!division loop
+
+	cmp/ge	r5,r4
+	bf	.L_skip
+
+	or	r7,r2
+	sub	r5,r4
+
+.L_skip:
+	shlr	r7
+	shll	r4
+
+	cmp/eq	r0,r7
+	bf	.L_loop
+
+	!chk the exponent for underflow
+	cmp/ge  r6,r0
+	bt      .L_underflow
+	
+	!apply rounding
+	cmp/gt	r5,r4
+	bt	.L_round1
+
+	cmp/eq	r4,r5
+	bt	.L_round2
+
+.L_pack:
+	!pack the result, r1=sign, r2=quotient, r6=exponent
+
+	mov    #23,r4
+	and     r3,r2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r4,r6
+#endif
+	or	r2,r1
+
+	or	r6,r1
+	mov	r1,r0	
+	
+	rts
+	nop
+
+.L_round1:
+	!Apply proper rounding
+
+        bra     .L_pack
+        add     #1,r2
+
+.L_round2:
+	!Apply proper rounding
+
+        mov.l   .L_comp_1,r5
+        bra     .L_pack
+        and     r5,r2
+
+.L_op1_inv:
+	!chk if op1 is Inf or NaN
+
+	mov.l	.L_mask_mant,r3
+	mov	r4,r6
+
+	and	r3,r6
+	cmp/hi	r0,r6
+
+	bt	.L_ret_op1
+	cmp/eq	r2,r7
+
+	SL(bf,	.L_ret_op1,
+	 mov	r1,r0)
+
+	rts
+	mov	#-1,r0	! 0/0, return NaN
+	
+.L_op2_inv:
+	!chk if op2 is Inf or NaN
+
+	mov.l	.L_mask_mant,r3
+	mov	r5,r7
+	
+	and	r3,r7
+	cmp/hi	r0,r7
+
+	bt	.L_ret_op2
+	mov	r1,r0
+	
+	rts
+	nop
+
+.L_op1_zero:
+	!op1 is zero. If op2 is zero, return NaN, else return zero
+
+	cmp/eq	r0,r5
+
+	bf	.L_ret_op1	
+
+	rts
+	mov	#-1,r0
+
+.L_op2_zero:
+	!B is zero,return Inf
+
+	rts
+	or	r2,r0
+
+.L_return_inf:
+	mov.l	.L_inf,r0
+	
+	rts
+	or	r1,r0
+
+.L_norm_op1:
+	!normalize dividend
+
+	shll	r4
+	tst	r2,r4
+	
+	add     #-1,r6
+	bt	.L_norm_op1
+
+	bra	.L_chk_op2
+	add	#1,r6
+
+.L_norm_op2:
+	!normalize divisor
+
+	shll	r5
+	tst	r2,r5
+	
+	add	#-1,r7
+	bt	.L_norm_op2
+
+	bra	.L_div1
+	add	#1,r7
+
+.L_underflow:
+	!denormalize the result
+
+	add	#1,r6
+	mov	#-24,r7
+
+	cmp/gt	r6,r7
+	mov	r2,r5
+
+	bt	.L_return_zero
+	add     #-1,r6
+
+	mov	#32,r3
+	neg	r6,r7
+
+	add	#1,r7
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r6,r2
+#else
+	cmp/ge	r0,r6
+	bf	.L_mov_right
+
+.L_mov_left:
+	cmp/eq	r0,r6
+	bt	.L_out
+
+	shll	r2
+	bra	.L_mov_left
+	add	#-1,r6
+
+.L_mov_right:
+	cmp/eq	r0,r6
+	bt	.L_out
+
+	add	#1,r6
+	bra	.L_mov_right
+	shlr	r2
+	
+.L_out:
+#endif
+	sub	r7,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r3,r5
+#else
+	cmp/ge	r0,r3
+	bf	.L_mov_right_1
+
+.L_mov_left_1:
+	shll	r5
+	add	#-1,r3
+
+	cmp/eq	r0,r3
+	bf	.L_mov_left_1
+
+	bt	.L_out_1
+
+.L_mov_right_1:
+	cmp/eq	r0,r3
+	bt	.L_out_1
+
+	add	#1,r3
+	bra	.L_mov_right_1
+	shlr	r5
+
+.L_out_1:
+#endif
+	shlr	r2
+	addc	r0,r2
+
+	cmp/eq	r4,r0		!r4 contains the remainder
+	mov      r2,r0
+
+	mov.l	.L_mask_sign,r7
+	bf	.L_return
+
+	mov.l   .L_comp_1,r2
+	cmp/eq	r7,r5
+
+	bf	.L_return
+	and	r2,r0
+
+.L_return:
+	rts
+	or     r1,r0
+	
+.L_ret_op1:
+	rts
+	or	r4,r0
+
+.L_ret_op2:
+	rts
+	or	r5,r0
+
+.L_return_zero:
+	rts
+	or	r1,r0
+
+
+
+	.align	2
+.L_inf:
+	.long	0x7f800000
+.L_mask_sign:
+	.long	0x80000000
+.L_mask_mant:
+	.long	0x007fffff
+.L_imp_bit:
+	.long	0x00800000
+.L_comp_1:
+	.long	0xfffffffe
+.L_255:
+	.long	255
+
+ENDFUNC (GLOBAL (divsf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/fixdfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/fixdfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/fixdfsi.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/fixdfsi.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,200 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to signed integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 	5
+	.global GLOBAL (fixdfsi)
+	FUNC (GLOBAL (fixdfsi))
+
+GLOBAL (fixdfsi):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+
+#endif
+	mov.l	.L_p_inf,r2
+	mov     #-20,r1
+	
+	mov	r2,r7
+	mov.l   .L_1023,r3
+
+	and	r4,r2
+	shll    r4
+        
+	movt    r6		! r6 contains the sign bit
+	
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r2		! r2 contains the exponent
+#else
+        SHLR20 (r2)
+#endif
+	 shlr    r4
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r7
+#else
+        SHLR20 (r7)
+#endif
+	cmp/hi	r2,r3		! if exp < 1023,return 0
+	mov.l	.L_mask_high_mant,r1
+
+	SL(bt,	.L_epil,
+	 mov	#0,r0)
+	and	r4,r1		! r1 contains high mantissa
+
+	cmp/eq	r2,r7		! chk if exp is invalid
+	mov.l	.L_1053,r7
+
+	bt	.L_inv_exp
+	mov	#11,r0
+	
+	cmp/hi	r7,r2		! If exp > 1053,return maxint
+	sub     r2,r7
+
+	mov.l	.L_21bit,r2
+	SL(bt,	.L_ret_max,
+	 add	#1,r7)		! r7 contains the number of shifts
+
+	or	r2,r1
+	mov	r7,r3
+	shll8   r1
+
+	neg     r7,r7
+	shll2	r1
+
+        shll	r1
+	cmp/hi	r3,r0
+
+	!chk if the result can be made only from higher mantissa
+	SL(bt,	.L_lower_mantissa,
+	 mov	#21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_loop:
+        tst	r7,r7
+        bt      .L_break1
+        add     #1,r7
+        bra     .L_loop
+        shlr    r1
+
+.L_break1:
+#endif
+	tst	r6,r6
+	SL(bt,	.L_epil,
+	 mov	r1,r0)
+
+	rts
+	neg	r0,r0
+
+.L_lower_mantissa:
+	!result is made from lower mantissa also
+	neg	r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r0,r5
+#else
+        SHLR21 (r5)
+#endif
+
+	or	r5,r1		!pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_sh_loop:
+	tst	r7,r7
+	bt	.L_break
+	add	#1,r7
+	bra	.L_sh_loop
+	shlr	r1
+
+.L_break:
+#endif
+	mov	r1,r0
+	bra	.L_chk_sign
+	nop
+
+.L_epil:
+	rts
+	nop
+
+.L_inv_exp:
+	cmp/hi	r0,r5
+	bt	.L_epil
+
+	cmp/hi	r0,r1		!compare high mantissa,r1
+	bt	.L_epil
+
+.L_ret_max:
+	mov.l   .L_maxint,r0
+	tst	r6,r6
+	bt	.L_epil
+
+	rts
+	add	#1,r0
+
+.L_chk_sign:
+	tst	r6,r6		!sign bit is set, number is -ve
+	bt	.L_epil
+	
+	rts
+	neg	r0,r0
+
+	.align	2
+
+.L_maxint:
+	.long	0x7fffffff
+.L_p_inf:
+	.long	0x7ff00000
+.L_mask_high_mant:
+	.long	0x000fffff
+.L_1023:
+	.long	0x000003ff
+.L_1053:
+	.long	1053
+.L_21bit:
+	.long	0x00100000
+
+ENDFUNC (GLOBAL (fixdfsi))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/fixsfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/fixsfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/fixsfsi.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/fixsfsi.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,165 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion routine for float to integer
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 (in floating point format)
+! Return: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global	GLOBAL (fixsfsi)
+	FUNC (GLOBAL (fixsfsi))
+
+GLOBAL (fixsfsi):
+	mov.l	.L_mask_sign,r7
+	mov	r4,r2
+
+	! Check for NaN
+	mov.l	.L_inf,r1
+	and	r7,r2
+
+	cmp/gt	r1,r2
+	mov	#127,r5
+
+	mov	r4,r3
+	SL(bt,	.L_epil,
+	 mov	#0,r0)
+
+	shll	r2
+	mov.l	.L_frac,r6
+
+	shlr16	r2
+	and	r6,r3	! r3 has fraction
+
+	shlr8	r2	! r2 has exponent
+	mov.l	.L_24bit,r1
+
+	! If exponent is less than 127, return 0
+	cmp/gt	r2,r5
+	or	r1,r3	! Set the implicit bit
+
+	mov.l	.L_157,r1
+	SL1(bt,	.L_epil,
+	 shll8	r3)
+
+	! If exponent is greater than 157,
+	! return the maximum/minumum integer
+	! value deducing from sign
+	cmp/gt	r1,r2
+	sub	r2,r1
+
+	mov.l	.L_sign,r2
+	SL(bt,	.L_ret_max,
+	 add	#1,r1)
+
+	and	r4,r2	! Sign in r2
+	neg	r1,r1
+
+	! Shift mantissa by exponent difference from 157
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r1,r3
+#else
+        cmp/gt  r0,r1
+        bt      .L_mov_left
+
+.L_mov_right:
+        cmp/eq  r1,r0
+        bt      .L_ret
+
+        add     #1,r1
+        bra     .L_mov_right
+
+        shlr    r3
+
+.L_mov_left:
+        add     #-1,r1
+
+        shll    r3
+        cmp/eq  r1,r0
+
+        bf      .L_mov_left
+.L_ret:
+#endif
+	! If op1 is negative, negate the result
+	cmp/eq	r0,r2
+	SL(bf,	.L_negate,
+	 mov	r3,r0)
+
+! r0 has the appropriate value
+.L_epil:
+	rts
+	nop
+
+! Return the max/min integer value
+.L_ret_max:
+	and	r4,r2	! Sign in r2
+	mov.l	.L_max,r3
+
+	mov.l	.L_sign,r1
+	cmp/eq	r0,r2
+
+	mov	r3,r0
+	bt	.L_epil
+
+	! Negative number, return min int
+	rts
+	mov	r1,r0
+
+! Negate the result
+.L_negate:
+	rts
+	neg	r0,r0
+
+	.align 2
+.L_inf:
+	.long 0x7F800000
+
+.L_157:
+	.long 157
+
+.L_max:
+	.long 0x7FFFFFFF
+
+.L_frac:
+	.long 0x007FFFFF
+
+.L_sign:
+	.long 0x80000000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_mask_sign:
+	.long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixsfsi))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/fixunsdfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/fixunsdfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/fixunsdfsi.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/fixunsdfsi.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,181 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to unsigned integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global GLOBAL (fixunsdfsi)
+	FUNC (GLOBAL (fixunsdfsi))
+
+GLOBAL (fixunsdfsi):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+#endif
+	mov.l	.L_p_inf,r2
+	mov     #-20,r1
+	
+	mov	r2,r7
+	mov.l   .L_1023,r3
+
+	and	r4,r2
+	shll    r4
+
+        movt    r6		! r6 contains the sign bit
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r2           ! r2 contains the exponent
+#else
+        SHLR20 (r2)
+#endif
+	shlr    r4
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r7
+#else
+        SHLR20 (r7)
+#endif
+	tst	r6,r6	
+	SL(bf,	.L_epil,
+	 mov	#0,r0)
+
+	cmp/hi	r2,r3		! if exp < 1023,return 0
+	mov.l	.L_high_mant,r1
+
+	SL(bt,	.L_epil,
+	 and	r4,r1)		! r1 contains high mantissa
+
+	cmp/eq	r2,r7		! chk if exp is invalid
+	mov.l	.L_1054,r7
+
+	bt	.L_inv_exp
+	mov	#11,r0
+	
+	cmp/hi	r7,r2		! If exp > 1054,return maxint
+	sub     r2,r7		!r7 contains the number of shifts
+
+	mov.l	.L_21bit,r2
+	bt	.L_ret_max
+
+	or	r2,r1
+	mov	r7,r3
+
+	shll8   r1
+	neg     r7,r7
+
+	shll2	r1
+
+        shll	r1
+	cmp/hi	r3,r0
+
+	SL(bt,	.L_lower_mant,
+	 mov	#21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_sh_loop:
+        tst	r7,r7
+        bt      .L_break
+        add     #1,r7
+        bra     .L_sh_loop
+        shlr    r1
+
+.L_break:
+#endif
+	rts
+	mov     r1,r0
+
+.L_lower_mant:
+	neg	r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r0,r5
+#else
+        SHLR21 (r5)
+#endif
+	or	r5,r1		!pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_loop:
+        tst	r7,r7
+        bt      .L_break1
+        add     #1,r7
+        bra     .L_loop
+        shlr    r1
+
+.L_break1:
+#endif
+	mov	r1,r0
+.L_epil:
+	rts
+	nop
+
+.L_inv_exp:
+	cmp/hi	r0,r5
+	bt	.L_epil
+
+	cmp/hi	r0,r1		!compare high mantissa,r1
+	bt	.L_epil
+
+.L_ret_max:
+	mov.l   .L_maxint,r0
+
+	rts
+	nop
+
+	.align	2
+
+.L_maxint:
+	.long	0xffffffff
+.L_p_inf:
+	.long	0x7ff00000
+.L_high_mant:
+	.long	0x000fffff
+.L_1023:
+	.long	0x000003ff
+.L_1054:
+	.long	1054
+.L_21bit:
+	.long	0x00100000
+
+ENDFUNC (GLOBAL (fixunsdfsi))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/fixunssfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/fixunssfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/fixunssfsi.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/fixunssfsi.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,155 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion from floating point to unsigned integer
+
+! Author: Rakesh Kumar
+
+! Argument: r4 (in floating point format)
+! Result: r0
+
+! For negative floating point numbers, it returns zero
+
+! The argument is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global	GLOBAL (fixunssfsi)
+	FUNC (GLOBAL (fixunssfsi))
+
+GLOBAL (fixunssfsi):
+	mov.l	.L_sign,r0
+	mov	r4,r2
+
+	! Check for NaN
+	mov.l	.L_inf,r1
+	and	r4,r0
+
+	mov.l	.L_mask_sign,r7
+	mov	#127,r5
+
+	! Remove sign bit
+	cmp/eq	#0,r0
+	and	r7,r2
+
+	! If number is negative, return 0
+	! LIBGCC deviates from standard in this regard.
+	mov	r4,r3
+	SL(bf,	.L_epil,
+	 mov	#0,r0)
+
+	mov.l	.L_frac,r6
+	cmp/gt	r1,r2
+
+	shll	r2
+	SL1(bt,	.L_epil,
+	 shlr16	r2)
+
+	shlr8	r2	! r2 has exponent
+	mov.l	.L_24bit,r1
+
+	and	r6,r3	! r3 has fraction
+	cmp/gt	r2,r5
+
+	! If exponent is less than 127, return 0
+	or	r1,r3
+	bt	.L_epil
+
+	! Process only if exponent is less than 158
+	mov.l	.L_158,r1
+	shll8	r3
+
+	cmp/gt	r1,r2
+	sub	r2,r1
+
+	neg	r1,r1
+	bt	.L_ret_max
+
+! Shift the mantissa with exponent difference from 158
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r1,r3
+#else
+	cmp/gt	r0,r1
+	bt	.L_mov_left
+
+.L_mov_right:
+	cmp/eq	r1,r0
+	bt	.L_ret
+
+	add	#1,r1
+	bra	.L_mov_right
+	shlr	r3
+
+.L_mov_left:
+	add	#-1,r1
+	
+	shll	r3
+	cmp/eq	r1,r0
+
+	bf	.L_mov_left
+
+.L_ret:	
+#endif
+	rts
+	mov	r3,r0
+
+! r0 already has appropriate value
+.L_epil:
+	rts
+	nop
+
+! Return the maximum unsigned integer value
+.L_ret_max:
+	mov.l	.L_max,r3
+
+	rts
+	mov	r3,r0
+
+	.align 2
+.L_inf:
+	.long 0x7F800000
+
+.L_158:
+	.long 158
+
+.L_max:
+	.long 0xFFFFFFFF
+
+.L_frac:
+	.long 0x007FFFFF
+
+.L_sign:
+	.long 0x80000000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_mask_sign:
+	.long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixunssfsi))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/floatsidf.S gcc-4.5.0/gcc/config/sh/IEEE-754/floatsidf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/floatsidf.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/floatsidf.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,151 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of signed integer to double precision floating point number
+!Author:Rakesh Kumar
+!
+!Entry:
+!r4:operand 
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in 
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatsidf)
+	FUNC (GLOBAL (floatsidf))
+
+GLOBAL (floatsidf):
+        mov.l   .L_sign,r0
+        mov     #0,r1
+
+	mov	r0,r2
+	tst	r4,r4 ! check r4 for zero
+
+	! Extract the sign
+	mov	r2,r3
+	SL(bt,	.L_ret_zero,
+	 and	r4,r0)
+
+	cmp/eq	r1,r0
+	not	r3,r3
+
+	mov	r1,r7
+	SL(bt,	.L_loop,
+	 and	r4,r3)
+
+	! Treat -2147483648 as special case
+	cmp/eq	r1,r3
+	neg	r4,r4
+
+	bt	.L_ret_min	
+
+.L_loop:
+	shll	r4	
+	mov	r4,r5
+
+	and	r2,r5
+	cmp/eq	r1,r5
+	
+	add	#1,r7
+	bt	.L_loop
+
+	mov.l	.L_initial_exp,r6
+	not	r2,r2
+	
+	and	r2,r4
+	mov	#21,r3
+
+	sub	r7,r6
+	mov	r4,r1
+
+	mov	#20,r7
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r1
+#else
+        SHLL21 (r1)
+#endif
+	mov	#-11,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r6	! Exponent in proper place
+#else
+        SHLL20 (r6)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r2,r4
+#else
+        SHLR11 (r4)
+#endif
+	or	r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+#ifdef __LITTLE_ENDIAN__
+	or	r4,r1
+#else
+	or	r4,r0
+#endif
+	
+.L_ret_zero:
+	rts
+	mov	#0,r0
+
+.L_ret_min:
+	mov.l	.L_min,r0
+	
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	nop
+
+	.align 2
+
+.L_initial_exp:
+	.long 0x0000041E
+
+.L_sign:
+	.long 0x80000000
+
+.L_min:
+	.long 0xC1E00000
+
+ENDFUNC (GLOBAL (floatsidf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/floatsisf.S gcc-4.5.0/gcc/config/sh/IEEE-754/floatsisf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/floatsisf.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/floatsisf.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,200 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatsisf)
+        FUNC (GLOBAL (floatsisf))
+
+GLOBAL (floatsisf):
+	mov.l	.L_sign,r2
+	mov	#23,r6
+
+	! Check for zero
+	tst	r4,r4
+	mov.l	.L_24_bits,r7
+
+	! Extract sign
+	and	r4,r2
+	bt	.L_ret
+
+	! Negative ???
+	mov.l	.L_imp_bit,r5
+	cmp/pl	r4
+
+	not	r7,r3
+	bf	.L_neg
+
+	! Decide the direction for shifting
+	cmp/gt	r7,r4
+	mov	r4,r0
+
+	and	r5,r0
+	bt	.L_shr_0
+
+	! Number may already be in normalized form
+	cmp/eq	#0,r0
+	bf	.L_pack
+
+! Shift the bits to the left. Adjust the exponent
+.L_shl:
+	shll	r4
+	mov	r4,r0
+
+	and	r5,r0
+	cmp/eq	#0,r0
+
+	SL(bt,	.L_shl,
+	 add	#-1,r6)
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa, r2 has sign
+.L_pack:
+	mov	#23,r3
+	not	r5,r5
+
+	mov	r2,r0
+	add	#127,r6
+
+	and	r5,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+
+	or	r6,r0
+	rts
+	or	r4,r0
+
+! Negate the number
+.L_neg:
+	! Take care for -2147483648.
+	mov	r4,r0
+	shll	r0
+	
+	cmp/eq	#0,r0
+	SL(bt,	.L_ret_min,
+	 neg	r4,r4)
+
+        cmp/gt  r7,r4
+        bt	.L_shr_0
+
+	mov	r4,r0
+	and	r5,r0
+
+	cmp/eq	#0,r0
+	bf	.L_pack
+	bt	.L_shl
+	
+.L_shr_0:
+	mov	#0,r1
+
+! Shift right the number with rounding
+.L_shr:
+	shlr	r4
+	movt	r7
+
+	tst	r7,r7
+
+	! Count number of ON bits shifted
+	bt	.L_shr_1
+	add	#1,r1
+
+.L_shr_1:
+	mov	r4,r0
+	add	#1,r6
+
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	! Add MSB of shifted bits
+	bf	.L_shr
+	add	r7,r4
+
+	tst	r7,r7
+	bt	.L_pack
+
+.L_pack1:
+	mov	#1,r0
+	cmp/eq	r1,r0
+
+	bt	.L_rnd
+	mov	r4,r0
+
+	! Rounding may have misplaced MSB. Adjust.
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	bf	.L_shr
+	bt	.L_pack
+
+! If only MSB of shifted bits is ON, we are halfway
+! between two numbers. Round towards even LSB of
+! resultant mantissa.
+.L_rnd:
+	shlr	r4
+	bra	.L_pack
+	shll	r4
+
+.L_ret:
+	rts
+	mov	r4,r0
+
+! Return value for -2147483648
+.L_ret_min:
+	mov.l	.L_min_val,r0
+	rts
+	nop
+
+	.align 2
+.L_sign:
+	.long 0x80000000
+
+.L_imp_bit:
+	.long 0x00800000
+
+.L_24_bits:
+	.long 0x00FFFFFF
+
+.L_nsign:
+	.long 0x7FFFFFFF
+
+.L_min_val:
+	.long 0xCF000000
+
+ENDFUNC (GLOBAL (floatsisf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssidf.S gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssidf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssidf.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssidf.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,76 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of unsigned integer to double precision floating point number
+!Author:Rakesh Kumar
+!Rewritten for SH1 support: Joern Rennecke
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatunsidf)
+	FUNC (GLOBAL (floatunsidf))
+
+GLOBAL (floatunsidf):
+	mov.w	LOCAL(x41f0),DBLRH	! bias + 32
+	tst	r4,r4			! check for zero
+	bt	.L_ret_zero
+.L_loop:
+	shll	r4	
+	SL(bf,	.L_loop,
+	 add	#-16,DBLRH)
+
+	mov	r4,DBLRL
+
+        SHLL20 (DBLRL)
+
+        shll16	DBLRH ! put exponent in proper place
+
+        SHLR12 (r4)
+
+	rts
+	or	r4,DBLRH
+	
+.L_ret_zero:
+	mov	#0,r1
+	rts
+	mov	#0,r0
+
+LOCAL(x41f0):	.word	0x41f0
+	.align 2
+
+ENDFUNC (GLOBAL (floatunsidf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssisf.S gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssisf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssisf.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/floatunssisf.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,137 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of unsigned integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatunsisf)
+	FUNC (GLOBAL (floatunsisf))
+
+GLOBAL (floatunsisf):
+	tst	r4,r4
+	mov	#23,r6
+
+	mov.l	.L_set_24_bits,r7
+	SL(bt,	.L_return,
+	 not	r7,r3)
+
+	! Decide the direction for shifting
+	mov.l	.L_set_24_bit,r5
+	cmp/hi	r7,r4
+
+	not	r5,r2
+	SL(bt,	.L_shift_right,
+	 mov	#0,r7)
+
+	tst	r5,r4
+	
+	mov	#0,r0
+	bf	.L_pack_sf
+
+! Shift the bits to the left. Adjust the exponent
+.L_shift_left:
+	shll	r4
+	tst	r5,r4
+
+	add	#-1,r6
+	bt	.L_shift_left
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa
+.L_pack_sf:
+	mov	#23,r3
+	add	#127,r6
+
+	! Align the exponent
+	and	r2,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+        SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+
+	or	r6,r0
+	rts
+	or	r4,r0
+
+! Shift right the number with rounding
+.L_shift_right:
+	shlr	r4
+	rotcr	r7
+
+	tst	r4,r3
+	add	#1,r6
+
+	bf	.L_shift_right
+	
+	tst	r7,r7
+	bt	.L_sh_rt_1
+
+	shll	r7
+	movt	r1
+
+	add	r1,r4
+
+	tst	r7,r7
+	bf	.L_sh_rt_1
+
+	! Halfway between two numbers.
+	! Round towards LSB = 0
+	shlr	r4
+	shll	r4
+
+.L_sh_rt_1:
+	mov	r4,r0
+
+	! Rounding may have misplaced MSB. Adjust.
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	bf	.L_shift_right
+	bt	.L_pack_sf
+
+.L_return:
+	rts
+	mov	r4,r0
+
+	.align 2
+.L_set_24_bit:
+	.long 0x00800000
+
+.L_set_24_bits:
+	.long 0x00FFFFFF
+
+ENDFUNC (GLOBAL (floatunsisf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/adddf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/adddf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/adddf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/adddf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,587 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! adddf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4-200 without FPU, but can also be used for SH3.
+! Numbers with same sign are added in typically 37 cycles, worst case is
+! 43 cycles, unless there is an overflow, in which case the addition can
+! take up to takes 47 cycles.
+! Normal numbers with different sign are added in 56 (57 for PIC) cycles
+! or less on SH4.
+! If one of the inputs is a denormal, the worst case is 59 (60 for PIC)
+! cycles. (Two denormal inputs are faster than normal inputs, and
+! denormal outputs don't slow down computation).
+! Subtraction takes two cycles to negate the second input and then drops
+! through to addition.
+
+/* If the input exponents of a difference of two normalized numbers
+   differ by more than one, the output does not need to be adjusted
+   by more than one bit position.  Hence, it makes sense to ensure that
+   the shifts by 0 & 1 are handled quickly to reduce average and worst
+   case times.  */
+FUNC(GLOBAL(adddf3))
+FUNC(GLOBAL(subdf3))
+	.global	GLOBAL(adddf3)
+	.global	GLOBAL(subdf3)
+LOCAL(denorm_arg1):
+	bt	LOCAL(inf_nan_arg0)
+	tst	r0,r2
+	bt/s	LOCAL(denorm_both)
+	shlr	r1
+	mov.l	LOCAL(x00100000),r3
+	bra	LOCAL(denorm_arg1_done)
+	 sub	r2,r3
+
+! Handle denorm addition here because otherwise the ordinary addition would
+! have to check for denormal results.
+! Denormal subtraction could also be done faster, but the denorm subtraction
+! path here is still one cycles faster than the one for normalized input
+! numbers, and 16 instructions shorter than the fastest version.
+! Here we also generate +0.0 + +0.0 -> +0.0 ; -0.0 + -0.0 -> -0.0
+LOCAL(denorm_both):
+	div0s	DBL0H,DBL1H
+	mov.l	LOCAL(x800fffff),r9
+	bt/s	LOCAL(denorm_sub)
+	and	r1,DBL1H
+	and	r9,DBL0H
+	mov.l	@r15+,r9
+	mov	DBL0L,DBLRL
+	mov	DBL0H,DBLRH
+	addc	DBL1L,DBLRL
+	mov.l	@r15+,r8
+	rts
+	 addc	DBL1H,DBLRH
+
+! N.B., since subtraction also generates +0.0 for subtraction of numbers
+! with identical fractions, this also covers the +0.0 + -0.0 -> +0.0 /
+! -0.0 + +0.0 -> +0.0 cases.
+LOCAL(denorm_sub):
+	mov	DBL0H,r8	! tentative result sign
+	and	r1,DBL0H
+	bra	LOCAL(sub_same_exp)
+	 addc	r1,r2	! exponent++, clear T
+
+LOCAL(inf_nan_arg0):
+	mov	DBL0L,DBLRL
+	bra	LOCAL(pop_r8_r9)
+	 mov	DBL0H,DBLRH
+
+LOCAL(ret_arg0):
+	mov.l	LOCAL(x800fffff),DBLRH
+	mov	DBL0L,DBLRL
+	mov	r2,r3
+LOCAL(ret_arg):
+	mov.l	@r15+,r9
+	and	r8,DBLRH
+	mov.l	@r15+,r8
+	rts
+	or	r3,DBLRH
+
+	.balign	4
+GLOBAL(subdf3):
+	cmp/pz	DBL1H
+	add	DBL1H,DBL1H
+	rotcr	DBL1H
+	nop
+
+GLOBAL(adddf3):
+	mov.l	LOCAL(x7ff00000),r0
+	mov	DBL0H,r2
+	mov.l	LOCAL(x001fffff),r1
+	mov	DBL1H,r3
+	mov.l	r8,@-r15
+	and	r0,r2
+	mov.l	r9,@-r15
+	and	r0,r3
+	cmp/hi	r2,r3
+	or	r0,DBL0H
+	or	r0,DBL1H
+	bt	LOCAL(arg1_gt)
+	tst	r0,r3
+	mov	#-20,r9
+	bt/s	LOCAL(denorm_arg1)
+	cmp/hs	r0,r2
+	bt	LOCAL(inf_nan_arg0)
+	sub	r2,r3
+LOCAL(denorm_arg1_done):	! r2 is tentative result exponent
+	shad	r9,r3
+	mov.w	LOCAL(m32),r9
+	mov	DBL0H,r8	! tentative result sign
+	and	r1,DBL0H	! arg0 fraction
+	mov	DBL1H,r0	! the 'other' sign
+	and	r1,DBL1H	! arg1 fraction
+	cmp/ge	r9,r3
+	mov	DBL1H,r1
+	bf/s	LOCAL(large_shift_arg1)
+	 shld	r3,DBL1H
+LOCAL(small_shift_arg1):
+	mov	DBL1L,r9
+	shld	r3,DBL1L
+	tst	r3,r3
+	add	#32,r3
+	bt/s	LOCAL(same_exp)
+	 div0s	r8,r0	! compare signs
+	shld	r3,r1
+
+	or	r1,DBL1L
+	bf/s	LOCAL(add)
+	shld	r3,r9
+	clrt
+	negc	r9,r9
+	mov.l	LOCAL(x001f0000),r3
+LOCAL(sub_high):
+	mov	DBL0L,DBLRL
+	subc	DBL1L,DBLRL
+	mov	DBL0H,DBLRH
+	bra	LOCAL(subtract_done)
+	 subc	DBL1H,DBLRH
+
+LOCAL(large_shift_arg1):
+	mov.w	LOCAL(d0),r9
+	add	#64,r3
+	cmp/pl	r3
+	shld	r3,r1
+	bf	LOCAL(ret_arg0)
+	cmp/hi	r9,DBL1L
+	mov	DBL1H,DBL1L
+	mov	r9,DBL1H
+	addc	r1,r9
+
+	div0s	r8,r0	! compare signs
+
+	bf	LOCAL(add)
+	clrt
+	mov.l	LOCAL(x001f0000),r3
+	bra	LOCAL(sub_high)
+	 negc	r9,r9
+
+LOCAL(add_clr_r9):
+	mov	#0,r9
+LOCAL(add):
+	mov.l	LOCAL(x00200000),r3
+	addc	DBL1L,DBL0L
+	addc	DBL1H,DBL0H
+	mov.l	LOCAL(x80000000),r1
+	tst	r3,DBL0H
+	mov.l	LOCAL(x7fffffff),r3
+	mov	DBL0L,r0
+	bt/s	LOCAL(no_carry)
+	and	r1,r8
+	tst	r9,r9
+	bf	LOCAL(add_one)
+	tst	#2,r0
+LOCAL(add_one):
+	subc	r9,r9
+	sett
+	mov	r0,DBLRL
+	addc	r9,DBLRL
+	mov	DBL0H,DBLRH
+	addc	r9,DBLRH
+	shlr	DBLRH
+	mov.l	LOCAL(x7ff00000),r3
+	add	r2,DBLRH
+	mov.l	@r15+,r9
+	rotcr	DBLRL
+	cmp/hi	r3,DBLRH
+LOCAL(add_done):
+	bt	LOCAL(inf)
+LOCAL(or_sign):
+	or	r8,DBLRH
+	rts
+	 mov.l	@r15+,r8
+
+LOCAL(inf):
+	bra	LOCAL(or_sign)
+	 mov	r3,DBLRH
+
+LOCAL(pos_difference_0):
+	tst	r3,DBL0H
+	mov	DBL0L,DBLRL
+	mov.l	LOCAL(x80000000),DBL0L
+	mov	DBL0H,DBLRH
+	mov.l	LOCAL(x00100000),DBL0H
+	bt/s	LOCAL(long_norm)
+	and	DBL0L,r8
+	bra	LOCAL(norm_loop)
+	 not	DBL0L,r3
+
+LOCAL(same_exp):
+	bf	LOCAL(add_clr_r9)
+	clrt
+LOCAL(sub_same_exp):
+	subc	DBL1L,DBL0L
+	mov.l	LOCAL(x001f0000),r3
+	subc	DBL1H,DBL0H
+	mov.w	LOCAL(d0),r9
+	bf	LOCAL(pos_difference_0)
+	clrt
+	negc	DBL0L,DBLRL
+	mov.l	LOCAL(x80000000),DBL0L
+	negc	DBL0H,DBLRH
+	mov.l	LOCAL(x00100000),DBL0H
+	tst	r3,DBLRH
+	not	r8,r8
+	bt/s	LOCAL(long_norm)
+	and	DBL0L,r8
+	bra	LOCAL(norm_loop)
+	 not	DBL0L,r3
+
+LOCAL(large_shift_arg0):
+	add	#64,r2
+
+	mov	#0,r9
+	cmp/pl	r2
+	shld	r2,r1
+	bf	LOCAL(ret_arg1_exp_r3)
+	cmp/hi	r9,DBL0L
+	mov	DBL0H,DBL0L
+	mov	r9,DBL0H
+	addc	r1,r9
+	div0s	r8,r0	! compare signs
+	mov	r3,r2	! tentative result exponent
+	bf	LOCAL(add)
+	clrt
+	negc	r9,r9
+	bra	LOCAL(subtract_arg0_arg1_done)
+	 mov	DBL1L,DBLRL
+
+LOCAL(arg1_gt):
+	tst	r0,r2
+	mov	#-20,r9
+	bt/s	LOCAL(denorm_arg0)
+	cmp/hs	r0,r3
+	bt	LOCAL(inf_nan_arg1)
+	sub	r3,r2
+LOCAL(denorm_arg0_done):
+	shad	r9,r2
+	mov.w	LOCAL(m32),r9
+	mov	DBL1H,r8	! tentative result sign
+	and	r1,DBL1H
+	mov	DBL0H,r0	! the 'other' sign
+	and	r1,DBL0H
+	cmp/ge	r9,r2
+	mov	DBL0H,r1
+	shld	r2,DBL0H
+	bf	LOCAL(large_shift_arg0)
+	mov	DBL0L,r9
+	shld	r2,DBL0L
+	add	#32,r2
+	mov.l	r3,@-r15
+	shld	r2,r1
+	mov	r2,r3
+	div0s	r8,r0		! compare signs
+	mov.l	@r15+,r2	! tentative result exponent
+	shld	r3,r9
+	bf/s	LOCAL(add)
+	or	r1,DBL0L
+	clrt
+	negc	r9,r9
+	mov	DBL1L,DBLRL
+LOCAL(subtract_arg0_arg1_done):
+	subc	DBL0L,DBLRL
+	mov	DBL1H,DBLRH
+	mov.l	LOCAL(x001f0000),r3
+	subc	DBL0H,DBLRH
+/* Since the exponents were different, the difference is positive.  */
+/* Fall through */
+LOCAL(subtract_done):
+/* First check if a shift by a few bits is sufficient.  This not only
+   speeds up this case, but also alleviates the need for considering
+   lower bits from r9 or rounding in the other code.
+   Moreover, by handling the upper 1+4 bits of the fraction here, long_norm
+   can assume that DBLRH fits into 20 (20 < 16) bit.  */
+	tst	r3,DBLRH
+	mov.l	LOCAL(x80000000),r3
+	mov.l	LOCAL(x00100000),DBL0H
+	bt/s	LOCAL(long_norm)
+	and	r3,r8
+	mov.l	LOCAL(x7fffffff),r3
+LOCAL(norm_loop):	! Well, this used to be a loop...
+	tst	DBL0H,DBLRH
+	sub	DBL0H,r2
+	bf	LOCAL(norm_round)
+	shll	r9
+	rotcl	DBLRL
+
+	rotcl	DBLRH
+
+	tst	DBL0H,DBLRH
+	sub	DBL0H,r2
+	bf	LOCAL(norm_round)
+	shll	DBLRL
+	rotcl	DBLRH
+	mov.l	@r15+,r9
+	cmp/gt	r2,DBL0H
+	sub	DBL0H,r2
+LOCAL(norm_loop_1):
+	bt	LOCAL(denorm0_n)
+	tst	DBL0H,DBLRH
+	bf	LOCAL(norm_pack)
+	shll	DBLRL
+	rotcl	DBLRH	! clears T
+	bra	LOCAL(norm_loop_1)
+	 subc	DBL0H,r2
+
+LOCAL(no_carry):
+	shlr	r0
+	mov.l	LOCAL(x000fffff),DBLRH
+	addc	r3,r9
+	mov.w	LOCAL(d0),DBL1H
+	mov	DBL0L,DBLRL
+	and	DBL0H,DBLRH	! mask out implicit 1
+	mov.l	LOCAL(x7ff00000),r3
+	addc	DBL1H,DBLRL
+	addc	r2,DBLRH
+	mov.l	@r15+,r9
+	add	DBL1H,DBLRH	! fraction overflow -> exp increase
+	bra	LOCAL(add_done)
+	 cmp/hi	r3,DBLRH
+
+LOCAL(denorm_arg0):
+	bt	LOCAL(inf_nan_arg1)
+	mov.l	LOCAL(x00100000),r2
+	shlr	r1
+	bra	LOCAL(denorm_arg0_done)
+	 sub	r3,r2
+
+LOCAL(inf_nan_arg1):
+	mov	DBL1L,DBLRL
+	bra	LOCAL(pop_r8_r9)
+	 mov	DBL1H,DBLRH
+
+LOCAL(ret_arg1_exp_r3):
+	mov.l	LOCAL(x800fffff),DBLRH
+	bra	LOCAL(ret_arg)
+	 mov	DBL1L,DBLRL
+
+#ifdef __pic__
+	.balign 8
+#endif
+LOCAL(m32):
+	.word	-32
+LOCAL(d0):
+	.word	0
+#ifndef __pic__
+	.balign 8
+#endif
+! Because we had several bits of cancellations, we know that r9 contains
+! only one bit.
+! We'll normalize by shifting words so that DBLRH:DBLRL contains
+! the fraction with 0 < DBLRH <= 0x1fffff, then we shift DBLRH:DBLRL
+! up by 21 minus the number of non-zero bits in DBLRH.
+LOCAL(long_norm):
+	tst	DBLRH,DBLRH
+	mov.w	LOCAL(xff),DBL0L
+	mov	#21,r3
+	bf	LOCAL(long_norm_highset)
+	mov.l	LOCAL(x02100000),DBL1L	! shift 32, implicit 1
+	tst	DBLRL,DBLRL
+	extu.w	DBLRL,DBL0H
+	bt	LOCAL(zero_or_ulp)
+	mov	DBLRL,DBLRH
+	cmp/hi	DBL0H,DBLRL
+	bf	0f
+	mov.l	LOCAL(x01100000),DBL1L	! shift 16, implicit 1
+	clrt
+	shlr16  DBLRH
+	xtrct	DBLRL,r9
+	mov     DBLRH,DBL0H
+LOCAL(long_norm_ulp_done):
+0:	mov	r9,DBLRL	! DBLRH:DBLRL == fraction; DBL0H == DBLRH
+	subc	DBL1L,r2
+	bt	LOCAL(denorm1_b)
+#ifdef __pic__
+	mov.l	LOCAL(c__clz_tab),DBL1H
+LOCAL(long_norm_lookup):
+	mov	r0,r9
+	mova	LOCAL(c__clz_tab),r0
+	add	DBL1H,r0
+#else
+	mov	r0,r9
+LOCAL(long_norm_lookup):
+	mov.l	LOCAL(c__clz_tab),r0
+#endif /* __pic__ */
+	cmp/hi	DBL0L,DBL0H
+	bf	0f
+	shlr8	DBL0H
+0:	mov.b	@(r0,DBL0H),r0
+	bf	0f
+	add	#-8,r3
+0:	mov.w	LOCAL(d20),DBL0L
+	mov	#-20,DBL0H
+	clrt
+	sub	r0,r3
+	mov	r9,r0
+	mov	r3,DBL1H
+	shld	DBL0L,DBL1H
+	subc	DBL1H,r2
+	!
+	bf	LOCAL(no_denorm)
+	shad	DBL0H,r2
+	bra	LOCAL(denorm1_done)
+	add	r2,r3
+	
+LOCAL(norm_round):
+	cmp/pz	r2
+	mov	#0,DBL1H
+	bf	LOCAL(denorm0_1)
+	or	r8,r2
+	mov	DBLRL,DBL1L
+	shlr	DBL1L
+	addc	r3,r9
+	mov.l	@r15+,r9
+	addc	DBL1H,DBLRL	! round to even
+	mov.l	@r15+,r8
+	rts
+	 addc	r2,DBLRH
+
+LOCAL(norm_pack):
+	add	r8,DBLRH
+	mov.l	@r15+,r8
+	rts
+	add	r2,DBLRH
+
+LOCAL(denorm0_1):
+	mov.l	@r15+,r9
+	mov	r8,DBL0L
+	mov.l	@r15+,r8
+LOCAL(denorm0_shift):
+	shlr	DBLRH
+	rotcr	DBLRL
+
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(denorm0_n):
+	mov	r8,DBL0L
+	addc	DBL0H,r2
+	mov.l	@r15+,r8
+	bf	LOCAL(denorm0_shift)
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(no_denorm):
+	add	r2,r8		! add (exponent - 1) to sign
+
+LOCAL(denorm1_done):
+	shld	r3,DBLRH
+	mov	DBLRL,DBL0L
+	shld	r3,DBLRL
+
+	add	r8,DBLRH	! add in sign and (exponent - 1)
+	mov.l	@r15+,r9
+	add	#-32,r3
+	mov.l	@r15+,r8
+	shld	r3,DBL0L
+
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(long_norm_highset):
+	mov.l	LOCAL(x00200000),DBL1L	! shift 1, implicit 1
+	shll	r9
+	rotcl	DBLRL
+	mov	DBLRH,DBL0H
+	rotcl	DBLRH	! clears T
+#ifdef __pic__
+	mov.l	LOCAL(c__clz_tab),DBL1H
+#else
+	mov	r0,r9
+#endif /* __pic__ */
+	subc	DBL1L,r2
+	add	#-1,r3
+	bf	LOCAL(long_norm_lookup)
+LOCAL(denorm1_a):
+	shlr	DBLRH
+	rotcr	DBLRL
+	mov.l	@r15+,r9
+	or	r8,DBLRH
+
+	rts
+	mov.l	@r15+,r8
+
+	.balign	4
+LOCAL(denorm1_b):
+	mov	#-20,DBL0L
+	shad	DBL0L,r2
+	mov	DBLRH,DBL0L
+	shld	r2,DBLRH
+	shld	r2,DBLRL
+	or	r8,DBLRH
+	mov.l	@r15+,r9
+	add	#32,r2
+	mov.l	@r15+,r8
+	shld	r2,DBL0L
+	rts
+	or	DBL0L,DBLRL
+
+LOCAL(zero_or_ulp):
+	tst	r9,r9
+	bf	LOCAL(long_norm_ulp_done)
+	! return +0.0
+LOCAL(pop_r8_r9):
+	mov.l	@r15+,r9
+	rts
+	mov.l	@r15+,r8
+
+LOCAL(d20):
+	.word	20
+LOCAL(xff):
+	.word 0xff
+	.balign	4
+LOCAL(x7ff00000):
+	.long	0x7ff00000
+LOCAL(x001fffff):
+	.long	0x001fffff
+LOCAL(x80000000):
+	.long	0x80000000
+LOCAL(x000fffff):
+	.long	0x000fffff
+LOCAL(x800fffff):
+	.long	0x800fffff
+LOCAL(x001f0000):
+	.long	0x001f0000
+LOCAL(x00200000):
+	.long	0x00200000
+LOCAL(x7fffffff):
+	.long	0x7fffffff
+LOCAL(x00100000):
+	.long	0x00100000
+LOCAL(x02100000):
+	.long	0x02100000
+LOCAL(x01100000):
+	.long	0x01100000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(adddf3))
+ENDFUNC(GLOBAL(subdf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/addsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/addsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/addsf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/addsf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,290 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! addsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+#ifdef L_add_sub_sf3
+	.balign 4
+	.global GLOBAL(subsf3)
+	FUNC(GLOBAL(subsf3))
+	.global GLOBAL(addsf3)
+	FUNC(GLOBAL(addsf3))
+GLOBAL(subsf3):
+	cmp/pz	r5
+	add	r5,r5
+	rotcr	r5
+	.balign 4
+GLOBAL(addsf3):
+	mov.l	LOCAL(x7f800000),r3
+	mov	r4,r6
+	add	r6,r6
+	mov	r5,r7
+	add	r7,r7
+	mov	r4,r0
+	or	r3,r0
+	cmp/hi	r6,r7
+	mov	r5,r1
+	bf/s	LOCAL(r4_hs)
+	 or	r3,r1
+	cmp/eq	r5,r1
+	bt	LOCAL(ret_r5) /* sole Inf or NaN, return unchanged.  */
+	shll8	r0	! r4 fraction
+	shll8	r1	! r5 fraction
+	mov	r6,r3
+	mov	#-24,r2
+	mov	r7,r6
+	shld	r2,r6	! r5 exp
+	mov	r0,r7
+	shld	r2,r3	! r4 exp
+	tst	r6,r6
+	sub	r6,r3	! exp difference (negative or 0)
+	bt	LOCAL(denorm_r4)
+LOCAL(denorm_r4_done): ! r1: u1.31
+	shld	r3,r0	! Get 31 upper bits, including 8 guard bits
+	mov.l	LOCAL(xff000000),r2
+	add	#31,r3
+	mov.l	r5,@-r15 ! push result sign.
+	cmp/pl	r3	! r0 has no more than one bit set -> return arg 1
+	shld	r3,r7	! copy of lowest guard bit in r0 and lower guard bits
+	bf	LOCAL(ret_stack)
+	div0s	r4,r5
+	bf/s	LOCAL(add)
+	 cmp/pl	r7	/* Is LSB in r0 clear, but any lower guards bit set?  */
+	subc	r0,r1
+	mov.l	LOCAL(c__clz_tab),r7
+	tst	r2,r1
+	mov	#-24,r3
+	bf/s LOCAL(norm_r0)
+	 mov	r1,r0
+	extu.w	r1,r1
+	bra	LOCAL(norm_check2)
+	 cmp/eq	r0,r1
+LOCAL(ret_r5):
+	rts
+	mov	r5,r0
+LOCAL(ret_stack):
+	rts
+	mov.l	@r15+,r0
+
+/* We leave the numbers denormalized, but we change the bit position to be
+   consistent with normalized numbers.  This also removes the spurious
+   leading one that was inserted before.  */
+LOCAL(denorm_r4):
+	tst	r3,r3
+	bf/s	LOCAL(denorm_r4_done)
+	add	r0,r0
+	bra	LOCAL(denorm_r4_done)
+	add	r1,r1
+LOCAL(denorm_r5):
+	tst	r6,r6
+	add	r1,r1
+	bf	LOCAL(denorm_r5_done)
+	clrt
+	bra	LOCAL(denorm_r5_done)
+	add	r0,r0
+
+/* If the exponent differs by two or more, normalization is minimal, and
+   few guard bits are needed for an exact final result, so sticky guard
+   bit compresion before subtraction (or add) works fine.
+   If the exponent differs by one, only one extra guard bit is generated,
+   and effectively no guard bit compression takes place.  */
+
+	.balign	4
+LOCAL(r4_hs):
+	cmp/eq	r4,r0
+	mov	#-24,r3
+	bt	LOCAL(inf_nan_arg0)
+	shld	r3,r7
+	shll8	r0
+	tst	r7,r7
+	shll8	r1
+	mov.l	LOCAL(xff000000),r2
+	bt/s	LOCAL(denorm_r5)
+	shld	r3,r6
+LOCAL(denorm_r5_done):
+	mov	r1,r3
+	subc	r6,r7
+	bf	LOCAL(same_exp)
+	shld	r7,r1	/* Get 31 upper bits.  */
+	add	#31,r7
+	mov.l	r4,@-r15 ! push result sign.
+	cmp/pl	r7
+	shld	r7,r3
+	bf	LOCAL(ret_stack)
+	div0s	r4,r5
+	bf/s	LOCAL(add)
+	 cmp/pl	r3	/* Is LSB in r1 clear, but any lower guard bit set?  */
+	subc	r1,r0
+	mov.l	LOCAL(c__clz_tab),r7
+LOCAL(norm_check):
+	tst	r2,r0
+	mov	#-24,r3
+	bf LOCAL(norm_r0)
+	extu.w	r0,r1
+	cmp/eq	r0,r1
+LOCAL(norm_check2):
+	mov	#-8,r3
+	bt LOCAL(norm_r0)
+	mov	#-16,r3
+LOCAL(norm_r0):
+	mov	r0,r1
+	shld	r3,r0
+#ifdef __pic__
+	add	r0,r7
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r7),r7
+	add	#25,r3
+	add	#-9+1,r6
+	mov	r1,r0
+	sub	r7,r3
+	mov.l	LOCAL(xbfffffff),r7
+	sub	r3,r6	/* generate exp-1  */
+	mov.w	LOCAL(d24),r2
+	cmp/pz	r6	/* check exp > 0  */
+	shld	r3,r0	/* Leading 1 becomes +1 exp adjustment.  */
+	bf	LOCAL(zero_denorm)
+LOCAL(denorm_done):
+	add	#30,r3
+	shld	r3,r1
+	mov.w   LOCAL(m1),r3
+	tst	r7,r1	! clear T if rounding up
+	shld	r2,r6
+	subc	r3,r0	! round - overflow will boost exp adjustment to 2.
+	mov.l	@r15+,r2
+	add	r6,r0	! overflow will generate inf
+	cmp/ge	r2,r3	! get sign into T
+	rts
+	rotcr	r0
+LOCAL(ret_r4):
+	rts
+	mov	r4,r0
+
+/* At worst, we are shifting the number back in place where an incoming
+   denormal was.  Thus, the shifts won't get out of range.  They still
+   might generate a zero fraction, but that's OK, that makes it 0.  */
+LOCAL(zero_denorm):
+	add	r6,r3
+	mov	r1,r0
+	mov	#0,r6	/* leading one will become free (except for rounding) */
+	bra	LOCAL(denorm_done)
+	shld	r3,r0
+
+/* Handle abs(r4) >= abs(r5), same exponents specially so we don't need
+   check for a zero fraction in the main path.  */
+LOCAL(same_exp):
+	div0s	r4,r5
+	mov.l	r4,@-r15
+	bf	LOCAL(add)
+	cmp/eq	r1,r0
+	mov.l	LOCAL(c__clz_tab),r7
+	bf/s	LOCAL(norm_check)
+	 sub	r1,r0
+	rts	! zero difference -> return +zero
+	mov.l	@r15+,r1
+
+/* r2: 0xff000000 */
+LOCAL(add):
+	addc	r1,r0
+	mov.w	LOCAL(x2ff),r7
+	shll8	r6
+	bf/s	LOCAL(no_carry)
+	shll16	r6
+	tst	r7,r0
+	shlr8	r0
+	mov.l	@r15+,r3	! discard saved sign
+	subc	r2,r0
+	sett
+	addc	r6,r0
+	cmp/hs	r2,r0
+	bt/s	LOCAL(inf)
+	 div0s	r7,r4 /* Copy sign.  */
+	rts
+	rotcr	r0
+LOCAL(inf):
+	mov	r6,r0
+	rts
+	rotcr	r0
+LOCAL(no_carry):
+	mov.w	LOCAL(m1),r3
+	tst	r6,r6
+	bt	LOCAL(denorm_add)
+	add	r0,r0
+	tst	r7,r0		! check if lower guard bit set or round to even
+	shlr8	r0
+	mov.l	@r15+,r1	! discard saved sign
+	subc	r3,r0	! round ; overflow -> exp++
+	cmp/ge	r4,r3	/* Copy sign.  */
+	add	r6,r0	! overflow -> inf
+	rts
+	rotcr	r0
+
+LOCAL(denorm_add):
+	cmp/ge	r4,r3	/* Copy sign.  */
+	shlr8	r0
+	mov.l	@r15+,r1	! discard saved sign
+	rts
+	rotcr	r0
+
+LOCAL(inf_nan_arg0):
+	cmp/eq	r5,r1
+	bf	LOCAL(ret_r4)
+	div0s	r4,r5		/* Both are inf or NaN, check signs.  */
+	bt	LOCAL(ret_nan)	/* inf - inf, or NaN.  */
+	mov	r4,r0		! same sign; return NaN if either is NaN.
+	rts
+	or	r5,r0
+LOCAL(ret_nan):
+	rts
+	mov	#-1,r0
+
+LOCAL(d24):
+	.word	24
+LOCAL(x2ff):
+	.word	0x2ff
+LOCAL(m1):
+	.word	-1
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(xbfffffff):
+	.long	0xbfffffff
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(xfe000000):
+	.long	0xfe000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+
+	ENDFUNC(GLOBAL(addsf3))
+	ENDFUNC(GLOBAL(subsf3))
+#endif /* L_add_sub_sf3 */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3-rt.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3-rt.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3-rt.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3-rt.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,519 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* This version is not quite finshed, since I've found that I can
+   get better average performance with a slightly altered algorithm.
+   Still, if you want a version for hard real time, this version here might
+   be a good starting point, since it has effectively no conditional
+   branches in the path that deals with normal numbers
+   (branches with zero offset are effectively conditional execution),
+   and thus it has a uniform execution time in this path.  */
+
+/* y = 1/x  ; x (- [1,2)
+   y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+   y1 = y0 - ((y0) * x - 1) * y0  =  y-x*d^2
+   y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+   z0 = y2*a ;  a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+   z1 = y2*a1 (round to nearest odd 0.5 ulp);
+   a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+   z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+   Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+   with suitable scaling and/or top truncation.
+   x truncated to 20 bits is sufficient to calculate y0 or even y1.
+   Table entries are adjusted by about +128 to use full signed byte range.
+   This adjustment has been perturbed slightly to allow cse with the
+   shift count constant -26.
+   The threshold point for the shift adjust before rounding is found by
+   comparing the fractions, which is exact, unlike the top bit of y2.
+   Therefore, the top bit of y2 becomes slightly random after the adjustment
+   shift, but that's OK because this can happen only at the boundaries of
+   the interval, and the baising of the error means that it can in fact happen
+   only at the bottom end.  And there, the carry propagation will make sure
+   that in the end we will have in effect an implicit 1 (or two whem rounding
+   up...)  */
+/* If an exact result exists, it can have no more bits than the divident.
+   Hence, we don't need to bother with the round-to-even tie breaker
+   unless the result is denormalized.  */
+/* 70 cycles through main path for sh4-300 .  Some cycles might be
+   saved by more careful register allocation.
+   122 cycles for sh4-200.  If execution time for sh4-200 is of concern,
+   a specially scheduled version makes sense.  */
+
+#define x_h r12
+#define yn  r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too.  We still have to come back to denorm_arg1_done,
+   since we heven't done any of the work yet that we do till the denorm_arg0
+   entry point.  We know that neither of the arguments is inf/nan, but
+   arg0 might be zero.  Check for that first to avoid having to establish an
+   rts return address.  */
+LOCAL(both_denorm):
+	mov.l	r9,@-r15
+	mov	DBL0H,r1
+	mov.l	r0,@-r15
+	shll2	r1
+	mov.w LOCAL(both_denorm_cleanup_off),r9
+	or	DBL0L,r1
+	tst	r1,r1
+	mov	DBL0H,r0
+	bf/s	LOCAL(zero_denorm_arg0_1)
+	shll2	r0
+	mov.l	@(4,r15),r9
+	add	#8,r15
+	bra	LOCAL(ret_inf_nan_0)
+	mov	r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+	mov.l	@r15+,r0
+	!
+	mov.l	@r15+,r9
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+	bra	LOCAL(denorm_arg1_done)
+	!
+	add	r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+   (implicit 1).  To leave the result exponent unaltered, the other
+   argument's exponent is adjusted by the the shift count.  */
+
+	.balign 4
+LOCAL(arg0_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL0L,r0
+	shll	DBL0H
+	add	#1,r0
+	mov	DBL0L,DBL0H
+	shld	r0,DBL0H
+	rotcr	DBL0H
+	tst	DBL0L,DBL0L	/* Check for divide of zero.  */
+	add	#-33,r0
+	shld	r0,DBL0L
+	bf/s	LOCAL(adjust_arg1_exp)
+	add	#64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign.  */
+	mov.l	@r15+,r10
+	mov	#0,DBLRH
+	mov.l	@r15+,r9
+	bra	LOCAL(ret_inf_nan_0)
+	mov.l	@r15+,r8
+
+	.balign 4
+LOCAL(arg1_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL1L,r0
+	shll	DBL1H
+	add	#1,r0
+	mov	DBL1L,DBL1H
+	shld	r0,DBL1H
+	rotcr	DBL1H
+	tst	DBL1L,DBL1L	/* Check for divide by zero.  */
+	add	#-33,r0
+	shld	r0,DBL1L
+	bf/s	LOCAL(adjust_arg0_exp)
+	add	#64,r0
+	mov	DBL0H,r0
+	add	r0,r0
+	tst	r0,r0	! 0 / 0 ?
+	mov	#-1,DBLRH
+	bf	LOCAL(return_inf)
+	!
+	bt	LOCAL(ret_inf_nan_0)
+	!
+
+	.balign 4
+LOCAL(zero_denorm_arg1):
+	not	DBL0H,r3
+	mov	DBL1H,r0
+	tst	r2,r3
+	shll2	r0
+	bt	LOCAL(early_inf_nan_arg0)
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg1_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	!
+	shll	DBL1H
+	mov	DBL1L,r3
+	shld	r0,DBL1H
+	shld	r0,DBL1L
+	rotcr	DBL1H
+	add	#-32,r0
+	shld	r0,r3
+	add	#32,r0
+	or	r3,DBL1H
+LOCAL(adjust_arg0_exp):
+	tst	r2,DBL0H
+	mov	#20,r3
+	shld	r3,r0
+	bt	LOCAL(both_denorm)
+	add	DBL0H,r0
+	div0s	r0,DBL0H	! Check for obvious overflow.  */
+	not	r0,r3		! Check for more subtle overflow - lest
+	bt	LOCAL(return_inf)
+	mov	r0,DBL0H
+	tst	r2,r3		! we mistake it for NaN later
+	mov	#12,r3
+	bf	LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign.  */
+	mov	#20,r3
+	mov	#-2,DBLRH
+	bra	LOCAL(ret_inf_nan_0)
+	shad	r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan  nan/x -> nan */
+LOCAL(inf_nan_arg0):
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+LOCAL(early_inf_nan_arg0):
+	not	DBL1H,r3
+	mov	DBL0H,DBLRH
+	tst	r2,r3	! both inf/nan?
+	add	DBLRH,DBLRH
+	bf	LOCAL(ret_inf_nan_0)
+	mov	#-1,DBLRH
+LOCAL(ret_inf_nan_0):
+	mov	#0,DBLRL
+	mov.l	@r15+,r12
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+/* Already handled: inf/x, nan/x .  Thus: x/inf -> 0; x/nan -> nan */
+	.balign	4
+LOCAL(inf_nan_arg1):
+	mov	DBL1H,r2
+	mov	#12,r1
+	shld	r1,r2
+	mov.l	@r15+,r10
+	mov	#0,DBLRL
+	mov.l	@r15+,r9
+	or	DBL1L,r2
+	mov.l	@r15+,r8
+	cmp/hi	DBLRL,r2
+	mov.l	@r15+,r12
+	subc	DBLRH,DBLRH
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+	.balign 4
+LOCAL(zero_denorm_arg0):
+	mov.w	LOCAL(denorm_arg0_done_off),r9
+	not	DBL1H,r1
+	mov	DBL0H,r0
+	tst	r2,r1
+	shll2	r0
+	bt	LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg0_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	shll	DBL0H
+	mov	DBL0L,r12
+	shld	r0,DBL0H
+	shld	r0,DBL0L
+	rotcr	DBL0H
+	add	#-32,r0
+	shld	r0,r12
+	add	#32,r0
+	or	r12,DBL0H
+LOCAL(adjust_arg1_exp):
+	mov	#20,r12
+	shld	r12,r0
+	add	DBL1H,r0
+	div0s	r0,DBL1H	! Check for obvious underflow.  */
+	not	r0,r12		! Check for more subtle underflow - lest
+	bt	LOCAL(return_0)
+	mov	r0,DBL1H
+	tst	r2,r12		! we mistake it for NaN later
+	bt	LOCAL(return_0)
+	!
+	braf	r9
+	mov	#13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00):	.word 0xff00
+LOCAL(denorm_arg0_done_off):
+	.word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+	.word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign	8
+GLOBAL(divdf3):
+ mov.l	LOCAL(x7ff00000),r2
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst	r2,DBL1H
+ mov.l	r12,@-r15
+ bt	LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov	DBL1H,x_h	! x_h live in r12
+ shld	r3,x_h	! x - 1 ; u0.20
+ mov	x_h,yn
+ mova	LOCAL(ytab),r0
+ mov.l	r8,@-r15
+ shld	r1,yn	! x-1 ; u26.6
+ mov.b	@(r0,yn),yn
+ mov	#6,r0
+ mov.l	r9,@-r15
+ mov	x_h,r8
+ mov.l	r10,@-r15
+ shlr16	x_h	! x - 1; u16.16	! x/2 - 0.5 ; u15.17
+ add	x_h,r1	! SH4-200 single-issues this insn
+ shld	r0,yn
+ sub	r1,yn	! yn := y0 ; u15.17
+ mov	DBL1L,r1
+ mov	#-20,r10
+ mul.l	yn,x_h	! r12 dead
+ swap.w	yn,r9
+ shld	r10,r1
+ sts	macl,r0	! y0 * (x-1) - n ; u-1.32
+ add	r9,r0	! y0 * x - 1     ; s-1.32
+ tst	r2,DBL0H
+ dmuls.l r0,yn
+ mov.w	LOCAL(d13),r0
+ or	r1,r8	! x  - 1; u0.32
+ add	yn,yn	! yn = y0 ; u14.18
+ bt	LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):	! This label must stay aligned.
+ sts	mach,r1	!      d0 ; s14.18
+ sub	r1,yn	! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov	DBL0L,r12
+ shld	r0,yn	! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w	LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld	r10,r12
+ mov	yn,r0
+ mov	DBL0H,r8
+ add	yn,yn	! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts	mach,r1	! y1 * (x-1); u1.31
+ add	r0,r1	! y1 * x    ; u1.31
+ dmulu.l yn,r1
+ not	DBL0H,r10
+ shld	r9,r8
+ tst	r2,r10
+ or	r8,r12	! a - 1; u0.32
+ bt	LOCAL(inf_nan_arg0)
+ sts	mach,r1	! d1+yn; u1.31
+ sett		! adjust y2 so that it can be interpreted as s1.31
+ not	DBL1H,r10
+ subc	r1,yn	! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l	LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst	r2,r10
+ or	DBL1H,r2
+ bt	LOCAL(inf_nan_arg1)
+ mov.l	r11,@-r15
+ sts	mach,r11	! y2*(a-1) ; u1.31
+ add	yn,r11		! z0       ; u1.31
+ dmulu.l r11,DBL1L
+ mov.l	LOCAL(x40000000),DBLRH	! bias + 1
+ and	r9,r2		! x ; u12.20
+ cmp/hi	DBL0L,DBL1L
+ sts	macl,r8
+ mov	#-24,r12
+ sts	mach,r9 	! r9:r8 := z0 * DBL1L; u-19.64
+ subc	DBL1H,DBLRH
+ mul.l	r11,r2  	! (r9+macl):r8 == z0*x; u-19.64
+ shll	r8
+ add	DBL0H,DBLRH	! result sign/exponent + 1
+ mov	r8,r10
+ sts	macl,DBLRL
+ add	DBLRL,r9
+ rotcl	r9		! r9:r8 := z*x; u-20.63
+ shld	r12,r10
+ mov.l	LOCAL(x7fe00000),DBLRL
+ sub	DBL0L,r9	! r9:r8 := -a ; u-20.63
+ mov.l	LOCAL(x00200000),r12
+FIXME: the following  shift might loose the sign.
+ shll8	r9
+ or	r10,r9	! -a1 ; s-28.32
+ mov.l	LOCAL(x00100000),r10
+ dmuls.l r9,yn	! r3 dead
+ mov	DBL1H,r3
+ mov.l LOCAL(xfff00000),DBL0L
+ xor	DBL0H,r3	! calculate expected sign & bit20
+ div0s	r3,DBLRH
+ xor	DBLRH,r3
+ bt	LOCAL(ret_denorm_inf)
+ tst	DBLRL,DBLRH
+ bt	LOCAL(ret_denorm)
+ sub	r12,DBLRH ! calculate sign / exponent minus implicit 1
+ tst	r10,r3	! set T if a >= x
+ sts	mach,r12! -z1 ; s-27.32
+ bt	0f
+ add	r11,r11	! z0 ; u1.31 / u0.31
+0: mov	#6,r3
+ negc	r3,r10 ! shift count := a >= x ? -7 : -6; T := 1
+ shll8	r8	! r9:r8 := -a1 ; s-28.64
+ shad	r10,r12	! -z1 ; truncate to s-20.32 / s-21.32
+ rotcl	r12	! -z1 ; s-21.32 / s-22.32 / round to odd 0.5 ulp ; T := sign
+ add	#20,r10
+ dmulu.l r12,DBL1L ! r12 signed, DBL1L unsigned
+ and	DBL0L,DBLRH	! isolate sign / exponent
+ shld	r10,r9
+ mov	r8,r3
+ shld	r10,r8
+ sts	macl,DBL0L
+ sts	mach,DBLRL
+ add	#-32,r10
+ shld	r10,r3
+ mul.l r12,r2
+ bf	0f	! adjustment for signed/unsigned multiply
+ sub	DBL1L,DBLRL	! DBL1L dead
+0: shar	r12	! -z1 ; truncate to s-20.32 / s-21.32
+ sts	macl,DBL1L
+ or	r3,r9	! r9:r8 := -a1 ;             s-41.64/s-42.64
+ !
+ cmp/hi	r8,DBL0L
+ add	DBLRL,DBL1L ! DBL1L:DBL0L := -z1*x ; s-41.64/s-42.64
+ subc	DBL1L,r9
+ not	r12,DBLRL ! z1, truncated to s-20.32 / s-21.32
+ shll	r9	! T :=  a2 > 0
+ mov	r11,r2
+ mov	#21,r7
+ shld	r7,r11
+ addc	r11,DBLRL
+ mov.l	@r15+,r11
+ mov.l	@r15+,r10
+ mov	#-11,r7
+ mov.l	@r15+,r9
+ shld	r7,r2
+ mov.l	@r15+,r8
+ addc	r2,DBLRH
+ rts
+ mov.l	@r15+,r12
+
+LOCAL(ret_denorm):
+	tst	r10,DBLRH
+	bra	LOCAL(denorm_have_count)
+	movt	DBLRH	! calculate shift count (off by 2)
+
+LOCAL(ret_denorm_inf):
+	mov	DBLRH,r12
+	add	r12,r12
+	cmp/pz	r12
+	mov	#-21,DBLRL
+	bt	LOCAL(ret_inf_late)
+	shld	DBLRL,DBLRH
+LOCAL(denorm_have_count):
+	add	#-2,DBLRH
+/* FIXME */
+	bra	LOCAL(return_0)
+	mov.l	@r15+,r11
+
+LOCAL(ret_inf_late):
+	mov.l	@r15+,r11
+	!
+	mov.l	@r15+,r10
+	!
+	mov.l	@r15+,r9
+	bra	LOCAL(return_inf)
+	mov.l	@r15+,r8
+
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#8-11,r9
+	xtrct	r0,r8
+	add	#16,r9
+0:	tst	r12,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(d1):	.word 1
+LOCAL(d12):	.word 12
+LOCAL(d13):	.word 13
+
+	.balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+        .byte   120, 105,  91,  78,  66,  54,  43,  33
+        .byte    24,  15,   8,   0,  -5, -12, -17, -22
+        .byte   -27, -31, -34, -37, -40, -42, -44, -45
+        .byte   -46, -46, -47, -46, -46, -45, -44, -42
+        .byte   -41, -39, -36, -34, -31, -28, -24, -20
+        .byte   -17, -12,  -8,  -4,   0,   5,  10,  16
+        .byte    21,  27,  33,  39,  45,  52,  58,  65
+        .byte    72,  79,  86,  93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divdf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,608 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* y = 1/x  ; x (- [1,2)
+   y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+   y1 = y0 - ((y0) * x - 1) * y0  =  y-x*d^2
+   y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+   z0 = y2*a ;  a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+   z1 = y2*a1 (round to nearest odd 0.5 ulp);
+   a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+   z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+   Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+   with suitable scaling and/or top truncation.
+   We use a slightly modified algorithm here that checks if the lower
+   bits in z1 are sufficient to determine the outcome of rounding - in that
+   case a2 is not computed.
+   -z1 is computed in units of 1/128 ulp, with an error in the range
+   -0x3.e/128 .. +0 ulp.
+   Thus, after adding three, the result can be safely rounded for normal
+   numbers if any of the bits 5..2 is set, or if the highest guard bit
+   (bit 6 if y <1, otherwise bit 7) is set.
+   (Because of the way truncation works, we would be fine for an open
+    error interval of (-4/128..+1/128) ulp )
+   For denormal numbers, the rounding point lies higher, but it would be
+   quite cumbersome to calculate where exactly; it is sufficient if any
+   of the bits 7..3 is set.
+   x truncated to 20 bits is sufficient to calculate y0 or even y1.
+   Table entries are adjusted by about +128 to use full signed byte range.
+   This adjustment has been perturbed slightly to allow cse with the
+   shift count constant -26.
+   The threshold point for the shift adjust before rounding is found by
+   comparing the fractions, which is exact, unlike the top bit of y2.
+   Therefore, the top bit of y2 becomes slightly random after the adjustment
+   shift, but that's OK because this can happen only at the boundaries of
+   the interval, and the biasing of the error means that it can in fact happen
+   only at the bottom end.  And there, the carry propagation will make sure
+   that in the end we will have in effect an implicit 1 (or two whem rounding
+   up...)  */
+/* If an exact result exists, it can have no more bits than the divident.
+   Hence, we don't need to bother with the round-to-even tie breaker
+   unless the result is denormalized.  */
+/* 64 cycles through main path for sh4-300 (about 93.7% of normalized numbers),
+   82 for the path for rounding tie-breaking for normalized numbers
+   (including one branch mispredict).
+   Some cycles might be saved by more careful register allocation.  */
+
+#define x_h r12
+#define yn  r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too.  We still have to come back to denorm_arg1_done,
+   since we heven't done any of the work yet that we do till the denorm_arg0
+   entry point.  We know that neither of the arguments is inf/nan, but
+   arg0 might be zero.  Check for that first to avoid having to establish an
+   rts return address.  */
+LOCAL(both_denorm):
+	mov.l	r9,@-r15
+	mov	DBL0H,r1
+	mov.l	r0,@-r15
+	shll2	r1
+	mov.w LOCAL(both_denorm_cleanup_off),r9
+	or	DBL0L,r1
+	tst	r1,r1
+	mov	DBL0H,r0
+	bf/s	LOCAL(zero_denorm_arg0_1)
+	shll2	r0
+	mov.l	@(4,r15),r9
+	add	#8,r15
+	bra	LOCAL(ret_inf_nan_0)
+	mov	r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+	mov.l	@r15+,r0
+	!
+	mov.l	@r15+,r9
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+	bra	LOCAL(denorm_arg1_done)
+	!
+	add	r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+   (implicit 1).  To leave the result exponent unaltered, the other
+   argument's exponent is adjusted by the the shift count.  */
+
+	.balign 4
+LOCAL(arg0_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL0L,r0
+	shll	DBL0H
+	add	#1,r0
+	mov	DBL0L,DBL0H
+	shld	r0,DBL0H
+	rotcr	DBL0H
+	tst	DBL0L,DBL0L	/* Check for divide of zero.  */
+	add	#-33,r0
+	shld	r0,DBL0L
+	bf/s	LOCAL(adjust_arg1_exp)
+	add	#64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign.  */
+	mov.l	@r15+,r10
+	mov	#0,DBLRH
+	mov.l	@r15+,r9
+	bra	LOCAL(ret_inf_nan_0)
+	mov.l	@r15+,r8
+
+	.balign 4
+LOCAL(arg1_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL1L,r0
+	shll	DBL1H
+	add	#1,r0
+	mov	DBL1L,DBL1H
+	shld	r0,DBL1H
+	rotcr	DBL1H
+	tst	DBL1L,DBL1L	/* Check for divide by zero.  */
+	add	#-33,r0
+	shld	r0,DBL1L
+	bf/s	LOCAL(adjust_arg0_exp)
+	add	#64,r0
+	mov	DBL0H,r0
+	add	r0,r0
+	tst	r0,r0	! 0 / 0 ?
+	mov	#-1,DBLRH
+	bf	LOCAL(return_inf)
+	!
+	bt	LOCAL(ret_inf_nan_0)
+	!
+
+	.balign 4
+LOCAL(zero_denorm_arg1):
+	not	DBL0H,r3
+	mov	DBL1H,r0
+	tst	r2,r3
+	shll2	r0
+	bt	LOCAL(early_inf_nan_arg0)
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg1_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	!
+	shll	DBL1H
+	mov	DBL1L,r3
+	shld	r0,DBL1H
+	shld	r0,DBL1L
+	rotcr	DBL1H
+	add	#-32,r0
+	shld	r0,r3
+	add	#32,r0
+	or	r3,DBL1H
+LOCAL(adjust_arg0_exp):
+	tst	r2,DBL0H
+	mov	#20,r3
+	shld	r3,r0
+	bt	LOCAL(both_denorm)
+	add	DBL0H,r0
+	div0s	r0,DBL0H	! Check for obvious overflow.  */
+	not	r0,r3		! Check for more subtle overflow - lest
+	bt	LOCAL(return_inf)
+	mov	r0,DBL0H
+	tst	r2,r3		! we mistake it for NaN later
+	mov	#12,r3
+	bf	LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign.  */
+	mov	#20,r3
+	mov	#-2,DBLRH
+	bra	LOCAL(ret_inf_nan_0)
+	shad	r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan  nan/x -> nan */
+LOCAL(inf_nan_arg0):
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+LOCAL(early_inf_nan_arg0):
+	not	DBL1H,r3
+	mov	DBL0H,DBLRH
+	tst	r2,r3	! both inf/nan?
+	add	DBLRH,DBLRH
+	bf	LOCAL(ret_inf_nan_0)
+	mov	#-1,DBLRH
+LOCAL(ret_inf_nan_0):
+	mov	#0,DBLRL
+	mov.l	@r15+,r12
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+/* Already handled: inf/x, nan/x .  Thus: x/inf -> 0; x/nan -> nan */
+	.balign	4
+LOCAL(inf_nan_arg1):
+	mov	DBL1H,r2
+	mov	#12,r1
+	shld	r1,r2
+	mov.l	@r15+,r10
+	mov	#0,DBLRL
+	mov.l	@r15+,r9
+	or	DBL1L,r2
+	mov.l	@r15+,r8
+	cmp/hi	DBLRL,r2
+	mov.l	@r15+,r12
+	subc	DBLRH,DBLRH
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+	.balign 4
+LOCAL(zero_denorm_arg0):
+	mov.w	LOCAL(denorm_arg0_done_off),r9
+	not	DBL1H,r1
+	mov	DBL0H,r0
+	tst	r2,r1
+	shll2	r0
+	bt	LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg0_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	shll	DBL0H
+	mov	DBL0L,r12
+	shld	r0,DBL0H
+	shld	r0,DBL0L
+	rotcr	DBL0H
+	add	#-32,r0
+	shld	r0,r12
+	add	#32,r0
+	or	r12,DBL0H
+LOCAL(adjust_arg1_exp):
+	mov	#20,r12
+	shld	r12,r0
+	add	DBL1H,r0
+	div0s	r0,DBL1H	! Check for obvious underflow.  */
+	not	r0,r12		! Check for more subtle underflow - lest
+	bt	LOCAL(return_0)
+	mov	r0,DBL1H
+	tst	r2,r12		! we mistake it for NaN later
+	bt	LOCAL(return_0)
+	!
+	braf	r9
+	mov	#13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00):	.word 0xff00
+LOCAL(denorm_arg0_done_off):
+	.word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+	.word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign	8
+GLOBAL(divdf3):
+ mov.l	LOCAL(x7ff00000),r2
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst	r2,DBL1H
+ mov.l	r12,@-r15
+ bt	LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov	DBL1H,x_h	! x_h live in r12
+ shld	r3,x_h	! x - 1 ; u0.20
+ mov	x_h,yn
+ mova	LOCAL(ytab),r0
+ mov.l	r8,@-r15
+ shld	r1,yn	! x-1 ; u26.6
+ mov.b	@(r0,yn),yn
+ mov	#6,r0
+ mov.l	r9,@-r15
+ mov	x_h,r8
+ mov.l	r10,@-r15
+ shlr16	x_h	! x - 1; u16.16	! x/2 - 0.5 ; u15.17
+ add	x_h,r1	! SH4-200 single-issues this insn
+ shld	r0,yn
+ sub	r1,yn	! yn := y0 ; u15.17
+ mov	DBL1L,r1
+ mov	#-20,r10
+ mul.l	yn,x_h	! r12 dead
+ swap.w	yn,r9
+ shld	r10,r1
+ sts	macl,r0	! y0 * (x-1) - n ; u-1.32
+ add	r9,r0	! y0 * x - 1     ; s-1.32
+ tst	r2,DBL0H
+ dmuls.l r0,yn
+ mov.w	LOCAL(d13),r0
+ or	r1,r8	! x  - 1; u0.32
+ add	yn,yn	! yn = y0 ; u14.18
+ bt	LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):
+ sts	mach,r1	!      d0 ; s14.18
+ sub	r1,yn	! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov	DBL0L,r12
+ shld	r0,yn	! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w	LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld	r10,r12
+ mov	yn,r0
+ mov	DBL0H,r8
+ add	yn,yn	! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts	mach,r1	! y1 * (x-1); u1.31
+ add	r0,r1	! y1 * x    ; u1.31
+ dmulu.l yn,r1
+ not	DBL0H,r10
+ shld	r9,r8
+ tst	r2,r10
+ or	r8,r12	! a - 1; u0.32
+ bt	LOCAL(inf_nan_arg0)
+ sts	mach,r1	! d1+yn; u1.31
+ sett		! adjust y2 so that it can be interpreted as s1.31
+ not	DBL1H,r10
+ subc	r1,yn	! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l	LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst	r2,r10
+ or	DBL1H,r2
+ bt	LOCAL(inf_nan_arg1)
+ mov.l	r11,@-r15
+ sts	mach,r12	! y2*(a-1) ; u1.31
+ add	yn,r12		! z0       ; u1.31
+ dmulu.l r12,DBL1L
+ mov.l	LOCAL(x40000000),DBLRH ! bias + 1
+ and	r9,r2		! x ; u12.20
+ cmp/hi	DBL0L,DBL1L
+ sts	macl,r8
+ mov	#-24,r11
+ sts	mach,r9 	! r9:r8 := z0 * DBL1L; u-19.64
+ subc	DBL1H,DBLRH
+ mul.l	r12,r2  	! (r9+macl):r8 == z0*x; u-19.64
+ shll	r8
+ add	DBL0H,DBLRH	! result sign/exponent + 1
+ mov	r8,r10
+ sts	macl,DBLRL
+ add	DBLRL,r9
+ rotcl	r9		! r9:r8 := z*x; u-20.63
+ shld	r11,r10
+ mov.l	LOCAL(x7fe00000),DBLRL
+ sub	DBL0L,r9	! r9:r8 := -a ; u-20.63
+ cmp/pz	r9		! In corner cases this shift can loose ..
+ shll8	r9		!  .. the sign, so check it first.
+ mov.l	LOCAL(x00200000),r11
+ or	r10,r9	! -a1 ; s-28.32
+ mov.l	LOCAL(x00100000),r10
+ dmulu.l r9,yn	! sign for r9 is in T
+ xor	DBL0H,DBL1H	! calculate expected sign & bit20
+ mov.w	LOCAL(d120),DBL0H ! to test bits 6..4
+ xor	DBLRH,DBL1H
+ !
+ sts	mach,DBL0L	! -z1 ; s-27.32
+ bt 0f
+ sub	yn,DBL0L	! multiply adjust for -a1 negative; r3 dies here
+0:tst	r10,DBL1H		! set T if a >= x
+ mov.l LOCAL(xfff00000),r3
+ bt	0f
+ add	DBL0L,DBL0L	! z1 ; s-27.32 / s-28.32
+0:bt 0f
+ add	r12,r12	! z0 ; u1.31 / u0.31
+0:add	#6-64,DBL0L
+ and	r3,DBLRH	! isolate sign / exponent
+ tst	DBL0H,DBL0L
+ bf/s	LOCAL(exact)	! make the hot path taken for best branch prediction
+ cmp/pz	DBL1H
+
+! Unless we follow the next branch, we need to test which way the rounding
+! should go.
+! For normal numbers, we know that the result is not exact, so the sign
+! of the rest will be conclusive.
+! We generate a number that looks safely rounded so that denorm handling
+! can safely test the number twice.
+! r10:r8 == 0 will indicate if the number was exact, which can happen
+! when we come here for denormals to check a number that is close or
+! equal to a result in whole ulps.
+ bf	LOCAL(ret_denorm_inf)	! denorm or infinity, DBLRH has inverted sign
+ add	#64,DBL0L
+LOCAL(find_adjust): tst	r10,DBL1H ! set T if a >= x
+ mov	#-2,r10
+ addc	r10,r10
+ mov	DBL0L,DBLRL	! z1 ; s-27.32 / s-28.32 ; lower 4 bits unsafe.
+ shad	r10,DBLRL	! tentatively rounded z1 ; s-24.32
+ shll8	r8		! r9:r8 := -a1 ; s-28.64
+ clrt
+ dmuls.l DBLRL,DBL1L	! DBLRL signed, DBL1L unsigned
+ mov	r8,r10
+ shll16	r8		! r8  := lowpart  of -a1 ; s-44.48
+ xtrct	r9,r10		! r10 := highpart of -a1 ; s-44.48
+ !
+ sts	macl,r3
+ subc	r3,r8
+ sts	mach,r3
+ subc	r3,r10
+ cmp/pz	DBL1L
+ mul.l	DBLRL,r2
+ bt	0f
+ sub	DBLRL,r10	! adjust for signed/unsigned multiply
+0: mov.l	LOCAL(x7fe00000),DBLRL
+ mov	#-26,r2
+ sts	macl,r9
+ sub	r9,r10		! r10:r8 := -a2
+ add	#-64+16,DBL0L	! the denorm code negates this adj. for exact results
+ shld	r2,r10		! convert sign into adjustment in the range 32..63
+ sub	r10,DBL0L
+ cmp/pz	DBL1H
+
+ .balign 4
+LOCAL(exact):
+ bf	LOCAL(ret_denorm_inf)	! denorm or infinity, DBLRH has inverted sign
+ tst	DBLRL,DBLRH
+ bt	LOCAL(ret_denorm_inf)	! denorm, DBLRH has correct sign
+ mov	#-7,DBL1H
+ cmp/pz	DBL0L		! T is sign extension of z1
+ not	DBL0L,DBLRL
+ subc	r11,DBLRH	! calculate sign / exponent minus implicit 1 minus T
+ mov.l	@r15+,r11
+ mov.l	@r15+,r10
+ shad	DBL1H,DBLRL
+ mov.l	@r15+,r9
+ mov	#-11,DBL1H
+ mov	r12,r8		! z0 contributes to DBLRH and DBLRL
+ shld	DBL1H,r12
+ mov	#21,DBL1H
+ clrt
+ shld	DBL1H,r8
+ addc	r8,DBLRL
+ mov.l	@r15+,r8
+ addc	r12,DBLRH
+ rts
+ mov.l	@r15+,r12
+
+!	sign in DBLRH ^ DBL1H
+! If the last 7 bits are in the range 64..64+7, we might have an exact
+! value in the preceding bits - or we might not. For denorms, we need to
+! find out.
+! if r10:r8 is zero, we just have found out that there is an exact value.
+	.balign	4
+LOCAL(ret_denorm_inf):
+	mov	DBLRH,r3
+	add	r3,r3
+	div0s	DBL1H,r3
+	mov	#120,DBLRL
+	bt	LOCAL(ret_inf_late)
+	add	#64,DBL0L
+	tst	DBLRL,DBL0L
+	mov	#-21,DBLRL
+	bt	LOCAL(find_adjust)
+	or	r10,r8
+	tst	r8,r8		! check if find_adjust found an exact value.
+	shad	DBLRL,r3
+	bf	0f
+	add	#-16,DBL0L	! if yes, cancel adjustment
+0:	mov	#-8,DBLRL	! remove the three lowest (inexact) bits
+	and	DBLRL,DBL0L
+	add	#-2-11,r3	! shift count for denorm generation
+	mov	DBL0L,DBLRL
+	mov	#28,r2
+	mov.l	@r15+,r11
+	mov.l	@r15+,r10
+	shll2	DBLRL
+	mov.l	@r15+,r9
+	shld	r2,DBL0L
+	mov.l	@r15+,r8
+	mov	#-31,r2
+	cmp/ge	r2,r3
+	shll2	DBLRL
+	bt/s	0f
+	add	DBL0L,r12	! fraction in r12:DBLRL ; u1.63
+	negc	DBLRL,DBLRL	! T := DBLRL != 0
+	add	#31,r3
+	mov	r12,DBLRL
+	rotcl	DBLRL		! put in sticky bit
+	movt	r12
+	cmp/ge	r2,r3
+	bt/s	LOCAL(return_0_late)
+0:	div0s	DBL1H,DBLRH	! calculate sign
+	mov	r12,DBLRH
+	shld	r3,DBLRH
+	mov	DBLRL,r2
+	shld	r3,DBLRL
+	add	#32,r3
+	add	DBLRH,DBLRH
+	mov.l	LOCAL(x80000000),DBL1H
+	shld	r3,r12
+	rotcr	DBLRH		! combine sign with highpart
+	add	#-1,r3
+	shld	r3,r2
+	mov	#0,r3
+	rotl	r2
+	cmp/hi	DBL1H,r2
+	addc	r12,DBLRL
+	mov.l	@r15+,r12
+	rts
+	addc	r3,DBLRH
+
+LOCAL(ret_inf_late):
+	mov.l	@r15+,r11
+	mov.l	@r15+,r10
+	mov	DBLRH,DBL0H
+	mov.l	@r15+,r9
+	bra	LOCAL(return_inf)
+	mov.l	@r15+,r8
+
+LOCAL(return_0_late):
+	div0s	DBLRH,DBL1H
+	mov.l	@r15+,r12
+	mov	#0,DBLRH
+	rts
+	rotcr	DBLRH
+
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#21,r9
+	xtrct	r0,r8
+	add	#-16,r9
+0:	tst	r12,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#-8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(d1):	.word 1
+LOCAL(d12):	.word 12
+LOCAL(d13):	.word 13
+LOCAL(d120):	.word 120
+
+	.balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+        .byte   120, 105,  91,  78,  66,  54,  43,  33
+        .byte    24,  15,   8,   0,  -5, -12, -17, -22
+        .byte   -27, -31, -34, -37, -40, -42, -44, -45
+        .byte   -46, -46, -47, -46, -46, -45, -44, -42
+        .byte   -41, -39, -36, -34, -31, -28, -24, -20
+        .byte   -17, -12,  -8,  -4,   0,   5,  10,  16
+        .byte    21,  27,  33,  39,  45,  52,  58,  65
+        .byte    72,  79,  86,  93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divsf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/divsf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,365 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! divsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+! long 0th..3rd significant byte
+#ifdef __LITTLE_ENDIAN__
+#define L0SB	3
+#define L1SB	2
+#define L2SB	1
+#define L3SB	0
+#else
+#define L0SB	0
+#define L1SB	1
+#define L2SB	2
+#define L3SB	3
+#endif
+
+! clobbered: r0,r1,r2,r3,r6,r7,T (and for sh.md's purposes PR)
+!
+! Note: When the divisor is larger than the divident, we have to adjust the
+! exponent down by one.  We do this automatically when subtracting the entire
+! exponent/fraction bitstring as an integer, by means of the borrow from
+! bit 23 to bit 24.
+! Note: non-denormal rounding of a division result cannot cause fraction
+! overflow / exponent change. (r4 > r5 : fraction must stay in (2..1] interval;
+! r4 < r5: having an extra bit of precision available, even the smallest
+! possible difference of the result from one is rounded in all rounding modes
+! to a fraction smaller than one.)
+! sh4-200: 59 cycles
+! sh4-300: 44 cycles
+! tab indent: exponent / sign computations
+! tab+space indent: fraction computation
+FUNC(GLOBAL(divsf3))
+	.global GLOBAL(divsf3)
+	.balign	4
+GLOBAL(divsf3):
+	mov.l	LOCAL(x7f800000),r3
+	mov	#1,r2
+	mov	r4,r6
+	 shll8	 r6
+	mov	r5,r7
+	 shll8	 r7
+	rotr	r2
+	tst	r3,r4
+	or	r2,r6
+	bt/s	LOCAL(denorm_arg0)
+	or	r2,r7
+	tst	r3,r5
+	bt	LOCAL(denorm_arg1)
+	 shlr	 r6
+	mov.l	LOCAL(x3f000000),r3	! bias minus explict leading 1
+	 div0u
+LOCAL(denorm_done):
+	 div1	 r7,r6
+	mov.l	r8,@-r15
+	 bt	 0f
+	 div1	r7,r6
+0:	mov.l	r9,@-r15
+	 div1	 r7,r6
+	add	r4,r3
+	 div1	 r7,r6
+	sub	r5,r3	! result sign/exponent minus 1 if no overflow/underflow
+	 div1	 r7,r6
+	or	r3,r2
+	 div1	 r7,r6
+	mov.w	LOCAL(xff00),r9
+	 div1	 r7,r6
+	mov.l	r2,@-r15 ! L0SB is 0xff iff denorm / infinity exp is computed
+	 div1	 r7,r6
+	mov.w	LOCAL(m23),r2
+	 div1	 r7,r6
+	mov	r4,r0
+	 div1	 r7,r6
+	 extu.b	 r6,r1
+	 and	 r9,r6
+	 swap.w	 r1,r1	! first 8 bits of result fraction in bit 23..16
+	 div1	 r7,r6
+	shld	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L3SB,r15)	! 0xff iff divident was infinity / nan
+	 div1	 r7,r6
+	mov	r5,r0
+	 div1	 r7,r6
+	shld	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L2SB,r15)	! 0xff iff divisor was infinity / nan
+	 div1	 r7,r6
+	mov	r4,r0
+	 div1	 r7,r6
+	mov.w	LOCAL(m31),r2
+	 div1	 r7,r6
+	 extu.b	 r6,r8	! second 8 bits of result fraction in bit 7..0
+	 and	 r9,r6
+	mov.l	LOCAL(xff800000),r9
+	 div1	 r7,r6
+	xor	r5,r0	! msb := correct result sign
+	 div1	 r7,r6
+	xor	r3,r0	! xor with sign of result sign/exponent word
+	 div1	 r7,r6
+	shad	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L1SB,r15)	! 0xff	iff exponent over/underflows
+	and	r9,r3	! isolate sign / exponent
+	 mov.w	 LOCAL(xff01),r2
+	 div1	 r7,r6
+	swap.b	r8,r0	! second 8 bits of result fraction in bit 15..8
+	 div1	 r7,r6
+	or	r1,r0	! first 16 bits of result fraction in bit 23..8
+	 div1	 r7,r6
+	mov.w	LOCAL(m1),r9
+	 div1	 r7,r6
+	mov.l	@r15+,r8 ! load encoding of unusal exponent conditions
+	 and	 r6,r2	! rest | result lsb
+	 mov	 #0,r1
+	 bf	 0f	! bit below lsb clear -> no rounding
+	 cmp/hi	r1,r2
+0:	 extu.b	 r6,r1
+	 or	 r1,r0	! 24 bit result fraction with explicit leading 1
+	addc	r3,r0	! add in exponent / sign
+	cmp/str	r9,r8
+	! (no stall *here* for SH4-100 / SH4-200)
+	bt/s	LOCAL(inf_nan_denorm_zero)
+	mov.l	@r15+,r9
+	rts
+	mov.l	@r15+,r8
+
+/* The exponennt adjustment for denormal numbers is done by leaving an
+   adjusted value in r3; r4/r5 are not changed.  */
+	.balign	4
+LOCAL(denorm_arg0):
+	mov.w	LOCAL(xff00),r1
+	sub	r2,r6	! 0x800000000 : remove implict 1
+	tst	r6,r6
+	sts.l	pr,@-r15
+	bt	LOCAL(div_zero)
+	bsr	LOCAL(clz)
+	mov	r6,r0
+	shld	r0,r6
+	tst	r3,r5
+	mov.l	LOCAL(x3f800000),r3	! bias - 1 + 1
+	mov	#23,r1
+	shld	r1,r0
+	bt/s	LOCAL(denorm_arg1_2)
+	sub	r0,r3
+	 shlr	 r6
+	bra	LOCAL(denorm_done)
+	 div0u
+
+LOCAL(denorm_arg1):
+	mov.l	LOCAL(x3f000000),r3	! bias - 1
+LOCAL(denorm_arg1_2):
+	sub	r2,r7	! 0x800000000 : remove implict 1
+	mov.w	LOCAL(xff00),r1
+	tst	r7,r7
+	sts.l	pr,@-r15
+	bt	LOCAL(div_by_zero)
+	bsr	LOCAL(clz)
+	mov	r7,r0
+	shld	r0,r7
+	add	#-1,r0
+	mov	#23,r1
+	shld	r1,r0
+	add	r0,r3
+	 shlr	 r6
+	bra	LOCAL(denorm_done)
+	 div0u
+
+	.balign	4
+LOCAL(inf_nan_denorm_zero):
+! r0 has the rounded result, r6 has the non-rounded lowest bits & rest.
+! the bit just below the LSB of r6 is available as ~Q
+
+! Alternative way to get at ~Q:
+! if rounding took place, ~Q must be set.
+! if the rest appears to be zero, ~Q must be set.
+! if the rest appears to be nonzero, but rounding didn't take place,
+! ~Q must be clear;  the apparent rest will then require adjusting to test if 
+! the actual rest is nonzero.
+	mov	r0,r2
+	not	r8,r0
+	tst	#0xff,r0
+	shlr8	r0
+	mov.l	@r15+,r8
+	bt/s	LOCAL(div_inf_or_nan)
+	tst	#0xff,r0
+	mov	r4,r0
+	bt	LOCAL(div_by_inf_or_nan)
+	add	r0,r0
+	mov	r5,r1
+	add	r1,r1
+	cmp/hi	r1,r0
+	mov	r6,r0
+	bt	LOCAL(overflow)
+	sub	r2,r0
+	exts.b	r0,r0	! -1 if rounding took place
+	shlr8	r6	! isolate div1-mangled rest
+	addc	r2,r0	! generate carry if rounding took place
+	shlr8	r7
+	sub	r3,r0	! pre-rounding fraction
+	bt	0f ! going directly to denorm_sticky would cause mispredicts
+	tst	r6,r6	! rest can only be zero if lost bit was set
+0:	add	r7,r6	! (T ? corrupt : reconstruct) actual rest
+	bt	0f
+	cmp/pl	r6
+0:	mov.w	LOCAL(m24),r1
+	addc	r0,r0	! put in sticky bit
+	add	#-1,r3
+	mov.l	LOCAL(x40000000),r6
+	add	r3,r3
+	mov	r0,r2
+	shad	r1,r3	! exponent ; s32.0
+	!
+	shld	r3,r0
+	add	#30,r3
+	cmp/pl	r3
+	shld	r3,r2
+	bf	LOCAL(zero_nan)	! return zero
+	rotl	r2
+	cmp/hi	r6,r2
+	mov	#0,r7
+	addc	r7,r0
+	div0s	r4,r5
+	rts
+	rotcr	r0
+	
+! ????
+! undo normal rounding (lowest bits still in r6). then do denormal rounding.
+	
+LOCAL(overflow):
+	mov.l	LOCAL(xff000000),r0
+	div0s	r4,r5
+	rts
+	rotcl	r0
+	
+LOCAL(div_inf_or_nan):
+	mov	r4,r0
+	bra	LOCAL(nan_if_t)
+	add	r0,r0
+	
+LOCAL(div_by_inf_or_nan):
+	mov.l	LOCAL(xff000000),r1
+	mov	#0,r0
+	mov	r5,r2
+	add	r2,r2
+	bra	LOCAL(nan_if_t)
+	cmp/hi	r1,r2
+
+
+
+! still need to check for divide by zero or divide by nan
+! r3: 0x7f800000
+	.balign	4
+LOCAL(div_zero):
+	mov	r5,r1
+	add	r1,r1
+	tst	r1,r1	! 0 / 0 -> nan
+	not	r5,r1
+	bt	LOCAL(nan)
+	add	r3,r3
+	cmp/hi	r3,r1	! 0 / nan -> nan (but 0 / inf -> 0)
+LOCAL(zero_nan):
+	mov	#0,r0
+LOCAL(nan_if_t):
+	bf	0f:
+LOCAL(nan):
+	mov	#-1,r0
+0:	div0s	r4,r5	! compute sign
+	rts
+	rotcr	r0	! insert sign
+
+LOCAL(div_by_zero):
+	mov.l	LOCAL(xff000000),r0
+	mov	r5,r2
+	add	r2,r2
+	bra	LOCAL(nan_if_t)
+	cmp/hi	r0,r2
+	
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#8-8,r9
+	xtrct	r0,r8
+	add	#16,r9
+0:	tst	r1,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(m23):	.word -23
+LOCAL(m24):	.word -24
+LOCAL(m31):	.word -31
+LOCAL(xff01):	.word 0xff01
+	.balign	4
+LOCAL(xff000000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(xff00):	.word 0xff00
+LOCAL(m1):	.word -1
+#else
+LOCAL(m1):	.word -1
+LOCAL(xff00):	.word 0xff00
+#endif
+LOCAL(x7f800000): .long 0x7f800000
+LOCAL(x3f000000): .long 0x3f000000
+LOCAL(x3f800000): .long 0x3f800000
+LOCAL(xff800000): .long 0xff800000
+LOCAL(x40000000): .long 0x40000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(divsf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixdfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixdfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixdfsi.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixdfsi.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,115 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!! fixdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixdfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get UINT_MAX, for set sign bit, you get 0.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixdfsi)
+	FUNC(GLOBAL(fixdfsi))
+	.balign	4
+GLOBAL(fixdfsi):
+	mov.w	LOCAL(x413),r1
+	mov	DBL0H,r0
+	shll	DBL0H
+	mov.l	LOCAL(mask),r3
+	mov	#-21,r2
+	shld	r2,DBL0H	! SH4-200 will start this insn in a new cycle
+	bt/s	LOCAL(neg)
+	sub	r1,DBL0H
+	cmp/pl	DBL0H		! SH4-200 will start this insn in a new cycle
+	and	r3,r0
+	bf/s	LOCAL(ignore_low)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#10,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmax)
+	shld	DBL0H,DBL0L
+	rts
+	or	DBL0L,r0
+
+	.balign	8
+LOCAL(ignore_low):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	bf	0f		! SH4-200 will start this insn in a new cycle
+	mov	#-31,DBL0H	! results in 0 return
+0:	add	#1,r0
+	rts
+	shld	DBL0H,r0
+
+	.balign 4
+LOCAL(neg):
+	cmp/pl	DBL0H
+	and	r3,r0
+	bf/s	LOCAL(ignore_low_neg)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#10,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmin)
+	shld	DBL0H,DBL0L
+	or	DBL0L,r0	! SH4-200 will start this insn in a new cycle
+	rts
+	neg	r0,r0
+
+	.balign 4
+LOCAL(ignore_low_neg):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	add	#1,r0
+	shld	DBL0H,r0
+	bf	0f
+	mov	#0,r0		! results in 0 return
+0:	rts
+	neg	r0,r0
+
+LOCAL(retmax):
+	mov	#-1,r0
+	rts
+	shlr	r0
+
+LOCAL(retmin):
+	mov	#1,r0
+	rts
+	rotr	r0
+
+LOCAL(x413): .word 0x413
+
+	.balign 4
+LOCAL(mask): .long 0x000fffff
+	ENDFUNC(GLOBAL(fixdfsi))
+#endif /* L_fixdfsi */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixunsdfsi.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixunsdfsi.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixunsdfsi.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/fixunsdfsi.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,82 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!! fixunsdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixunsdfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get INT_MAX, for set sign bit, you get INT_MIN.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixunsdfsi)
+	FUNC(GLOBAL(fixunsdfsi))
+	.balign	4
+GLOBAL(fixunsdfsi):
+	mov.w	LOCAL(x413),r1	! bias + 20
+	mov	DBL0H,r0
+	shll	DBL0H
+	mov.l	LOCAL(mask),r3
+	mov	#-21,r2
+	shld	r2,DBL0H	! SH4-200 will start this insn in a new cycle
+	bt/s	LOCAL(ret0)
+	sub	r1,DBL0H
+	cmp/pl	DBL0H		! SH4-200 will start this insn in a new cycle
+	and	r3,r0
+	bf/s	LOCAL(ignore_low)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#11,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmax)
+	shld	DBL0H,DBL0L
+	rts
+	or	DBL0L,r0
+
+	.balign	8
+LOCAL(ignore_low):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	add	#1,r0
+	bf	0f
+LOCAL(ret0): mov #0,r0		! results in 0 return
+0:	rts
+	shld	DBL0H,r0
+
+LOCAL(retmax):
+	rts
+	mov	#-1,r0
+
+LOCAL(x413): .word 0x413
+
+	.balign 4
+LOCAL(mask): .long 0x000fffff
+	ENDFUNC(GLOBAL(fixunsdfsi))
+#endif /* L_fixunsdfsi */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsidf.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsidf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsidf.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsidf.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,103 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! floatsidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsidf))
+	.global GLOBAL(floatsidf)
+	.balign	4
+GLOBAL(floatsidf):
+	tst	r4,r4
+	mov	r4,r1
+	bt	LOCAL(ret0)
+	cmp/pz	r4
+	bt	0f
+	neg	r4,r1
+0:	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r1,r5
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r1,r5
+	mov	#21,r2
+	bt	0f
+	mov	r1,r5
+	shlr16	r5
+	add	#-16,r2
+0:	tst	r3,r5	! 0xff00
+	bt	0f
+	shlr8	r5
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r5
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r5),r5
+	cmp/pz	r4
+	mov.l	LOCAL(x41200000),r3	! bias + 20 - implicit 1
+	bt	0f
+	mov.l	LOCAL(xc1200000),r3	! sign + bias + 20 - implicit 1
+0:	mov	r1,r0	! DBLRL & DBLRH
+	sub	r5,r2
+	mov	r2,r5
+	shld	r2,DBLRH
+	cmp/pz	r2
+	add	r3,DBLRH
+	add	#32,r2
+	shld	r2,DBLRL
+	bf	0f
+	mov.w	LOCAL(d0),DBLRL
+0:	mov	#20,r2
+	shld	r2,r5
+	rts
+	sub	r5,DBLRH
+LOCAL(ret0):
+	mov	#0,DBLRL
+	rts
+	mov	#0,DBLRH
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0):	  .word 0
+		  .word 0x4120
+#else
+		  .word 0x4120
+LOCAL(d0):	  .word 0
+#endif
+LOCAL(xc1200000): .long 0xc1200000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsidf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsisf.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsisf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsisf.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatsisf.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,106 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsisf))
+	.global GLOBAL(floatsisf)
+	.balign	4
+GLOBAL(floatsisf):
+	cmp/pz	r4
+	mov	r4,r5
+	bt	0f
+	neg	r4,r5
+0:	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r5,r1
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r5,r1
+	mov	#24,r2
+	bt	0f
+	mov	r5,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r1
+	cmp/pz	r4
+	mov.l	LOCAL(x4a800000),r3	! bias + 23 - implicit 1
+	bt	0f
+	mov.l	LOCAL(xca800000),r3	! sign + bias + 23 - implicit 1
+0:	mov	r5,r0
+	sub	r1,r2
+	mov.l	LOCAL(x80000000),r1
+	shld	r2,r0
+	cmp/pz	r2
+	add	r3,r0
+	bt	LOCAL(noround)
+	add	#31,r2
+	shld	r2,r5
+	add	#-31,r2
+	rotl	r5
+	cmp/hi	r1,r5
+	mov	#0,r3
+	addc	r3,r0
+	mov	#23,r1
+	shld	r1,r2
+	rts
+	sub	r2,r0
+	.balign	8
+LOCAL(noround):
+	mov	#23,r1
+	tst	r4,r4
+	shld	r1,r2
+	bt	LOCAL(ret0)
+	rts
+	sub	r2,r0
+LOCAL(ret0):
+	rts
+	mov	#0,r0
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(xca800000): .long 0xca800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsisf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssidf.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssidf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssidf.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssidf.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,96 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! floatunssidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsidf))
+	.global GLOBAL(floatunsidf)
+	.balign	4
+GLOBAL(floatunsidf):
+	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r4,r1
+	mov.w	LOCAL(0xff00),r3
+	cmp/eq	r4,r1
+	mov	#21,r2
+	bt	0f
+	mov	r4,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r5
+	mov	r4,DBLRL
+	mov.l	LOCAL(x41200000),r3	! bias + 20 - implicit 1
+	tst	r4,r4
+	mov	r4,DBLRH
+	bt	LOCAL(ret0)
+	sub	r5,r2
+	mov	r2,r5
+	shld	r2,DBLRH
+	cmp/pz	r2
+	add	r3,DBLRH
+	add	#32,r2
+	shld	r2,DBLRL
+	bf	0f
+	mov.w	LOCAL(d0),DBLRL
+0:	mov	#20,r2
+	shld	r2,r5
+	rts
+	sub	r5,DBLRH
+LOCAL(ret0):
+	mov	r4,DBLRL
+	rts
+	mov	r4,DBLRH
+
+LOCAL(0xff00):	.word  0xff00
+	.balign	4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0):	  .word 0
+		  .word 0x4120
+#else
+		  .word 0x4120
+LOCAL(d0):	  .word 0
+#endif
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsidf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssisf.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssisf.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssisf.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/floatunssisf.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,94 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsisf))
+	.global GLOBAL(floatunsisf)
+	.balign	4
+GLOBAL(floatunsisf):
+	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r4,r1
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r4,r1
+	mov	#24,r2
+	bt	0f
+	mov	r4,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r1
+	mov	r4,r0
+	mov.l	LOCAL(x4a800000),r3	! bias + 23 - implicit 1
+	tst	r4,r4
+	bt	LOCAL(ret0)
+	!
+	sub	r1,r2
+	mov.l	LOCAL(x80000000),r1
+	shld	r2,r0
+	cmp/pz	r2
+	add	r3,r0
+	bt	LOCAL(noround)
+	add	#31,r2
+	shld	r2,r4
+	rotl	r4
+	add	#-31,r2
+	cmp/hi	r1,r4
+	mov	#0,r3
+	addc	r3,r0
+LOCAL(noround):
+	mov	#23,r1
+	shld	r1,r2
+	rts
+	sub	r2,r0
+LOCAL(ret0):
+	rts
+	nop
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsisf))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/muldf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/muldf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/muldf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/muldf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,486 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! muldf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+! Normal numbers are multiplied in 53 or 54 cycles on SH4-200.
+
+FUNC(GLOBAL(muldf3))
+	.global GLOBAL(muldf3)
+LOCAL(inf_nan_denorm_or_zero_a):
+	mov.l	r8,@-r15
+	sub	r3,DBL0H	! isolate high fraction
+	mov.l	@(4,r15),r8	! original DBL0H (with sign & exp)
+	sub	r3,r1		! 0x7ff00000
+	mov.l	LOCAL(x60000000),r3
+	shll16	r2		! 0xffff0000
+	!			  no stall here for sh4-200
+	!
+	tst	r1,r8
+	mov.l	r0,@-r15
+	bf	LOCAL(inf_nan_a)
+	tst	r1,r0		! test for DBL1 inf, nan or small
+	bt	LOCAL(ret_inf_nan_zero)
+LOCAL(normalize_arg):
+	tst	DBL0H,DBL0H
+	bf	LOCAL(normalize_arg53)
+	tst	DBL0L,DBL0L
+	bt	LOCAL(a_zero)
+	tst	r2,DBL0L
+	mov	DBL0L,DBL0H
+	bt	LOCAL(normalize_arg16)
+	shlr16	DBL0H
+	mov.w	LOCAL(m15),r2	! 1-16
+	bra	LOCAL(normalize_arg48)
+	shll16	DBL0L
+
+LOCAL(normalize_arg53):
+	tst	r2,DBL0H
+	mov	#1,r2
+	bt	LOCAL(normalize_arg48)
+	mov	DBL0H,r1
+	shlr16	r1
+	bra	LOCAL(normalize_DBL0H)
+	mov	#21-16,r3
+
+LOCAL(normalize_arg16):
+	mov.w	LOCAL(m31),r2 ! 1-32
+	mov	#0,DBL0L
+LOCAL(normalize_arg48):
+	mov	DBL0H,r1
+	mov	#21,r3
+LOCAL(normalize_DBL0H):
+	extu.b	r1,r8
+	mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r8,r1
+	!
+	bt	0f
+	shlr8	r1
+0:
+#ifdef	__pic__
+	add	r0,r1
+
+	mova	LOCAL(c__clz_tab),r0
+
+#endif /* __pic__ */
+	mov.b	@(r0,r1),r8
+	mov	DBL0L,r1
+	mov.l	@r15+,r0
+	bt	0f
+	add	#-8,r3
+0:	clrt
+	sub	r8,r3
+	mov.w	LOCAL(d20),r8
+	shld	r3,DBL0H
+	shld	r3,DBL0L
+	sub	r3,r2
+	add	#-32,r3
+	shld	r3,r1
+	mov.l	LOCAL(x00100000),r3
+	or	r1,DBL0H
+	shld	r8,r2
+	mov.l	@r15+,r8
+	add	r2,DBL1H
+	mov.l	LOCAL(x001fffff),r2
+	dmulu.l	DBL0L,DBL1L
+	bra	LOCAL(arg_denorm_done)
+	or	r3,r0		! set implicit 1 bit
+
+LOCAL(a_zero):
+	mov.l	@(4,r15),r8
+	add	#8,r15
+LOCAL(zero):
+	mov	#0,DBLRH
+	bra	LOCAL(pop_ret)
+	mov	#0,DBLRL
+
+! both inf / nan -> result is nan if at least one is none, else inf.
+! BBL0 inf/nan, DBL1 zero   -> result is nan
+! DBL0 inf/nan, DBL1 finite -> result is DBL0 with sign adjustemnt
+LOCAL(inf_nan_a):
+	mov	r8,DBLRH
+	mov.l	@(4,r15),r8
+	add	#8,r15
+	tst	r1,r0	! arg1 inf/nan ?
+	mov	DBL0L,DBLRL
+	bt	LOCAL(both_inf_nan)
+	tst	DBL1L,DBL1L
+	mov	DBL1H,r1
+	bf	LOCAL(pop_ret)
+	add	r1,r1
+	tst	r1,r1
+	!
+	bf	LOCAL(pop_ret)
+LOCAL(nan):
+	mov	#-1,DBLRL
+	bra	LOCAL(pop_ret)
+	mov	#-1,DBLRH
+
+LOCAL(both_inf_nan):
+	or	DBL1L,DBLRL
+	bra	LOCAL(pop_ret)
+	or	DBL1H,DBLRH
+
+LOCAL(ret_inf_nan_zero):
+	tst	r1,r0
+	mov.l	@(4,r15),r8
+	or	DBL0L,DBL0H
+	bf/s	LOCAL(zero)
+	add	#8,r15
+	tst	DBL0H,DBL0H
+	bt	LOCAL(nan)
+LOCAL(inf_nan_b):
+	mov	DBL1L,DBLRL
+	mov	DBL1H,DBLRH
+LOCAL(pop_ret):
+	mov.l	@r15+,DBL0H
+	add	DBLRH,DBLRH
+
+
+	div0s	DBL0H,DBL1H
+
+	rts
+	rotcr	DBLRH
+
+	.balign	4
+/* Argument a has already been tested for being zero or denorm.
+   On the other side, we have to swap a and b so that we can share the
+   normalization code.
+   a: sign/exponent : @r15 fraction: DBL0H:DBL0L
+   b: sign/exponent: DBL1H fraction:    r0:DBL1L  */
+LOCAL(inf_nan_denorm_or_zero_b):
+	sub	r3,r1		! 0x7ff00000
+	mov.l	@r15,r2		! get original DBL0H
+	tst	r1,DBL1H
+	sub	r3,r0		! isolate high fraction
+	bf	LOCAL(inf_nan_b)
+	mov.l	DBL1H,@r15
+	mov	r0,DBL0H
+	mov.l	r8,@-r15
+	mov	r2,DBL1H
+	mov.l	LOCAL(0xffff0000),r2
+	mov.l	r1,@-r15
+	mov	DBL1L,r1
+	mov	DBL0L,DBL1L
+	bra	LOCAL(normalize_arg)
+	mov	r1,DBL0L
+
+LOCAL(d20):
+	.word	20
+LOCAL(m15):
+	.word	-15
+LOCAL(m31):
+	.word	-31
+LOCAL(xff):
+	.word	0xff
+
+	.balign	4
+LOCAL(0xffff0000): .word 0xffff0000
+
+	! calculate a (DBL0H:DBL0L) * b (DBL1H:DBL1L)
+	.balign	4
+GLOBAL(muldf3):
+	mov.l	LOCAL(xfff00000),r3
+	mov	DBL1H,r0
+	dmulu.l	DBL0L,DBL1L
+	mov.l	LOCAL(x7fe00000),r1
+	sub	r3,r0
+	mov.l	DBL0H,@-r15
+	sub	r3,DBL0H
+	tst	r1,DBL0H
+	or	r3,DBL0H
+	mov.l	LOCAL(x001fffff),r2
+	bt	LOCAL(inf_nan_denorm_or_zero_a)
+	tst	r1,r0
+	or	r3,r0		! r0:DBL1L    := b fraction ; u12.52
+	bt	LOCAL(inf_nan_denorm_or_zero_b) ! T clear on fall-through
+LOCAL(arg_denorm_done):
+	and	r2,r0		! r0:DBL1L    := b fraction ; u12.52
+	sts	macl,r3
+	sts	mach,r1
+	dmulu.l	DBL0L,r0
+	and	r2,DBL0H	! DBL0H:DBL0L := a fraction ; u12.52
+	mov.l	r8,@-r15
+	mov	#0,DBL0L
+	mov.l	r9,@-r15
+	sts	macl,r2
+	sts	mach,r8
+	dmulu.l	DBL0H,DBL1L
+	addc	r1,r2
+
+	addc	DBL0L,r8	! add T; clears T
+
+	sts	macl,r1
+	sts	mach,DBL1L
+	dmulu.l	DBL0H,r0
+	addc	r1,r2
+	mov.l	LOCAL(x7ff00000),DBL0H
+	addc	DBL1L,r8	! clears T
+	mov.l	@(8,r15),DBL1L	! a sign/exp w/fraction
+	sts	macl,DBLRL
+	sts	mach,DBLRH
+	and	DBL0H,DBL1L	! a exponent
+	mov.w	LOCAL(x200),r9
+	addc	r8,DBLRL
+	mov.l	LOCAL(x3ff00000),r8	! bias
+	addc	DBL0L,DBLRH	! add T
+	cmp/hi	DBL0L,r3	! 32 guard bits -> sticky: T := r3 != 0
+	movt	r3
+	tst	r9,DBLRH	! T := fraction < 2
+	or	r3,r2		! DBLRH:DBLRL:r2 := result fraction; u24.72
+	bt/s	LOCAL(shll12)
+	sub	r8,DBL1L
+	mov.l	LOCAL(x002fffff),r8
+	and	DBL1H,DBL0H	! b exponent
+	mov.l	LOCAL(x00100000),r9
+	add	DBL0H,DBL1L ! result exponent - 1
+	tst	r8,r2
+	mov.w	LOCAL(m20),r8
+	subc	DBL0L,r9
+	addc	r2,r9 ! r2 value is still needed for denormal rounding
+	mov.w	LOCAL(d11),DBL0L
+	rotcr	r9
+	clrt
+	shld	r8,r9
+	mov.w	LOCAL(m21),r8
+	mov	DBLRL,r3
+	shld	DBL0L,DBLRL
+	addc	r9,DBLRL
+	mov.l	@r15+,r9
+	shld	r8,r3
+	mov.l	@r15+,r8
+	shld	DBL0L,DBLRH
+	mov.l	@r15+,DBL0H
+	addc	r3,DBLRH
+	mov.l	LOCAL(x7ff00000),DBL0L
+	add	DBL1L,DBLRH	! implicit 1 adjusts exponent
+	mov.l	LOCAL(xffe00000),r3
+	cmp/hs	DBL0L,DBLRH
+	add	DBLRH,DBLRH
+	bt	LOCAL(ill_exp_11)
+	tst	r3,DBLRH
+	bt	LOCAL(denorm_exp0_11)
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+
+
+LOCAL(shll12):
+	mov.l	LOCAL(x0017ffff),r8
+	extu.b	DBLRH,DBLRH	! remove implicit 1.
+	mov.l	LOCAL(x00080000),r9
+	and	DBL1H,DBL0H	! b exponent
+	add	DBL0H,DBL1L	! result exponent
+	tst	r8,r2		! rounding adjust for lower guard ...
+	mov.w	LOCAL(m19),r8
+	subc	DBL0L,r9	! ... bits and round to even; clear T
+	addc	r2,r9 ! r2 value is still needed for denormal rounding
+	mov.w	LOCAL(d12),DBL0L
+	rotcr	r9
+	clrt
+	shld	r8,r9
+	mov.w	LOCAL(m20),r8
+	mov	DBLRL,r3
+	shld	DBL0L,DBLRL
+	addc	r9,DBLRL
+	mov.l	@r15+,r9
+	shld	r8,r3
+	mov.l	@r15+,r8
+	shld	DBL0L,DBLRH
+	mov.l	LOCAL(x7ff00000),DBL0L
+	addc	r3,DBLRH
+	mov.l	@r15+,DBL0H
+	add	DBL1L,DBLRH
+	mov.l	LOCAL(xffe00000),r3
+	cmp/hs	DBL0L,DBLRH
+	add	DBLRH,DBLRH
+	bt	LOCAL(ill_exp_12)
+	tst	r3,DBLRH
+	bt	LOCAL(denorm_exp0_12)
+LOCAL(insert_sign):
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+
+LOCAL(overflow):
+	mov	r3,DBLRH
+	mov	#0,DBLRL
+	bra	LOCAL(insert_sign)
+	mov.l	@r15+,r8
+
+LOCAL(denorm_exp0_11):
+	mov.l	r8,@-r15
+	mov	#-21,r8
+	mov.l	r9,@-r15
+	bra	LOCAL(denorm)
+	mov	#-2,DBL1L	! one for denormal, and one for sticky bit
+
+LOCAL(ill_exp_11):
+	mov	DBL1H,DBL1L
+	and	r3,DBL0L	! 0x7fe00000
+	add	DBL1L,DBL1L
+	mov.l	r8,@-r15
+	cmp/hi	DBL1L,DBL0L	! check if exp a was large
+	mov	#-20,DBL0L
+	bf	LOCAL(overflow)
+	mov	#-21,r8
+	mov	DBLRH,DBL1L
+	rotcr	DBL1L		! shift in negative sign
+	mov.l	r9,@-r15
+	shad	DBL0L,DBL1L	! exponent ; s32
+	bra	LOCAL(denorm)
+	add	#-2,DBL1L	! add one for denormal, and one for sticky bit
+
+LOCAL(denorm_exp0_12):
+	mov.l	r8,@-r15
+	mov	#-20,r8
+	mov.l	r9,@-r15
+	bra	LOCAL(denorm)
+	mov	#-2,DBL1L	! one for denormal, and one for sticky bit
+
+	.balign 4		! also aligns LOCAL(denorm)
+LOCAL(ill_exp_12):
+	and	r3,DBL0L	! 0x7fe00000
+	mov	DBL1H,DBL1L
+	add	DBL1L,DBL1L
+	mov.l	r8,@-r15
+	cmp/hi	DBL1L,DBL0L	! check if exp a was large
+	bf	LOCAL(overflow)
+	mov	DBLRH,DBL1L
+	rotcr	DBL1L		! shift in negative sign
+	mov	#-20,r8
+	shad	r8,DBL1L	! exponent ; s32
+	mov.l	r9,@-r15
+	add	#-2,DBL1L	! add one for denormal, and one for sticky bit
+LOCAL(denorm):
+	not	r3,r9		! 0x001fffff
+	mov.l	r10,@-r15
+	mov	r2,r10
+	shld	r8,r10	! 11 or 12 lower bit valid
+	and	r9,DBLRH ! Mask away vestiges of exponent.
+	add	#32,r8
+	sub	r3,DBLRH ! Make leading 1 explicit.
+	shld	r8,r2	! r10:r2 := unrounded result lowpart
+	shlr	DBLRH	! compensate for doubling at end of normal code
+	sub	DBLRL,r10	! reconstruct effect of previous rounding
+	exts.b	r10,r9
+	shad	r3,r10	! sign extension
+	mov	#0,r3
+	clrt
+	addc	r9,DBLRL	! Undo previous rounding.
+	mov.w	LOCAL(m32),r9
+	addc	r10,DBLRH
+	cmp/hi	r3,r2
+	rotcl	DBLRL	! fit in the rest of r2 as a sticky bit.
+	mov.l	@r15+,r10
+	rotcl	DBLRH
+	cmp/ge	r9,DBL1L
+	bt	LOCAL(small_norm_shift)
+	cmp/hi	r3,DBLRL
+	add	#32,DBL1L
+	movt	DBLRL
+	cmp/gt	r9,DBL1L
+	or	DBLRH,DBLRL
+	bt/s	LOCAL(small_norm_shift)
+	mov	r3,DBLRH
+	mov	r3,DBLRL	! exponent too negative to shift - return zero
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	.balign	4
+LOCAL(small_norm_shift):
+	mov	DBLRL,r2	! stash away guard bits
+	shld	DBL1L,DBLRL
+	mov	DBLRH,DBL0L
+	shld	DBL1L,DBLRH
+	mov.l	LOCAL(x7fffffff),r9
+	add	#32,DBL1L
+	shld	DBL1L,r2
+	shld	DBL1L,DBL0L
+	or	DBL0L,DBLRL
+	shlr	DBL0L
+	addc	r2,r9
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+	addc	r3,DBLRL
+	addc	r3,DBLRH
+	div0s	DBL0H,DBL1H
+	add	DBLRH,DBLRH
+	rts
+	rotcr	DBLRH
+
+
+LOCAL(x200):
+	.word 0x200
+LOCAL(m19):
+	.word	-19
+LOCAL(m20):
+	.word	-20
+LOCAL(m21):
+	.word	-21
+LOCAL(m32):
+	.word	-32
+LOCAL(d11):
+	.word	11
+LOCAL(d12):
+	.word	12
+	.balign	4
+LOCAL(x60000000):
+	.long	0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+LOCAL(xfff00000):
+	.long	0xfff00000
+LOCAL(x7fffffff):
+	.long	0x7fffffff
+LOCAL(x00100000):
+	.long	0x00100000
+LOCAL(x7fe00000):
+	.long	0x7fe00000
+LOCAL(x001fffff):
+	.long	0x001fffff
+LOCAL(x7ff00000):
+	.long	0x7ff00000
+LOCAL(x3ff00000):
+	.long	0x3ff00000
+LOCAL(x002fffff):
+	.long	0x002fffff
+LOCAL(xffe00000):
+	.long	0xffe00000
+LOCAL(x0017ffff):
+	.long	0x0017ffff
+LOCAL(x00080000):
+	.long	0x00080000
+ENDFUNC(GLOBAL(muldf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/m3/mulsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/m3/mulsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/m3/mulsf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/m3/mulsf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,246 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! mulsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+	.balign 4
+	.global GLOBAL(mulsf3)
+	FUNC(GLOBAL(mulsf3))
+GLOBAL(mulsf3):
+	mov.l	LOCAL(x7f800000),r1
+	not	r4,r2
+	mov	r4,r3
+	not	r5,r0
+	tst	r1,r2
+	or	r1,r3
+	bt/s	LOCAL(inf_nan_arg0)
+	 tst	r1,r0
+	bt	LOCAL(inf_nan_arg1)
+	tst	r1,r5
+	mov	r1,r2
+	shll8	r3
+	or	r5,r1
+	bt/s	LOCAL(zero_denorm_arg1)
+	 shll8	r1
+	tst	r2,r4
+	bt	LOCAL(zero_denorm_arg0)
+	dmulu.l	r3,r1
+	mov	r4,r0
+	and	r2,r0
+LOCAL(arg_norm):
+	and	r5,r2
+	mov.l	LOCAL(x3f800000),r3
+	sts	mach,r1
+	sub	r3,r0
+	sts	macl,r3
+	add	r2,r0
+	cmp/pz	r1
+	mov.w	LOCAL(x100),r2
+	bf/s	LOCAL(norm_frac)
+	 tst	r3,r3
+	shll2	r1	/* Shift one up, replace leading 1 with 0.  */
+	shlr	r1
+	tst	r3,r3
+LOCAL(norm_frac):
+	mov.w	LOCAL(mx80),r3
+	bf	LOCAL(round_frac)
+	tst	r2,r1
+LOCAL(round_frac):
+	mov.l	LOCAL(xff000000),r2
+	subc	r3,r1	/* Even overflow gives right result: exp++, frac=0.  */
+	shlr8	r1
+	add	r1,r0
+	shll	r0
+	bt	LOCAL(ill_exp)
+	tst	r2,r0
+	bt	LOCAL(denorm0)
+	cmp/hs	r2,r0
+	bt	LOCAL(inf)
+LOCAL(insert_sign):
+	div0s	r4,r5
+	rts
+	rotcr	r0
+LOCAL(denorm0):
+	sub	r2,r0
+	bra	LOCAL(insert_sign)
+	 shlr	r0
+LOCAL(zero_denorm_arg1):
+	mov.l	LOCAL(x60000000),r2	/* Check exp0 >= -64	*/
+	add	r1,r1
+	tst	r1,r1	/* arg1 == 0 ? */
+	mov	#0,r0
+	bt	LOCAL(insert_sign) /* argument 1 is zero ==> return 0  */
+	tst	r4,r2
+	bt	LOCAL(insert_sign) /* exp0 < -64  ==> return 0 */
+	mov.l	LOCAL(c__clz_tab),r0
+	mov	r3,r2
+	mov	r1,r3
+	bra	LOCAL(arg_normalize)
+	mov	r2,r1
+LOCAL(zero_denorm_arg0):
+	mov.l	LOCAL(x60000000),r2	/* Check exp1 >= -64	*/
+	add	r3,r3
+	tst	r3,r3	/* arg0 == 0 ? */
+	mov	#0,r0
+	bt	LOCAL(insert_sign) /* argument 0 is zero ==> return 0  */
+	tst	r5,r2
+	bt	LOCAL(insert_sign) /* exp1 < -64  ==> return 0 */
+	mov.l	LOCAL(c__clz_tab),r0
+LOCAL(arg_normalize):
+	mov.l	r7,@-r15
+	extu.w	r3,r7
+	cmp/eq	r3,r7
+	mov.l	LOCAL(xff000000),r7
+	mov	#-8,r2
+	bt	0f
+	tst	r7,r3
+	mov	#-16,r2
+	bt	0f
+	mov	#-24,r2
+0:
+	mov	r3,r7
+	shld	r2,r7
+#ifdef __pic__
+	add	r0,r7
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r7),r0
+	add	#32,r2
+	mov	r2,r7
+	mov	#23,r2
+	sub	r0,r7
+	mov.l	LOCAL(x7f800000),r0
+	shld	r7,r3
+	shld	r2,r7
+	mov	r0,r2
+	and	r4,r0
+	sub	r7,r0
+	mov.l	@r15+,r7
+	bra	LOCAL(arg_norm)
+	 dmulu.l	r3,r1
+#if 0 /* This is slightly slower, but could be used if table lookup causes
+         cache thrashing.  */
+	bt	LOCAL(insert_sign) /* exp1 < -64  ==> return 0 */
+	mov.l	LOCAL(xff000000),r2
+	mov	r4,r0
+LOCAL(arg_normalize):
+	tst	r2,r3
+	bf	LOCAL(arg_bit_norm)
+LOCAL(arg_byte_loop):
+	tst	r2,r3
+	add	r2,r0
+	shll8	r3
+	bt	LOCAL(arg_byte_loop)
+	add	r4,r0
+LOCAL(arg_bit_norm):
+	mov.l	LOCAL(x7f800000),r2
+	rotl	r3
+LOCAL(arg_bit_loop):
+	add	r2,r0
+	bf/s	LOCAL(arg_bit_loop)
+	 rotl	r3
+	rotr	r3
+	rotr	r3
+	sub	r2,r0
+	bra	LOCAL(arg_norm)
+	 dmulu.l	r3,r1
+#endif /* 0 */
+LOCAL(inf):
+	bra	LOCAL(insert_sign)
+	 mov	r2,r0
+LOCAL(inf_nan_arg0):
+	bt	LOCAL(inf_nan_both)
+	add	r0,r0
+	cmp/eq	#-1,r0	/* arg1 zero? -> NAN */
+	bt	LOCAL(insert_sign)
+	mov	r4,r0
+LOCAL(inf_insert_sign):
+	bra	LOCAL(insert_sign)
+	 add	r0,r0
+LOCAL(inf_nan_both):
+	mov	r4,r0
+	bra	LOCAL(inf_insert_sign)
+	 or	r5,r0
+LOCAL(inf_nan_arg1):
+	mov	r2,r0
+	add	r0,r0
+	cmp/eq	#-1,r0	/* arg0 zero? */
+	bt	LOCAL(insert_sign)
+	bra	LOCAL(inf_insert_sign)
+	 mov	r5,r0
+LOCAL(ill_exp):
+	cmp/pz	r0
+	mov	#-24,r3
+	bt	LOCAL(inf)
+	add	r1,r1
+	mov	r0,r2
+	sub	r1,r2	! remove fraction to get back pre-rounding exponent.
+	sts	mach,r0
+	sts	macl,r1
+	shad	r3,r2
+	mov	r0,r3
+	shld	r2,r0
+	add	#32,r2
+	cmp/pz	r2
+	shld	r2,r3
+	bf	LOCAL(zero)
+	or	r1,r3
+	mov	#-1,r1
+	tst	r3,r3
+	mov.w	LOCAL(x100),r3
+	bf/s	LOCAL(denorm_round_up)
+	mov	#-0x80,r1
+	tst	r3,r0
+LOCAL(denorm_round_up):
+	mov	#-7,r3
+	subc	r1,r0
+	bra	LOCAL(insert_sign)
+	 shld	r3,r0
+LOCAL(zero):
+	bra	LOCAL(insert_sign)
+	 mov #0,r0
+LOCAL(x100):
+	.word	0x100
+LOCAL(mx80):
+	.word	-0x80
+	.balign	4
+LOCAL(x7f800000):
+	.long 0x7f800000
+LOCAL(x3f800000):
+	.long 0x3f800000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x60000000):
+	.long	0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+	ENDFUNC(GLOBAL(mulsf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/muldf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/muldf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/muldf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/muldf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,601 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!multiplication of two double precision floating point numbers
+!Author:Aanchal Khanna
+!SH1 Support / Simplifications: Joern Rennecke
+!
+!Entry:
+!r4,r5:operand 1
+!
+!r6,r7:operand 2
+!
+!Exit:
+!r0,r1:result
+!
+!Notes: argument 1 is passed in regs r4 and r5 and argument 2 is passed in regs
+!r6 and r7, result is returned in regs r0 and r1. operand 1 is referred as op1
+!and operand 2 as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+	.text
+	.align	5	
+	.global	GLOBAL (muldf3)
+	FUNC (GLOBAL (muldf3))
+
+GLOBAL (muldf3):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+
+        mov     r6,r1
+        mov     r7,r6
+        mov     r1,r7
+#endif
+	mov.l	.L_mask_sign,r0
+	mov	r4,r2
+
+	and	r0,r2		
+	mov	#0,r1
+
+	shll	r4
+	and	r6,r0		
+	
+	xor     r2,r0		!r0 contains the result's sign bit
+	shlr	r4
+
+	mov.l   .L_inf,r2
+	shll	r6
+
+	mov	r4,r3
+	shlr	r6
+	
+.L_chk_a_inv:
+	!chk if op1 is Inf/NaN
+	and	r2,r3
+	mov.l	r8,@-r15
+
+	cmp/eq	r3,r2
+	mov.l	.L_mask_high_mant,r8
+
+	mov	r2,r3
+	bf	.L_chk_b_inv
+
+	mov	r8,r3
+	and	r4,r8
+
+	cmp/hi  r1,r8		
+	bt	.L_return_a	!op1 NaN, return op1
+
+	cmp/hi  r1,r5	
+	mov	r2,r8
+
+	bt      .L_return_a	!op1 NaN, return op1
+	and	r6,r8
+
+	cmp/eq	r8,r2		
+	and	r6,r3
+
+	bt      .L_b_inv
+	cmp/eq	r1,r6		
+
+	bf	.L_return_a	!op1 Inf,op2= normal no return op1
+	cmp/eq	r1,r7
+
+	bf	.L_return_a	!op1 Inf,op2= normal no return op1
+	mov.l   @r15+,r8	
+
+	rts
+	mov	#-1,DBLRH	!op1=Inf, op2=0,return nan
+
+.L_b_inv:
+	!op2 is NaN/Inf
+	cmp/hi	r1,r7
+	mov	r1,r2
+
+	mov	r5,r1
+	bt	.L_return_b	!op2=NaN,return op2
+
+	cmp/hi	r2,r6
+	or	r4,r0
+
+	bt	.L_return_b	!op2=NaN,return op2
+	mov.l   @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts			!op1=Inf,op2=Inf,return Inf with sign
+	nop
+
+.L_chk_b_inv:
+	!Chk if op2 is NaN/Inf
+	and	r6,r2
+	cmp/eq	r3,r2
+
+	bf	.L_chk_a_for_zero
+	and	r6,r8
+
+	cmp/hi	r1,r8
+	bt	.L_return_b	 !op2=NaN,return op2
+
+	cmp/hi	r1,r7
+	bt	.L_return_b	 !op2=NaN,return op2
+
+	cmp/eq	r5,r1
+	bf      .L_return_b	 !op1=normal number,op2=Inf,return Inf
+
+	mov	r7,r1
+	cmp/eq	r4,r1
+
+	bf	.L_return_b	/* op1=normal number, op2=Inf,return Inf */
+	mov.l   @r15+,r8
+
+	rts
+	mov	#-1,DBLRH	!op1=0,op2=Inf,return NaN
+
+.L_return_a:
+	mov	r5,r1
+	or	r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l   @r15+,r8
+
+.L_return_b:
+	mov	r7,r1
+	or	r6,r0	
+	
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+	
+.L_chk_a_for_zero:
+	!Chk if op1 is zero
+	cmp/eq	r1,r4
+	bf	.L_chk_b_for_zero
+	
+	cmp/eq	r1,r5
+	bf	.L_chk_b_for_zero
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+.L_chk_b_for_zero:
+	!op1=0,chk if op2 is zero
+        cmp/eq  r1,r6
+        mov	r1,r3
+	
+	mov.l   .L_inf,r1
+	bf      .L_normal_nos
+
+        cmp/eq  r3,r7
+        bf      .L_normal_nos
+
+	mov	r3,r1
+	mov.l   @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	nop
+
+.L_normal_nos:
+	!op1 and op2 are normal nos
+	mov.l	r9,@-r15
+	mov	r4,r3
+
+	mov     #-20,r9	
+	and	r1,r3	
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r9,r2
+#else
+        SHLR20 (r2)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r9,r3
+#else
+        SHLR20 (r3)
+#endif
+	cmp/pl	r3
+
+	bf	.L_norm_a	!normalize op1
+.L_chk_b:	
+	cmp/pl	r2
+	bf	.L_norm_b	!normalize op2
+
+.L_mul1:
+	add	r3,r2
+	mov.l  .L_1023,r1
+	
+	!resultant exponent in r2
+	add     r1,r2
+	mov.l   .L_2047,r1	
+
+	!Chk the exponent for overflow
+	cmp/ge	r1,r2
+	and     r8,r4
+
+	bt	.L_return_inf
+	mov.l	.L_imp_bit,r1
+	
+	or	r1,r4		
+	and	r8,r6
+
+	or	r1,r6
+	clrt
+
+	!multiplying the mantissas
+	DMULU_SAVE
+	DMULUL	(r7,r5,r1) 	!bits 0-31 of product 	
+
+	DMULUH	(r3)
+	
+	DMULUL	(r4,r7,r8)
+
+	addc	r3,r8
+
+	DMULUH	(r3)
+
+	movt	r9
+	clrt
+
+	DMULUL	(r5,r6,r7)
+
+	addc	r7,r8		!bits 63-32 of product
+
+	movt	r7
+	add	r7,r9
+
+	DMULUH	(r7)
+
+	add	r7,r3
+
+	add	r9,r3
+	clrt
+
+	DMULUL	(r4,r6,r7)
+
+	addc	r7,r3		!bits 64-95 of product
+
+	DMULUH	(r7)
+	DMULU_RESTORE
+	
+	mov	#0,r5
+	addc	r5,r7		!bits 96-105 of product
+
+	cmp/eq	r5,r1
+	mov     #1,r4
+
+	bt	.L_skip
+	or	r4,r8
+.L_skip:
+	mov.l   .L_106_bit,r4
+	mov	r8,r9
+
+.L_chk_extra_msb:
+	!chk if exra MSB is generated
+	and     r7,r4
+	cmp/eq	r5,r4
+
+	mov     #12,r4
+	SL(bf,	.L_shift_rt_by_1,
+	 mov     #31,r5)
+	
+.L_pack_mantissa:
+	!scale the mantissa t0 53 bits
+	mov	#-19,r6
+	mov.l	.L_mask_high_mant,r5
+
+        SHLRN (19, r6, r8)
+
+	and	r3,r5
+
+	shlr	r8
+	movt	r1
+
+        SHLLN (12, r4, r5)
+
+	add	#-1,r6
+
+	or	r5,r8		!lower bits of resulting mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r6,r3
+#else
+        SHLR20 (r3)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r4,r7
+#else
+        SHLL12 (r7)
+#endif
+	clrt
+
+	or	r7,r3		!higher bits of resulting mantissa
+	mov     #0,r7
+
+	!chk the exponent for underflow
+	cmp/ge	r2,r7
+	bt	.L_underflow
+
+	addc    r1,r8           !rounding
+	mov	r8,r1
+
+	addc	r7,r3		!rounding
+	mov.l	.L_mask_22_bit,r5
+
+	and	r3,r5
+	!chk if extra msb is generated after rounding
+	cmp/eq	r7,r5
+
+	mov.l	.L_mask_high_mant,r8
+	bt	.L_pack_result
+
+	add	#1,r2
+	mov.l	.L_2047,r6
+
+	cmp/ge	r6,r2
+
+	bt	.L_return_inf
+	shlr	r3
+
+	rotcr	r1
+
+.L_pack_result:
+	!pack the result, r2=exponent, r3=higher mantissa, r1=lower mantissa
+	!r0=sign bit
+	mov	#20,r6
+	and	r8,r3
+	
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r6,r2
+#else
+        SHLL20 (r2)
+#endif
+	or	r3,r0
+	
+	or      r2,r0
+	mov.l   @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_norm_a:
+	!normalize op1
+	shll	r5
+	mov.l	.L_imp_bit,r1
+
+	rotcl	r4
+	add	#-1,r3
+
+	tst	r1,r4
+	bt	.L_norm_a
+
+	bra	.L_chk_b
+	add	#1,r3
+
+.L_norm_b:
+	!normalize op2
+        shll    r7
+        mov.l   .L_imp_bit,r1
+
+        rotcl   r6
+        add     #-1,r2
+
+        tst     r1,r6
+        bt      .L_norm_b
+
+        bra     .L_mul1
+        add     #1,r2
+
+.L_shift_rt_by_1:
+	!adjust the extra msb
+
+	add     #1,r2           !add 1 to exponent
+	mov.l	.L_2047,r6
+
+	cmp/ge	r6,r2
+	mov	#20,r6
+
+	bt	.L_return_inf
+	shlr	r7		!r7 contains bit 96-105 of product
+
+	rotcr	r3		!r3 contains bit 64-95 of product
+
+	rotcr	r8		!r8 contains bit 32-63 of product
+	bra	.L_pack_mantissa
+
+	rotcr	r1		!r1 contains bit 31-0 of product
+
+.L_return_inf:
+	!return Inf
+	mov.l	.L_inf,r2
+	mov     #0,r1
+
+	or	r2,r0
+	mov.l   @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+	
+.L_underflow:
+	!check if the result needs to be denormalized
+	mov	#-53,r1
+	add	#1,r2
+
+	cmp/gt	r2,r1
+	mov	#32,r4
+
+	add	#-2,r2
+	bt	.L_return_zero
+
+	add	r2,r4
+	mov	r7,r1
+	
+	cmp/ge	r7,r4
+	mov	r2,r6
+
+	mov	#-54,r2
+	bt	.L_denorm
+
+	mov	#-32,r6
+	
+.L_denorm:
+	!denormalize the result
+	shlr	r8
+	rotcr	r1	
+
+	shll	r8
+	add	#1,r6
+
+	shlr	r3
+	rotcr	r8
+
+	cmp/eq	r7,r6
+	bf	.L_denorm
+
+	mov	r4,r6
+	cmp/eq	r2,r4
+
+	bt	.L_break
+	mov	r7,r5
+
+	cmp/gt	r6,r7
+	bf	.L_break
+
+	mov	r2,r4
+	mov	r1,r5
+
+	mov	r7,r1
+	bt	.L_denorm
+
+.L_break:
+	mov	#0,r2
+
+	cmp/gt	r1,r2
+
+	addc	r2,r8
+	mov.l	.L_comp_1,r4
+	
+	addc	r7,r3
+	or	r3,r0
+
+	cmp/eq	r9,r7
+	bf	.L_return
+
+	cmp/eq	r7,r5
+	mov.l	.L_mask_sign,r6
+
+	bf	.L_return
+	cmp/eq	r1,r6
+	
+	bf	.L_return
+	and	r4,r8
+
+.L_return:
+	mov.l	@r15+,r9
+	mov	r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_return_zero:
+	mov.l	@r15+,r9
+	mov	r7,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+	.align	2
+
+.L_mask_high_mant:
+	.long	0x000fffff
+.L_inf:
+	.long	0x7ff00000	
+.L_mask_sign:
+	.long	0x80000000
+.L_1023:
+	.long	-1023
+.L_2047:
+	.long	2047
+.L_imp_bit:
+	.long	0x00100000
+.L_mask_22_bit:
+	.long	0x00200000
+.L_106_bit:
+	.long	0x00000200
+.L_comp_1:
+	.long	0xfffffffe
+
+ENDFUNC (GLOBAL (muldf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/IEEE-754/mulsf3.S gcc-4.5.0/gcc/config/sh/IEEE-754/mulsf3.S
--- gcc-4.5.0/gcc/config/sh/IEEE-754/mulsf3.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/IEEE-754/mulsf3.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,352 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for multiplying two floating point numbers
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 and r5
+! Result: r0
+
+! The arguments are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (mulsf3)
+        FUNC (GLOBAL (mulsf3))
+
+GLOBAL (mulsf3):
+	! Extract the sign bits
+	mov.l	.L_sign,r3
+	mov	r3,r0
+
+	and	r4,r3		! sign bit for op1
+	mov.l	.L_sign_mask,r6
+
+	! Mask out the sign bit from op1 and op2
+	and	r5,r0		! sign bit for op2
+	mov.l	.L_inf,r2
+
+	and	r6,r4
+	xor	r3,r0		! Final sign in r0
+
+	and	r6,r5
+	tst	r4,r4
+
+	! Check for zero
+	mov	r5,r7
+	! Check op1 for zero
+	SL(bt,	.L_op1_zero,
+	 mov	r4,r6)
+
+	tst	r5,r5
+	bt	.L_op2_zero	! op2 is zero
+
+	! Extract the exponents
+	and	r2,r6		! Exponent of op1
+	cmp/eq	r2,r6
+
+	and	r2,r7
+	bt	.L_inv_op1	! op1 is NaN or Inf
+
+	mov.l	.L_mant,r3
+	cmp/eq	r2,r7
+
+	and	r3,r4	! Mantissa of op1
+	bt	.L_ret_op2	! op2 is Nan or Inf
+
+	and	r3,r5	! Mantissa of op2
+
+	mov	#-23,r3
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+	SHLR23 (r7)
+#else
+	shld	r3,r6
+	shld	r3,r7
+#endif
+	! Check for denormals
+	mov.l	.L_24bit,r3
+	tst	r6,r6
+
+	bt	.L_norm_op1	! op1 is denormal
+	add	#-127,r6	! Unbias op1's exp
+
+	tst	r7,r7
+	bt	.L_norm_op2	! op2 is denormal
+
+	add	#-127,r7	! Unbias op2's exp
+
+.L_multiply:
+	add	r6,r7	! Final exponent in r7
+	mov.l	.L_24bit,r1
+
+	! set 24th bit of mantissas
+	mov	#127,r3
+	or	r1,r4
+
+	DMULU_SAVE
+
+	! Multiply
+	or	r1,r5
+	DMULUL	(r4,r5,r4)
+
+	DMULUH	(r5)
+
+	DMULU_RESTORE
+
+	mov.l	.L_16bit,r6
+
+	! Check for extra MSB generated
+	tst	r5,r6
+
+	mov.l	.L_255,r1
+	bf	.L_shift_by_1	! Adjust the extra MSB
+	
+! Normalize the result with rounding
+.L_epil:
+	! Bias the exponent
+	add	#127,r7
+	cmp/ge	r1,r7
+	
+	! Check exponent overflow and underflow
+	bt	.L_ret_inf
+
+	cmp/pl	r7
+	bf	.L_denorm
+
+.L_epil_0:
+	mov	#-23,r3
+	shll	r5
+	mov	#0,r6
+
+! Fit resultant mantissa in 24 bits
+! Apply default rounding
+.L_loop_epil_0:
+        tst	r3,r3
+	bt	.L_loop_epil_out
+
+	add	#1,r3
+	shlr	r4
+
+	bra	.L_loop_epil_0
+	rotcr	r6
+
+! Round mantissa
+.L_loop_epil_out:
+	shll8	r5
+	or	r5,r4
+
+	mov.l	.L_mant,r2
+	mov	#23,r3
+
+	! Check last bit shifted out of result
+	tst	r6,r6
+	bt	.L_epil_2
+
+	! Round
+	shll	r6
+	movt	r5
+
+	add	r5,r4
+
+	! If this is the only ON bit shifted
+	! Round towards LSB = 0
+	tst	r6,r6
+	bf	.L_epil_2
+
+	shlr	r4
+	shll	r4
+
+.L_epil_2:
+	! Rounding may have produced extra MSB.
+	mov.l	.L_25bit,r5
+	tst	r4,r5
+
+	bt	.L_epil_1
+
+	add	#1,r7
+	shlr	r4
+
+.L_epil_1:
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r7)
+#else
+	shld	r3,r7
+#endif
+
+	and	r2,r4
+
+	or	r7,r4
+	rts
+	or	r4,r0
+
+.L_denorm:
+	mov	#0,r3
+
+.L_den_1:
+	shlr	r5
+	rotcr	r4
+
+	cmp/eq	r3,r7
+	bt	.L_epil_0
+
+	bra	.L_den_1
+	add	#1,r7
+	
+
+! Normalize the first argument
+.L_norm_op1:
+	shll	r4
+	tst	r3,r4
+
+	add	#-1,r6
+	bt	.L_norm_op1
+
+	! The biasing is by 126
+	add	#-126,r6
+	tst	r7,r7
+
+	bt      .L_norm_op2
+
+	bra	.L_multiply
+	add	#-127,r7
+
+! Normalize the second argument
+.L_norm_op2:
+	shll	r5
+	tst	r3,r5
+
+	add	#-1,r7
+	bt	.L_norm_op2
+
+	bra	.L_multiply
+	add	#-126,r7
+
+! op2 is zero. Check op1 for exceptional cases
+.L_op2_zero:
+	mov.l	.L_inf,r2
+	and	r2,r6
+
+	! Check if op1 is deterministic
+	cmp/eq	r2,r6
+	SL(bf,	.L_ret_op2,
+	 mov	#1,r1)
+
+	! Return NaN
+	rts
+	mov	#-1,r0
+
+! Adjust the extra MSB
+.L_shift_by_1:
+	shlr	r5
+	rotcr	r4
+
+	add	#1,r7		! Show the shift in exponent
+
+	cmp/gt	r3,r7
+	bf	.L_epil
+
+	! The resultant exponent is invalid
+	mov.l	.L_inf,r1
+	rts
+	or	r1,r0
+
+.L_ret_op1:
+	rts
+	or	r4,r0
+
+! op1 is zero. Check op2 for exceptional cases
+.L_op1_zero:
+	mov.l	.L_inf,r2
+	and	r2,r7
+	
+	! Check if op2 is deterministic
+	cmp/eq	r2,r7
+	SL(bf,	.L_ret_op1,
+	 mov	#1,r1)
+
+	! Return NaN
+	rts
+	mov	#-1,r0
+
+.L_inv_op1:
+	mov.l	.L_mant,r3
+	mov	r4,r6
+
+	and	r3,r6
+	tst	r6,r6
+
+	bf	.L_ret_op1	! op1 is Nan
+	! op1 is not Nan. It is Inf
+
+	cmp/eq	r2,r7
+	bf	.L_ret_op1	! op2 has a valid exponent
+
+! op2 has a invalid exponent. It could be Inf, -Inf, Nan.
+! It doesn't make any difference.
+.L_ret_op2:
+	rts
+	or	r5,r0
+
+.L_ret_inf:
+	rts
+	or	r2,r0
+
+.L_ret_zero:
+	mov	#0,r2
+	rts
+	or	r2,r0
+
+	
+	.align 2
+.L_mant:
+	.long 0x007FFFFF
+
+.L_inf:
+	.long 0x7F800000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_25bit:
+	.long 0x01000000
+
+.L_16bit:
+	.long 0x00008000
+
+.L_sign:
+	.long 0x80000000
+
+.L_sign_mask:
+	.long 0x7FFFFFFF
+
+.L_255:
+	.long 0x000000FF
+
+ENDFUNC (GLOBAL (mulsf3))
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/ieee-754-df.S gcc-4.5.0/gcc/config/sh/ieee-754-df.S
--- gcc-4.5.0/gcc/config/sh/ieee-754-df.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/ieee-754-df.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,789 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_DOUBLE__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Double-precision floating-point emulation.
+   We handle NANs, +-infinity, and +-zero.
+   However, we assume that for NANs, the topmost bit of the fraction is set.  */
+
+#ifdef __LITTLE_ENDIAN__
+#define DBL0L r4
+#define DBL0H r5
+#define DBL1L r6
+#define DBL1H r7
+#define DBLRL r0
+#define DBLRH r1
+#else
+#define DBL0L r5
+#define DBL0H r4
+#define DBL1L r7
+#define DBL1H r6
+#define DBLRL r1
+#define DBLRH r0
+#endif
+
+#ifdef __SH_FPU_ANY__
+#define RETURN_R0_MAIN
+#define RETURN_R0 bra LOCAL(return_r0)
+#define RETURN_FR0
+LOCAL(return_r0): \
+ lds r0,fpul; \
+ rts; \
+ fsts fpul,fr0
+#define ARG_TO_R4 \
+ flds fr4,fpul; \
+ sts fpul,r4
+#else /* ! __SH_FPU_ANY__ */
+#define RETURN_R0_MAIN rts
+#define RETURN_R0 rts
+#define RETURN_FR0
+#define ARG_TO_R4
+#endif /* ! __SH_FPU_ANY__ */
+
+#ifdef L_nedf2
+/* -ffinite-math-only -mb inline version, T := r4:DF == r6:DF
+	cmp/eq	r5,r7
+	mov	r4,r0
+	bf	0f
+	cmp/eq	r4,r6
+	bt	0f
+	or	r6,r0
+	add	r0,r0
+	or	r5,r0
+	tst	r0,r0
+	0:			*/
+	.balign 4
+	.global GLOBAL(nedf2)
+	HIDDEN_FUNC(GLOBAL(nedf2))
+GLOBAL(nedf2):
+	cmp/eq	DBL0L,DBL1L
+	mov.l   LOCAL(c_DF_NAN_MASK),r1
+	bf LOCAL(ne)
+	cmp/eq	DBL0H,DBL1H
+	not	DBL0H,r0
+	bt	LOCAL(check_nan)
+	mov	DBL0H,r0
+	or	DBL1H,r0
+	add	r0,r0
+	rts
+	or	DBL0L,r0
+LOCAL(check_nan):
+	tst	r1,r0
+	rts
+	movt	r0
+LOCAL(ne):
+	rts
+	mov #1,r0
+	.balign 4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(nedf2))
+#endif /* L_nedf2 */
+
+#ifdef L_unorddf2
+	.balign 4
+	.global GLOBAL(unorddf2)
+	HIDDEN_FUNC(GLOBAL(unorddf2))
+GLOBAL(unorddf2):
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	not	DBL0H,r0
+	tst	r1,r0
+	not	r6,r0
+	bt	LOCAL(unord)
+	tst	r1,r0
+LOCAL(unord):
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(unorddf2))
+#endif /* L_unorddf2 */
+
+#if defined(L_gtdf2t) || defined(L_gtdf2t_trap)
+#ifdef L_gtdf2t
+#define fun_label GLOBAL(gtdf2t)
+#else
+#define fun_label GLOBAL(gtdf2t_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater, the result true, unless
+	   any of them is a nan (but infinity is fine), or both values are
+	   +- zero.  Otherwise, the result false.  */
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	cmp/pz	DBL0H
+	not	DBL1H,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	DBL0H,r0
+	bt	LOCAL(nan) /* return zero if DBL1 is NAN.  */
+	cmp/eq	DBL1H,DBL0H
+	bt	LOCAL(cmp_low)
+	cmp/gt	DBL1H,DBL0H
+	or	DBL1H,r0
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/gt	DBL0H,r1)
+	add	r0,r0
+	bf	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	or	DBL0L,r0
+	rts
+	or	DBL1L,r0 /* non-zero unless both DBL0 and DBL1 are +-zero.  */
+LOCAL(cmp_low):
+	cmp/hi	DBL1L,DBL0L
+	rts
+	movt	r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan) /* return zero if DBL1 is NAN.  */
+	cmp/eq	DBL1H,DBL0H
+	SLC(bt,	LOCAL(neg_cmp_low),
+	 cmp/hi	DBL0L,DBL1L)
+	not	DBL0H,r0
+	tst	r1,r0
+	bt	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	cmp/hi	DBL0H,DBL1H
+	SLI(rts	!,)
+	SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+	SLI(cmp/hi	DBL0L,DBL1L)
+	rts
+	movt	r0
+LOCAL(check_nan):
+#ifdef L_gtdf2t
+LOCAL(nan):
+	rts
+	mov	#0,r0
+#else
+	SLI(cmp/gt DBL0H,r1)
+	bf	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	rts
+	mov	#0,r0
+LOCAL(nan):
+	mov	#0,r0
+	trapa	#0
+#endif
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(fun_label)
+#endif /* defined(L_gtdf2t) || defined(L_gtdf2t_trap) */
+
+#ifdef L_gedf2f
+	.balign 4
+	.global GLOBAL(gedf2f)
+	HIDDEN_FUNC(GLOBAL(gedf2f))
+GLOBAL(gedf2f):
+	/* If the raw values compare greater or equal, the result is
+	   true, unless any of them is a nan, or both are the
+	   same infinity.  If both are -+zero, the result is true;
+	   otherwise, it is false.
+	   We use 0 as true and nonzero as false for this function.  */
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	cmp/pz	DBL1H
+	not	DBL0H,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	DBL0H,r0
+	bt	LOCAL(nan)
+	cmp/eq	DBL0H,DBL1H
+	bt	LOCAL(cmp_low)
+	cmp/gt	DBL0H,DBL1H
+	or	DBL1H,r0
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/ge	r1,DBL1H)
+	add	r0,r0
+	bt	LOCAL(nan)
+	or	DBL0L,r0
+	rts
+	or	DBL1L,r0
+LOCAL(cmp_low):
+	cmp/hi	DBL0L,DBL1L
+#if defined(L_gedf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+	rts
+	movt	r0
+#if defined(L_gedf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,DBL1H)
+LOCAL(nan):
+	rts
+	movt	r0
+#elif defined(L_gedf2f_trap)
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,DBL1H)
+	bt	LOCAL(nan)
+	rts
+LOCAL(nan):
+	movt	r0
+	trapa	#0
+#endif /* L_gedf2f_trap */
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	cmp/eq	DBL0H,DBL1H
+	not	DBL1H,r0
+	SLC(bt,	LOCAL(neg_cmp_low),
+	 cmp/hi	DBL1L,DBL0L)
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	DBL1H,DBL0H
+	SLI(rts !,)
+	SLI(movt	r0 !,)
+LOCAL(neg_cmp_low):
+	SLI(cmp/hi	DBL1L,DBL0L)
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(gedf2f))
+#endif /* L_gedf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_extendsfdf2
+	.balign 4
+	.global GLOBAL(extendsfdf2)
+	FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+	ARG_TO_R4
+	mov.l	LOCAL(x7f800000),r3
+	mov	r4,DBLRL
+	tst	r3,r4
+	bt	LOCAL(zero_denorm)
+	mov.l	LOCAL(xe0000000),r2
+	rotr	DBLRL
+	rotr	DBLRL
+	rotr	DBLRL
+	and	r2,DBLRL
+	mov	r4,DBLRH
+	not	r4,r2
+	tst	r3,r2
+	mov.l	LOCAL(x38000000),r2
+	bf	0f
+	add	r2,r2	! infinity / NaN adjustment
+0:	shll	DBLRH
+	shlr2	DBLRH
+	shlr2	DBLRH
+	add	DBLRH,DBLRH
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+LOCAL(zero_denorm):
+	mov.l	r4,@-r15
+	add	r4,r4
+	tst	r4,r4
+	bt	LOCAL(zero)
+	shlr8	r3	/* 0x007f8000 */
+	mov.w	LOCAL(x389),r2
+LOCAL(shift_byte):
+	tst	r3,r4
+	shll8	r4
+	SL(bt,	LOCAL(shift_byte),
+	 add	#-8,r2)
+LOCAL(shift_bit):
+	shll	r4
+	SL(bf,	LOCAL(shift_bit),
+	 add	#-1,r2)
+	mov	#0,DBLRL
+	mov	r4,DBLRH
+	mov.l	@r15+,r4
+	shlr8	DBLRH
+	shlr2	DBLRH
+	shlr	DBLRH
+	rotcr	DBLRL
+	cmp/gt	r4,DBLRH	! get sign
+	rotcr	DBLRH
+	rotcr	DBLRL
+	shll16	r2
+	shll8	r2
+	rts
+	add	r2,DBLRH
+LOCAL(zero):
+	mov.l	@r15+,DBLRH
+	rts
+	mov	#0,DBLRL
+LOCAL(x389):	.word 0x389
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(xe0000000):
+	.long	0xe0000000
+	ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+	.balign 4
+	.global GLOBAL(truncdfsf2)
+	FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+	mov.l	LOCAL(x38000000),r3	! exponent adjustment DF -> SF
+	mov	DBL0H,r1
+	mov.l	LOCAL(x70000000),r2	! mask for out-of-range exponent bits
+	mov	DBL0H,r0
+	mov.l	DBL0L,@-r15
+	sub	r3,r1
+	tst	r2,r1
+	shll8	r0			!
+	shll2	r0			! Isolate highpart fraction.
+	shll2	r0			!
+	bf	LOCAL(ill_exp)
+	shll2	r1
+	mov.l	LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits.  */
+	shll2	r1
+	mov.l	LOCAL(xff000000),r3
+	shlr8	r0
+	tst	r2,DBL0L /* Check if msb guard bit wants rounding up.  */
+	shlr16	DBL0L
+	shlr8	DBL0L
+	shlr2	DBL0L
+	SL1(bt,	LOCAL(add_frac),
+	 shlr2	DBL0L)
+	add	#1,DBL0L
+LOCAL(add_frac):
+	add	DBL0L,r0
+	mov.l	LOCAL(x01000000),r2
+	and	r3,r1
+	mov.l	@r15+,DBL0L
+	add	r1,r0
+	tst	r3,r0
+	bt	LOCAL(inf_denorm0)
+	cmp/hs	r3,r0
+LOCAL(denorm_noup_sh1):
+	bt	LOCAL(inf)
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+RETURN_R0_MAIN
+	rotcr	r0
+RETURN_FR0
+LOCAL(inf_denorm0):	!  We might need to undo previous rounding.
+	mov.l	LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits.  */
+	tst	r1,r1
+	bf	LOCAL(inf)
+	add	#-1,r0
+	tst	r3,DBL0L /* Check if msb guard bit was rounded up.  */
+	mov.l	LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits.  */
+	addc	r2,r0
+	shlr	r0
+	tst	r3,DBL0L /* Check if msb guard bit wants rounding up.  */
+#ifdef DELAYED_BRANCHES
+	bt/s	LOCAL(denorm_noup)
+#else
+	bt	LOCAL(denorm_noup_sh1)
+#endif
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	add	#1,r0
+LOCAL(denorm_noup):
+	RETURN_R0
+	rotcr	r0
+LOCAL(ill_exp):
+	div0s	DBL0H,r1
+	mov.l	LOCAL(x7ff80000),r2
+	add	r1,r1
+	bf	LOCAL(inf_nan)
+	mov.w	LOCAL(m32),r3 /* Handle denormal or zero.  */
+	shlr16	r1
+	exts.w	r1,r1
+	shll2	r1
+	add	r1,r1
+	shlr8	r1
+	exts.w	r1,r1
+	add	#-8,r1	/* Go from 9 to 1 guard bit in MSW.  */
+	cmp/gt	r3,r1
+	mov.l	@r15+,r3 /* DBL0L */
+	bf	LOCAL(zero)
+	mov.l	DBL0L, @-r15
+	shll8	DBL0L
+	rotcr	r0	/* Insert leading 1.  */
+	shlr16	r3
+	shll2	r3
+	add	r3,r3
+	shlr8	r3
+	cmp/pl	DBL0L	/* Check lower 23 guard bits if guard bit 23 is 0.  */
+	addc	r3,r0	/* Assemble fraction with compressed guard bits.  */
+	mov.l	@r15+,DBL0L
+	mov	#0,r2
+	neg	r1,r1
+LOCAL(denorm_loop):
+	shlr	r0
+	rotcl	r2
+	dt	r1
+	bf	LOCAL(denorm_loop)
+	tst	#2,r0
+	rotcl	r0
+	tst	r2,r2
+	rotcl	r0
+	xor	#3,r0
+	add	#3,r0	/* Even overflow gives the correct result.  */
+	shlr2	r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(zero):
+	mov	#0,r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(inf_nan):
+	not	DBL0H,r0
+	tst	r2,r0
+	mov.l	@r15+,DBL0L
+	bf	LOCAL(inf)
+	RETURN_R0
+	mov	#-1,r0	/* NAN */
+LOCAL(inf):	/* r2 must be positive here.  */
+	mov.l	LOCAL(xffe00000),r0
+	div0s	r2,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(m32):
+	.word	-32
+	.balign	4
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(x70000000):
+	.long	0x70000000
+LOCAL(x2fffffff):
+	.long	0x2fffffff
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x5fffffff):
+	.long	0x5fffffff
+LOCAL(x7ff80000):
+	.long	0x7ff80000
+LOCAL(xffe00000):
+	.long	0xffe00000
+	ENDFUNC(GLOBAL(truncdfsf2))
+#endif /*  L_truncdfsf2 */
+#ifdef L_add_sub_df3
+#include "IEEE-754/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift.  Supporting SH1 / SH2 here would
+   make this code too hard to maintain, so if you want to add SH1 / SH2
+   support, do it in a separate copy.  */
+#ifdef DYN_SHIFT
+#ifdef L_extendsfdf2
+	.balign 4
+	.global GLOBAL(extendsfdf2)
+	FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+	ARG_TO_R4
+	mov.l	LOCAL(x7f800000),r2
+	mov	#29,r3
+	mov	r4,DBLRL
+	not	r4,DBLRH
+	tst	r2,r4
+	shld	r3,DBLRL
+	bt	LOCAL(zero_denorm)
+	mov	#-3,r3
+	tst	r2,DBLRH
+	mov	r4,DBLRH
+	mov.l	LOCAL(x38000000),r2
+	bt/s	LOCAL(inf_nan)
+	 shll	DBLRH
+	shld	r3,DBLRH
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+	.balign	4
+LOCAL(inf_nan):
+	shld	r3,DBLRH
+	add	r2,r2
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+LOCAL(zero_denorm):
+	mov.l	r4,@-r15
+	add	r4,r4
+	tst	r4,r4
+	extu.w	r4,r2
+	bt	LOCAL(zero)
+	cmp/eq	r4,r2
+	extu.b	r4,r1
+	bf/s	LOCAL(three_bytes)
+	 mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r4,r1
+	mov	#22,DBLRH
+	bt	LOCAL(one_byte)
+	shlr8	r2
+	mov	#14,DBLRH
+LOCAL(one_byte):
+#ifdef __pic__
+	add	r0,r2
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r2),r2
+	mov	#21,r3
+	mov.w	LOCAL(x0),DBLRL
+	sub	r2,DBLRH
+LOCAL(norm_shift):
+	shld	DBLRH,r4
+	mov.l	@r15+,r2
+	shld	r3,DBLRH
+	mov.l	LOCAL(xb7ffffff),r3
+	add	r4,DBLRH
+	cmp/pz	r2
+	mov	r2,r4
+	rotcr	DBLRH
+	rts
+	sub	r3,DBLRH
+LOCAL(three_bytes):
+	mov	r4,r2
+	shlr16	r2
+#ifdef __pic__
+	add	r0,r2
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r2),r2
+	mov	#21,r3
+	mov	#6-32,DBLRH
+	sub	r2,DBLRH
+	mov	r4,DBLRL
+	shld	DBLRH,DBLRL
+	bra	LOCAL(norm_shift)
+	add	#32,DBLRH
+LOCAL(zero):
+	rts	/* DBLRL has already been zeroed above.  */
+	mov.l @r15+,DBLRH
+LOCAL(x0):
+	.word 0
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(xb7ffffff):
+	/* Flip sign back, do exponent adjustment, and remove leading one.  */
+	.long 0x80000000 + 0x38000000 - 1
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+	ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+	.balign 4
+	.global GLOBAL(truncdfsf2)
+	FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+	mov.l	LOCAL(x38000000),r3
+	mov	DBL0H,r1
+	mov.l	LOCAL(x70000000),r2
+	mov	DBL0H,r0
+	sub	r3,r1
+	mov.l	DBL0L,@-r15
+	tst	r2,r1
+	mov	#12,r3
+	shld	r3,r0			! Isolate highpart fraction.
+	bf	LOCAL(ill_exp)
+	shll2	r1
+	mov.l	LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits.  */
+	shll2	r1
+	mov.l	LOCAL(xff000000),r3
+	shlr8	r0
+	tst	r2,DBL0L /* Check if msb guard bit wants rounding up.  */
+	mov	#-28,r2
+	bt/s	LOCAL(add_frac)
+	 shld	r2,DBL0L
+	add	#1,DBL0L
+LOCAL(add_frac):
+	add	DBL0L,r0
+	mov.l	LOCAL(x01000000),r2
+	and	r3,r1
+	mov.l	@r15+,DBL0L
+	add	r1,r0
+	tst	r3,r0
+	bt	LOCAL(inf_denorm0)
+#if 0	// No point checking overflow -> infinity if we dont't raise a signal.
+	cmp/hs	r3,r0
+	bt	LOCAL(inf)
+#endif
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	RETURN_R0_MAIN
+	rotcr	r0
+RETURN_FR0
+LOCAL(inf_denorm0):	! We might need to undo previous rounding.
+	mov.l	LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits.  */
+	tst	r1,r1
+	bf	LOCAL(inf)
+	add	#-1,r0
+	tst	r3,DBL0L /* Check if msb guard bit was rounded up.  */
+	mov.l	LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits.  */
+	addc	r2,r0
+	shlr	r0
+	tst	r3,DBL0L /* Check if msb guard bit wants rounding up.  */
+	bt/s	LOCAL(denorm_noup)
+	 div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	add	#1,r0
+LOCAL(denorm_noup):
+	RETURN_R0
+	rotcr	r0
+LOCAL(ill_exp):
+	div0s	DBL0H,r1
+	mov.l	LOCAL(x7ff80000),r2
+	add	r1,r1
+	bf	LOCAL(inf_nan)
+	mov.w	LOCAL(m32),r3 /* Handle denormal or zero.  */
+	mov	#-21,r2
+	shad	r2,r1
+	add	#-8,r1	/* Go from 9 to 1 guard bit in MSW.  */
+	cmp/gt	r3,r1
+	mov.l	@r15+,r3 /* DBL0L */
+	bf	LOCAL(zero)
+	mov.l	DBL0L, @-r15
+	shll8	DBL0L
+	rotcr	r0	/* Insert leading 1.  */
+	shld	r2,r3
+	cmp/pl	DBL0L	/* Check lower 23 guard bits if guard bit 23 is 0.  */
+	addc	r3,r0	/* Assemble fraction with compressed guard bits.  */
+	mov	r0,r2
+	shld	r1,r0
+	mov.l	@r15+,DBL0L
+	add	#32,r1
+	shld	r1,r2
+	tst	#2,r0
+	rotcl	r0
+	tst	r2,r2
+	rotcl	r0
+	xor	#3,r0
+	add	#3,r0	/* Even overflow gives the correct result.  */
+	shlr2	r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(zero):
+	mov	#0,r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(inf_nan):
+	not	DBL0H,r0
+	tst	r2,r0
+	mov.l	@r15+,DBL0L
+	bf	LOCAL(inf)
+	RETURN_R0
+	mov	#-1,r0	/* NAN */
+LOCAL(inf):	/* r2 must be positive here.  */
+	mov.l	LOCAL(xffe00000),r0
+	div0s	r2,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(m32):
+	.word	-32
+	.balign	4
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(x70000000):
+	.long	0x70000000
+LOCAL(x2fffffff):
+	.long	0x2fffffff
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x5fffffff):
+	.long	0x5fffffff
+LOCAL(x7ff80000):
+	.long	0x7ff80000
+LOCAL(xffe00000):
+	.long	0xffe00000
+	ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+
+
+#ifdef L_add_sub_df3
+#include "IEEE-754/m3/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/m3/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/m3/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/m3/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/m3/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/m3/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/m3/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_DOUBLE__ */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/ieee-754-sf.S gcc-4.5.0/gcc/config/sh/ieee-754-sf.S
--- gcc-4.5.0/gcc/config/sh/ieee-754-sf.S	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/ieee-754-sf.S	2010-07-14 11:58:33.000000000 +0530
@@ -0,0 +1,697 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_ANY__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Single-precision floating-point emulation.
+   We handle NANs, +-infinity, and +-zero.
+   However, we assume that for NANs, the topmost bit of the fraction is set.  */
+#ifdef L_nesf2
+/* -ffinite-math-only inline version, T := r4:SF == r5:SF
+	cmp/eq	r4,r5
+	mov	r4,r0
+	bt	0f
+	or	r5,r0
+	add	r0,r0
+	tst	r0,r0	! test for +0.0 == -0.0 ; -0.0 == +0.0
+	0:			*/
+	.balign 4
+	.global GLOBAL(nesf2)
+	HIDDEN_FUNC(GLOBAL(nesf2))
+GLOBAL(nesf2):
+        /* If the raw values are unequal, the result is unequal, unless
+	   both values are +-zero.
+	   If the raw values are equal, the result is equal, unless
+	   the values are NaN.  */
+	cmp/eq	r4,r5
+	mov.l   LOCAL(c_SF_NAN_MASK),r1
+	not	r4,r0
+	bt	LOCAL(check_nan)
+	mov	r4,r0
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(check_nan):
+	tst	r1,r0
+	rts
+	movt	r0
+	.balign 4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(nesf2))
+#endif /* L_nesf2 */
+
+#ifdef L_unordsf2
+	.balign 4
+	.global GLOBAL(unordsf2)
+	HIDDEN_FUNC(GLOBAL(unordsf2))
+GLOBAL(unordsf2):
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	not	r4,r0
+	tst	r1,r0
+	not	r5,r0
+	bt	LOCAL(unord)
+	tst	r1,r0
+LOCAL(unord):
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(unordsf2))
+#endif /* L_unordsf2 */
+
+#if defined(L_gtsf2t) || defined(L_gtsf2t_trap)
+/* -ffinite-math-only inline version, T := r4:SF > r5:SF ? 0 : 1
+	cmp/pz	r4
+	mov	r4,r0
+	bf/s	0f
+	 cmp/hs	r5,r4
+	cmp/ge	r4,r5
+	or	r5,r0
+	bt	0f
+	add	r0,r0
+	tst	r0,r0
+	0:			*/
+#ifdef L_gtsf2t
+#define fun_label GLOBAL(gtsf2t)
+#else
+#define fun_label GLOBAL(gtsf2t_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater, the result true, unless
+	   any of them is a nan (but infinity is fine), or both values are
+	   +- zero.  Otherwise, the result false.  */
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	cmp/pz	r4
+	not	r5,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	r4,r0
+	bt	LOCAL(nan)
+	cmp/gt	r5,r4
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/gt	r4,r1)
+	bf	LOCAL(nan)
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	not	r4,r0
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	r4,r5
+#if defined(L_gtsf2t) && defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+#endif /* DELAYED_BRANCHES */
+	rts
+	movt	r0
+#ifdef L_gtsf2t
+LOCAL(check_nan):
+LOCAL(nan):
+	rts
+	mov	#0,r0
+#else /* ! L_gtsf2t */
+LOCAL(check_nan):
+	SLI(cmp/gt	r4,r1)
+	bf	LOCAL(nan)
+	rts
+	movt	r0
+LOCAL(nan):
+	mov	#0,r0
+	trapa	#0
+#endif /* ! L_gtsf2t */
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(fun_label)
+#endif /* L_gtsf2t */
+
+#if defined(L_gesf2f) || defined(L_gesf2f_trap)
+/* -ffinite-math-only inline version, T := r4:SF >= r5:SF */
+	cmp/pz	r5
+	mov	r4,r0
+	bf/s	0f
+	 cmp/hs	r4,r5
+	cmp/ge	r5,r4
+	or	r5,r0
+	bt	0f
+	add	r0,r0
+	tst	r0,r0
+	0:
+#ifdef L_gesf2f
+#define fun_label GLOBAL(gesf2f)
+#else
+#define fun_label GLOBAL(gesf2f_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater or equal, the result is
+	   true, unless any of them is a nan.  If both are -+zero, the
+	   result is true; otherwise, it is false.
+	   We use 0 as true and nonzero as false for this function.  */
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	cmp/pz	r5
+	not	r4,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	r4,r0
+	bt	LOCAL(nan)
+	cmp/gt	r4,r5
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/ge	r1,r5)
+	bt	LOCAL(nan)
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	not	r5,r0
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	r5,r4
+#if defined(L_gesf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+	rts
+	movt	r0
+#if defined(L_gesf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+	cmp/ge	r1,r5
+LOCAL(nan):
+	rts
+	movt	r0
+#endif /* ! DELAYED_BRANCHES */
+#ifdef L_gesf2f_trap
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,r5)
+	bt	LOCAL(nan)
+	rts
+LOCAL(nan):
+	movt	r0
+	trapa	#0
+#endif /* L_gesf2f_trap */
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(gesf2f))
+#endif /* L_gesf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_add_sub_sf3
+#include "IEEE-754/addsf3.S"
+#endif /* _add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+#include "IEEE-754/fixunssfsi.S"
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+#include "IEEE-754/fixsfsi.S"
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/divsf3.S"
+#endif /* L_divsf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift.  Supporting SH1 / SH2 here would
+   make this code too hard to maintain, so if you want to add SH1 / SH2
+   support, do it in a separate copy.  */
+#ifdef DYN_SHIFT
+#ifdef L_add_sub_sf3
+#include "IEEE-754/m3/addsf3.S"
+#endif /* L_add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/m3/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get UINT_MAX, for set sign bit, you get 0.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixunssfsi)
+	FUNC(GLOBAL(fixunssfsi))
+GLOBAL(fixunssfsi):
+	mov.l	LOCAL(max),r2
+	mov	#-23,r1
+	mov	r4,r0
+	shad	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/ge	r2,r0
+	or	r2,r0
+	bt	LOCAL(retmax)
+	cmp/pz	r4
+	and	r1,r0
+	bf	LOCAL(ret0)
+	add	#-23,r4
+	rts
+	shld	r4,r0
+LOCAL(ret0):
+LOCAL(retmax):
+	rts
+	subc	r0,r0
+	.balign 4
+LOCAL(mask):
+	.long	0x00ffffff
+LOCAL(max):
+	.long	0x4f800000
+	ENDFUNC(GLOBAL(fixunssfsi))
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get INT_MAX, for set sign bit, you get INT_MIN.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixsfsi)
+	FUNC(GLOBAL(fixsfsi))
+	.balign	4
+GLOBAL(fixsfsi):
+	mov	r4,r0
+	shll	r4
+	mov	#-24,r1
+	bt	LOCAL(neg)
+	mov.l	LOCAL(max),r2
+	shld	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/pz	r4
+	add	#-23,r4
+	bf	LOCAL(ret0)
+	cmp/gt	r0,r2
+	bf	LOCAL(retmax)
+	and	r1,r0
+	addc	r1,r0
+	rts
+	shld	r4,r0
+
+	.balign	4
+LOCAL(neg):
+	mov.l	LOCAL(min),r2
+	shld	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/pz	r4
+	add	#-23,r4
+	bf	LOCAL(ret0)
+	cmp/gt	r0,r2
+	bf	LOCAL(retmin)
+	and	r1,r0
+	addc	r1,r0
+	shld	r4,r0	! SH4-200 will start this insn on a new cycle
+	rts
+	neg	r0,r0
+
+	.balign	4
+LOCAL(ret0):
+	rts
+	mov	#0,r0
+
+LOCAL(retmax):
+	mov	#-1,r0
+	rts
+	shlr	r0
+
+LOCAL(retmin):
+	mov	#1,r0
+	rts
+	rotr	r0
+
+	.balign 4
+LOCAL(mask):
+	.long	0x007fffff
+LOCAL(max):
+	.long	0x4f000000
+LOCAL(min):
+	.long	0xcf000000
+	ENDFUNC(GLOBAL(fixsfsi))
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/m3/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/m3/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/m3/divsf3.S"
+#endif /* L_divsf3 */
+
+#ifdef L_hypotf
+	.balign 4
+	.global GLOBAL(hypotf)
+	FUNC(GLOBAL(hypotf))
+GLOBAL(hypotf):
+/* This integer implementation takes 71 to 72 cycles in the main path.
+   This is a bit slower than the SH4 can do this computation using double
+   precision hardware floating point - 57 cycles, or 69 with mode switches.  */
+ /* First, calculate x (r4) as the sum of the square of the fractions -
+    the exponent is calculated separately in r3.
+    Then, calculate sqrt(x) for the fraction by reciproot iteration.
+    We get an 7.5 bit inital value using linear approximation with two slopes
+    that are powers of two.
+    x (- [1. .. 2.)  y0 := 1.25 - x/4 - tab(x)   y (- (0.8 .. 1.0)
+    x (- [2. .. 4.)  y0 := 1.   - x/8 - tab(x)   y (- (0.5 .. 0.8)
+ x is represented with two bits before the point,
+ y with 0 bits before the binary point.
+ Thus, to calculate y0 := 1. - x/8 - tab(x), all you have to do is to shift x
+ right by 1, negate it, and subtract tab(x).  */
+
+ /* y1 := 1.5*y0 - 0.5 * (x * y0) * (y0 * y0)
+    z0 := x * y1
+    z1 := z0 + 0.5 * (y1 - (y1*y1) * z0) */
+
+	mov.l	LOCAL(xff000000),r1
+	add	r4,r4
+	mov	r4,r0
+	add	r5,r5
+	cmp/hs	r5,r4
+	sub	r5,r0
+	mov	#-24,r2
+	bf/s	LOCAL(r5_large)
+	shad	r2,r0
+	mov	r4,r3
+	shll8	r4
+	rotcr	r4
+	tst	#0xe0,r0
+	neg	r0,r0
+	bt	LOCAL(ret_abs_r3)
+	tst	r1,r5
+	shll8	r5
+	bt/s	LOCAL(denorm_r5)
+	cmp/hi	r3,r1
+	dmulu.l	r4,r4
+	bf	LOCAL(inf_nan)
+	rotcr	r5
+	shld	r0,r5
+LOCAL(denorm_r5_done):
+	sts	mach,r4
+	dmulu.l	r5,r5
+	mov.l	r6,@-r15
+	mov	#20,r6
+
+	sts	mach,r5
+LOCAL(add_frac):
+	mova	LOCAL(tab)-32,r0
+	mov.l	r7,@-r15
+	mov.w	LOCAL(x1380),r7
+	and	r1,r3
+	addc	r5,r4
+	mov.w	LOCAL(m25),r2	! -25
+	bf	LOCAL(frac_ok)
+	sub	r1,r3
+	rotcr	r4
+	cmp/eq	r1,r3	! did we generate infinity ?
+	bt	LOCAL(inf_nan)
+	shlr	r4
+	mov	r4,r1
+	shld	r2,r1
+	mov.b	@(r0,r1),r0
+	mov	r4,r1
+	shld	r6,r1
+	bra	LOCAL(frac_low2)
+	sub	r1,r7
+
+LOCAL(frac_ok):
+	mov	r4,r1
+	shld	r2,r1
+	mov.b	@(r0,r1),r1
+	cmp/pz	r4
+	mov	r4,r0
+	bt/s	LOCAL(frac_low)
+	shld	r6,r0
+	mov.w	LOCAL(xf80),r7
+	shlr	r0
+LOCAL(frac_low):
+	sub	r0,r7
+LOCAL(frac_low2):
+	mov.l	LOCAL(x40000080),r0 ! avoid denorm results near 1. << r3
+	sub	r1,r7	! {0.12}
+	mov.l	LOCAL(xfffe0000),r5 ! avoid rounding overflow near 4. << r3
+	swap.w	r7,r1	! {0.28}
+	dmulu.l	r1,r4 /* two issue cycles */
+	mulu.w	r7,r7  /* two issue cycles */
+	sts	mach,r2	! {0.26}
+	mov	r1,r7
+	shlr	r1
+	sts	macl,r6	! {0.24}
+	cmp/hi	r0,r4
+	shlr2	r2
+	bf	LOCAL(near_one)
+	shlr	r2	! {0.23} systemic error of linear approximation keeps y1 < 1
+	dmulu.l	r2,r6
+	cmp/hs	r5,r4
+	add	r7,r1	! {1.28}
+	bt	LOCAL(near_four)
+	shlr2	r1	! {1.26}
+	sts	mach,r0	! {0.15} x*y0^3 == {0.16} 0.5*x*y0^3
+	shlr2	r1	! {1.24}
+	shlr8	r1	! {1.16}
+	sett		! compensate for truncation of subtrahend, keep y1 < 1
+	subc	r0,r1   ! {0.16} y1;  max error about 3.5 ulp
+	swap.w	r1,r0
+	dmulu.l	r0,r4	! { 1.30 }
+	mulu.w	r1,r1
+	sts	mach,r2
+	shlr2	r0
+	sts	macl,r1
+	add	r2,r0
+	mov.l	LOCAL(xff000000),r6
+	add	r2,r0
+	dmulu.l	r1,r2
+	add	#127,r0
+	add	r6,r3	! precompensation for adding leading 1
+	sts	mach,r1
+	shlr	r3
+	mov.l	@r15+,r7
+	sub	r1,r0	! {0.31} max error about 50 ulp (+127)
+	mov.l	@r15+,r6
+	shlr8	r0	! {0.23} max error about 0.7 ulp
+	rts
+	add	r3,r0
+	
+LOCAL(r5_large):
+	mov	r5,r3
+	mov	#-31,r2
+	cmp/ge	r2,r0
+	shll8	r5
+	bf	LOCAL(ret_abs_r3)
+	rotcr	r5
+	tst	r1,r4
+	shll8	r4
+	bt/s	LOCAL(denorm_r4)
+	cmp/hi	r3,r1
+	dmulu.l	r5,r5
+	bf	LOCAL(inf_nan)
+	rotcr	r4
+LOCAL(denorm_r4_done):
+	shld	r0,r4
+	sts	mach,r5
+	dmulu.l	r4,r4
+	mov.l	r6,@-r15
+	mov	#20,r6
+	bra	LOCAL(add_frac)
+	sts	mach,r4
+
+LOCAL(near_one):
+	bra	LOCAL(assemble_sqrt)
+	mov	#0,r0
+LOCAL(near_four):
+	! exact round-to-nearest would add 255.  We add 256 for speed & compactness.
+	mov	r4,r0
+	shlr8	r0
+	add	#1,r0
+	tst	r0,r0
+	addc	r0,r3	! might generate infinity.
+LOCAL(assemble_sqrt):
+	mov.l	@r15+,r7
+	shlr	r3
+	mov.l	@r15+,r6
+	rts
+	add	r3,r0
+LOCAL(inf_nan):
+LOCAL(ret_abs_r3):
+	mov	r3,r0
+	rts
+	shlr	r0
+LOCAL(denorm_r5):
+	bf	LOCAL(inf_nan)
+	tst	r1,r4
+	bt	LOCAL(denorm_both)
+	dmulu.l	r4,r4
+	bra	LOCAL(denorm_r5_done)
+	shld	r0,r5
+LOCAL(denorm_r4):
+	bf	LOCAL(inf_nan)
+	tst	r1,r5
+	dmulu.l	r5,r5
+	bf	LOCAL(denorm_r4_done)
+LOCAL(denorm_both):	! normalize according to r3.
+	extu.w	r3,r2
+	mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r3,r2
+	mov	#-8,r2
+	bt	0f
+	tst	r1,r3
+	mov	#-16,r2
+	bt	0f
+	mov	#-24,r2
+0:
+	shld	r2,r3
+	mov.l	r7,@-r15
+#ifdef __pic__
+	add	r0,r3
+	mova	 LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r3),r0
+	add	#32,r2
+	sub	r0,r2
+	shld	r2,r4
+	mov	r2,r7
+	dmulu.l	r4,r4
+	sts.l	pr,@-r15
+	mov	#1,r3
+	bsr	LOCAL(denorm_r5_done)
+	shld	r2,r5
+	mov.l	LOCAL(x01000000),r1
+	neg	r7,r2
+	lds.l	@r15+,pr
+	tst	r1,r0
+	mov.l	@r15+,r7
+	bt	0f
+	add	#1,r2
+	sub	r1,r0
+0:
+	rts
+	shld	r2,r0
+
+LOCAL(m25):
+	.word	-25
+LOCAL(x1380):
+	.word	0x1380
+LOCAL(xf80):
+	.word	0xf80
+	.balign	4
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x40000080):
+	.long	0x40000080
+LOCAL(xfffe0000):
+	.long	0xfffe0000
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+
+/*
+double err(double x)
+{
+  return (x < 2. ? 1.25 - x/4. : 1. - x/8.) - 1./sqrt(x);
+}
+
+int
+main ()
+{
+  int i = 0;
+  double x, s, v;
+  double lx, hx;
+
+  s = 1./32.;
+  for (x = 1.; x < 4; x += s, i++)
+    {
+      lx = x;
+      hx = x + s - 1. / (1 << 30);
+      v = 0.5 * (err (lx) + err (hx));
+      printf ("%s% 4d%c",
+              (i & 7) == 0 ? "\t.byte\t" : "",
+              (int)(v * 4096 + 0.5) - 128,
+              (i & 7) == 7 ? '\n' : ',');
+    }
+  return 0;
+} */
+
+	.balign	4
+LOCAL(tab):
+	.byte	-113, -84, -57, -33, -11,   8,  26,  41
+	.byte	  55,  67,  78,  87,  94, 101, 106, 110
+	.byte	 113, 115, 115, 115, 114, 112, 109, 106
+	.byte	 101,  96,  91,  84,  77,  69,  61,  52
+	.byte	  51,  57,  63,  68,  72,  77,  80,  84
+	.byte	  87,  89,  91,  93,  95,  96,  97,  97
+	.byte	  97,  97,  97,  96,  95,  94,  93,  91
+	.byte	  89,  87,  84,  82,  79,  76,  72,  69
+	.byte	  65,  61,  57,  53,  49,  44,  39,  34
+	.byte	  29,  24,  19,  13,   8,   2,  -4, -10
+	.byte	 -17, -23, -29, -36, -43, -50, -57, -64
+	.byte	 -71, -78, -85, -93,-101,-108,-116,-124
+	ENDFUNC(GLOBAL(hypotf))
+#endif /* L_hypotf */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_ANY__ */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/lib1funcs.asm gcc-4.5.0/gcc/config/sh/lib1funcs.asm
--- gcc-4.5.0/gcc/config/sh/lib1funcs.asm	2009-04-18 03:50:40.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/lib1funcs.asm	2010-07-14 11:58:33.000000000 +0530
@@ -3931,3 +3931,6 @@ GLOBAL(udiv_qrnnd_16):
 	ENDFUNC(GLOBAL(udiv_qrnnd_16))
 #endif /* !__SHMEDIA__ */
 #endif /* L_udiv_qrnnd_16 */
+
+#include "ieee-754-sf.S"
+#include "ieee-754-df.S"
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/lib1funcs.h gcc-4.5.0/gcc/config/sh/lib1funcs.h
--- gcc-4.5.0/gcc/config/sh/lib1funcs.h	2009-07-06 19:25:09.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/lib1funcs.h	2010-07-14 11:58:33.000000000 +0530
@@ -64,13 +64,151 @@ see the files COPYING3 and COPYING.RUNTI
 #endif /* !__LITTLE_ENDIAN__ */
 
 #ifdef __sh1__
+/* branch with two-argument delay slot insn */
 #define SL(branch, dest, in_slot, in_slot_arg2) \
 	in_slot, in_slot_arg2; branch dest
+/* branch with one-argument delay slot insn */
 #define SL1(branch, dest, in_slot) \
 	in_slot; branch dest
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+        branch dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg2) in_slot, in_slot_arg2
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+	branch .+6; bra .+6; cmp2, cmp2arg2; cmp1, cmp1arg2
+#define DMULU_SAVE \
+ mov.l r10,@-r15; \
+ mov.l r11,@-r15; \
+ mov.l r12,@-r15; \
+ mov.l r13,@-r15
+#define DMULUL(m1, m2, rl) \
+ swap.w m1,r12; \
+ mulu.w r12,m2; \
+ swap.w m2,r13; \
+ sts macl,r10; \
+ mulu.w r13,m1; \
+ clrt; \
+ sts macl,r11; \
+ mulu.w r12,r13; \
+ addc r11,r10; \
+ sts macl,r12; \
+ mulu.w m1,m2; \
+ movt r11; \
+ sts macl,rl; \
+ mov r10,r13; \
+ shll16 r13; \
+ addc r13,rl; \
+ xtrct r11,r10; \
+ addc r10,r12 \
+/* N.B. the carry is cleared here.  */
+#define DMULUH(rh) mov r12,rh
+#define DMULU_RESTORE \
+ mov.l @r15+,r13; \
+ mov.l @r15+,r12; \
+ mov.l @r15+,r11; \
+ mov.l @r15+,r10
 #else /* ! __sh1__ */
+/* branch with two-argument delay slot insn */
 #define SL(branch, dest, in_slot, in_slot_arg2) \
-	branch##.s dest; in_slot, in_slot_arg2
+	branch##/s dest; in_slot, in_slot_arg2
+/* branch with one-argument delay slot insn */
 #define SL1(branch, dest, in_slot) \
 	branch##/s dest; in_slot
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+        branch##/s dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg)
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+	branch##/s .+6; cmp1, cmp1arg2; cmp2, cmp2arg2
+#define DMULU_SAVE
+#define DMULUL(m1, m2, rl) dmulu.l m1,m2; sts macl,rl
+#define DMULUH(rh) sts mach,rh
+#define DMULU_RESTORE
 #endif /* !__sh1__ */
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+/* don't #define DYN_SHIFT */
+  #define SHLL4(REG)	\
+	shll2	REG;	\
+	shll2	REG
+
+  #define SHLR4(REG)	\
+	shlr2	REG;	\
+	shlr2	REG
+
+  #define SHLL6(REG)	\
+	shll2	REG;	\
+	shll2	REG;	\
+	shll2	REG
+
+  #define SHLR6(REG)	\
+	shlr2	REG;	\
+	shlr2	REG;	\
+	shlr2	REG
+
+  #define SHLL12(REG)	\
+	shll8	REG;	\
+	SHLL4 (REG)
+
+  #define SHLR12(REG)	\
+	shlr8	REG;	\
+	SHLR4 (REG)
+
+  #define SHLR19(REG)	\
+	shlr16	REG;	\
+	shlr2	REG;	\
+	shlr	REG
+
+  #define SHLL23(REG)	\
+	shll16	REG;	\
+	shlr	REG;	\
+	shll8	REG
+
+  #define SHLR24(REG)	\
+	shlr16	REG;	\
+	shlr8	REG
+
+  #define SHLR21(REG)	\
+	shlr16	REG;	\
+	shll2	REG;	\
+	add	REG,REG;\
+	shlr8	REG
+
+  #define SHLL21(REG)	\
+	shll16	REG;	\
+	SHLL4 (REG);	\
+	add	REG,REG
+
+  #define SHLR11(REG)	\
+	shlr8	REG;	\
+	shlr2	REG;	\
+	shlr	REG
+
+  #define SHLR22(REG)	\
+	shlr16	REG;	\
+	shll2	REG;	\
+	shlr8	REG
+
+  #define SHLR23(REG)	\
+	shlr16	REG;	\
+	add	REG,REG;\
+	shlr8	REG
+
+  #define SHLR20(REG)	\
+	shlr16	REG;	\
+	SHLR4 (REG)
+
+  #define SHLL20(REG)	\
+	shll16	REG;	\
+	SHLL4 (REG)
+#define SHLD_COUNT(N,COUNT)
+#define SHLRN(N,COUNT,REG) SHLR##N(REG)
+#define SHLLN(N,COUNT,REG) SHLL##N(REG)
+#else
+#define SHLD_COUNT(N,COUNT) mov #N,COUNT
+#define SHLRN(N,COUNT,REG) shld COUNT,REG
+#define SHLLN(N,COUNT,REG) shld COUNT,REG
+#define DYN_SHIFT 1
+#endif
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/predicates.md gcc-4.5.0/gcc/config/sh/predicates.md
--- gcc-4.5.0/gcc/config/sh/predicates.md	2009-06-25 11:46:11.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/predicates.md	2010-07-14 11:58:33.000000000 +0530
@@ -699,6 +699,33 @@
 (define_predicate "symbol_ref_operand"
   (match_code "symbol_ref"))
 
+(define_special_predicate "soft_fp_comparison_operand"
+  (match_code "subreg,reg")
+{
+  switch (GET_MODE (op))
+    {
+    default:
+      return 0;
+    case CC_FP_NEmode: case CC_FP_GTmode: case CC_FP_UNLTmode:
+      break;
+    }
+  return register_operand (op, mode);
+})
+
+(define_predicate "soft_fp_comparison_operator"
+  (match_code "eq, unle, ge")
+{
+  switch (GET_CODE (op))
+    {
+    default:
+      return 0;
+    case EQ:  mode = CC_FP_NEmode;    break;
+    case UNLE:        mode = CC_FP_GTmode;    break;
+    case GE:  mode = CC_FP_UNLTmode;  break;
+    }
+  return register_operand (XEXP (op, 0), mode);
+})
+
 ;; Same as target_reg_operand, except that label_refs and symbol_refs
 ;; are accepted before reload.
 
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh.c gcc-4.5.0/gcc/config/sh/sh.c
--- gcc-4.5.0/gcc/config/sh/sh.c	2010-03-01 04:53:50.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh.c	2010-07-14 11:58:33.000000000 +0530
@@ -110,6 +110,12 @@ static short cached_can_issue_more;
 /* Unique number for UNSPEC_BBR pattern.  */
 static unsigned int unspec_bbr_uid = 1;
 
+/* Saved operands from the last compare to use when we generate an scc
+   or bcc insn.  */
+
+rtx sh_compare_op0;
+rtx sh_compare_op1;
+
 /* Provides the class number of the smallest class containing
    reg number.  */
 
@@ -279,6 +285,7 @@ static int sh_arg_partial_bytes (CUMULAT
 			         tree, bool);
 static bool sh_scalar_mode_supported_p (enum machine_mode);
 static int sh_dwarf_calling_convention (const_tree);
+static void sh_expand_float_condop (rtx operands[4], rtx (*[2]) (rtx));
 static void sh_encode_section_info (tree, rtx, int);
 static int sh2a_function_vector_p (tree);
 static void sh_trampoline_init (rtx, tree, rtx);
@@ -536,6 +543,9 @@ static const struct attribute_spec sh_at
 /* Machine-specific symbol_ref flags.  */
 #define SYMBOL_FLAG_FUNCVEC_FUNCTION    (SYMBOL_FLAG_MACH_DEP << 0)
 
+#undef TARGET_MATCH_ADJUST
+#define TARGET_MATCH_ADJUST sh_match_adjust
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 \f
 /* Implement TARGET_HANDLE_OPTION.  */
@@ -2146,6 +2156,78 @@ sh_emit_cheap_store_flag (enum machine_m
   return gen_rtx_fmt_ee (code, VOIDmode, target, const0_rtx);
 }
 
+static rtx
+sh_soft_fp_cmp (int code, enum machine_mode op_mode, rtx op0, rtx op1)
+{
+  const char *name = NULL;
+  rtx (*fun) (rtx, rtx), addr, tmp, first, last, equiv;
+  int df = op_mode == DFmode;
+  enum machine_mode mode = CODE_FOR_nothing; /* shut up warning.  */
+
+  switch (code)
+    {
+    case EQ:
+      if (!flag_finite_math_only)
+	{
+	  name = df ? "__nedf2" : "__nesf2";
+	  fun = df ? gen_cmpnedf_i1 : gen_cmpnesf_i1;
+	  mode = CC_FP_NEmode;
+	  break;
+	} /* Fall through.  */
+    case UNEQ:
+      fun = gen_cmpuneq_sdf;
+      break;
+    case UNLE:
+      if (flag_finite_math_only && !df)
+	{
+	  fun = gen_cmplesf_i1_finite;
+	  break;
+	}
+      name = df ? "__gtdf2t" : "__gtsf2t";
+      fun = df ? gen_cmpgtdf_i1 : gen_cmpgtsf_i1;
+      mode = CC_FP_GTmode;
+      break;
+    case GE:
+      if (flag_finite_math_only && !df)
+	{
+	  tmp = op0; op0 = op1; op1 = tmp;
+	  fun = gen_cmplesf_i1_finite;
+	  break;
+	}
+      name = df ? "__gedf2f" : "__gesf2f";
+      fun = df ? gen_cmpunltdf_i1 : gen_cmpunltsf_i1;
+      mode = CC_FP_UNLTmode;
+      break;
+    case UNORDERED:
+      fun = gen_cmpun_sdf;
+      break;
+    default: gcc_unreachable ();
+    }
+
+  if (!name)
+    return fun (force_reg (op_mode, op0), force_reg (op_mode, op1));
+
+  tmp = gen_reg_rtx (mode);
+  addr = gen_reg_rtx (Pmode);
+  function_symbol (addr, name, SFUNC_STATIC);
+  first = emit_move_insn (gen_rtx_REG (op_mode, R4_REG), op0);
+  emit_move_insn (gen_rtx_REG (op_mode, R5_REG + df), op1);
+  last = emit_insn (fun (tmp, addr));
+  equiv = gen_rtx_fmt_ee (COMPARE, mode, op0, op1);
+  REG_NOTES (last) = gen_rtx_EXPR_LIST (REG_EQUAL, equiv, REG_NOTES (last));
+  /* Wrap the sequence in REG_LIBCALL / REG_RETVAL notes so that loop
+     invariant code motion can move it.  */
+/*
+  REG_NOTES (first) = gen_rtx_INSN_LIST (REG_LIBCALL, last, REG_NOTES (first));
+  REG_NOTES (last) = gen_rtx_INSN_LIST (REG_RETVAL, first, REG_NOTES (last));
+*/
+  /* Use fpcmp_i1 rather than cmpeqsi_t, so that the optimizers can grok
+     the computation.  */
+  return gen_rtx_SET (VOIDmode,
+		      gen_rtx_REG (SImode, T_REG),
+		      gen_rtx_fmt_ee (code, SImode, tmp, CONST0_RTX (mode)));
+}
+
 /* Called from the md file, set up the operands of a compare instruction.  */
 
 void
@@ -8601,6 +8683,57 @@ sh_fix_range (const char *const_str)
       str = comma + 1;
     }
 }
+
+/* Expand an sfunc operation taking NARGS MODE arguments, using generator
+   function FUN, which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using EQUIV.  */
+static void
+expand_sfunc_op (int nargs, enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		 const char *name, rtx equiv, rtx *operands)
+{
+  int next_reg = FIRST_PARM_REG, i;
+  rtx addr, first = NULL_RTX, last, insn;
+
+  addr = gen_reg_rtx (Pmode);
+  function_symbol (addr, name, SFUNC_FREQUENT);
+  for ( i = 1; i <= nargs; i++)
+    {
+      insn = emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
+      if (!first)
+	first = insn;
+      next_reg += GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+    }
+  last = emit_insn ((*fun) (operands[0], addr));
+  REG_NOTES (last) = gen_rtx_EXPR_LIST (REG_EQUAL, equiv, REG_NOTES (last));
+  /* Wrap the sequence in REG_LIBCALL / REG_RETVAL notes so that loop
+     invariant code motion can move it.  */
+/*
+  REG_NOTES (first) = gen_rtx_INSN_LIST (REG_LIBCALL, last, REG_NOTES (first));
+  REG_NOTES (last) = gen_rtx_INSN_LIST (REG_RETVAL, first, REG_NOTES (last));
+*/
+}
+
+/* Expand an sfunc unary operation taking an MODE argument, using generator
+   function FUN, which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using CODE.  */
+void
+expand_sfunc_unop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		   const char *name, enum rtx_code code, rtx *operands)
+{
+  rtx equiv = gen_rtx_fmt_e (code, GET_MODE (operands[0]), operands[1]);
+  expand_sfunc_op (1, mode, fun, name, equiv, operands);
+}
+
+/* Expand an sfunc binary operation in MODE, using generator function FUN,
+   which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using CODE.  */
+void
+expand_sfunc_binop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		    const char *name, enum rtx_code code, rtx *operands)
+{
+  rtx equiv = gen_rtx_fmt_ee (code, mode, operands[1], operands[2]);
+  expand_sfunc_op (2, mode, fun, name, equiv, operands);
+}
 \f
 /* Insert any deferred function attributes from earlier pragmas.  */
 static void
@@ -11429,11 +11562,10 @@ function_symbol (rtx target, const char 
 {
   rtx sym;
 
-  /* If this is not an ordinary function, the name usually comes from a
-     string literal or an sprintf buffer.  Make sure we use the same
+  /* The name usually comes from a string literal or an sprintf buffer.
+     Make sure we use the same
      string consistently, so that cse will be able to unify address loads.  */
-  if (kind != FUNCTION_ORDINARY)
-    name = IDENTIFIER_POINTER (get_identifier (name));
+  name = IDENTIFIER_POINTER (get_identifier (name));
   sym = gen_rtx_SYMBOL_REF (Pmode, name);
   SYMBOL_REF_FLAGS (sym) = SYMBOL_FLAG_FUNCTION;
   if (flag_pic)
@@ -11441,6 +11573,10 @@ function_symbol (rtx target, const char 
       {
       case FUNCTION_ORDINARY:
 	break;
+      case SFUNC_FREQUENT:
+	if (!optimize || optimize_size)
+	  break;
+	/* Fall through.  */
       case SFUNC_GOT:
 	{
 	  rtx reg = target ? target : gen_reg_rtx (Pmode);
@@ -11551,6 +11687,141 @@ sh_expand_t_scc (rtx operands[])
   return 1;
 }
 
+void
+sh_expand_float_cbranch (rtx operands[4])
+{
+  static rtx (*branches[]) (rtx) = { gen_branch_true, gen_branch_false };
+
+  sh_expand_float_condop (operands, branches);
+}
+
+void
+sh_expand_float_scc (rtx operands[4])
+{
+  static rtx (*movts[]) (rtx) = { gen_movt, gen_movnegt };
+
+  operands[3] = NULL_RTX;
+  sh_expand_float_condop (operands, movts);
+}
+
+/* The first element of USER is for positive logic, the second one for
+   negative logic.  */
+static void
+sh_expand_float_condop (rtx operands[4], rtx (*user[2]) (rtx))
+{
+  enum machine_mode mode = GET_MODE (operands[1]);
+  enum rtx_code comparison = GET_CODE (operands[0]);
+  int swap_operands = 0;
+
+  if (TARGET_SH1_SOFTFP_MODE (mode))
+    {
+      switch (comparison)
+	{
+	case NE:
+	  comparison = EQ;
+	  user++;
+	  break;
+	case LT:
+	  swap_operands = 1;	/* Fall through.  */
+	case GT:
+	  comparison = UNLE;
+	  user++;
+	  break;
+	case UNGT:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLT:
+	  comparison = GE;
+	  user++;
+	  break;
+	case UNGE:
+	  swap_operands = 1;
+	  comparison = UNLE;
+	  break;
+	case LE:
+	  swap_operands = 1;
+	  comparison = GE;	/* Fall through.  */
+	case EQ:
+	case UNEQ:
+	case GE:
+	case UNLE:
+	case UNORDERED:
+	  break;
+	case LTGT:
+	  comparison = UNEQ;
+	  user++;
+	  break;
+	case ORDERED:
+	  comparison = UNORDERED;
+	  user++;
+	  break;
+	  
+	default: gcc_unreachable ();
+	}
+    }
+  else /* SH2E .. SH4 Hardware floating point */
+    {
+      switch (comparison)
+	{
+	case NE:
+	  comparison = EQ;
+	  user++;
+	  break;
+	case LT:
+	  swap_operands = 1;	/* Fall through.  */
+	  comparison = GT;
+	case GT:
+	case EQ:
+	case LTGT:
+	case ORDERED:
+	  break;
+	case LE:
+	  if (flag_finite_math_only)
+	    {
+	      comparison = GT;
+	      user++;
+	      break;
+	    }
+	  swap_operands = 1;
+	  comparison = GE;	/* Fall through.  */
+	case GE:
+	  if (flag_finite_math_only)
+	    {
+	      swap_operands = 1;
+	      comparison = GT;
+	      user++;
+	      break;
+	    }
+	  break;
+	case UNGE:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLE:
+	  comparison = GT;
+	  user++;
+	  break;
+	case UNGT:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLT:
+	  comparison = GE;
+	  user++;
+	  break;
+	case UNEQ:
+	  comparison = LTGT;
+	  user++;
+	  break;
+	case UNORDERED:
+	  comparison = ORDERED;
+	  user++;
+	  break;
+
+	default: gcc_unreachable ();
+	}
+    }
+  sh_compare_op0 = operands[1+swap_operands];
+  sh_compare_op1 = operands[2-swap_operands];
+  sh_emit_compare_and_branch (&operands[3], comparison);
+  emit_jump_insn ((*user) (operands[3]));
+}
+
 /* INSN is an sfunc; return the rtx that describes the address used.  */
 static rtx
 extract_sfunc_addr (rtx insn)
@@ -12100,6 +12371,19 @@ sh_secondary_reload (bool in_p, rtx x, e
   return NO_REGS;
 }
 
+int
+sh_match_adjust (rtx x, int regno)
+{
+  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+     multiple hard register group of scalar integer registers, so that
+     for example (reg:DI 0) and (reg:SI 1) will be considered the same
+     register.  */
+  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+      && regno < FIRST_PSEUDO_REGISTER)
+    regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+  return regno;
+}
+
 enum sh_divide_strategy_e sh_div_strategy = SH_DIV_STRATEGY_DEFAULT;
 
 #include "gt-sh.h"
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh.h gcc-4.5.0/gcc/config/sh/sh.h
--- gcc-4.5.0/gcc/config/sh/sh.h	2009-12-01 03:08:46.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh.h	2010-07-14 16:13:59.000000000 +0530
@@ -175,6 +175,11 @@ do { \
 #define TARGET_FPU_DOUBLE \
   ((target_flags & MASK_SH4) != 0 || TARGET_SH2A_DOUBLE)
 
+#define TARGET_SH1_SOFTFP (TARGET_SH1 && !TARGET_FPU_DOUBLE)
+
+#define TARGET_SH1_SOFTFP_MODE(MODE) \
+  (TARGET_SH1_SOFTFP && (!TARGET_SH2E || (MODE) == DFmode))
+
 /* Nonzero if an FPU is available.  */
 #define TARGET_FPU_ANY (TARGET_SH2E || TARGET_FPU_DOUBLE)
 
@@ -321,6 +326,38 @@ do { \
 #define SUPPORT_ANY_SH5 \
   (SUPPORT_ANY_SH5_32MEDIA || SUPPORT_ANY_SH5_64MEDIA)
 
+/* Check if we have support for optimized software floating point using
+   dynamic shifts - then some function calls clobber fewer registers.  */
+#ifdef SUPPORT_SH3
+#define SUPPORT_SH3_OSFP 1
+#else
+#define SUPPORT_SH3_OSFP 0
+#endif
+
+#ifdef SUPPORT_SH3E
+#define SUPPORT_SH3E_OSFP 1
+#else
+#define SUPPORT_SH3E_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_NOFPU) || defined(SUPPORT_SH3_OSFP)
+#define SUPPORT_SH4_NOFPU_OSFP 1
+#else
+#define SUPPORT_SH4_NOFPU_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_SINGLE_ONLY) || defined (SUPPORT_SH3E_OSFP)
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 1
+#else
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 0
+#endif
+
+#define TARGET_OSFP (0 \
+ || (TARGET_SH3 && !TARGET_SH2E && SUPPORT_SH3_OSFP) \
+ || (TARGET_SH3E && SUPPORT_SH3E_OSFP) \
+ || (TARGET_HARD_SH4 && !TARGET_SH2E && SUPPORT_SH4_NOFPU_OSFP) \
+ || (TARGET_HARD_SH4 && TARGET_SH2E && SUPPORT_SH4_SINGLE_ONLY_OSFP))
+
 /* Reset all target-selection flags.  */
 #define MASK_ARCH (MASK_SH1 | MASK_SH2 | MASK_SH3 | MASK_SH_E | MASK_SH4 \
 		   | MASK_HARD_SH2A | MASK_HARD_SH2A_DOUBLE | MASK_SH4A \
@@ -2120,6 +2157,12 @@ struct sh_args {
 #define LIBGCC2_DOUBLE_TYPE_SIZE 64
 #endif
 
+#if defined(__SH2E__) || defined(__SH3E__) || defined( __SH4_SINGLE_ONLY__)
+#define LIBGCC2_DOUBLE_TYPE_SIZE 32
+#else
+#define LIBGCC2_DOUBLE_TYPE_SIZE 64
+#endif
+
 /* 'char' is signed by default.  */
 #define DEFAULT_SIGNED_CHAR  1
 
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh.md gcc-4.5.0/gcc/config/sh/sh.md
--- gcc-4.5.0/gcc/config/sh/sh.md	2009-11-22 04:21:07.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh.md	2010-07-14 18:47:04.000000000 +0530
@@ -106,6 +106,7 @@
   (DR0_REG	64)
   (DR2_REG	66)
   (DR4_REG	68)
+  (FR4_REG	68)
   (FR23_REG	87)
 
   (TR0_REG	128)
@@ -173,6 +174,15 @@
   (UNSPECV_WINDOW_END	10)
   (UNSPECV_CONST_END	11)
   (UNSPECV_EH_RETURN	12)
+  ;; NaN handling for software floating point:
+  ;; We require one bit specific for a precision to be set in all NaNs,
+  ;; so that we can test them with a not / tst sequence.
+  ;; ??? Ironically, this is the quiet bit for now, because that is the
+  ;; only bit set by __builtin_nan ("").
+  ;; ??? Should really use one bit lower and force it set by using
+  ;; a custom encoding function.
+  (SF_NAN_MASK         0x7fc00000)
+  (DF_NAN_MASK         0x7ff80000)
 ])
 
 ;; -------------------------------------------------------------------------
@@ -614,6 +624,14 @@
 	cmp/eq	%1,%0"
    [(set_attr "type" "mt_group")])
 
+(define_insn "fpcmp_i1"
+  [(set (reg:SI T_REG)
+	(match_operator:SI 1 "soft_fp_comparison_operator"
+	  [(match_operand 0 "soft_fp_comparison_operand" "r") (const_int 0)]))]
+  "TARGET_SH1_SOFTFP"
+  "tst	%0,%0"
+   [(set_attr "type" "mt_group")])
+
 (define_insn "cmpgtsi_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:SI 0 "arith_reg_operand" "r,r")
@@ -1153,9 +1171,9 @@
 
 (define_insn "*movsicc_t_false"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
-	(if_then_else (eq (reg:SI T_REG) (const_int 0))
-		      (match_operand:SI 1 "general_movsrc_operand" "r,I08")
-		      (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+	(if_then_else:SI (eq (reg:SI T_REG) (const_int 0))
+			 (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+			 (match_operand:SI 2 "arith_reg_operand" "0,0")))]
   "TARGET_PRETEND_CMOVE
    && (arith_reg_operand (operands[1], SImode)
        || (immediate_operand (operands[1], SImode)
@@ -1166,9 +1184,9 @@
 
 (define_insn "*movsicc_t_true"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
-	(if_then_else (ne (reg:SI T_REG) (const_int 0))
-		      (match_operand:SI 1 "general_movsrc_operand" "r,I08")
-		      (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+	(if_then_else:SI (ne (reg:SI T_REG) (const_int 0))
+			 (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+			 (match_operand:SI 2 "arith_reg_operand" "0,0")))]
   "TARGET_PRETEND_CMOVE
    && (arith_reg_operand (operands[1], SImode)
        || (immediate_operand (operands[1], SImode)
@@ -6833,6 +6851,50 @@ label:
 \f
 ;; Conditional branch insns
 
+(define_expand "cmpun_sdf"
+  [(unordered (match_operand 0 "" "") (match_operand 1 "" ""))]
+  ""
+  "
+{
+  HOST_WIDE_INT mask;
+  switch (GET_MODE (operands[0]))
+    {
+    case SFmode:
+      mask = SF_NAN_MASK;
+      break;
+    case DFmode:
+      mask = DF_NAN_MASK;
+      break;
+    default:
+      FAIL;
+    }
+  emit_insn (gen_cmpunsf_i1 (operands[0], operands[1],
+                            force_reg (SImode, GEN_INT (mask))));
+  DONE;
+}")
+
+(define_expand "cmpuneq_sdf"
+  [(uneq (match_operand 0 "" "") (match_operand 1 "" ""))]
+  ""
+  "
+{
+  HOST_WIDE_INT mask;
+  switch (GET_MODE (operands[0]))
+    {
+    case SFmode:
+      mask = SF_NAN_MASK;
+      break;
+    case DFmode:
+      mask = DF_NAN_MASK;
+      break;
+    default:
+      FAIL;
+    }
+  emit_insn (gen_cmpuneqsf_i1 (operands[0], operands[1],
+                              force_reg (SImode, GEN_INT (mask))));
+  DONE;
+}")
+
 (define_expand "cbranchint4_media"
   [(set (pc)
 	(if_then_else (match_operator 0 "shmedia_cbranch_comparison_operator"
@@ -9340,6 +9402,19 @@ mov.l\\t1f,r0\\n\\
 
 
 
+(define_expand "sunle"
+  [(set (match_operand:SI 0 "arith_reg_operand" "")
+	(match_dup 1))]
+  "TARGET_SH1_SOFTFP"
+  "
+{
+  if (! currently_expanding_to_rtl)
+    FAIL;
+  sh_emit_compare_and_branch (operands, UNLE);
+  emit_insn (gen_movt (operands[0]));
+  DONE;
+}")
+
 ;; sne moves the complement of the T reg to DEST like this:
 ;;      cmp/eq ...
 ;;      mov    #-1,temp
@@ -9750,7 +9825,7 @@ mov.l\\t1f,r0\\n\\
   [(set (match_operand:SF 0 "arith_reg_operand" "")
 	(plus:SF (match_operand:SF 1 "arith_reg_operand" "")
 		 (match_operand:SF 2 "arith_reg_operand" "")))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SH2E)
@@ -9758,6 +9833,12 @@ mov.l\\t1f,r0\\n\\
       expand_sf_binop (&gen_addsf3_i, operands);
       DONE;
     }
+  else if (TARGET_OSFP)
+    {
+      expand_sfunc_binop (SFmode, &gen_addsf3_i3, \"__addsf3\", PLUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*addsf3_media"
@@ -9856,6 +9937,22 @@ mov.l\\t1f,r0\\n\\
 }"
   [(set_attr "type" "fparith_media")])
 
+(define_insn "addsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(plus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R6_REG))
+   (clobber (reg:SI R7_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "addsf3_i"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(plus:SF (match_operand:SF 1 "fp_arith_reg_operand" "%0")
@@ -9870,7 +9967,7 @@ mov.l\\t1f,r0\\n\\
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
 	(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
 		  (match_operand:SF 2 "fp_arith_reg_operand" "")))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SH2E)
@@ -9878,6 +9975,12 @@ mov.l\\t1f,r0\\n\\
       expand_sf_binop (&gen_subsf3_i, operands);
       DONE;
     }
+  else if (TARGET_OSFP)
+    {
+      expand_sfunc_binop (SFmode, &gen_subsf3_i3, \"__subsf3\", MINUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*subsf3_media"
@@ -9888,6 +9991,23 @@ mov.l\\t1f,r0\\n\\
   "fsub.s	%1, %2, %0"
   [(set_attr "type" "fparith_media")])
 
+(define_insn "subsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(minus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R5_REG))
+   (clobber (reg:SI R6_REG))
+   (clobber (reg:SI R7_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "subsf3_i"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")
@@ -9900,11 +10020,32 @@ mov.l\\t1f,r0\\n\\
 
 (define_expand "mulsf3"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
-	(mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
-		 (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+        (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+                 (match_operand:SF 2 "fp_arith_reg_operand" "")))]
   "TARGET_SH2E || TARGET_SHMEDIA_FPU"
   "")
 
+;(define_expand "mulsf3"
+;  [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+;        (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+;                 (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+;  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
+;  "
+; { 
+;   if (TARGET_SH4 || TARGET_SH2A_SINGLE) 
+;     expand_sf_binop (&gen_mulsf3_i4, operands);
+;   else if (TARGET_SH2E)
+;     emit_insn (gen_mulsf3_ie (operands[0], operands[1], operands[2]));
+;  else if (TARGET_OSFP)
+;    { 
+;      expand_sfunc_binop (SFmode, &gen_mulsf3_i3, \"__mulsf3\", MULT,
+;                         operands);
+;      DONE;
+;    }
+;   if (! TARGET_SHMEDIA)
+;     DONE;
+; }")
+
 (define_insn "*mulsf3_media"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -9944,6 +10085,22 @@ mov.l\\t1f,r0\\n\\
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "single")])
 
+(define_insn "mulsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(mult:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "mac_media"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(plus:SF (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -10104,6 +10261,149 @@ mov.l\\t1f,r0\\n\\
   "ftrc	%1,%0"
   [(set_attr "type" "fp")])
 
+(define_insn "cmpnesf_i1"
+  [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+	(compare:CC_FP_NE (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtsf_i1"
+  [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+	(compare:CC_FP_GT (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltsf_i1"
+  [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+	(compare:CC_FP_UNLT (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqsf_i1_finite"
+  [(set (reg:SI T_REG)
+	(eq:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+	       (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+   (clobber (match_scratch:SI 2 "=0,1,?r"))]
+  "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+  "*
+{
+  if (which_alternative == 0)
+     output_asm_insn (\"cmp/eq\t%0,%1\;or\t%1,%2\;bt\t0f\", operands);
+  else if (which_alternative == 1)
+     output_asm_insn (\"cmp/eq\t%0,%1\;or\t%0,%2\;bt\t0f\", operands);
+  else
+    output_asm_insn (\"cmp/eq\t%0,%1\;mov\t%0,%2\;bt\t0f\;or\t%1,%2\",
+		     operands);
+  return \"add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+  [(set_attr "length" "10,10,12")])
+
+(define_insn "cmplesf_i1_finite"
+  [(set (reg:SI T_REG)
+	(le:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+	       (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+   (clobber (match_scratch:SI 2 "=0,1,r"))]
+  "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+  "*
+{
+  output_asm_insn (\"cmp/pz\t%0\", operands);
+  if (which_alternative == 2)
+    output_asm_insn (\"mov\t%0,%2\", operands);
+  if (TARGET_SH2)
+    output_asm_insn (\"bf/s\t0f\;cmp/hs\t%1,%0\;cmp/ge\t%0,%1\", operands);
+  else
+    output_asm_insn (\"bt\t1f\;bra\t0f\;cmp/hs\t%1,%0\\n1:\tcmp/ge\t%0,%1\",
+		     operands);
+  if (which_alternative == 1)
+    output_asm_insn (\"or\t%0,%2\", operands);
+  else
+    output_asm_insn (\"or\t%1,%2\", operands);
+  return \"bt\t0f\;add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+  [(set_attr "length" "18,18,20")])
+
+(define_insn "cmpunsf_i1"
+  [(set (reg:SI T_REG)
+	(unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
+		      (match_operand:SF 1 "arith_reg_operand" "r,r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+   (clobber (match_scratch:SI 3 "=0,&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+  [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqsf_i1"
+  [(set (reg:SI T_REG)
+	(uneq:SI (match_operand:SF 0 "arith_reg_operand" "r")
+		 (match_operand:SF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "*
+{
+  output_asm_insn (\"not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\", operands);
+  output_asm_insn (\"bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%0,%1\", operands);
+  output_asm_insn (\"mov\t%0,%3\;bt\t0f\;or\t%1,%3\", operands);
+  return \"add\t%3,%3\;tst\t%3,%3\\n0:\";
+}"
+  [(set_attr "length" "24")])
+
+(define_insn "movcc_fp_ne"
+  [(set (match_operand:CC_FP_NE 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_NE 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_gt"
+  [(set (match_operand:CC_FP_GT 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_GT 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_unlt"
+  [(set (match_operand:CC_FP_UNLT 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_UNLT 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
 (define_insn "cmpgtsf_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
@@ -10131,6 +10431,22 @@ mov.l\\t1f,r0\\n\\
   "* return output_ieee_ccmpeq (insn, operands);"
   [(set_attr "length" "4")])
 
+(define_insn "*cmpltgtsf_t"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+  "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")])
+
+(define_insn "*cmporderedsf_t"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+  "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")])
+
 
 (define_insn "cmpgtsf_t_i4"
   [(set (reg:SI T_REG)
@@ -10163,6 +10479,26 @@ mov.l\\t1f,r0\\n\\
   [(set_attr "length" "4")
    (set_attr "fp_mode" "single")])
 
+(define_insn "*cmpltgtsf_t_4"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "single")])
+
+(define_insn "*cmporderedsf_t_4"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "single")])
+
 (define_insn "cmpeqsf_media"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(eq:SI (match_operand:SF 1 "fp_arith_reg_operand" "f")
@@ -10411,11 +10747,39 @@ mov.l\\t1f,r0\\n\\
   [(set_attr "type" "fmove")
    (set_attr "fp_mode" "single")])
 
+(define_expand "abssc2"
+  [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+	(abs:SF (match_operand:SC 1 "fp_arith_reg_operand" "")))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "
+{
+  expand_sfunc_unop (SCmode, &gen_abssc2_i3, \"__hypotf\", ABS, operands);
+  DONE;
+}")
+
+(define_insn "abssc2_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(abs:SF (reg:SC R4_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI R5_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "adddf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(plus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
 		 (match_operand:DF 2 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_FPU_DOUBLE || TARGET_SH3"
   "
 {
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10423,6 +10787,12 @@ mov.l\\t1f,r0\\n\\
       expand_df_binop (&gen_adddf3_i, operands);
       DONE;
     }
+  else if (TARGET_SH3)
+    {
+      expand_sfunc_binop (DFmode, &gen_adddf3_i3_wrap, \"__adddf3\", PLUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*adddf3_media"
@@ -10443,6 +10813,30 @@ mov.l\\t1f,r0\\n\\
   [(set_attr "type" "dfp_arith")
    (set_attr "fp_mode" "double")])
 
+(define_expand "adddf3_i3_wrap"
+  [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+  "TARGET_SH3"
+  "
+{
+  emit_insn (gen_adddf3_i3 (operands[1]));
+  emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+  DONE;
+}")
+
+(define_insn "adddf3_i3"
+  [(set (reg:DF R0_REG)
+	(plus:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:DI R2_REG))
+   (clobber (reg:DF R4_REG))
+   (clobber (reg:DF R6_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH3"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "subdf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(minus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10479,7 +10873,7 @@ mov.l\\t1f,r0\\n\\
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(mult:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
 		 (match_operand:DF 2 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_FPU_DOUBLE || TARGET_SH3"
   "
 {
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10487,6 +10881,12 @@ mov.l\\t1f,r0\\n\\
       expand_df_binop (&gen_muldf3_i, operands);
       DONE;
     }
+  else if (TARGET_SH3)
+    {
+      expand_sfunc_binop (DFmode, &gen_muldf3_i3_wrap, \"__muldf3\", MULT,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*muldf3_media"
@@ -10507,6 +10907,32 @@ mov.l\\t1f,r0\\n\\
   [(set_attr "type" "dfp_mul")
    (set_attr "fp_mode" "double")])
 
+(define_expand "muldf3_i3_wrap"
+  [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+  "TARGET_SH3"
+  "
+{
+  emit_insn (gen_muldf3_i3 (operands[1]));
+  emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+  DONE;
+}")
+
+(define_insn "muldf3_i3"
+  [(set (reg:DF R0_REG)
+	(mult:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:DI R2_REG))
+   (clobber (reg:DF R4_REG))
+   (clobber (reg:DF R6_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH3"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "divdf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(div:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10636,6 +11062,73 @@ mov.l\\t1f,r0\\n\\
 ;; 	      (use (match_dup 2))])
 ;;    (set (match_dup 0) (reg:SI FPUL_REG))])
 
+(define_insn "cmpnedf_i1"
+  [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+	(compare:CC_FP_NE (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtdf_i1"
+  [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+	(compare:CC_FP_GT (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltdf_i1"
+  [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+	(compare:CC_FP_UNLT (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqdf_i1_finite"
+  [(set (reg:SI T_REG)
+	(eq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+	       (match_operand:DF 1 "arith_reg_operand" "r")))
+   (clobber (match_scratch:SI 2 "=&r"))]
+  "TARGET_SH1_SOFTFP && flag_finite_math_only"
+  "cmp/eq\t%R0,%R1\;mov\t%S0,%2\;bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;or\t%S1,%2\;add\t%2,%2\;or\t%R0,%2\;tst\t%2,%2\\n0:"
+  [(set_attr "length" "18")])
+
+(define_insn "cmpundf_i1"
+  [(set (reg:SI T_REG)
+	(unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
+		      (match_operand:DF 1 "arith_reg_operand" "r,r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+   (clobber (match_scratch:SI 3 "=0,&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+  [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqdf_i1"
+  [(set (reg:SI T_REG)
+	(uneq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+		 (match_operand:DF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
+  "TARGET_SH1_SOFTFP"
+  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%R0,%R1\; bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;mov\t%S0,%3\;or\t%S1,%3\;add\t%3,%3\;or\t%R0,%3\;tst\t%3,%3\\n0:"
+  [(set_attr "length" "30")])
+
 (define_insn "cmpgtdf_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:DF 0 "arith_reg_operand" "f")
@@ -10667,6 +11160,26 @@ mov.l\\t1f,r0\\n\\
   [(set_attr "length" "4")
    (set_attr "fp_mode" "double")])
 
+(define_insn "*cmpltgtdf_t"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_DOUBLE"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "double")])
+
+(define_insn "*cmpordereddf_t_4"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "double")])
+
 (define_insn "cmpeqdf_media"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(eq:SI (match_operand:DF 1 "fp_arith_reg_operand" "f")
@@ -10835,16 +11348,82 @@ mov.l\\t1f,r0\\n\\
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "double")])
 
+;; ??? In order to use this efficiently, we'd have to have an extra
+;; register class for r0 and r1 - and that would cause repercussions in
+;; register allocation elsewhere.  So just say we clobber r0 / r1, and
+;; that we can use an arbitrary target.  */
+(define_insn_and_split "extendsfdf2_i1"
+  [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+	(float_extend:DF (reg:SF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (reg:DF R0_REG))]
+  "emit_insn (gen_extendsfdf2_i1_r0 (operands[1]));"
+  [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i1_r0"
+  [(set (reg:DF R0_REG) (float_extend:DF (reg:SF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn_and_split "extendsfdf2_i2e"
+  [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+	(float_extend:DF (reg:SF FR4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI FPUL_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (reg:DF R0_REG))]
+  "emit_insn (gen_extendsfdf2_i2e_r0 (operands[1]));"
+  [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i2e_r0"
+  [(set (reg:DF R0_REG) (float_extend:DF (reg:SF FR4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI FPUL_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "truncdfsf2"
   [(set (match_operand:SF 0 "fpul_operand" "")
-	(float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
-  "
-{
+        (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
+  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU" 
+  "                      
+{                     
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
     {
       emit_df_insn (gen_truncdfsf2_i4 (operands[0], operands[1],
-				       get_fpscr_rtx ()));
+                                       get_fpscr_rtx ()));
       DONE;
     }
 }")
@@ -10864,6 +11443,23 @@ mov.l\\t1f,r0\\n\\
   "fcnvds  %1,%0"
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "double")])
+
+(define_insn "truncdfsf2_i2e"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=w")
+	(float_truncate:SF (reg:DF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI FPUL_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 \f
 ;; Bit field extract patterns.  These give better code for packed bitfields,
 ;; because they allow auto-increment addresses to be generated.
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh-modes.def gcc-4.5.0/gcc/config/sh/sh-modes.def
--- gcc-4.5.0/gcc/config/sh/sh-modes.def	2007-08-02 16:19:31.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh-modes.def	2010-07-14 11:58:33.000000000 +0530
@@ -22,6 +22,11 @@ PARTIAL_INT_MODE (SI);
 /* PDI mode is used to represent a function address in a target register.  */
 PARTIAL_INT_MODE (DI);
 
+/* For software floating point comparisons.  */
+CC_MODE (CC_FP_NE);
+CC_MODE (CC_FP_GT);
+CC_MODE (CC_FP_UNLT);
+
 /* Vector modes.  */
 VECTOR_MODE  (INT, QI, 2);    /*                 V2QI */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh.opt gcc-4.5.0/gcc/config/sh/sh.opt
--- gcc-4.5.0/gcc/config/sh/sh.opt	2009-07-20 13:07:37.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh.opt	2010-07-14 11:58:33.000000000 +0530
@@ -21,7 +21,7 @@
 ;; Used for various architecture options.
 Mask(SH_E)
 
-;; Set if the default precision of th FPU is single.
+;; Set if the default precision of the FPU is single.
 Mask(FPU_SINGLE)
 
 ;; Set if we should generate code using type 2A insns.
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sh-protos.h gcc-4.5.0/gcc/config/sh/sh-protos.h
--- gcc-4.5.0/gcc/config/sh/sh-protos.h	2009-12-01 03:08:46.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sh-protos.h	2010-07-14 11:58:33.000000000 +0530
@@ -25,8 +25,13 @@ along with GCC; see the file COPYING3.  
 #define GCC_SH_PROTOS_H
 
 enum sh_function_kind {
-  /* A function with normal C ABI  */
+  /* A function with normal C ABI, or an SH1..SH4 sfunc that may resolved via
+     a PLT.  */
   FUNCTION_ORDINARY,
+  /* A function that is a bit large to put it in every calling dso, but that's
+     typically used often enough so that calling via GOT makes sense for
+     speed.  */
+  SFUNC_FREQUENT,
   /* A special function that guarantees that some otherwise call-clobbered
      registers are not clobbered.  These can't go through the SH5 resolver,
      because it only saves argument passing registers.  */
@@ -116,6 +121,10 @@ extern void expand_sf_binop (rtx (*)(rtx
 extern void expand_df_unop (rtx (*)(rtx, rtx, rtx), rtx *);
 extern void expand_df_binop (rtx (*)(rtx, rtx, rtx, rtx), rtx *);
 extern void expand_fp_branch (rtx (*)(void), rtx (*)(void));
+extern void expand_sfunc_unop (enum machine_mode, rtx (*) (rtx, rtx),
+			       const char *, enum rtx_code code, rtx *);
+extern void expand_sfunc_binop (enum machine_mode, rtx (*) (rtx, rtx),
+				const char *, enum rtx_code code, rtx *);
 extern int sh_insn_length_adjustment (rtx);
 extern int sh_can_redirect_branch (rtx, rtx);
 extern void sh_expand_unop_v2sf (enum rtx_code, rtx, rtx);
@@ -133,6 +142,8 @@ extern struct rtx_def *get_fpscr_rtx (vo
 extern int sh_media_register_for_return (void);
 extern void sh_expand_prologue (void);
 extern void sh_expand_epilogue (bool);
+extern void sh_expand_float_cbranch (rtx operands[4]);
+extern void sh_expand_float_scc (rtx operands[4]);
 extern int sh_need_epilogue (void);
 extern void sh_set_return_address (rtx, rtx);
 extern int initial_elimination_offset (int, int);
@@ -176,6 +187,7 @@ struct secondary_reload_info;
 extern enum reg_class sh_secondary_reload (bool, rtx, enum reg_class,
 					   enum machine_mode,
 					   struct secondary_reload_info *);
+extern int sh_match_adjust (rtx, int);
 extern int sh2a_get_function_vector_number (rtx);
 extern int sh2a_is_function_vector_call (rtx);
 extern void sh_fix_range (const char *);
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/t-sh gcc-4.5.0/gcc/config/sh/t-sh
--- gcc-4.5.0/gcc/config/sh/t-sh	2009-08-23 03:13:07.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/t-sh	2010-07-14 17:23:41.000000000 +0530
@@ -26,6 +26,10 @@ LIB1ASMSRC = sh/lib1funcs.asm
 LIB1ASMFUNCS = _ashiftrt _ashiftrt_n _ashiftlt _lshiftrt _movmem \
   _movmem_i4 _mulsi3 _sdivsi3 _sdivsi3_i4 _udivsi3 _udivsi3_i4 _set_fpscr \
   _div_table _udiv_qrnnd_16 \
+  _nesf2 _nedf2 _gtsf2t _gtdf2t _gesf2f _gedf2f \
+  _add_sub_sf3 _mulsf3 _hypotf _muldf3 _add_sub_df3 _divdf3 \
+  _fixunssfsi _fixsfsi _fixunsdfsi _fixdfsi _floatunssisf _floatsisf \
+  _floatunssidf _floatsidf \
   $(LIB1ASMFUNCS_CACHE)
 LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_invalidate_array
 
@@ -131,17 +135,17 @@ OPT_EXTRA_PARTS= libgcc-Os-4-200.a libgc
 EXTRA_MULTILIB_PARTS= $(IC_EXTRA_PARTS) $(OPT_EXTRA_PARTS)
 
 $(T)ic_invalidate_array_4-100.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4-100.a: $(T)ic_invalidate_array_4-100.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-100.a $(T)ic_invalidate_array_4-100.o
 
 $(T)ic_invalidate_array_4-200.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4-200.a: $(T)ic_invalidate_array_4-200.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-200.a $(T)ic_invalidate_array_4-200.o
 
 $(T)ic_invalidate_array_4a.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4a.a: $(T)ic_invalidate_array_4a.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4a.a $(T)ic_invalidate_array_4a.o
 

[-- Attachment #3: sh_softfp.patch --]
[-- Type: application/octet-stream, Size: 5477 bytes --]

diff -up -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config.gcc gcc-4.5.0/gcc/config.gcc
--- gcc-4.5.0/gcc/config.gcc	2010-04-07 16:04:00.000000000 +0530
+++ gcc-4.5.0/gcc/config.gcc	2010-07-14 18:43:09.000000000 +0530
@@ -2167,7 +2167,7 @@ sh-*-symbianelf* | sh[12346l]*-*-symbian
   sh-*-linux* | sh[2346lbe]*-*-linux* | \
   sh-*-netbsdelf* | shl*-*-netbsdelf* | sh5-*-netbsd* | sh5l*-*-netbsd* | \
    sh64-*-netbsd* | sh64l*-*-netbsd*)
-	tmake_file="${tmake_file} sh/t-sh sh/t-elf"
+	tmake_file="${tmake_file} sh/t-sh sh/t-elf sh/t-fprules-softfp soft-fp/t-softfp"
 	if test x${with_endian} = x; then
 		case ${target} in
 		sh[1234]*be-*-* | sh[1234]*eb-*-*) with_endian=big ;;
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/sfp-machine.h gcc-4.5.0/gcc/config/sh/sfp-machine.h
--- gcc-4.5.0/gcc/config/sh/sfp-machine.h	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/sfp-machine.h	2010-07-14 18:45:09.000000000 +0530
@@ -0,0 +1,67 @@
+#define _FP_W_TYPE_SIZE        32
+#define _FP_W_TYPE             unsigned long
+#define _FP_WS_TYPE            signed long
+#define _FP_I_TYPE             long
+
+/* The type of the result of a floating point comparison.  This must
+   match `__libgcc_cmp_return__' in GCC for the target.  */
+typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
+#define CMPtype __gcc_CMPtype
+
+#define _FP_MUL_MEAT_S(R,X,Y)                          \
+  _FP_MUL_MEAT_1_wide(_FP_WFRACBITS_S,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_D(R,X,Y)                          \
+  _FP_MUL_MEAT_2_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_Q(R,X,Y)                          \
+  _FP_MUL_MEAT_4_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+
+#define _FP_DIV_MEAT_S(R,X,Y)  _FP_DIV_MEAT_1_loop(S,R,X,Y)
+#define _FP_DIV_MEAT_D(R,X,Y)  _FP_DIV_MEAT_2_udiv(D,R,X,Y)
+#define _FP_DIV_MEAT_Q(R,X,Y)  _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
+
+#define _FP_NANFRAC_S          ((_FP_QNANBIT_S << 1) - 1)
+#define _FP_NANFRAC_D          ((_FP_QNANBIT_D << 1) - 1), -1
+#define _FP_NANFRAC_Q          ((_FP_QNANBIT_Q << 1) - 1), -1, -1, -1
+#define _FP_NANSIGN_S          0
+#define _FP_NANSIGN_D          0
+#define _FP_NANSIGN_Q          0
+
+#define _FP_KEEPNANFRACP 1
+
+/* Someone please check this.  */
+#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP)                     \
+  do {                                                         \
+    if ((_FP_FRAC_HIGH_RAW_##fs(X) & _FP_QNANBIT_##fs)         \
+       && !(_FP_FRAC_HIGH_RAW_##fs(Y) & _FP_QNANBIT_##fs))     \
+      {                                                                \
+       R##_s = Y##_s;                                          \
+       _FP_FRAC_COPY_##wc(R,Y);                                \
+      }                                                                \
+    else                                                       \
+      {                                                                \
+       R##_s = X##_s;                                          \
+       _FP_FRAC_COPY_##wc(R,X);                                \
+      }                                                                \
+    R##_c = FP_CLS_NAN;                                                \
+  } while (0)
+
+#define        __LITTLE_ENDIAN 1234
+#define        __BIG_ENDIAN    4321
+
+#if defined __BIG_ENDIAN__ || defined _BIG_ENDIAN
+# if defined __LITTLE_ENDIAN__ || defined _LITTLE_ENDIAN
+#  error "Both BIG_ENDIAN and LITTLE_ENDIAN defined!"
+# endif
+# define __BYTE_ORDER __BIG_ENDIAN
+#else
+# if defined __LITTLE_ENDIAN__ || defined _LITTLE_ENDIAN
+#  define __BYTE_ORDER __LITTLE_ENDIAN
+# else
+#  error "Cannot determine current byte order"
+# endif
+#endif
+
+/* Define ALIASNAME as a strong alias for NAME.  */
+# define strong_alias(name, aliasname) _strong_alias(name, aliasname)
+# define _strong_alias(name, aliasname) \
+  extern __typeof (name) aliasname __attribute__ ((alias (#name)));
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/t-fprules-softfp gcc-4.5.0/gcc/config/sh/t-fprules-softfp
--- gcc-4.5.0/gcc/config/sh/t-fprules-softfp	1970-01-01 05:30:00.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/t-fprules-softfp	2010-07-14 18:43:09.000000000 +0530
@@ -0,0 +1,6 @@
+softfp_float_modes := sf df
+softfp_int_modes := si di
+softfp_extensions := sfdf
+softfp_truncations := dfsf
+softfp_machine_header := sh/sfp-machine.h
+softfp_exclude_libgcc2 := y
diff -uprN -x'*.orig' -x'*.rej' -x'*patch*' gcc-4.5.0/gcc/config/sh/t-sh gcc-4.5.0/gcc/config/sh/t-sh
--- gcc-4.5.0/gcc/config/sh/t-sh	2009-08-23 03:13:07.000000000 +0530
+++ gcc-4.5.0/gcc/config/sh/t-sh	2010-07-14 18:43:09.000000000 +0530
@@ -31,24 +31,6 @@ LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_
 
 TARGET_LIBGCC2_CFLAGS = -mieee
 
-# We want fine grained libraries, so use the new code to build the
-# floating point emulation libraries.
-FPBIT = fp-bit.c
-DPBIT = dp-bit.c
-
-dp-bit.c: $(srcdir)/config/fp-bit.c
-	echo '#ifdef __LITTLE_ENDIAN__' > dp-bit.c
-	echo '#define FLOAT_BIT_ORDER_MISMATCH' >>dp-bit.c
-	echo '#endif' 		>> dp-bit.c
-	cat $(srcdir)/config/fp-bit.c >> dp-bit.c
-
-fp-bit.c: $(srcdir)/config/fp-bit.c
-	echo '#define FLOAT' > fp-bit.c
-	echo '#ifdef __LITTLE_ENDIAN__' >> fp-bit.c
-	echo '#define FLOAT_BIT_ORDER_MISMATCH' >>fp-bit.c
-	echo '#endif' 		>> fp-bit.c
-	cat $(srcdir)/config/fp-bit.c >> fp-bit.c
-
 DEFAULT_ENDIAN = $(word 1,$(TM_ENDIAN_CONFIG))
 OTHER_ENDIAN = $(word 2,$(TM_ENDIAN_CONFIG))
 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: SH optimized software floating point routines
  2010-07-16 10:04   ` Naveen H. S
@ 2010-07-16 10:26     ` Joern Rennecke
  2010-07-22 23:10       ` Joseph S. Myers
  2010-07-16 14:01     ` Kaz Kojima
  2010-07-17 13:30     ` Joern Rennecke
  2 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-16 10:26 UTC (permalink / raw)
  To: Naveen H. S; +Cc: Kaz Kojima, gcc, Prafulla Thakare

Quoting "Naveen H. S" <Naveen.S@kpitcummins.com>:

> extendsfdf2 - gcc.c-torture/execute/conversion.c
> gcc.dg/torture/fp-int-convert-float.c, gcc.dg/pr28796-2.c

Note that some tests invoke undefined behaviour; I've also come across this
when doing optimized soft FP for ARCompact:

http://gcc.gnu.org/viewcvs/branches/arc-4_4-20090909-branch/gcc/testsuite/gcc.dg/torture/fp-int-convert.h?r1=151539&r2=151545

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-16 10:04   ` Naveen H. S
  2010-07-16 10:26     ` Joern Rennecke
@ 2010-07-16 14:01     ` Kaz Kojima
  2010-07-17 14:31       ` Joern Rennecke
  2010-07-17 13:30     ` Joern Rennecke
  2 siblings, 1 reply; 30+ messages in thread
From: Kaz Kojima @ 2010-07-16 14:01 UTC (permalink / raw)
  To: Naveen.S; +Cc: gcc, Prafulla.Thakare, amylaar

"Naveen H. S" <Naveen.S@kpitcummins.com> wrote:
>>> you can free to propose a complete and regtested patch for SH 
>>> assembly soft fp against trunk.
> 
> Please find attached the ported soft float patch "sh_softfloat.patch". 
> The original patch was posted at the following link by Joern RENNECKE.
> http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00614.html

Your patches are for 4.5.0 and sh_softfloat.patch can't be
applicable for trunk cleanly.  Please provide patches against
svn trunk and the ChangeLog entries for them.

Here is an incomplete list of comments for sh_softfloat.patch:

*target*:
New macro TARGET_MATCH_ADJUST requires a doc patch.

reload.c:
config/sh/lib1funcs.asm:
config/sh/lib1funcs.h:

Copyright years of these files should be updated.

All copyright of IEEE-754/* files should be GPLv3 with Runtime
library exception instead of v2 and 2010 should be added to
their copyright years.
Please see the one used in sh/lib1funcs.asm for example.

config/sh/sh.c:
>+/* Saved operands from the last compare to use when we generate an scc
>+   or bcc insn.  */
>+
>+rtx sh_compare_op0;
>+rtx sh_compare_op1;

It looks that sh_compare_op0 snd sh_compare_op1 are set but
not used.

>+  REG_NOTES (last) = gen_rtx_EXPR_LIST (REG_EQUAL, equiv, REG_NOTES (last));

Use add_reg_note here.  Several other similar cases.

> Please find attached the patch ""sh_softfp.patch" which implements basic
> support of soft-fp for SH target. There were no regressions found with 
> the patch. Please let us know if there should be any further improvements
> required for complete soft-fp support.

sh_softfp.patch looks basically OK to me, though I'm curious
with numbers for fp-bit.c/softfp/softfloat.  Could you show us
some real speed&size numbers?

Regards,
	kaz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: SH optimized software floating point routines
  2010-07-16 10:04   ` Naveen H. S
  2010-07-16 10:26     ` Joern Rennecke
  2010-07-16 14:01     ` Kaz Kojima
@ 2010-07-17 13:30     ` Joern Rennecke
  2010-07-19  0:59       ` Joern Rennecke
  2 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-17 13:30 UTC (permalink / raw)
  To: Naveen H. S; +Cc: Kaz Kojima, gcc, Prafulla Thakare

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

Quoting "Naveen H. S" <Naveen.S@kpitcummins.com>:

> Hi,
>
>>> you can free to propose a complete and regtested patch for SH
>>> assembly soft fp against trunk.
>
> Please find attached the ported soft float patch "sh_softfloat.patch".
> The original patch was posted at the following link by Joern RENNECKE.
> http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00614.html
>
> The following modifications have been done in the patch.
>
> sched-deps.c :- Hunk was not applied due to modifications in the
> current gcc source.
>
> t-sh :- divsf3, extendsfdf2 and truncdfsf2 routines are not included
> as they resulted in some regressions.
> divsf3 - c-c++-common/torture/complex-sign-mixed-div.c
> extendsfdf2 - gcc.c-torture/execute/conversion.c
> gcc.dg/torture/fp-int-convert-float.c, gcc.dg/pr28796-2.c
> gcc.dg/torture/type-generic-1.c

I've found some bugs in the SH[12] implementation of divsf3 / extendsf2.
What test case fails for truncsf2?

> sh.md :- cbranchsf4, cbranchdf4, cstoresf4, cstoredf4 instruction
> patterns are not included as they are already present in current source.
> Modifying the routines referring patch resulted in build failure.

Without optimized comparisons, conversions and division, the performance
will be heavily compromised.

Could you please test the attached patch.

[-- Attachment #2: sh-softfp-20100717-1350 --]
[-- Type: text/plain, Size: 285198 bytes --]

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 162269)
+++ gcc/doc/tm.texi	(working copy)
@@ -2753,6 +2753,10 @@ of the individual moves due to expected 
 forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_MATCH_ADJUST (rtx, @var{int})
+This hook is documented in @file{target.def} / @file{targhooks.c}.
+@end deftypefn
+
 @defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 162269)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -2753,6 +2753,8 @@ of the individual moves due to expected 
 forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
 @end deftypefn
 
+@hook TARGET_MATCH_ADJUST
+
 @defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 162269)
+++ gcc/targhooks.c	(working copy)
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  
 #include "reload.h"
 #include "optabs.h"
 #include "recog.h"
+#include "regs.h"
 
 
 bool
@@ -906,6 +907,27 @@ default_secondary_reload (bool in_p ATTR
   return rclass;
 }
 
+/*  Given an rtx and its regno, return a regno value that shall be used for
+    purposes of comparison in operands_match_p.
+    Generally, we say that integer registers are subject to big-endian
+    adjustment.  This default target hook should generally work if the mode
+    of a register is a sufficient indication if this adjustment is to take
+    place; this will not work when software floating point is done in integer
+    registers.  */
+int
+default_match_adjust (rtx x, int regno)
+{
+  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+     multiple hard register group of scalar integer registers, so that
+     for example (reg:DI 0) and (reg:SI 1) will be considered the same
+     register.  */
+  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+      && SCALAR_INT_MODE_P (GET_MODE (x))
+      && regno < FIRST_PSEUDO_REGISTER)
+    regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+  return regno;
+}
+
 void
 default_target_option_override (void)
 {
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 162269)
+++ gcc/targhooks.h	(working copy)
@@ -121,6 +121,7 @@ extern const reg_class_t *default_ira_co
 extern reg_class_t default_secondary_reload (bool, rtx, reg_class_t,
 					     enum machine_mode,
 					     secondary_reload_info *);
+extern int default_match_adjust (rtx, int);
 extern void default_target_option_override (void);
 extern void hook_void_bitmap (bitmap);
 extern bool default_handle_c_option (size_t, const char *, int);
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 162269)
+++ gcc/target.def	(working copy)
@@ -1945,6 +1945,14 @@ DEFHOOK
   secondary_reload_info *sri),
  default_secondary_reload)
 
+/* Take an rtx and its regno, and return the regno for purposes of
+   checking a matching constraint.  */
+DEFHOOK
+(match_adjust,
+ "This hook is documented in @file{target.def} / @file{targhooks.c}.",
+ int, (rtx, int),
+ default_match_adjust)
+
 /* This target hook allows the backend to perform additional
    processing while initializing for variable expansion.  */
 DEFHOOK
Index: gcc/reload.c
===================================================================
--- gcc/reload.c	(revision 162269)
+++ gcc/reload.c	(working copy)
@@ -2216,14 +2216,8 @@ operands_match_p (rtx x, rtx y)
 	 multiple hard register group of scalar integer registers, so that
 	 for example (reg:DI 0) and (reg:SI 1) will be considered the same
 	 register.  */
-      if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
-	  && SCALAR_INT_MODE_P (GET_MODE (x))
-	  && i < FIRST_PSEUDO_REGISTER)
-	i += hard_regno_nregs[i][GET_MODE (x)] - 1;
-      if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (y)) > UNITS_PER_WORD
-	  && SCALAR_INT_MODE_P (GET_MODE (y))
-	  && j < FIRST_PSEUDO_REGISTER)
-	j += hard_regno_nregs[j][GET_MODE (y)] - 1;
+      i = targetm.match_adjust (x, i);
+      j = targetm.match_adjust (y, j);
 
       return i == j;
     }
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 162269)
+++ gcc/Makefile.in	(working copy)
@@ -2806,7 +2806,7 @@ opts-common.o : opts-common.c opts.h opt
 targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \
    $(EXPR_H) $(TM_H) $(RTL_H) $(TM_P_H) $(FUNCTION_H) output.h $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
    $(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) gt-targhooks.h \
-   $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h
+   $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h $(REGS_H)
 
 bversion.h: s-bversion; @true
 s-bversion: BASE-VER
Index: gcc/config/sh/sh-protos.h
===================================================================
--- gcc/config/sh/sh-protos.h	(revision 162269)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -25,8 +25,13 @@ along with GCC; see the file COPYING3.  
 #define GCC_SH_PROTOS_H
 
 enum sh_function_kind {
-  /* A function with normal C ABI  */
+  /* A function with normal C ABI, or an SH1..SH4 sfunc that may resolved via
+     a PLT.  */
   FUNCTION_ORDINARY,
+  /* A function that is a bit large to put it in every calling dso, but that's
+     typically used often enough so that calling via GOT makes sense for
+     speed.  */
+  SFUNC_FREQUENT,
   /* A special function that guarantees that some otherwise call-clobbered
      registers are not clobbered.  These can't go through the SH5 resolver,
      because it only saves argument passing registers.  */
@@ -115,6 +120,10 @@ extern void expand_sf_binop (rtx (*)(rtx
 extern void expand_df_unop (rtx (*)(rtx, rtx, rtx), rtx *);
 extern void expand_df_binop (rtx (*)(rtx, rtx, rtx, rtx), rtx *);
 extern void expand_fp_branch (rtx (*)(void), rtx (*)(void));
+extern void expand_sfunc_unop (enum machine_mode, rtx (*) (rtx, rtx),
+			       const char *, enum rtx_code code, rtx *);
+extern void expand_sfunc_binop (enum machine_mode, rtx (*) (rtx, rtx),
+				const char *, enum rtx_code code, rtx *);
 extern int sh_insn_length_adjustment (rtx);
 extern int sh_can_redirect_branch (rtx, rtx);
 extern void sh_expand_unop_v2sf (enum rtx_code, rtx, rtx);
@@ -132,6 +141,8 @@ extern struct rtx_def *get_fpscr_rtx (vo
 extern int sh_media_register_for_return (void);
 extern void sh_expand_prologue (void);
 extern void sh_expand_epilogue (bool);
+extern void sh_expand_float_cbranch (rtx operands[4]);
+extern void sh_expand_float_scc (rtx operands[4]);
 extern int sh_need_epilogue (void);
 extern void sh_set_return_address (rtx, rtx);
 extern int initial_elimination_offset (int, int);
@@ -176,6 +187,7 @@ struct secondary_reload_info;
 extern reg_class_t sh_secondary_reload (bool, rtx, reg_class_t,
 					enum machine_mode,
 					struct secondary_reload_info *);
+extern int sh_match_adjust (rtx, int);
 extern int sh2a_get_function_vector_number (rtx);
 extern int sh2a_is_function_vector_call (rtx);
 extern void sh_fix_range (const char *);
Index: gcc/config/sh/lib1funcs.asm
===================================================================
--- gcc/config/sh/lib1funcs.asm	(revision 162269)
+++ gcc/config/sh/lib1funcs.asm	(working copy)
@@ -3931,3 +3931,6 @@ GLOBAL(udiv_qrnnd_16):
 	ENDFUNC(GLOBAL(udiv_qrnnd_16))
 #endif /* !__SHMEDIA__ */
 #endif /* L_udiv_qrnnd_16 */
+
+#include "ieee-754-sf.S"
+#include "ieee-754-df.S"
Index: gcc/config/sh/t-sh
===================================================================
--- gcc/config/sh/t-sh	(revision 162269)
+++ gcc/config/sh/t-sh	(working copy)
@@ -26,6 +26,10 @@ LIB1ASMSRC = sh/lib1funcs.asm
 LIB1ASMFUNCS = _ashiftrt _ashiftrt_n _ashiftlt _lshiftrt _movmem \
   _movmem_i4 _mulsi3 _sdivsi3 _sdivsi3_i4 _udivsi3 _udivsi3_i4 _set_fpscr \
   _div_table _udiv_qrnnd_16 \
+  _nesf2 _nedf2 _gtsf2t _gtdf2t _gesf2f _gedf2f _extendsfdf2  _truncdfsf2 \
+  _add_sub_sf3 _mulsf3 _hypotf _muldf3 _add_sub_df3 _divsf3 _divdf3 \
+  _fixunssfsi _fixsfsi _fixunsdfsi _fixdfsi _floatunssisf _floatsisf \
+  _floatunssidf _floatsidf \
   $(LIB1ASMFUNCS_CACHE)
 LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_invalidate_array
 
@@ -120,7 +124,6 @@ $(T)crtn.o: $(srcdir)/config/sh/crtn.asm
 	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)crtn.o -x assembler-with-cpp $(srcdir)/config/sh/crtn.asm
 
 $(out_object_file): gt-sh.h
-gt-sh.h : s-gtype ; @true
 
 # These are not suitable for COFF.
 # EXTRA_MULTILIB_PARTS= crt1.o crti.o crtn.o crtbegin.o crtend.o
@@ -131,17 +134,17 @@ OPT_EXTRA_PARTS= libgcc-Os-4-200.a libgc
 EXTRA_MULTILIB_PARTS= $(IC_EXTRA_PARTS) $(OPT_EXTRA_PARTS)
 
 $(T)ic_invalidate_array_4-100.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4-100.a: $(T)ic_invalidate_array_4-100.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-100.a $(T)ic_invalidate_array_4-100.o
 
 $(T)ic_invalidate_array_4-200.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4-200.a: $(T)ic_invalidate_array_4-200.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-200.a $(T)ic_invalidate_array_4-200.o
 
 $(T)ic_invalidate_array_4a.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4a.a: $(T)ic_invalidate_array_4a.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4a.a $(T)ic_invalidate_array_4a.o
 
Index: gcc/config/sh/sh.opt
===================================================================
--- gcc/config/sh/sh.opt	(revision 162269)
+++ gcc/config/sh/sh.opt	(working copy)
@@ -21,7 +21,7 @@
 ;; Used for various architecture options.
 Mask(SH_E)
 
-;; Set if the default precision of th FPU is single.
+;; Set if the default precision of the FPU is single.
 Mask(FPU_SINGLE)
 
 ;; Set if we should generate code using type 2A insns.
Index: gcc/config/sh/ieee-754-df.S
===================================================================
--- gcc/config/sh/ieee-754-df.S	(revision 0)
+++ gcc/config/sh/ieee-754-df.S	(revision 0)
@@ -0,0 +1,791 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_DOUBLE__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Double-precision floating-point emulation.
+   We handle NANs, +-infinity, and +-zero.
+   However, we assume that for NANs, the topmost bit of the fraction is set.  */
+
+#ifdef __LITTLE_ENDIAN__
+#define DBL0L r4
+#define DBL0H r5
+#define DBL1L r6
+#define DBL1H r7
+#define DBLRL r0
+#define DBLRH r1
+#else
+#define DBL0L r5
+#define DBL0H r4
+#define DBL1L r7
+#define DBL1H r6
+#define DBLRL r1
+#define DBLRH r0
+#endif
+
+#ifdef __SH_FPU_ANY__
+#define RETURN_R0_MAIN
+#define RETURN_R0 bra LOCAL(return_r0)
+#define RETURN_FR0
+LOCAL(return_r0): \
+ lds r0,fpul; \
+ rts; \
+ fsts fpul,fr0
+#define ARG_TO_R4 \
+ flds fr4,fpul; \
+ sts fpul,r4
+#else /* ! __SH_FPU_ANY__ */
+#define RETURN_R0_MAIN rts
+#define RETURN_R0 rts
+#define RETURN_FR0
+#define ARG_TO_R4
+#endif /* ! __SH_FPU_ANY__ */
+
+#ifdef L_nedf2
+/* -ffinite-math-only -mb inline version, T := r4:DF == r6:DF
+	cmp/eq	r5,r7
+	mov	r4,r0
+	bf	0f
+	cmp/eq	r4,r6
+	bt	0f
+	or	r6,r0
+	add	r0,r0
+	or	r5,r0
+	tst	r0,r0
+	0:			*/
+	.balign 4
+	.global GLOBAL(nedf2)
+	HIDDEN_FUNC(GLOBAL(nedf2))
+GLOBAL(nedf2):
+	cmp/eq	DBL0L,DBL1L
+	mov.l   LOCAL(c_DF_NAN_MASK),r1
+	bf LOCAL(ne)
+	cmp/eq	DBL0H,DBL1H
+	not	DBL0H,r0
+	bt	LOCAL(check_nan)
+	mov	DBL0H,r0
+	or	DBL1H,r0
+	add	r0,r0
+	rts
+	or	DBL0L,r0
+LOCAL(check_nan):
+	tst	r1,r0
+	rts
+	movt	r0
+LOCAL(ne):
+	rts
+	mov #1,r0
+	.balign 4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(nedf2))
+#endif /* L_nedf2 */
+
+#ifdef L_unorddf2
+	.balign 4
+	.global GLOBAL(unorddf2)
+	HIDDEN_FUNC(GLOBAL(unorddf2))
+GLOBAL(unorddf2):
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	not	DBL0H,r0
+	tst	r1,r0
+	not	r6,r0
+	bt	LOCAL(unord)
+	tst	r1,r0
+LOCAL(unord):
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(unorddf2))
+#endif /* L_unorddf2 */
+
+#if defined(L_gtdf2t) || defined(L_gtdf2t_trap)
+#ifdef L_gtdf2t
+#define fun_label GLOBAL(gtdf2t)
+#else
+#define fun_label GLOBAL(gtdf2t_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater, the result true, unless
+	   any of them is a nan (but infinity is fine), or both values are
+	   +- zero.  Otherwise, the result false.  */
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	cmp/pz	DBL0H
+	not	DBL1H,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	DBL0H,r0
+	bt	LOCAL(nan) /* return zero if DBL1 is NAN.  */
+	cmp/eq	DBL1H,DBL0H
+	bt	LOCAL(cmp_low)
+	cmp/gt	DBL1H,DBL0H
+	or	DBL1H,r0
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/gt	DBL0H,r1)
+	add	r0,r0
+	bf	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	or	DBL0L,r0
+	rts
+	or	DBL1L,r0 /* non-zero unless both DBL0 and DBL1 are +-zero.  */
+LOCAL(cmp_low):
+	cmp/hi	DBL1L,DBL0L
+	rts
+	movt	r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan) /* return zero if DBL1 is NAN.  */
+	cmp/eq	DBL1H,DBL0H
+	SLC(bt,	LOCAL(neg_cmp_low),
+	 cmp/hi	DBL0L,DBL1L)
+	not	DBL0H,r0
+	tst	r1,r0
+	bt	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	cmp/hi	DBL0H,DBL1H
+	SLI(rts	!,)
+	SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+	SLI(cmp/hi	DBL0L,DBL1L)
+	rts
+	movt	r0
+LOCAL(check_nan):
+#ifdef L_gtdf2t
+LOCAL(nan):
+	rts
+	mov	#0,r0
+#else
+	SLI(cmp/gt DBL0H,r1)
+	bf	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	rts
+	mov	#0,r0
+LOCAL(nan):
+	mov	#0,r0
+	trapa	#0
+#endif
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(fun_label)
+#endif /* defined(L_gtdf2t) || defined(L_gtdf2t_trap) */
+
+#ifdef L_gedf2f
+	.balign 4
+	.global GLOBAL(gedf2f)
+	HIDDEN_FUNC(GLOBAL(gedf2f))
+GLOBAL(gedf2f):
+	/* If the raw values compare greater or equal, the result is
+	   true, unless any of them is a nan, or both are the
+	   same infinity.  If both are -+zero, the result is true;
+	   otherwise, it is false.
+	   We use 0 as true and nonzero as false for this function.  */
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	cmp/pz	DBL1H
+	not	DBL0H,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	DBL0H,r0
+	bt	LOCAL(nan)
+	cmp/eq	DBL0H,DBL1H
+	bt	LOCAL(cmp_low)
+	cmp/gt	DBL0H,DBL1H
+	or	DBL1H,r0
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/ge	r1,DBL1H)
+	add	r0,r0
+	bt	LOCAL(nan)
+	or	DBL0L,r0
+	rts
+	or	DBL1L,r0
+LOCAL(cmp_low):
+	cmp/hi	DBL0L,DBL1L
+#if defined(L_gedf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+	rts
+	movt	r0
+#if defined(L_gedf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,DBL1H)
+LOCAL(nan):
+	rts
+	movt	r0
+#elif defined(L_gedf2f_trap)
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,DBL1H)
+	bt	LOCAL(nan)
+	rts
+LOCAL(nan):
+	movt	r0
+	trapa	#0
+#endif /* L_gedf2f_trap */
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	cmp/eq	DBL0H,DBL1H
+	not	DBL1H,r0
+	SLC(bt,	LOCAL(neg_cmp_low),
+	 cmp/hi	DBL1L,DBL0L)
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	DBL1H,DBL0H
+	SLI(rts !,)
+	SLI(movt	r0 !,)
+LOCAL(neg_cmp_low):
+	SLI(cmp/hi	DBL1L,DBL0L)
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(gedf2f))
+#endif /* L_gedf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_extendsfdf2
+	.balign 4
+	.global GLOBAL(extendsfdf2)
+	FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+	ARG_TO_R4
+	mov.l	LOCAL(x7f800000),r3
+	mov	r4,DBLRL
+	tst	r3,r4
+	bt	LOCAL(zero_denorm)
+	mov.l	LOCAL(xe0000000),r2
+	rotr	DBLRL
+	rotr	DBLRL
+	rotr	DBLRL
+	and	r2,DBLRL
+	mov	r4,DBLRH
+	not	r4,r2
+	tst	r3,r2
+	mov.l	LOCAL(x38000000),r2
+	bf	0f
+	add	r2,r2	! infinity / NaN adjustment
+0:	shll	DBLRH
+	shlr2	DBLRH
+	shlr2	DBLRH
+	add	DBLRH,DBLRH
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+LOCAL(zero_denorm):
+	mov.l	r4,@-r15
+	add	r4,r4
+	tst	r4,r4
+	bt	LOCAL(zero)
+	mov.l	LOCAL(x00ff0000),r3
+	mov.w	LOCAL(x389),r2
+LOCAL(shift_byte):
+	tst	r3,r4
+	shll8	r4
+	SL(bt,	LOCAL(shift_byte),
+	 add	#-8,r2)
+LOCAL(shift_bit):
+	shll	r4
+	SL(bf,	LOCAL(shift_bit),
+	 add	#-1,r2)
+	mov	#0,DBLRL
+	mov	r4,DBLRH
+	mov.l	@r15+,r4
+	shlr8	DBLRH
+	shlr2	DBLRH
+	shlr	DBLRH
+	rotcr	DBLRL
+	cmp/gt	r4,DBLRH	! get sign
+	rotcr	DBLRH
+	rotcr	DBLRL
+	shll16	r2
+	shll8	r2
+	rts
+	add	r2,DBLRH
+LOCAL(zero):
+	mov.l	@r15+,DBLRH
+	rts
+	mov	#0,DBLRL
+LOCAL(x389):	.word 0x389
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(xe0000000):
+	.long	0xe0000000
+LOCAL(x00ff0000):
+	.long	0x00ff0000
+	ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+	.balign 4
+	.global GLOBAL(truncdfsf2)
+	FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+	mov.l	LOCAL(x38000000),r3	! exponent adjustment DF -> SF
+	mov	DBL0H,r1
+	mov.l	LOCAL(x70000000),r2	! mask for out-of-range exponent bits
+	mov	DBL0H,r0
+	mov.l	DBL0L,@-r15
+	sub	r3,r1
+	tst	r2,r1
+	shll8	r0			!
+	shll2	r0			! Isolate highpart fraction.
+	shll2	r0			!
+	bf	LOCAL(ill_exp)
+	shll2	r1
+	mov.l	LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits.  */
+	shll2	r1
+	mov.l	LOCAL(xff000000),r3
+	shlr8	r0
+	tst	r2,DBL0L /* Check if msb guard bit wants rounding up.  */
+	shlr16	DBL0L
+	shlr8	DBL0L
+	shlr2	DBL0L
+	SL1(bt,	LOCAL(add_frac),
+	 shlr2	DBL0L)
+	add	#1,DBL0L
+LOCAL(add_frac):
+	add	DBL0L,r0
+	mov.l	LOCAL(x01000000),r2
+	and	r3,r1
+	mov.l	@r15+,DBL0L
+	add	r1,r0
+	tst	r3,r0
+	bt	LOCAL(inf_denorm0)
+	cmp/hs	r3,r0
+LOCAL(denorm_noup_sh1):
+	bt	LOCAL(inf)
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+RETURN_R0_MAIN
+	rotcr	r0
+RETURN_FR0
+LOCAL(inf_denorm0):	!  We might need to undo previous rounding.
+	mov.l	LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits.  */
+	tst	r1,r1
+	bf	LOCAL(inf)
+	add	#-1,r0
+	tst	r3,DBL0L /* Check if msb guard bit was rounded up.  */
+	mov.l	LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits.  */
+	addc	r2,r0
+	shlr	r0
+	tst	r3,DBL0L /* Check if msb guard bit wants rounding up.  */
+#ifdef DELAYED_BRANCHES
+	bt/s	LOCAL(denorm_noup)
+#else
+	bt	LOCAL(denorm_noup_sh1)
+#endif
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	add	#1,r0
+LOCAL(denorm_noup):
+	RETURN_R0
+	rotcr	r0
+LOCAL(ill_exp):
+	div0s	DBL0H,r1
+	mov.l	LOCAL(x7ff80000),r2
+	add	r1,r1
+	bf	LOCAL(inf_nan)
+	mov.w	LOCAL(m32),r3 /* Handle denormal or zero.  */
+	shlr16	r1
+	exts.w	r1,r1
+	shll2	r1
+	add	r1,r1
+	shlr8	r1
+	exts.w	r1,r1
+	add	#-8,r1	/* Go from 9 to 1 guard bit in MSW.  */
+	cmp/gt	r3,r1
+	mov.l	@r15+,r3 /* DBL0L */
+	bf	LOCAL(zero)
+	mov.l	DBL0L, @-r15
+	shll8	DBL0L
+	rotcr	r0	/* Insert leading 1.  */
+	shlr16	r3
+	shll2	r3
+	add	r3,r3
+	shlr8	r3
+	cmp/pl	DBL0L	/* Check lower 23 guard bits if guard bit 23 is 0.  */
+	addc	r3,r0	/* Assemble fraction with compressed guard bits.  */
+	mov.l	@r15+,DBL0L
+	mov	#0,r2
+	neg	r1,r1
+LOCAL(denorm_loop):
+	shlr	r0
+	rotcl	r2
+	dt	r1
+	bf	LOCAL(denorm_loop)
+	tst	#2,r0
+	rotcl	r0
+	tst	r2,r2
+	rotcl	r0
+	xor	#3,r0
+	add	#3,r0	/* Even overflow gives the correct result.  */
+	shlr2	r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(zero):
+	mov	#0,r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(inf_nan):
+	not	DBL0H,r0
+	tst	r2,r0
+	mov.l	@r15+,DBL0L
+	bf	LOCAL(inf)
+	RETURN_R0
+	mov	#-1,r0	/* NAN */
+LOCAL(inf):	/* r2 must be positive here.  */
+	mov.l	LOCAL(xffe00000),r0
+	div0s	r2,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(m32):
+	.word	-32
+	.balign	4
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(x70000000):
+	.long	0x70000000
+LOCAL(x2fffffff):
+	.long	0x2fffffff
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x5fffffff):
+	.long	0x5fffffff
+LOCAL(x7ff80000):
+	.long	0x7ff80000
+LOCAL(xffe00000):
+	.long	0xffe00000
+	ENDFUNC(GLOBAL(truncdfsf2))
+#endif /*  L_truncdfsf2 */
+#ifdef L_add_sub_df3
+#include "IEEE-754/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift.  Supporting SH1 / SH2 here would
+   make this code too hard to maintain, so if you want to add SH1 / SH2
+   support, do it in a separate copy.  */
+#ifdef DYN_SHIFT
+#ifdef L_extendsfdf2
+	.balign 4
+	.global GLOBAL(extendsfdf2)
+	FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+	ARG_TO_R4
+	mov.l	LOCAL(x7f800000),r2
+	mov	#29,r3
+	mov	r4,DBLRL
+	not	r4,DBLRH
+	tst	r2,r4
+	shld	r3,DBLRL
+	bt	LOCAL(zero_denorm)
+	mov	#-3,r3
+	tst	r2,DBLRH
+	mov	r4,DBLRH
+	mov.l	LOCAL(x38000000),r2
+	bt/s	LOCAL(inf_nan)
+	 shll	DBLRH
+	shld	r3,DBLRH
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+	.balign	4
+LOCAL(inf_nan):
+	shld	r3,DBLRH
+	add	r2,r2
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+LOCAL(zero_denorm):
+	mov.l	r4,@-r15
+	add	r4,r4
+	tst	r4,r4
+	extu.w	r4,r2
+	bt	LOCAL(zero)
+	cmp/eq	r4,r2
+	extu.b	r4,r1
+	bf/s	LOCAL(three_bytes)
+	 mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r4,r1
+	mov	#22,DBLRH
+	bt	LOCAL(one_byte)
+	shlr8	r2
+	mov	#14,DBLRH
+LOCAL(one_byte):
+#ifdef __pic__
+	add	r0,r2
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r2),r2
+	mov	#21,r3
+	mov.w	LOCAL(x0),DBLRL
+	sub	r2,DBLRH
+LOCAL(norm_shift):
+	shld	DBLRH,r4
+	mov.l	@r15+,r2
+	shld	r3,DBLRH
+	mov.l	LOCAL(xb7ffffff),r3
+	add	r4,DBLRH
+	cmp/pz	r2
+	mov	r2,r4
+	rotcr	DBLRH
+	rts
+	sub	r3,DBLRH
+LOCAL(three_bytes):
+	mov	r4,r2
+	shlr16	r2
+#ifdef __pic__
+	add	r0,r2
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r2),r2
+	mov	#21,r3
+	mov	#6-32,DBLRH
+	sub	r2,DBLRH
+	mov	r4,DBLRL
+	shld	DBLRH,DBLRL
+	bra	LOCAL(norm_shift)
+	add	#32,DBLRH
+LOCAL(zero):
+	rts	/* DBLRL has already been zeroed above.  */
+	mov.l @r15+,DBLRH
+LOCAL(x0):
+	.word 0
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(xb7ffffff):
+	/* Flip sign back, do exponent adjustment, and remove leading one.  */
+	.long 0x80000000 + 0x38000000 - 1
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+	ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+	.balign 4
+	.global GLOBAL(truncdfsf2)
+	FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+	mov.l	LOCAL(x38000000),r3
+	mov	DBL0H,r1
+	mov.l	LOCAL(x70000000),r2
+	mov	DBL0H,r0
+	sub	r3,r1
+	mov.l	DBL0L,@-r15
+	tst	r2,r1
+	mov	#12,r3
+	shld	r3,r0			! Isolate highpart fraction.
+	bf	LOCAL(ill_exp)
+	shll2	r1
+	mov.l	LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits.  */
+	shll2	r1
+	mov.l	LOCAL(xff000000),r3
+	shlr8	r0
+	tst	r2,DBL0L /* Check if msb guard bit wants rounding up.  */
+	mov	#-28,r2
+	bt/s	LOCAL(add_frac)
+	 shld	r2,DBL0L
+	add	#1,DBL0L
+LOCAL(add_frac):
+	add	DBL0L,r0
+	mov.l	LOCAL(x01000000),r2
+	and	r3,r1
+	mov.l	@r15+,DBL0L
+	add	r1,r0
+	tst	r3,r0
+	bt	LOCAL(inf_denorm0)
+#if 0	// No point checking overflow -> infinity if we dont't raise a signal.
+	cmp/hs	r3,r0
+	bt	LOCAL(inf)
+#endif
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	RETURN_R0_MAIN
+	rotcr	r0
+RETURN_FR0
+LOCAL(inf_denorm0):	! We might need to undo previous rounding.
+	mov.l	LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits.  */
+	tst	r1,r1
+	bf	LOCAL(inf)
+	add	#-1,r0
+	tst	r3,DBL0L /* Check if msb guard bit was rounded up.  */
+	mov.l	LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits.  */
+	addc	r2,r0
+	shlr	r0
+	tst	r3,DBL0L /* Check if msb guard bit wants rounding up.  */
+	bt/s	LOCAL(denorm_noup)
+	 div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	add	#1,r0
+LOCAL(denorm_noup):
+	RETURN_R0
+	rotcr	r0
+LOCAL(ill_exp):
+	div0s	DBL0H,r1
+	mov.l	LOCAL(x7ff80000),r2
+	add	r1,r1
+	bf	LOCAL(inf_nan)
+	mov.w	LOCAL(m32),r3 /* Handle denormal or zero.  */
+	mov	#-21,r2
+	shad	r2,r1
+	add	#-8,r1	/* Go from 9 to 1 guard bit in MSW.  */
+	cmp/gt	r3,r1
+	mov.l	@r15+,r3 /* DBL0L */
+	bf	LOCAL(zero)
+	mov.l	DBL0L, @-r15
+	shll8	DBL0L
+	rotcr	r0	/* Insert leading 1.  */
+	shld	r2,r3
+	cmp/pl	DBL0L	/* Check lower 23 guard bits if guard bit 23 is 0.  */
+	addc	r3,r0	/* Assemble fraction with compressed guard bits.  */
+	mov	r0,r2
+	shld	r1,r0
+	mov.l	@r15+,DBL0L
+	add	#32,r1
+	shld	r1,r2
+	tst	#2,r0
+	rotcl	r0
+	tst	r2,r2
+	rotcl	r0
+	xor	#3,r0
+	add	#3,r0	/* Even overflow gives the correct result.  */
+	shlr2	r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(zero):
+	mov	#0,r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(inf_nan):
+	not	DBL0H,r0
+	tst	r2,r0
+	mov.l	@r15+,DBL0L
+	bf	LOCAL(inf)
+	RETURN_R0
+	mov	#-1,r0	/* NAN */
+LOCAL(inf):	/* r2 must be positive here.  */
+	mov.l	LOCAL(xffe00000),r0
+	div0s	r2,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(m32):
+	.word	-32
+	.balign	4
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(x70000000):
+	.long	0x70000000
+LOCAL(x2fffffff):
+	.long	0x2fffffff
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x5fffffff):
+	.long	0x5fffffff
+LOCAL(x7ff80000):
+	.long	0x7ff80000
+LOCAL(xffe00000):
+	.long	0xffe00000
+	ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+
+
+#ifdef L_add_sub_df3
+#include "IEEE-754/m3/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/m3/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/m3/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/m3/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/m3/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/m3/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/m3/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_DOUBLE__ */
Index: gcc/config/sh/predicates.md
===================================================================
--- gcc/config/sh/predicates.md	(revision 162269)
+++ gcc/config/sh/predicates.md	(working copy)
@@ -719,6 +719,33 @@ (define_predicate "shift_operator"
 (define_predicate "symbol_ref_operand"
   (match_code "symbol_ref"))
 
+(define_special_predicate "soft_fp_comparison_operand"
+  (match_code "subreg,reg")
+{
+  switch (GET_MODE (op))
+    {
+    default:
+      return 0;
+    case CC_FP_NEmode: case CC_FP_GTmode: case CC_FP_UNLTmode:
+      break;
+    }
+  return register_operand (op, mode);
+})
+
+(define_predicate "soft_fp_comparison_operator"
+  (match_code "eq, unle, ge")
+{
+  switch (GET_CODE (op))
+    {
+    default:
+      return 0;
+    case EQ:  mode = CC_FP_NEmode;    break;
+    case UNLE:        mode = CC_FP_GTmode;    break;
+    case GE:  mode = CC_FP_UNLTmode;  break;
+    }
+  return register_operand (XEXP (op, 0), mode);
+})
+
 ;; Same as target_reg_operand, except that label_refs and symbol_refs
 ;; are accepted before reload.
 
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c	(revision 162269)
+++ gcc/config/sh/sh.c	(working copy)
@@ -111,6 +111,12 @@ static short cached_can_issue_more;
 /* Unique number for UNSPEC_BBR pattern.  */
 static unsigned int unspec_bbr_uid = 1;
 
+/* Saved operands from the last compare to use when we generate an scc
+   or bcc insn.  */
+
+rtx sh_compare_op0;
+rtx sh_compare_op1;
+
 /* Provides the class number of the smallest class containing
    reg number.  */
 
@@ -284,6 +290,7 @@ static int sh_arg_partial_bytes (CUMULAT
 			         tree, bool);
 static bool sh_scalar_mode_supported_p (enum machine_mode);
 static int sh_dwarf_calling_convention (const_tree);
+static void sh_expand_float_condop (rtx *operands, rtx, rtx (*[2]) (rtx));
 static void sh_encode_section_info (tree, rtx, int);
 static int sh2a_function_vector_p (tree);
 static void sh_trampoline_init (rtx, tree, rtx);
@@ -551,6 +558,9 @@ static const struct attribute_spec sh_at
 /* Machine-specific symbol_ref flags.  */
 #define SYMBOL_FLAG_FUNCVEC_FUNCTION    (SYMBOL_FLAG_MACH_DEP << 0)
 
+#undef TARGET_MATCH_ADJUST
+#define TARGET_MATCH_ADJUST sh_match_adjust
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 \f
 /* Implement TARGET_HANDLE_OPTION.  */
@@ -2180,6 +2190,78 @@ sh_emit_cheap_store_flag (enum machine_m
   return gen_rtx_fmt_ee (code, VOIDmode, target, const0_rtx);
 }
 
+static rtx
+sh_soft_fp_cmp (int code, enum machine_mode op_mode, rtx op0, rtx op1)
+{
+  const char *name = NULL;
+  rtx (*fun) (rtx, rtx), addr, tmp, first, last, equiv;
+  int df = op_mode == DFmode;
+  enum machine_mode mode = CODE_FOR_nothing; /* shut up warning.  */
+
+  switch (code)
+    {
+    case EQ:
+      if (!flag_finite_math_only)
+	{
+	  name = df ? "__nedf2" : "__nesf2";
+	  fun = df ? gen_cmpnedf_i1 : gen_cmpnesf_i1;
+	  mode = CC_FP_NEmode;
+	  break;
+	} /* Fall through.  */
+    case UNEQ:
+      fun = gen_cmpuneq_sdf;
+      break;
+    case UNLE:
+      if (flag_finite_math_only && !df)
+	{
+	  fun = gen_cmplesf_i1_finite;
+	  break;
+	}
+      name = df ? "__gtdf2t" : "__gtsf2t";
+      fun = df ? gen_cmpgtdf_i1 : gen_cmpgtsf_i1;
+      mode = CC_FP_GTmode;
+      break;
+    case GE:
+      if (flag_finite_math_only && !df)
+	{
+	  tmp = op0; op0 = op1; op1 = tmp;
+	  fun = gen_cmplesf_i1_finite;
+	  break;
+	}
+      name = df ? "__gedf2f" : "__gesf2f";
+      fun = df ? gen_cmpunltdf_i1 : gen_cmpunltsf_i1;
+      mode = CC_FP_UNLTmode;
+      break;
+    case UNORDERED:
+      fun = gen_cmpun_sdf;
+      break;
+    default: gcc_unreachable ();
+    }
+
+  if (!name)
+    return fun (force_reg (op_mode, op0), force_reg (op_mode, op1));
+
+  tmp = gen_reg_rtx (mode);
+  addr = gen_reg_rtx (Pmode);
+  function_symbol (addr, name, SFUNC_STATIC);
+  first = emit_move_insn (gen_rtx_REG (op_mode, R4_REG), op0);
+  emit_move_insn (gen_rtx_REG (op_mode, R5_REG + df), op1);
+  last = emit_insn (fun (tmp, addr));
+  equiv = gen_rtx_fmt_ee (COMPARE, mode, op0, op1);
+  REG_NOTES (last) = gen_rtx_EXPR_LIST (REG_EQUAL, equiv, REG_NOTES (last));
+  /* Wrap the sequence in REG_LIBCALL / REG_RETVAL notes so that loop
+     invariant code motion can move it.  */
+/*
+  REG_NOTES (first) = gen_rtx_INSN_LIST (REG_LIBCALL, last, REG_NOTES (first));
+  REG_NOTES (last) = gen_rtx_INSN_LIST (REG_RETVAL, first, REG_NOTES (last));
+*/
+  /* Use fpcmp_i1 rather than cmpeqsi_t, so that the optimizers can grok
+     the computation.  */
+  return gen_rtx_SET (VOIDmode,
+		      gen_rtx_REG (SImode, T_REG),
+		      gen_rtx_fmt_ee (code, SImode, tmp, CONST0_RTX (mode)));
+}
+
 /* Called from the md file, set up the operands of a compare instruction.  */
 
 void
@@ -8662,6 +8744,57 @@ sh_fix_range (const char *const_str)
       str = comma + 1;
     }
 }
+
+/* Expand an sfunc operation taking NARGS MODE arguments, using generator
+   function FUN, which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using EQUIV.  */
+static void
+expand_sfunc_op (int nargs, enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		 const char *name, rtx equiv, rtx *operands)
+{
+  int next_reg = FIRST_PARM_REG, i;
+  rtx addr, first = NULL_RTX, last, insn;
+
+  addr = gen_reg_rtx (Pmode);
+  function_symbol (addr, name, SFUNC_FREQUENT);
+  for ( i = 1; i <= nargs; i++)
+    {
+      insn = emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
+      if (!first)
+	first = insn;
+      next_reg += GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+    }
+  last = emit_insn ((*fun) (operands[0], addr));
+  REG_NOTES (last) = gen_rtx_EXPR_LIST (REG_EQUAL, equiv, REG_NOTES (last));
+  /* Wrap the sequence in REG_LIBCALL / REG_RETVAL notes so that loop
+     invariant code motion can move it.  */
+/*
+  REG_NOTES (first) = gen_rtx_INSN_LIST (REG_LIBCALL, last, REG_NOTES (first));
+  REG_NOTES (last) = gen_rtx_INSN_LIST (REG_RETVAL, first, REG_NOTES (last));
+*/
+}
+
+/* Expand an sfunc unary operation taking an MODE argument, using generator
+   function FUN, which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using CODE.  */
+void
+expand_sfunc_unop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		   const char *name, enum rtx_code code, rtx *operands)
+{
+  rtx equiv = gen_rtx_fmt_e (code, GET_MODE (operands[0]), operands[1]);
+  expand_sfunc_op (1, mode, fun, name, equiv, operands);
+}
+
+/* Expand an sfunc binary operation in MODE, using generator function FUN,
+   which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using CODE.  */
+void
+expand_sfunc_binop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		    const char *name, enum rtx_code code, rtx *operands)
+{
+  rtx equiv = gen_rtx_fmt_ee (code, mode, operands[1], operands[2]);
+  expand_sfunc_op (2, mode, fun, name, equiv, operands);
+}
 \f
 /* Insert any deferred function attributes from earlier pragmas.  */
 static void
@@ -11593,11 +11726,10 @@ function_symbol (rtx target, const char 
 {
   rtx sym;
 
-  /* If this is not an ordinary function, the name usually comes from a
-     string literal or an sprintf buffer.  Make sure we use the same
+  /* The name usually comes from a string literal or an sprintf buffer.
+     Make sure we use the same
      string consistently, so that cse will be able to unify address loads.  */
-  if (kind != FUNCTION_ORDINARY)
-    name = IDENTIFIER_POINTER (get_identifier (name));
+  name = IDENTIFIER_POINTER (get_identifier (name));
   sym = gen_rtx_SYMBOL_REF (Pmode, name);
   SYMBOL_REF_FLAGS (sym) = SYMBOL_FLAG_FUNCTION;
   if (flag_pic)
@@ -11605,6 +11737,10 @@ function_symbol (rtx target, const char 
       {
       case FUNCTION_ORDINARY:
 	break;
+      case SFUNC_FREQUENT:
+	if (!optimize || optimize_size)
+	  break;
+	/* Fall through.  */
       case SFUNC_GOT:
 	{
 	  rtx reg = target ? target : gen_reg_rtx (Pmode);
@@ -11715,6 +11851,168 @@ sh_expand_t_scc (rtx operands[])
   return 1;
 }
 
+void
+sh_expand_float_cbranch (rtx operands[4])
+{
+  static rtx (*branches[]) (rtx) = { gen_branch_true, gen_branch_false };
+
+  sh_expand_float_condop (operands, operands[3], branches);
+}
+
+void
+sh_expand_float_scc (rtx operands[4])
+{
+  static rtx (*movts[]) (rtx) = { gen_movt, gen_movnegt };
+
+  sh_expand_float_condop (&operands[1], operands[0], movts);
+}
+
+/* The first element of USER is for positive logic, the second one for
+   negative logic.  */
+static void
+sh_expand_float_condop (rtx *operands, rtx dest, rtx (*user[2]) (rtx))
+{
+  enum machine_mode mode = GET_MODE (operands[1]);
+  enum rtx_code comparison = GET_CODE (operands[0]);
+  int swap_operands = 0;
+  rtx op0, op1;
+  rtx lab = NULL_RTX;
+
+  if (TARGET_SH1_SOFTFP_MODE (mode))
+    {
+      switch (comparison)
+	{
+	case NE:
+	  comparison = EQ;
+	  user++;
+	  break;
+	case LT:
+	  swap_operands = 1;	/* Fall through.  */
+	case GT:
+	  comparison = UNLE;
+	  user++;
+	  break;
+	case UNGT:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLT:
+	  comparison = GE;
+	  user++;
+	  break;
+	case UNGE:
+	  swap_operands = 1;
+	  comparison = UNLE;
+	  break;
+	case LE:
+	  swap_operands = 1;
+	  comparison = GE;	/* Fall through.  */
+	case EQ:
+	case UNEQ:
+	case GE:
+	case UNLE:
+	case UNORDERED:
+	  break;
+	case LTGT:
+	  comparison = UNEQ;
+	  user++;
+	  break;
+	case ORDERED:
+	  comparison = UNORDERED;
+	  user++;
+	  break;
+	  
+	default: gcc_unreachable ();
+	}
+    }
+  else /* SH2E .. SH4 Hardware floating point */
+    {
+      switch (comparison)
+	{
+	case LTGT:
+	  if (!flag_finite_math_only)
+	    break;
+	  /* Fall through.  */
+	case NE:
+	  comparison = EQ;
+	  user++;
+	  break;
+	case LT:
+	  swap_operands = 1;
+	  comparison = GT;	/* Fall through.  */
+	case GT:
+	case EQ:
+	case ORDERED:
+	  break;
+	case LE:
+	  swap_operands = 1;
+	  comparison = GE;	/* Fall through.  */
+	case GE:
+	  if (flag_finite_math_only)
+	    {
+	      swap_operands ^= 1;
+	      comparison = GT;
+	      user++;
+	      break;
+	    }
+	  break;
+	case UNGT:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLT:
+	  if (flag_finite_math_only)
+	    {
+	      swap_operands ^= 1;
+	      comparison = GT;
+	      break;
+	    }
+	  comparison = GE;
+	  user++;
+	  break;
+	case UNGE:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLE:
+	  comparison = GT;
+	  user++;
+	  break;
+	case UNEQ:
+	  if (flag_finite_math_only)
+	    {
+	      comparison = EQ;
+	      break;
+	    }
+	  comparison = LTGT;
+	  user++;
+	  break;
+	case UNORDERED:
+	  comparison = ORDERED;
+	  user++;
+	  break;
+
+	default: gcc_unreachable ();
+	}
+      operands[1] = force_reg (mode, operands[1]);
+      operands[2] = force_reg (mode, operands[2]);
+      if (comparison == GE)
+	{
+	  lab = gen_label_rtx ();
+	  sh_emit_scc_to_t (GT, operands[1+swap_operands],
+			    operands[2-swap_operands]);
+	  emit_jump_insn (gen_branch_true (lab));
+	  comparison = EQ;
+	}
+    }
+  op0 = operands[1+swap_operands];
+  op1 = operands[2-swap_operands];
+  if (GET_MODE_CLASS (mode) == MODE_FLOAT && TARGET_SH1_SOFTFP_MODE (mode))
+    emit_insn (sh_soft_fp_cmp (comparison, mode, op0, op1));
+  else
+    sh_emit_set_t_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, T_REG),
+				     gen_rtx_fmt_ee (comparison, SImode,
+						     op0, op1)),
+			mode);
+  if (lab)
+    emit_label (lab);
+  emit ((*user) (dest));
+}
+
 /* INSN is an sfunc; return the rtx that describes the address used.  */
 static rtx
 extract_sfunc_addr (rtx insn)
@@ -12266,6 +12564,19 @@ sh_secondary_reload (bool in_p, rtx x, r
   return NO_REGS;
 }
 
+int
+sh_match_adjust (rtx x, int regno)
+{
+  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+     multiple hard register group of scalar integer registers, so that
+     for example (reg:DI 0) and (reg:SI 1) will be considered the same
+     register.  */
+  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+      && regno < FIRST_PSEUDO_REGISTER)
+    regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+  return regno;
+}
+
 enum sh_divide_strategy_e sh_div_strategy = SH_DIV_STRATEGY_DEFAULT;
 
 #include "gt-sh.h"
Index: gcc/config/sh/sh.h
===================================================================
--- gcc/config/sh/sh.h	(revision 162269)
+++ gcc/config/sh/sh.h	(working copy)
@@ -183,6 +183,11 @@ do { \
 #define TARGET_FPU_DOUBLE \
   ((target_flags & MASK_SH4) != 0 || TARGET_SH2A_DOUBLE)
 
+#define TARGET_SH1_SOFTFP (TARGET_SH1 && !TARGET_FPU_DOUBLE)
+
+#define TARGET_SH1_SOFTFP_MODE(MODE) \
+  (TARGET_SH1_SOFTFP && (!TARGET_SH2E || (MODE) == DFmode))
+
 /* Nonzero if an FPU is available.  */
 #define TARGET_FPU_ANY (TARGET_SH2E || TARGET_FPU_DOUBLE)
 
@@ -329,6 +334,38 @@ do { \
 #define SUPPORT_ANY_SH5 \
   (SUPPORT_ANY_SH5_32MEDIA || SUPPORT_ANY_SH5_64MEDIA)
 
+/* Check if we have support for optimized software floating point using
+   dynamic shifts - then some function calls clobber fewer registers.  */
+#ifdef SUPPORT_SH3
+#define SUPPORT_SH3_OSFP 1
+#else
+#define SUPPORT_SH3_OSFP 0
+#endif
+
+#ifdef SUPPORT_SH3E
+#define SUPPORT_SH3E_OSFP 1
+#else
+#define SUPPORT_SH3E_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_NOFPU) || defined(SUPPORT_SH3_OSFP)
+#define SUPPORT_SH4_NOFPU_OSFP 1
+#else
+#define SUPPORT_SH4_NOFPU_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_SINGLE_ONLY) || defined (SUPPORT_SH3E_OSFP)
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 1
+#else
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 0
+#endif
+
+#define TARGET_OSFP (0 \
+ || (TARGET_SH3 && !TARGET_SH2E && SUPPORT_SH3_OSFP) \
+ || (TARGET_SH3E && SUPPORT_SH3E_OSFP) \
+ || (TARGET_HARD_SH4 && !TARGET_SH2E && SUPPORT_SH4_NOFPU_OSFP) \
+ || (TARGET_HARD_SH4 && TARGET_SH2E && SUPPORT_SH4_SINGLE_ONLY_OSFP))
+
 /* Reset all target-selection flags.  */
 #define MASK_ARCH (MASK_SH1 | MASK_SH2 | MASK_SH3 | MASK_SH_E | MASK_SH4 \
 		   | MASK_HARD_SH2A | MASK_HARD_SH2A_DOUBLE | MASK_SH4A \
@@ -2047,6 +2084,12 @@ struct sh_args {
 #define LIBGCC2_DOUBLE_TYPE_SIZE 64
 #endif
 
+#if defined(__SH2E__) || defined(__SH3E__) || defined( __SH4_SINGLE_ONLY__)
+#define LIBGCC2_DOUBLE_TYPE_SIZE 32
+#else
+#define LIBGCC2_DOUBLE_TYPE_SIZE 64
+#endif
+
 /* 'char' is signed by default.  */
 #define DEFAULT_SIGNED_CHAR  1
 
Index: gcc/config/sh/sh-modes.def
===================================================================
--- gcc/config/sh/sh-modes.def	(revision 162269)
+++ gcc/config/sh/sh-modes.def	(working copy)
@@ -22,6 +22,11 @@ PARTIAL_INT_MODE (SI);
 /* PDI mode is used to represent a function address in a target register.  */
 PARTIAL_INT_MODE (DI);
 
+/* For software floating point comparisons.  */
+CC_MODE (CC_FP_NE);
+CC_MODE (CC_FP_GT);
+CC_MODE (CC_FP_UNLT);
+
 /* Vector modes.  */
 VECTOR_MODE  (INT, QI, 2);    /*                 V2QI */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
Index: gcc/config/sh/lib1funcs.h
===================================================================
--- gcc/config/sh/lib1funcs.h	(revision 162269)
+++ gcc/config/sh/lib1funcs.h	(working copy)
@@ -64,13 +64,151 @@ see the files COPYING3 and COPYING.RUNTI
 #endif /* !__LITTLE_ENDIAN__ */
 
 #ifdef __sh1__
+/* branch with two-argument delay slot insn */
 #define SL(branch, dest, in_slot, in_slot_arg2) \
 	in_slot, in_slot_arg2; branch dest
+/* branch with one-argument delay slot insn */
 #define SL1(branch, dest, in_slot) \
 	in_slot; branch dest
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+        branch dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg2) in_slot, in_slot_arg2
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+	branch .+6; bra .+6; cmp2, cmp2arg2; cmp1, cmp1arg2
+#define DMULU_SAVE \
+ mov.l r10,@-r15; \
+ mov.l r11,@-r15; \
+ mov.l r12,@-r15; \
+ mov.l r13,@-r15
+#define DMULUL(m1, m2, rl) \
+ swap.w m1,r12; \
+ mulu.w r12,m2; \
+ swap.w m2,r13; \
+ sts macl,r10; \
+ mulu.w r13,m1; \
+ clrt; \
+ sts macl,r11; \
+ mulu.w r12,r13; \
+ addc r11,r10; \
+ sts macl,r12; \
+ mulu.w m1,m2; \
+ movt r11; \
+ sts macl,rl; \
+ mov r10,r13; \
+ shll16 r13; \
+ addc r13,rl; \
+ xtrct r11,r10; \
+ addc r10,r12 \
+/* N.B. the carry is cleared here.  */
+#define DMULUH(rh) mov r12,rh
+#define DMULU_RESTORE \
+ mov.l @r15+,r13; \
+ mov.l @r15+,r12; \
+ mov.l @r15+,r11; \
+ mov.l @r15+,r10
 #else /* ! __sh1__ */
+/* branch with two-argument delay slot insn */
 #define SL(branch, dest, in_slot, in_slot_arg2) \
-	branch##.s dest; in_slot, in_slot_arg2
+	branch##/s dest; in_slot, in_slot_arg2
+/* branch with one-argument delay slot insn */
 #define SL1(branch, dest, in_slot) \
 	branch##/s dest; in_slot
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+        branch##/s dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg)
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+	branch##/s .+6; cmp1, cmp1arg2; cmp2, cmp2arg2
+#define DMULU_SAVE
+#define DMULUL(m1, m2, rl) dmulu.l m1,m2; sts macl,rl
+#define DMULUH(rh) sts mach,rh
+#define DMULU_RESTORE
 #endif /* !__sh1__ */
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+/* don't #define DYN_SHIFT */
+  #define SHLL4(REG)	\
+	shll2	REG;	\
+	shll2	REG
+
+  #define SHLR4(REG)	\
+	shlr2	REG;	\
+	shlr2	REG
+
+  #define SHLL6(REG)	\
+	shll2	REG;	\
+	shll2	REG;	\
+	shll2	REG
+
+  #define SHLR6(REG)	\
+	shlr2	REG;	\
+	shlr2	REG;	\
+	shlr2	REG
+
+  #define SHLL12(REG)	\
+	shll8	REG;	\
+	SHLL4 (REG)
+
+  #define SHLR12(REG)	\
+	shlr8	REG;	\
+	SHLR4 (REG)
+
+  #define SHLR19(REG)	\
+	shlr16	REG;	\
+	shlr2	REG;	\
+	shlr	REG
+
+  #define SHLL23(REG)	\
+	shll16	REG;	\
+	shlr	REG;	\
+	shll8	REG
+
+  #define SHLR24(REG)	\
+	shlr16	REG;	\
+	shlr8	REG
+
+  #define SHLR21(REG)	\
+	shlr16	REG;	\
+	shll2	REG;	\
+	add	REG,REG;\
+	shlr8	REG
+
+  #define SHLL21(REG)	\
+	shll16	REG;	\
+	SHLL4 (REG);	\
+	add	REG,REG
+
+  #define SHLR11(REG)	\
+	shlr8	REG;	\
+	shlr2	REG;	\
+	shlr	REG
+
+  #define SHLR22(REG)	\
+	shlr16	REG;	\
+	shll2	REG;	\
+	shlr8	REG
+
+  #define SHLR23(REG)	\
+	shlr16	REG;	\
+	add	REG,REG;\
+	shlr8	REG
+
+  #define SHLR20(REG)	\
+	shlr16	REG;	\
+	SHLR4 (REG)
+
+  #define SHLL20(REG)	\
+	shll16	REG;	\
+	SHLL4 (REG)
+#define SHLD_COUNT(N,COUNT)
+#define SHLRN(N,COUNT,REG) SHLR##N(REG)
+#define SHLLN(N,COUNT,REG) SHLL##N(REG)
+#else
+#define SHLD_COUNT(N,COUNT) mov #N,COUNT
+#define SHLRN(N,COUNT,REG) shld COUNT,REG
+#define SHLLN(N,COUNT,REG) shld COUNT,REG
+#define DYN_SHIFT 1
+#endif
Index: gcc/config/sh/IEEE-754/m3/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/divsf3.S	(revision 0)
@@ -0,0 +1,365 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! divsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+! long 0th..3rd significant byte
+#ifdef __LITTLE_ENDIAN__
+#define L0SB	3
+#define L1SB	2
+#define L2SB	1
+#define L3SB	0
+#else
+#define L0SB	0
+#define L1SB	1
+#define L2SB	2
+#define L3SB	3
+#endif
+
+! clobbered: r0,r1,r2,r3,r6,r7,T (and for sh.md's purposes PR)
+!
+! Note: When the divisor is larger than the divident, we have to adjust the
+! exponent down by one.  We do this automatically when subtracting the entire
+! exponent/fraction bitstring as an integer, by means of the borrow from
+! bit 23 to bit 24.
+! Note: non-denormal rounding of a division result cannot cause fraction
+! overflow / exponent change. (r4 > r5 : fraction must stay in (2..1] interval;
+! r4 < r5: having an extra bit of precision available, even the smallest
+! possible difference of the result from one is rounded in all rounding modes
+! to a fraction smaller than one.)
+! sh4-200: 59 cycles
+! sh4-300: 44 cycles
+! tab indent: exponent / sign computations
+! tab+space indent: fraction computation
+FUNC(GLOBAL(divsf3))
+	.global GLOBAL(divsf3)
+	.balign	4
+GLOBAL(divsf3):
+	mov.l	LOCAL(x7f800000),r3
+	mov	#1,r2
+	mov	r4,r6
+	 shll8	 r6
+	mov	r5,r7
+	 shll8	 r7
+	rotr	r2
+	tst	r3,r4
+	or	r2,r6
+	bt/s	LOCAL(denorm_arg0)
+	or	r2,r7
+	tst	r3,r5
+	bt	LOCAL(denorm_arg1)
+	 shlr	 r6
+	mov.l	LOCAL(x3f000000),r3	! bias minus explict leading 1
+	 div0u
+LOCAL(denorm_done):
+	 div1	 r7,r6
+	mov.l	r8,@-r15
+	 bt	 0f
+	 div1	r7,r6
+0:	mov.l	r9,@-r15
+	 div1	 r7,r6
+	add	r4,r3
+	 div1	 r7,r6
+	sub	r5,r3	! result sign/exponent minus 1 if no overflow/underflow
+	 div1	 r7,r6
+	or	r3,r2
+	 div1	 r7,r6
+	mov.w	LOCAL(xff00),r9
+	 div1	 r7,r6
+	mov.l	r2,@-r15 ! L0SB is 0xff iff denorm / infinity exp is computed
+	 div1	 r7,r6
+	mov.w	LOCAL(m23),r2
+	 div1	 r7,r6
+	mov	r4,r0
+	 div1	 r7,r6
+	 extu.b	 r6,r1
+	 and	 r9,r6
+	 swap.w	 r1,r1	! first 8 bits of result fraction in bit 23..16
+	 div1	 r7,r6
+	shld	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L3SB,r15)	! 0xff iff divident was infinity / nan
+	 div1	 r7,r6
+	mov	r5,r0
+	 div1	 r7,r6
+	shld	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L2SB,r15)	! 0xff iff divisor was infinity / nan
+	 div1	 r7,r6
+	mov	r4,r0
+	 div1	 r7,r6
+	mov.w	LOCAL(m31),r2
+	 div1	 r7,r6
+	 extu.b	 r6,r8	! second 8 bits of result fraction in bit 7..0
+	 and	 r9,r6
+	mov.l	LOCAL(xff800000),r9
+	 div1	 r7,r6
+	xor	r5,r0	! msb := correct result sign
+	 div1	 r7,r6
+	xor	r3,r0	! xor with sign of result sign/exponent word
+	 div1	 r7,r6
+	shad	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L1SB,r15)	! 0xff	iff exponent over/underflows
+	and	r9,r3	! isolate sign / exponent
+	 mov.w	 LOCAL(xff01),r2
+	 div1	 r7,r6
+	swap.b	r8,r0	! second 8 bits of result fraction in bit 15..8
+	 div1	 r7,r6
+	or	r1,r0	! first 16 bits of result fraction in bit 23..8
+	 div1	 r7,r6
+	mov.w	LOCAL(m1),r9
+	 div1	 r7,r6
+	mov.l	@r15+,r8 ! load encoding of unusal exponent conditions
+	 and	 r6,r2	! rest | result lsb
+	 mov	 #0,r1
+	 bf	 0f	! bit below lsb clear -> no rounding
+	 cmp/hi	r1,r2
+0:	 extu.b	 r6,r1
+	 or	 r1,r0	! 24 bit result fraction with explicit leading 1
+	addc	r3,r0	! add in exponent / sign
+	cmp/str	r9,r8
+	! (no stall *here* for SH4-100 / SH4-200)
+	bt/s	LOCAL(inf_nan_denorm_zero)
+	mov.l	@r15+,r9
+	rts
+	mov.l	@r15+,r8
+
+/* The exponennt adjustment for denormal numbers is done by leaving an
+   adjusted value in r3; r4/r5 are not changed.  */
+	.balign	4
+LOCAL(denorm_arg0):
+	mov.w	LOCAL(xff00),r1
+	sub	r2,r6	! 0x800000000 : remove implict 1
+	tst	r6,r6
+	sts.l	pr,@-r15
+	bt	LOCAL(div_zero)
+	bsr	LOCAL(clz)
+	mov	r6,r0
+	shld	r0,r6
+	tst	r3,r5
+	mov.l	LOCAL(x3f800000),r3	! bias - 1 + 1
+	mov	#23,r1
+	shld	r1,r0
+	bt/s	LOCAL(denorm_arg1_2)
+	sub	r0,r3
+	 shlr	 r6
+	bra	LOCAL(denorm_done)
+	 div0u
+
+LOCAL(denorm_arg1):
+	mov.l	LOCAL(x3f000000),r3	! bias - 1
+LOCAL(denorm_arg1_2):
+	sub	r2,r7	! 0x800000000 : remove implict 1
+	mov.w	LOCAL(xff00),r1
+	tst	r7,r7
+	sts.l	pr,@-r15
+	bt	LOCAL(div_by_zero)
+	bsr	LOCAL(clz)
+	mov	r7,r0
+	shld	r0,r7
+	add	#-1,r0
+	mov	#23,r1
+	shld	r1,r0
+	add	r0,r3
+	 shlr	 r6
+	bra	LOCAL(denorm_done)
+	 div0u
+
+	.balign	4
+LOCAL(inf_nan_denorm_zero):
+! r0 has the rounded result, r6 has the non-rounded lowest bits & rest.
+! the bit just below the LSB of r6 is available as ~Q
+
+! Alternative way to get at ~Q:
+! if rounding took place, ~Q must be set.
+! if the rest appears to be zero, ~Q must be set.
+! if the rest appears to be nonzero, but rounding didn't take place,
+! ~Q must be clear;  the apparent rest will then require adjusting to test if 
+! the actual rest is nonzero.
+	mov	r0,r2
+	not	r8,r0
+	tst	#0xff,r0
+	shlr8	r0
+	mov.l	@r15+,r8
+	bt/s	LOCAL(div_inf_or_nan)
+	tst	#0xff,r0
+	mov	r4,r0
+	bt	LOCAL(div_by_inf_or_nan)
+	add	r0,r0
+	mov	r5,r1
+	add	r1,r1
+	cmp/hi	r1,r0
+	mov	r6,r0
+	bt	LOCAL(overflow)
+	sub	r2,r0
+	exts.b	r0,r0	! -1 if rounding took place
+	shlr8	r6	! isolate div1-mangled rest
+	addc	r2,r0	! generate carry if rounding took place
+	shlr8	r7
+	sub	r3,r0	! pre-rounding fraction
+	bt	0f ! going directly to denorm_sticky would cause mispredicts
+	tst	r6,r6	! rest can only be zero if lost bit was set
+0:	add	r7,r6	! (T ? corrupt : reconstruct) actual rest
+	bt	0f
+	cmp/pl	r6
+0:	mov.w	LOCAL(m24),r1
+	addc	r0,r0	! put in sticky bit
+	add	#-1,r3
+	mov.l	LOCAL(x40000000),r6
+	add	r3,r3
+	mov	r0,r2
+	shad	r1,r3	! exponent ; s32.0
+	!
+	shld	r3,r0
+	add	#30,r3
+	cmp/pl	r3
+	shld	r3,r2
+	bf	LOCAL(zero_nan)	! return zero
+	rotl	r2
+	cmp/hi	r6,r2
+	mov	#0,r7
+	addc	r7,r0
+	div0s	r4,r5
+	rts
+	rotcr	r0
+	
+! ????
+! undo normal rounding (lowest bits still in r6). then do denormal rounding.
+	
+LOCAL(overflow):
+	mov.l	LOCAL(xff000000),r0
+	div0s	r4,r5
+	rts
+	rotcl	r0
+	
+LOCAL(div_inf_or_nan):
+	mov	r4,r0
+	bra	LOCAL(nan_if_t)
+	add	r0,r0
+	
+LOCAL(div_by_inf_or_nan):
+	mov.l	LOCAL(xff000000),r1
+	mov	#0,r0
+	mov	r5,r2
+	add	r2,r2
+	bra	LOCAL(nan_if_t)
+	cmp/hi	r1,r2
+
+
+
+! still need to check for divide by zero or divide by nan
+! r3: 0x7f800000
+	.balign	4
+LOCAL(div_zero):
+	mov	r5,r1
+	add	r1,r1
+	tst	r1,r1	! 0 / 0 -> nan
+	not	r5,r1
+	bt	LOCAL(nan)
+	add	r3,r3
+	cmp/hi	r3,r1	! 0 / nan -> nan (but 0 / inf -> 0)
+LOCAL(zero_nan):
+	mov	#0,r0
+LOCAL(nan_if_t):
+	bf	0f:
+LOCAL(nan):
+	mov	#-1,r0
+0:	div0s	r4,r5	! compute sign
+	rts
+	rotcr	r0	! insert sign
+
+LOCAL(div_by_zero):
+	mov.l	LOCAL(xff000000),r0
+	mov	r5,r2
+	add	r2,r2
+	bra	LOCAL(nan_if_t)
+	cmp/hi	r0,r2
+	
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#8-8,r9
+	xtrct	r0,r8
+	add	#16,r9
+0:	tst	r1,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(m23):	.word -23
+LOCAL(m24):	.word -24
+LOCAL(m31):	.word -31
+LOCAL(xff01):	.word 0xff01
+	.balign	4
+LOCAL(xff000000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(xff00):	.word 0xff00
+LOCAL(m1):	.word -1
+#else
+LOCAL(m1):	.word -1
+LOCAL(xff00):	.word 0xff00
+#endif
+LOCAL(x7f800000): .long 0x7f800000
+LOCAL(x3f000000): .long 0x3f000000
+LOCAL(x3f800000): .long 0x3f800000
+LOCAL(xff800000): .long 0xff800000
+LOCAL(x40000000): .long 0x40000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(divsf3))
Index: gcc/config/sh/IEEE-754/m3/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3.S	(revision 0)
@@ -0,0 +1,608 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* y = 1/x  ; x (- [1,2)
+   y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+   y1 = y0 - ((y0) * x - 1) * y0  =  y-x*d^2
+   y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+   z0 = y2*a ;  a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+   z1 = y2*a1 (round to nearest odd 0.5 ulp);
+   a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+   z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+   Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+   with suitable scaling and/or top truncation.
+   We use a slightly modified algorithm here that checks if the lower
+   bits in z1 are sufficient to determine the outcome of rounding - in that
+   case a2 is not computed.
+   -z1 is computed in units of 1/128 ulp, with an error in the range
+   -0x3.e/128 .. +0 ulp.
+   Thus, after adding three, the result can be safely rounded for normal
+   numbers if any of the bits 5..2 is set, or if the highest guard bit
+   (bit 6 if y <1, otherwise bit 7) is set.
+   (Because of the way truncation works, we would be fine for an open
+    error interval of (-4/128..+1/128) ulp )
+   For denormal numbers, the rounding point lies higher, but it would be
+   quite cumbersome to calculate where exactly; it is sufficient if any
+   of the bits 7..3 is set.
+   x truncated to 20 bits is sufficient to calculate y0 or even y1.
+   Table entries are adjusted by about +128 to use full signed byte range.
+   This adjustment has been perturbed slightly to allow cse with the
+   shift count constant -26.
+   The threshold point for the shift adjust before rounding is found by
+   comparing the fractions, which is exact, unlike the top bit of y2.
+   Therefore, the top bit of y2 becomes slightly random after the adjustment
+   shift, but that's OK because this can happen only at the boundaries of
+   the interval, and the biasing of the error means that it can in fact happen
+   only at the bottom end.  And there, the carry propagation will make sure
+   that in the end we will have in effect an implicit 1 (or two whem rounding
+   up...)  */
+/* If an exact result exists, it can have no more bits than the divident.
+   Hence, we don't need to bother with the round-to-even tie breaker
+   unless the result is denormalized.  */
+/* 64 cycles through main path for sh4-300 (about 93.7% of normalized numbers),
+   82 for the path for rounding tie-breaking for normalized numbers
+   (including one branch mispredict).
+   Some cycles might be saved by more careful register allocation.  */
+
+#define x_h r12
+#define yn  r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too.  We still have to come back to denorm_arg1_done,
+   since we heven't done any of the work yet that we do till the denorm_arg0
+   entry point.  We know that neither of the arguments is inf/nan, but
+   arg0 might be zero.  Check for that first to avoid having to establish an
+   rts return address.  */
+LOCAL(both_denorm):
+	mov.l	r9,@-r15
+	mov	DBL0H,r1
+	mov.l	r0,@-r15
+	shll2	r1
+	mov.w LOCAL(both_denorm_cleanup_off),r9
+	or	DBL0L,r1
+	tst	r1,r1
+	mov	DBL0H,r0
+	bf/s	LOCAL(zero_denorm_arg0_1)
+	shll2	r0
+	mov.l	@(4,r15),r9
+	add	#8,r15
+	bra	LOCAL(ret_inf_nan_0)
+	mov	r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+	mov.l	@r15+,r0
+	!
+	mov.l	@r15+,r9
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+	bra	LOCAL(denorm_arg1_done)
+	!
+	add	r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+   (implicit 1).  To leave the result exponent unaltered, the other
+   argument's exponent is adjusted by the the shift count.  */
+
+	.balign 4
+LOCAL(arg0_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL0L,r0
+	shll	DBL0H
+	add	#1,r0
+	mov	DBL0L,DBL0H
+	shld	r0,DBL0H
+	rotcr	DBL0H
+	tst	DBL0L,DBL0L	/* Check for divide of zero.  */
+	add	#-33,r0
+	shld	r0,DBL0L
+	bf/s	LOCAL(adjust_arg1_exp)
+	add	#64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign.  */
+	mov.l	@r15+,r10
+	mov	#0,DBLRH
+	mov.l	@r15+,r9
+	bra	LOCAL(ret_inf_nan_0)
+	mov.l	@r15+,r8
+
+	.balign 4
+LOCAL(arg1_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL1L,r0
+	shll	DBL1H
+	add	#1,r0
+	mov	DBL1L,DBL1H
+	shld	r0,DBL1H
+	rotcr	DBL1H
+	tst	DBL1L,DBL1L	/* Check for divide by zero.  */
+	add	#-33,r0
+	shld	r0,DBL1L
+	bf/s	LOCAL(adjust_arg0_exp)
+	add	#64,r0
+	mov	DBL0H,r0
+	add	r0,r0
+	tst	r0,r0	! 0 / 0 ?
+	mov	#-1,DBLRH
+	bf	LOCAL(return_inf)
+	!
+	bt	LOCAL(ret_inf_nan_0)
+	!
+
+	.balign 4
+LOCAL(zero_denorm_arg1):
+	not	DBL0H,r3
+	mov	DBL1H,r0
+	tst	r2,r3
+	shll2	r0
+	bt	LOCAL(early_inf_nan_arg0)
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg1_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	!
+	shll	DBL1H
+	mov	DBL1L,r3
+	shld	r0,DBL1H
+	shld	r0,DBL1L
+	rotcr	DBL1H
+	add	#-32,r0
+	shld	r0,r3
+	add	#32,r0
+	or	r3,DBL1H
+LOCAL(adjust_arg0_exp):
+	tst	r2,DBL0H
+	mov	#20,r3
+	shld	r3,r0
+	bt	LOCAL(both_denorm)
+	add	DBL0H,r0
+	div0s	r0,DBL0H	! Check for obvious overflow.  */
+	not	r0,r3		! Check for more subtle overflow - lest
+	bt	LOCAL(return_inf)
+	mov	r0,DBL0H
+	tst	r2,r3		! we mistake it for NaN later
+	mov	#12,r3
+	bf	LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign.  */
+	mov	#20,r3
+	mov	#-2,DBLRH
+	bra	LOCAL(ret_inf_nan_0)
+	shad	r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan  nan/x -> nan */
+LOCAL(inf_nan_arg0):
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+LOCAL(early_inf_nan_arg0):
+	not	DBL1H,r3
+	mov	DBL0H,DBLRH
+	tst	r2,r3	! both inf/nan?
+	add	DBLRH,DBLRH
+	bf	LOCAL(ret_inf_nan_0)
+	mov	#-1,DBLRH
+LOCAL(ret_inf_nan_0):
+	mov	#0,DBLRL
+	mov.l	@r15+,r12
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+/* Already handled: inf/x, nan/x .  Thus: x/inf -> 0; x/nan -> nan */
+	.balign	4
+LOCAL(inf_nan_arg1):
+	mov	DBL1H,r2
+	mov	#12,r1
+	shld	r1,r2
+	mov.l	@r15+,r10
+	mov	#0,DBLRL
+	mov.l	@r15+,r9
+	or	DBL1L,r2
+	mov.l	@r15+,r8
+	cmp/hi	DBLRL,r2
+	mov.l	@r15+,r12
+	subc	DBLRH,DBLRH
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+	.balign 4
+LOCAL(zero_denorm_arg0):
+	mov.w	LOCAL(denorm_arg0_done_off),r9
+	not	DBL1H,r1
+	mov	DBL0H,r0
+	tst	r2,r1
+	shll2	r0
+	bt	LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg0_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	shll	DBL0H
+	mov	DBL0L,r12
+	shld	r0,DBL0H
+	shld	r0,DBL0L
+	rotcr	DBL0H
+	add	#-32,r0
+	shld	r0,r12
+	add	#32,r0
+	or	r12,DBL0H
+LOCAL(adjust_arg1_exp):
+	mov	#20,r12
+	shld	r12,r0
+	add	DBL1H,r0
+	div0s	r0,DBL1H	! Check for obvious underflow.  */
+	not	r0,r12		! Check for more subtle underflow - lest
+	bt	LOCAL(return_0)
+	mov	r0,DBL1H
+	tst	r2,r12		! we mistake it for NaN later
+	bt	LOCAL(return_0)
+	!
+	braf	r9
+	mov	#13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00):	.word 0xff00
+LOCAL(denorm_arg0_done_off):
+	.word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+	.word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign	8
+GLOBAL(divdf3):
+ mov.l	LOCAL(x7ff00000),r2
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst	r2,DBL1H
+ mov.l	r12,@-r15
+ bt	LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov	DBL1H,x_h	! x_h live in r12
+ shld	r3,x_h	! x - 1 ; u0.20
+ mov	x_h,yn
+ mova	LOCAL(ytab),r0
+ mov.l	r8,@-r15
+ shld	r1,yn	! x-1 ; u26.6
+ mov.b	@(r0,yn),yn
+ mov	#6,r0
+ mov.l	r9,@-r15
+ mov	x_h,r8
+ mov.l	r10,@-r15
+ shlr16	x_h	! x - 1; u16.16	! x/2 - 0.5 ; u15.17
+ add	x_h,r1	! SH4-200 single-issues this insn
+ shld	r0,yn
+ sub	r1,yn	! yn := y0 ; u15.17
+ mov	DBL1L,r1
+ mov	#-20,r10
+ mul.l	yn,x_h	! r12 dead
+ swap.w	yn,r9
+ shld	r10,r1
+ sts	macl,r0	! y0 * (x-1) - n ; u-1.32
+ add	r9,r0	! y0 * x - 1     ; s-1.32
+ tst	r2,DBL0H
+ dmuls.l r0,yn
+ mov.w	LOCAL(d13),r0
+ or	r1,r8	! x  - 1; u0.32
+ add	yn,yn	! yn = y0 ; u14.18
+ bt	LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):
+ sts	mach,r1	!      d0 ; s14.18
+ sub	r1,yn	! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov	DBL0L,r12
+ shld	r0,yn	! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w	LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld	r10,r12
+ mov	yn,r0
+ mov	DBL0H,r8
+ add	yn,yn	! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts	mach,r1	! y1 * (x-1); u1.31
+ add	r0,r1	! y1 * x    ; u1.31
+ dmulu.l yn,r1
+ not	DBL0H,r10
+ shld	r9,r8
+ tst	r2,r10
+ or	r8,r12	! a - 1; u0.32
+ bt	LOCAL(inf_nan_arg0)
+ sts	mach,r1	! d1+yn; u1.31
+ sett		! adjust y2 so that it can be interpreted as s1.31
+ not	DBL1H,r10
+ subc	r1,yn	! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l	LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst	r2,r10
+ or	DBL1H,r2
+ bt	LOCAL(inf_nan_arg1)
+ mov.l	r11,@-r15
+ sts	mach,r12	! y2*(a-1) ; u1.31
+ add	yn,r12		! z0       ; u1.31
+ dmulu.l r12,DBL1L
+ mov.l	LOCAL(x40000000),DBLRH ! bias + 1
+ and	r9,r2		! x ; u12.20
+ cmp/hi	DBL0L,DBL1L
+ sts	macl,r8
+ mov	#-24,r11
+ sts	mach,r9 	! r9:r8 := z0 * DBL1L; u-19.64
+ subc	DBL1H,DBLRH
+ mul.l	r12,r2  	! (r9+macl):r8 == z0*x; u-19.64
+ shll	r8
+ add	DBL0H,DBLRH	! result sign/exponent + 1
+ mov	r8,r10
+ sts	macl,DBLRL
+ add	DBLRL,r9
+ rotcl	r9		! r9:r8 := z*x; u-20.63
+ shld	r11,r10
+ mov.l	LOCAL(x7fe00000),DBLRL
+ sub	DBL0L,r9	! r9:r8 := -a ; u-20.63
+ cmp/pz	r9		! In corner cases this shift can loose ..
+ shll8	r9		!  .. the sign, so check it first.
+ mov.l	LOCAL(x00200000),r11
+ or	r10,r9	! -a1 ; s-28.32
+ mov.l	LOCAL(x00100000),r10
+ dmulu.l r9,yn	! sign for r9 is in T
+ xor	DBL0H,DBL1H	! calculate expected sign & bit20
+ mov.w	LOCAL(d120),DBL0H ! to test bits 6..4
+ xor	DBLRH,DBL1H
+ !
+ sts	mach,DBL0L	! -z1 ; s-27.32
+ bt 0f
+ sub	yn,DBL0L	! multiply adjust for -a1 negative; r3 dies here
+0:tst	r10,DBL1H		! set T if a >= x
+ mov.l LOCAL(xfff00000),r3
+ bt	0f
+ add	DBL0L,DBL0L	! z1 ; s-27.32 / s-28.32
+0:bt 0f
+ add	r12,r12	! z0 ; u1.31 / u0.31
+0:add	#6-64,DBL0L
+ and	r3,DBLRH	! isolate sign / exponent
+ tst	DBL0H,DBL0L
+ bf/s	LOCAL(exact)	! make the hot path taken for best branch prediction
+ cmp/pz	DBL1H
+
+! Unless we follow the next branch, we need to test which way the rounding
+! should go.
+! For normal numbers, we know that the result is not exact, so the sign
+! of the rest will be conclusive.
+! We generate a number that looks safely rounded so that denorm handling
+! can safely test the number twice.
+! r10:r8 == 0 will indicate if the number was exact, which can happen
+! when we come here for denormals to check a number that is close or
+! equal to a result in whole ulps.
+ bf	LOCAL(ret_denorm_inf)	! denorm or infinity, DBLRH has inverted sign
+ add	#64,DBL0L
+LOCAL(find_adjust): tst	r10,DBL1H ! set T if a >= x
+ mov	#-2,r10
+ addc	r10,r10
+ mov	DBL0L,DBLRL	! z1 ; s-27.32 / s-28.32 ; lower 4 bits unsafe.
+ shad	r10,DBLRL	! tentatively rounded z1 ; s-24.32
+ shll8	r8		! r9:r8 := -a1 ; s-28.64
+ clrt
+ dmuls.l DBLRL,DBL1L	! DBLRL signed, DBL1L unsigned
+ mov	r8,r10
+ shll16	r8		! r8  := lowpart  of -a1 ; s-44.48
+ xtrct	r9,r10		! r10 := highpart of -a1 ; s-44.48
+ !
+ sts	macl,r3
+ subc	r3,r8
+ sts	mach,r3
+ subc	r3,r10
+ cmp/pz	DBL1L
+ mul.l	DBLRL,r2
+ bt	0f
+ sub	DBLRL,r10	! adjust for signed/unsigned multiply
+0: mov.l	LOCAL(x7fe00000),DBLRL
+ mov	#-26,r2
+ sts	macl,r9
+ sub	r9,r10		! r10:r8 := -a2
+ add	#-64+16,DBL0L	! the denorm code negates this adj. for exact results
+ shld	r2,r10		! convert sign into adjustment in the range 32..63
+ sub	r10,DBL0L
+ cmp/pz	DBL1H
+
+ .balign 4
+LOCAL(exact):
+ bf	LOCAL(ret_denorm_inf)	! denorm or infinity, DBLRH has inverted sign
+ tst	DBLRL,DBLRH
+ bt	LOCAL(ret_denorm_inf)	! denorm, DBLRH has correct sign
+ mov	#-7,DBL1H
+ cmp/pz	DBL0L		! T is sign extension of z1
+ not	DBL0L,DBLRL
+ subc	r11,DBLRH	! calculate sign / exponent minus implicit 1 minus T
+ mov.l	@r15+,r11
+ mov.l	@r15+,r10
+ shad	DBL1H,DBLRL
+ mov.l	@r15+,r9
+ mov	#-11,DBL1H
+ mov	r12,r8		! z0 contributes to DBLRH and DBLRL
+ shld	DBL1H,r12
+ mov	#21,DBL1H
+ clrt
+ shld	DBL1H,r8
+ addc	r8,DBLRL
+ mov.l	@r15+,r8
+ addc	r12,DBLRH
+ rts
+ mov.l	@r15+,r12
+
+!	sign in DBLRH ^ DBL1H
+! If the last 7 bits are in the range 64..64+7, we might have an exact
+! value in the preceding bits - or we might not. For denorms, we need to
+! find out.
+! if r10:r8 is zero, we just have found out that there is an exact value.
+	.balign	4
+LOCAL(ret_denorm_inf):
+	mov	DBLRH,r3
+	add	r3,r3
+	div0s	DBL1H,r3
+	mov	#120,DBLRL
+	bt	LOCAL(ret_inf_late)
+	add	#64,DBL0L
+	tst	DBLRL,DBL0L
+	mov	#-21,DBLRL
+	bt	LOCAL(find_adjust)
+	or	r10,r8
+	tst	r8,r8		! check if find_adjust found an exact value.
+	shad	DBLRL,r3
+	bf	0f
+	add	#-16,DBL0L	! if yes, cancel adjustment
+0:	mov	#-8,DBLRL	! remove the three lowest (inexact) bits
+	and	DBLRL,DBL0L
+	add	#-2-11,r3	! shift count for denorm generation
+	mov	DBL0L,DBLRL
+	mov	#28,r2
+	mov.l	@r15+,r11
+	mov.l	@r15+,r10
+	shll2	DBLRL
+	mov.l	@r15+,r9
+	shld	r2,DBL0L
+	mov.l	@r15+,r8
+	mov	#-31,r2
+	cmp/ge	r2,r3
+	shll2	DBLRL
+	bt/s	0f
+	add	DBL0L,r12	! fraction in r12:DBLRL ; u1.63
+	negc	DBLRL,DBLRL	! T := DBLRL != 0
+	add	#31,r3
+	mov	r12,DBLRL
+	rotcl	DBLRL		! put in sticky bit
+	movt	r12
+	cmp/ge	r2,r3
+	bt/s	LOCAL(return_0_late)
+0:	div0s	DBL1H,DBLRH	! calculate sign
+	mov	r12,DBLRH
+	shld	r3,DBLRH
+	mov	DBLRL,r2
+	shld	r3,DBLRL
+	add	#32,r3
+	add	DBLRH,DBLRH
+	mov.l	LOCAL(x80000000),DBL1H
+	shld	r3,r12
+	rotcr	DBLRH		! combine sign with highpart
+	add	#-1,r3
+	shld	r3,r2
+	mov	#0,r3
+	rotl	r2
+	cmp/hi	DBL1H,r2
+	addc	r12,DBLRL
+	mov.l	@r15+,r12
+	rts
+	addc	r3,DBLRH
+
+LOCAL(ret_inf_late):
+	mov.l	@r15+,r11
+	mov.l	@r15+,r10
+	mov	DBLRH,DBL0H
+	mov.l	@r15+,r9
+	bra	LOCAL(return_inf)
+	mov.l	@r15+,r8
+
+LOCAL(return_0_late):
+	div0s	DBLRH,DBL1H
+	mov.l	@r15+,r12
+	mov	#0,DBLRH
+	rts
+	rotcr	DBLRH
+
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#21,r9
+	xtrct	r0,r8
+	add	#-16,r9
+0:	tst	r12,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#-8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(d1):	.word 1
+LOCAL(d12):	.word 12
+LOCAL(d13):	.word 13
+LOCAL(d120):	.word 120
+
+	.balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+        .byte   120, 105,  91,  78,  66,  54,  43,  33
+        .byte    24,  15,   8,   0,  -5, -12, -17, -22
+        .byte   -27, -31, -34, -37, -40, -42, -44, -45
+        .byte   -46, -46, -47, -46, -46, -45, -44, -42
+        .byte   -41, -39, -36, -34, -31, -28, -24, -20
+        .byte   -17, -12,  -8,  -4,   0,   5,  10,  16
+        .byte    21,  27,  33,  39,  45,  52,  58,  65
+        .byte    72,  79,  86,  93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssisf.S	(revision 0)
@@ -0,0 +1,94 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsisf))
+	.global GLOBAL(floatunsisf)
+	.balign	4
+GLOBAL(floatunsisf):
+	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r4,r1
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r4,r1
+	mov	#24,r2
+	bt	0f
+	mov	r4,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r1
+	mov	r4,r0
+	mov.l	LOCAL(x4a800000),r3	! bias + 23 - implicit 1
+	tst	r4,r4
+	bt	LOCAL(ret0)
+	!
+	sub	r1,r2
+	mov.l	LOCAL(x80000000),r1
+	shld	r2,r0
+	cmp/pz	r2
+	add	r3,r0
+	bt	LOCAL(noround)
+	add	#31,r2
+	shld	r2,r4
+	rotl	r4
+	add	#-31,r2
+	cmp/hi	r1,r4
+	mov	#0,r3
+	addc	r3,r0
+LOCAL(noround):
+	mov	#23,r1
+	shld	r1,r2
+	rts
+	sub	r2,r0
+LOCAL(ret0):
+	rts
+	nop
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsisf))
Index: gcc/config/sh/IEEE-754/m3/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssidf.S	(revision 0)
@@ -0,0 +1,96 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! floatunssidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsidf))
+	.global GLOBAL(floatunsidf)
+	.balign	4
+GLOBAL(floatunsidf):
+	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r4,r1
+	mov.w	LOCAL(0xff00),r3
+	cmp/eq	r4,r1
+	mov	#21,r2
+	bt	0f
+	mov	r4,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r5
+	mov	r4,DBLRL
+	mov.l	LOCAL(x41200000),r3	! bias + 20 - implicit 1
+	tst	r4,r4
+	mov	r4,DBLRH
+	bt	LOCAL(ret0)
+	sub	r5,r2
+	mov	r2,r5
+	shld	r2,DBLRH
+	cmp/pz	r2
+	add	r3,DBLRH
+	add	#32,r2
+	shld	r2,DBLRL
+	bf	0f
+	mov.w	LOCAL(d0),DBLRL
+0:	mov	#20,r2
+	shld	r2,r5
+	rts
+	sub	r5,DBLRH
+LOCAL(ret0):
+	mov	r4,DBLRL
+	rts
+	mov	r4,DBLRH
+
+LOCAL(0xff00):	.word  0xff00
+	.balign	4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0):	  .word 0
+		  .word 0x4120
+#else
+		  .word 0x4120
+LOCAL(d0):	  .word 0
+#endif
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsidf))
Index: gcc/config/sh/IEEE-754/m3/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixunsdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixunsdfsi.S	(revision 0)
@@ -0,0 +1,82 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!! fixunsdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixunsdfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get INT_MAX, for set sign bit, you get INT_MIN.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixunsdfsi)
+	FUNC(GLOBAL(fixunsdfsi))
+	.balign	4
+GLOBAL(fixunsdfsi):
+	mov.w	LOCAL(x413),r1	! bias + 20
+	mov	DBL0H,r0
+	shll	DBL0H
+	mov.l	LOCAL(mask),r3
+	mov	#-21,r2
+	shld	r2,DBL0H	! SH4-200 will start this insn in a new cycle
+	bt/s	LOCAL(ret0)
+	sub	r1,DBL0H
+	cmp/pl	DBL0H		! SH4-200 will start this insn in a new cycle
+	and	r3,r0
+	bf/s	LOCAL(ignore_low)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#11,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmax)
+	shld	DBL0H,DBL0L
+	rts
+	or	DBL0L,r0
+
+	.balign	8
+LOCAL(ignore_low):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	add	#1,r0
+	bf	0f
+LOCAL(ret0): mov #0,r0		! results in 0 return
+0:	rts
+	shld	DBL0H,r0
+
+LOCAL(retmax):
+	rts
+	mov	#-1,r0
+
+LOCAL(x413): .word 0x413
+
+	.balign 4
+LOCAL(mask): .long 0x000fffff
+	ENDFUNC(GLOBAL(fixunsdfsi))
+#endif /* L_fixunsdfsi */
Index: gcc/config/sh/IEEE-754/m3/divdf3-rt.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3-rt.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3-rt.S	(revision 0)
@@ -0,0 +1,519 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* This version is not quite finshed, since I've found that I can
+   get better average performance with a slightly altered algorithm.
+   Still, if you want a version for hard real time, this version here might
+   be a good starting point, since it has effectively no conditional
+   branches in the path that deals with normal numbers
+   (branches with zero offset are effectively conditional execution),
+   and thus it has a uniform execution time in this path.  */
+
+/* y = 1/x  ; x (- [1,2)
+   y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+   y1 = y0 - ((y0) * x - 1) * y0  =  y-x*d^2
+   y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+   z0 = y2*a ;  a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+   z1 = y2*a1 (round to nearest odd 0.5 ulp);
+   a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+   z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+   Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+   with suitable scaling and/or top truncation.
+   x truncated to 20 bits is sufficient to calculate y0 or even y1.
+   Table entries are adjusted by about +128 to use full signed byte range.
+   This adjustment has been perturbed slightly to allow cse with the
+   shift count constant -26.
+   The threshold point for the shift adjust before rounding is found by
+   comparing the fractions, which is exact, unlike the top bit of y2.
+   Therefore, the top bit of y2 becomes slightly random after the adjustment
+   shift, but that's OK because this can happen only at the boundaries of
+   the interval, and the baising of the error means that it can in fact happen
+   only at the bottom end.  And there, the carry propagation will make sure
+   that in the end we will have in effect an implicit 1 (or two whem rounding
+   up...)  */
+/* If an exact result exists, it can have no more bits than the divident.
+   Hence, we don't need to bother with the round-to-even tie breaker
+   unless the result is denormalized.  */
+/* 70 cycles through main path for sh4-300 .  Some cycles might be
+   saved by more careful register allocation.
+   122 cycles for sh4-200.  If execution time for sh4-200 is of concern,
+   a specially scheduled version makes sense.  */
+
+#define x_h r12
+#define yn  r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too.  We still have to come back to denorm_arg1_done,
+   since we heven't done any of the work yet that we do till the denorm_arg0
+   entry point.  We know that neither of the arguments is inf/nan, but
+   arg0 might be zero.  Check for that first to avoid having to establish an
+   rts return address.  */
+LOCAL(both_denorm):
+	mov.l	r9,@-r15
+	mov	DBL0H,r1
+	mov.l	r0,@-r15
+	shll2	r1
+	mov.w LOCAL(both_denorm_cleanup_off),r9
+	or	DBL0L,r1
+	tst	r1,r1
+	mov	DBL0H,r0
+	bf/s	LOCAL(zero_denorm_arg0_1)
+	shll2	r0
+	mov.l	@(4,r15),r9
+	add	#8,r15
+	bra	LOCAL(ret_inf_nan_0)
+	mov	r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+	mov.l	@r15+,r0
+	!
+	mov.l	@r15+,r9
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+	bra	LOCAL(denorm_arg1_done)
+	!
+	add	r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+   (implicit 1).  To leave the result exponent unaltered, the other
+   argument's exponent is adjusted by the the shift count.  */
+
+	.balign 4
+LOCAL(arg0_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL0L,r0
+	shll	DBL0H
+	add	#1,r0
+	mov	DBL0L,DBL0H
+	shld	r0,DBL0H
+	rotcr	DBL0H
+	tst	DBL0L,DBL0L	/* Check for divide of zero.  */
+	add	#-33,r0
+	shld	r0,DBL0L
+	bf/s	LOCAL(adjust_arg1_exp)
+	add	#64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign.  */
+	mov.l	@r15+,r10
+	mov	#0,DBLRH
+	mov.l	@r15+,r9
+	bra	LOCAL(ret_inf_nan_0)
+	mov.l	@r15+,r8
+
+	.balign 4
+LOCAL(arg1_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL1L,r0
+	shll	DBL1H
+	add	#1,r0
+	mov	DBL1L,DBL1H
+	shld	r0,DBL1H
+	rotcr	DBL1H
+	tst	DBL1L,DBL1L	/* Check for divide by zero.  */
+	add	#-33,r0
+	shld	r0,DBL1L
+	bf/s	LOCAL(adjust_arg0_exp)
+	add	#64,r0
+	mov	DBL0H,r0
+	add	r0,r0
+	tst	r0,r0	! 0 / 0 ?
+	mov	#-1,DBLRH
+	bf	LOCAL(return_inf)
+	!
+	bt	LOCAL(ret_inf_nan_0)
+	!
+
+	.balign 4
+LOCAL(zero_denorm_arg1):
+	not	DBL0H,r3
+	mov	DBL1H,r0
+	tst	r2,r3
+	shll2	r0
+	bt	LOCAL(early_inf_nan_arg0)
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg1_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	!
+	shll	DBL1H
+	mov	DBL1L,r3
+	shld	r0,DBL1H
+	shld	r0,DBL1L
+	rotcr	DBL1H
+	add	#-32,r0
+	shld	r0,r3
+	add	#32,r0
+	or	r3,DBL1H
+LOCAL(adjust_arg0_exp):
+	tst	r2,DBL0H
+	mov	#20,r3
+	shld	r3,r0
+	bt	LOCAL(both_denorm)
+	add	DBL0H,r0
+	div0s	r0,DBL0H	! Check for obvious overflow.  */
+	not	r0,r3		! Check for more subtle overflow - lest
+	bt	LOCAL(return_inf)
+	mov	r0,DBL0H
+	tst	r2,r3		! we mistake it for NaN later
+	mov	#12,r3
+	bf	LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign.  */
+	mov	#20,r3
+	mov	#-2,DBLRH
+	bra	LOCAL(ret_inf_nan_0)
+	shad	r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan  nan/x -> nan */
+LOCAL(inf_nan_arg0):
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+LOCAL(early_inf_nan_arg0):
+	not	DBL1H,r3
+	mov	DBL0H,DBLRH
+	tst	r2,r3	! both inf/nan?
+	add	DBLRH,DBLRH
+	bf	LOCAL(ret_inf_nan_0)
+	mov	#-1,DBLRH
+LOCAL(ret_inf_nan_0):
+	mov	#0,DBLRL
+	mov.l	@r15+,r12
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+/* Already handled: inf/x, nan/x .  Thus: x/inf -> 0; x/nan -> nan */
+	.balign	4
+LOCAL(inf_nan_arg1):
+	mov	DBL1H,r2
+	mov	#12,r1
+	shld	r1,r2
+	mov.l	@r15+,r10
+	mov	#0,DBLRL
+	mov.l	@r15+,r9
+	or	DBL1L,r2
+	mov.l	@r15+,r8
+	cmp/hi	DBLRL,r2
+	mov.l	@r15+,r12
+	subc	DBLRH,DBLRH
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+	.balign 4
+LOCAL(zero_denorm_arg0):
+	mov.w	LOCAL(denorm_arg0_done_off),r9
+	not	DBL1H,r1
+	mov	DBL0H,r0
+	tst	r2,r1
+	shll2	r0
+	bt	LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg0_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	shll	DBL0H
+	mov	DBL0L,r12
+	shld	r0,DBL0H
+	shld	r0,DBL0L
+	rotcr	DBL0H
+	add	#-32,r0
+	shld	r0,r12
+	add	#32,r0
+	or	r12,DBL0H
+LOCAL(adjust_arg1_exp):
+	mov	#20,r12
+	shld	r12,r0
+	add	DBL1H,r0
+	div0s	r0,DBL1H	! Check for obvious underflow.  */
+	not	r0,r12		! Check for more subtle underflow - lest
+	bt	LOCAL(return_0)
+	mov	r0,DBL1H
+	tst	r2,r12		! we mistake it for NaN later
+	bt	LOCAL(return_0)
+	!
+	braf	r9
+	mov	#13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00):	.word 0xff00
+LOCAL(denorm_arg0_done_off):
+	.word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+	.word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign	8
+GLOBAL(divdf3):
+ mov.l	LOCAL(x7ff00000),r2
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst	r2,DBL1H
+ mov.l	r12,@-r15
+ bt	LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov	DBL1H,x_h	! x_h live in r12
+ shld	r3,x_h	! x - 1 ; u0.20
+ mov	x_h,yn
+ mova	LOCAL(ytab),r0
+ mov.l	r8,@-r15
+ shld	r1,yn	! x-1 ; u26.6
+ mov.b	@(r0,yn),yn
+ mov	#6,r0
+ mov.l	r9,@-r15
+ mov	x_h,r8
+ mov.l	r10,@-r15
+ shlr16	x_h	! x - 1; u16.16	! x/2 - 0.5 ; u15.17
+ add	x_h,r1	! SH4-200 single-issues this insn
+ shld	r0,yn
+ sub	r1,yn	! yn := y0 ; u15.17
+ mov	DBL1L,r1
+ mov	#-20,r10
+ mul.l	yn,x_h	! r12 dead
+ swap.w	yn,r9
+ shld	r10,r1
+ sts	macl,r0	! y0 * (x-1) - n ; u-1.32
+ add	r9,r0	! y0 * x - 1     ; s-1.32
+ tst	r2,DBL0H
+ dmuls.l r0,yn
+ mov.w	LOCAL(d13),r0
+ or	r1,r8	! x  - 1; u0.32
+ add	yn,yn	! yn = y0 ; u14.18
+ bt	LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):	! This label must stay aligned.
+ sts	mach,r1	!      d0 ; s14.18
+ sub	r1,yn	! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov	DBL0L,r12
+ shld	r0,yn	! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w	LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld	r10,r12
+ mov	yn,r0
+ mov	DBL0H,r8
+ add	yn,yn	! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts	mach,r1	! y1 * (x-1); u1.31
+ add	r0,r1	! y1 * x    ; u1.31
+ dmulu.l yn,r1
+ not	DBL0H,r10
+ shld	r9,r8
+ tst	r2,r10
+ or	r8,r12	! a - 1; u0.32
+ bt	LOCAL(inf_nan_arg0)
+ sts	mach,r1	! d1+yn; u1.31
+ sett		! adjust y2 so that it can be interpreted as s1.31
+ not	DBL1H,r10
+ subc	r1,yn	! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l	LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst	r2,r10
+ or	DBL1H,r2
+ bt	LOCAL(inf_nan_arg1)
+ mov.l	r11,@-r15
+ sts	mach,r11	! y2*(a-1) ; u1.31
+ add	yn,r11		! z0       ; u1.31
+ dmulu.l r11,DBL1L
+ mov.l	LOCAL(x40000000),DBLRH	! bias + 1
+ and	r9,r2		! x ; u12.20
+ cmp/hi	DBL0L,DBL1L
+ sts	macl,r8
+ mov	#-24,r12
+ sts	mach,r9 	! r9:r8 := z0 * DBL1L; u-19.64
+ subc	DBL1H,DBLRH
+ mul.l	r11,r2  	! (r9+macl):r8 == z0*x; u-19.64
+ shll	r8
+ add	DBL0H,DBLRH	! result sign/exponent + 1
+ mov	r8,r10
+ sts	macl,DBLRL
+ add	DBLRL,r9
+ rotcl	r9		! r9:r8 := z*x; u-20.63
+ shld	r12,r10
+ mov.l	LOCAL(x7fe00000),DBLRL
+ sub	DBL0L,r9	! r9:r8 := -a ; u-20.63
+ mov.l	LOCAL(x00200000),r12
+FIXME: the following  shift might loose the sign.
+ shll8	r9
+ or	r10,r9	! -a1 ; s-28.32
+ mov.l	LOCAL(x00100000),r10
+ dmuls.l r9,yn	! r3 dead
+ mov	DBL1H,r3
+ mov.l LOCAL(xfff00000),DBL0L
+ xor	DBL0H,r3	! calculate expected sign & bit20
+ div0s	r3,DBLRH
+ xor	DBLRH,r3
+ bt	LOCAL(ret_denorm_inf)
+ tst	DBLRL,DBLRH
+ bt	LOCAL(ret_denorm)
+ sub	r12,DBLRH ! calculate sign / exponent minus implicit 1
+ tst	r10,r3	! set T if a >= x
+ sts	mach,r12! -z1 ; s-27.32
+ bt	0f
+ add	r11,r11	! z0 ; u1.31 / u0.31
+0: mov	#6,r3
+ negc	r3,r10 ! shift count := a >= x ? -7 : -6; T := 1
+ shll8	r8	! r9:r8 := -a1 ; s-28.64
+ shad	r10,r12	! -z1 ; truncate to s-20.32 / s-21.32
+ rotcl	r12	! -z1 ; s-21.32 / s-22.32 / round to odd 0.5 ulp ; T := sign
+ add	#20,r10
+ dmulu.l r12,DBL1L ! r12 signed, DBL1L unsigned
+ and	DBL0L,DBLRH	! isolate sign / exponent
+ shld	r10,r9
+ mov	r8,r3
+ shld	r10,r8
+ sts	macl,DBL0L
+ sts	mach,DBLRL
+ add	#-32,r10
+ shld	r10,r3
+ mul.l r12,r2
+ bf	0f	! adjustment for signed/unsigned multiply
+ sub	DBL1L,DBLRL	! DBL1L dead
+0: shar	r12	! -z1 ; truncate to s-20.32 / s-21.32
+ sts	macl,DBL1L
+ or	r3,r9	! r9:r8 := -a1 ;             s-41.64/s-42.64
+ !
+ cmp/hi	r8,DBL0L
+ add	DBLRL,DBL1L ! DBL1L:DBL0L := -z1*x ; s-41.64/s-42.64
+ subc	DBL1L,r9
+ not	r12,DBLRL ! z1, truncated to s-20.32 / s-21.32
+ shll	r9	! T :=  a2 > 0
+ mov	r11,r2
+ mov	#21,r7
+ shld	r7,r11
+ addc	r11,DBLRL
+ mov.l	@r15+,r11
+ mov.l	@r15+,r10
+ mov	#-11,r7
+ mov.l	@r15+,r9
+ shld	r7,r2
+ mov.l	@r15+,r8
+ addc	r2,DBLRH
+ rts
+ mov.l	@r15+,r12
+
+LOCAL(ret_denorm):
+	tst	r10,DBLRH
+	bra	LOCAL(denorm_have_count)
+	movt	DBLRH	! calculate shift count (off by 2)
+
+LOCAL(ret_denorm_inf):
+	mov	DBLRH,r12
+	add	r12,r12
+	cmp/pz	r12
+	mov	#-21,DBLRL
+	bt	LOCAL(ret_inf_late)
+	shld	DBLRL,DBLRH
+LOCAL(denorm_have_count):
+	add	#-2,DBLRH
+/* FIXME */
+	bra	LOCAL(return_0)
+	mov.l	@r15+,r11
+
+LOCAL(ret_inf_late):
+	mov.l	@r15+,r11
+	!
+	mov.l	@r15+,r10
+	!
+	mov.l	@r15+,r9
+	bra	LOCAL(return_inf)
+	mov.l	@r15+,r8
+
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#8-11,r9
+	xtrct	r0,r8
+	add	#16,r9
+0:	tst	r12,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(d1):	.word 1
+LOCAL(d12):	.word 12
+LOCAL(d13):	.word 13
+
+	.balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+        .byte   120, 105,  91,  78,  66,  54,  43,  33
+        .byte    24,  15,   8,   0,  -5, -12, -17, -22
+        .byte   -27, -31, -34, -37, -40, -42, -44, -45
+        .byte   -46, -46, -47, -46, -46, -45, -44, -42
+        .byte   -41, -39, -36, -34, -31, -28, -24, -20
+        .byte   -17, -12,  -8,  -4,   0,   5,  10,  16
+        .byte    21,  27,  33,  39,  45,  52,  58,  65
+        .byte    72,  79,  86,  93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/addsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/addsf3.S	(revision 0)
@@ -0,0 +1,290 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! addsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+#ifdef L_add_sub_sf3
+	.balign 4
+	.global GLOBAL(subsf3)
+	FUNC(GLOBAL(subsf3))
+	.global GLOBAL(addsf3)
+	FUNC(GLOBAL(addsf3))
+GLOBAL(subsf3):
+	cmp/pz	r5
+	add	r5,r5
+	rotcr	r5
+	.balign 4
+GLOBAL(addsf3):
+	mov.l	LOCAL(x7f800000),r3
+	mov	r4,r6
+	add	r6,r6
+	mov	r5,r7
+	add	r7,r7
+	mov	r4,r0
+	or	r3,r0
+	cmp/hi	r6,r7
+	mov	r5,r1
+	bf/s	LOCAL(r4_hs)
+	 or	r3,r1
+	cmp/eq	r5,r1
+	bt	LOCAL(ret_r5) /* sole Inf or NaN, return unchanged.  */
+	shll8	r0	! r4 fraction
+	shll8	r1	! r5 fraction
+	mov	r6,r3
+	mov	#-24,r2
+	mov	r7,r6
+	shld	r2,r6	! r5 exp
+	mov	r0,r7
+	shld	r2,r3	! r4 exp
+	tst	r6,r6
+	sub	r6,r3	! exp difference (negative or 0)
+	bt	LOCAL(denorm_r4)
+LOCAL(denorm_r4_done): ! r1: u1.31
+	shld	r3,r0	! Get 31 upper bits, including 8 guard bits
+	mov.l	LOCAL(xff000000),r2
+	add	#31,r3
+	mov.l	r5,@-r15 ! push result sign.
+	cmp/pl	r3	! r0 has no more than one bit set -> return arg 1
+	shld	r3,r7	! copy of lowest guard bit in r0 and lower guard bits
+	bf	LOCAL(ret_stack)
+	div0s	r4,r5
+	bf/s	LOCAL(add)
+	 cmp/pl	r7	/* Is LSB in r0 clear, but any lower guards bit set?  */
+	subc	r0,r1
+	mov.l	LOCAL(c__clz_tab),r7
+	tst	r2,r1
+	mov	#-24,r3
+	bf/s LOCAL(norm_r0)
+	 mov	r1,r0
+	extu.w	r1,r1
+	bra	LOCAL(norm_check2)
+	 cmp/eq	r0,r1
+LOCAL(ret_r5):
+	rts
+	mov	r5,r0
+LOCAL(ret_stack):
+	rts
+	mov.l	@r15+,r0
+
+/* We leave the numbers denormalized, but we change the bit position to be
+   consistent with normalized numbers.  This also removes the spurious
+   leading one that was inserted before.  */
+LOCAL(denorm_r4):
+	tst	r3,r3
+	bf/s	LOCAL(denorm_r4_done)
+	add	r0,r0
+	bra	LOCAL(denorm_r4_done)
+	add	r1,r1
+LOCAL(denorm_r5):
+	tst	r6,r6
+	add	r1,r1
+	bf	LOCAL(denorm_r5_done)
+	clrt
+	bra	LOCAL(denorm_r5_done)
+	add	r0,r0
+
+/* If the exponent differs by two or more, normalization is minimal, and
+   few guard bits are needed for an exact final result, so sticky guard
+   bit compresion before subtraction (or add) works fine.
+   If the exponent differs by one, only one extra guard bit is generated,
+   and effectively no guard bit compression takes place.  */
+
+	.balign	4
+LOCAL(r4_hs):
+	cmp/eq	r4,r0
+	mov	#-24,r3
+	bt	LOCAL(inf_nan_arg0)
+	shld	r3,r7
+	shll8	r0
+	tst	r7,r7
+	shll8	r1
+	mov.l	LOCAL(xff000000),r2
+	bt/s	LOCAL(denorm_r5)
+	shld	r3,r6
+LOCAL(denorm_r5_done):
+	mov	r1,r3
+	subc	r6,r7
+	bf	LOCAL(same_exp)
+	shld	r7,r1	/* Get 31 upper bits.  */
+	add	#31,r7
+	mov.l	r4,@-r15 ! push result sign.
+	cmp/pl	r7
+	shld	r7,r3
+	bf	LOCAL(ret_stack)
+	div0s	r4,r5
+	bf/s	LOCAL(add)
+	 cmp/pl	r3	/* Is LSB in r1 clear, but any lower guard bit set?  */
+	subc	r1,r0
+	mov.l	LOCAL(c__clz_tab),r7
+LOCAL(norm_check):
+	tst	r2,r0
+	mov	#-24,r3
+	bf LOCAL(norm_r0)
+	extu.w	r0,r1
+	cmp/eq	r0,r1
+LOCAL(norm_check2):
+	mov	#-8,r3
+	bt LOCAL(norm_r0)
+	mov	#-16,r3
+LOCAL(norm_r0):
+	mov	r0,r1
+	shld	r3,r0
+#ifdef __pic__
+	add	r0,r7
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r7),r7
+	add	#25,r3
+	add	#-9+1,r6
+	mov	r1,r0
+	sub	r7,r3
+	mov.l	LOCAL(xbfffffff),r7
+	sub	r3,r6	/* generate exp-1  */
+	mov.w	LOCAL(d24),r2
+	cmp/pz	r6	/* check exp > 0  */
+	shld	r3,r0	/* Leading 1 becomes +1 exp adjustment.  */
+	bf	LOCAL(zero_denorm)
+LOCAL(denorm_done):
+	add	#30,r3
+	shld	r3,r1
+	mov.w   LOCAL(m1),r3
+	tst	r7,r1	! clear T if rounding up
+	shld	r2,r6
+	subc	r3,r0	! round - overflow will boost exp adjustment to 2.
+	mov.l	@r15+,r2
+	add	r6,r0	! overflow will generate inf
+	cmp/ge	r2,r3	! get sign into T
+	rts
+	rotcr	r0
+LOCAL(ret_r4):
+	rts
+	mov	r4,r0
+
+/* At worst, we are shifting the number back in place where an incoming
+   denormal was.  Thus, the shifts won't get out of range.  They still
+   might generate a zero fraction, but that's OK, that makes it 0.  */
+LOCAL(zero_denorm):
+	add	r6,r3
+	mov	r1,r0
+	mov	#0,r6	/* leading one will become free (except for rounding) */
+	bra	LOCAL(denorm_done)
+	shld	r3,r0
+
+/* Handle abs(r4) >= abs(r5), same exponents specially so we don't need
+   check for a zero fraction in the main path.  */
+LOCAL(same_exp):
+	div0s	r4,r5
+	mov.l	r4,@-r15
+	bf	LOCAL(add)
+	cmp/eq	r1,r0
+	mov.l	LOCAL(c__clz_tab),r7
+	bf/s	LOCAL(norm_check)
+	 sub	r1,r0
+	rts	! zero difference -> return +zero
+	mov.l	@r15+,r1
+
+/* r2: 0xff000000 */
+LOCAL(add):
+	addc	r1,r0
+	mov.w	LOCAL(x2ff),r7
+	shll8	r6
+	bf/s	LOCAL(no_carry)
+	shll16	r6
+	tst	r7,r0
+	shlr8	r0
+	mov.l	@r15+,r3	! discard saved sign
+	subc	r2,r0
+	sett
+	addc	r6,r0
+	cmp/hs	r2,r0
+	bt/s	LOCAL(inf)
+	 div0s	r7,r4 /* Copy sign.  */
+	rts
+	rotcr	r0
+LOCAL(inf):
+	mov	r6,r0
+	rts
+	rotcr	r0
+LOCAL(no_carry):
+	mov.w	LOCAL(m1),r3
+	tst	r6,r6
+	bt	LOCAL(denorm_add)
+	add	r0,r0
+	tst	r7,r0		! check if lower guard bit set or round to even
+	shlr8	r0
+	mov.l	@r15+,r1	! discard saved sign
+	subc	r3,r0	! round ; overflow -> exp++
+	cmp/ge	r4,r3	/* Copy sign.  */
+	add	r6,r0	! overflow -> inf
+	rts
+	rotcr	r0
+
+LOCAL(denorm_add):
+	cmp/ge	r4,r3	/* Copy sign.  */
+	shlr8	r0
+	mov.l	@r15+,r1	! discard saved sign
+	rts
+	rotcr	r0
+
+LOCAL(inf_nan_arg0):
+	cmp/eq	r5,r1
+	bf	LOCAL(ret_r4)
+	div0s	r4,r5		/* Both are inf or NaN, check signs.  */
+	bt	LOCAL(ret_nan)	/* inf - inf, or NaN.  */
+	mov	r4,r0		! same sign; return NaN if either is NaN.
+	rts
+	or	r5,r0
+LOCAL(ret_nan):
+	rts
+	mov	#-1,r0
+
+LOCAL(d24):
+	.word	24
+LOCAL(x2ff):
+	.word	0x2ff
+LOCAL(m1):
+	.word	-1
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(xbfffffff):
+	.long	0xbfffffff
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(xfe000000):
+	.long	0xfe000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+
+	ENDFUNC(GLOBAL(addsf3))
+	ENDFUNC(GLOBAL(subsf3))
+#endif /* L_add_sub_sf3 */
Index: gcc/config/sh/IEEE-754/m3/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/adddf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/adddf3.S	(revision 0)
@@ -0,0 +1,587 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! adddf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4-200 without FPU, but can also be used for SH3.
+! Numbers with same sign are added in typically 37 cycles, worst case is
+! 43 cycles, unless there is an overflow, in which case the addition can
+! take up to takes 47 cycles.
+! Normal numbers with different sign are added in 56 (57 for PIC) cycles
+! or less on SH4.
+! If one of the inputs is a denormal, the worst case is 59 (60 for PIC)
+! cycles. (Two denormal inputs are faster than normal inputs, and
+! denormal outputs don't slow down computation).
+! Subtraction takes two cycles to negate the second input and then drops
+! through to addition.
+
+/* If the input exponents of a difference of two normalized numbers
+   differ by more than one, the output does not need to be adjusted
+   by more than one bit position.  Hence, it makes sense to ensure that
+   the shifts by 0 & 1 are handled quickly to reduce average and worst
+   case times.  */
+FUNC(GLOBAL(adddf3))
+FUNC(GLOBAL(subdf3))
+	.global	GLOBAL(adddf3)
+	.global	GLOBAL(subdf3)
+LOCAL(denorm_arg1):
+	bt	LOCAL(inf_nan_arg0)
+	tst	r0,r2
+	bt/s	LOCAL(denorm_both)
+	shlr	r1
+	mov.l	LOCAL(x00100000),r3
+	bra	LOCAL(denorm_arg1_done)
+	 sub	r2,r3
+
+! Handle denorm addition here because otherwise the ordinary addition would
+! have to check for denormal results.
+! Denormal subtraction could also be done faster, but the denorm subtraction
+! path here is still one cycles faster than the one for normalized input
+! numbers, and 16 instructions shorter than the fastest version.
+! Here we also generate +0.0 + +0.0 -> +0.0 ; -0.0 + -0.0 -> -0.0
+LOCAL(denorm_both):
+	div0s	DBL0H,DBL1H
+	mov.l	LOCAL(x800fffff),r9
+	bt/s	LOCAL(denorm_sub)
+	and	r1,DBL1H
+	and	r9,DBL0H
+	mov.l	@r15+,r9
+	mov	DBL0L,DBLRL
+	mov	DBL0H,DBLRH
+	addc	DBL1L,DBLRL
+	mov.l	@r15+,r8
+	rts
+	 addc	DBL1H,DBLRH
+
+! N.B., since subtraction also generates +0.0 for subtraction of numbers
+! with identical fractions, this also covers the +0.0 + -0.0 -> +0.0 /
+! -0.0 + +0.0 -> +0.0 cases.
+LOCAL(denorm_sub):
+	mov	DBL0H,r8	! tentative result sign
+	and	r1,DBL0H
+	bra	LOCAL(sub_same_exp)
+	 addc	r1,r2	! exponent++, clear T
+
+LOCAL(inf_nan_arg0):
+	mov	DBL0L,DBLRL
+	bra	LOCAL(pop_r8_r9)
+	 mov	DBL0H,DBLRH
+
+LOCAL(ret_arg0):
+	mov.l	LOCAL(x800fffff),DBLRH
+	mov	DBL0L,DBLRL
+	mov	r2,r3
+LOCAL(ret_arg):
+	mov.l	@r15+,r9
+	and	r8,DBLRH
+	mov.l	@r15+,r8
+	rts
+	or	r3,DBLRH
+
+	.balign	4
+GLOBAL(subdf3):
+	cmp/pz	DBL1H
+	add	DBL1H,DBL1H
+	rotcr	DBL1H
+	nop
+
+GLOBAL(adddf3):
+	mov.l	LOCAL(x7ff00000),r0
+	mov	DBL0H,r2
+	mov.l	LOCAL(x001fffff),r1
+	mov	DBL1H,r3
+	mov.l	r8,@-r15
+	and	r0,r2
+	mov.l	r9,@-r15
+	and	r0,r3
+	cmp/hi	r2,r3
+	or	r0,DBL0H
+	or	r0,DBL1H
+	bt	LOCAL(arg1_gt)
+	tst	r0,r3
+	mov	#-20,r9
+	bt/s	LOCAL(denorm_arg1)
+	cmp/hs	r0,r2
+	bt	LOCAL(inf_nan_arg0)
+	sub	r2,r3
+LOCAL(denorm_arg1_done):	! r2 is tentative result exponent
+	shad	r9,r3
+	mov.w	LOCAL(m32),r9
+	mov	DBL0H,r8	! tentative result sign
+	and	r1,DBL0H	! arg0 fraction
+	mov	DBL1H,r0	! the 'other' sign
+	and	r1,DBL1H	! arg1 fraction
+	cmp/ge	r9,r3
+	mov	DBL1H,r1
+	bf/s	LOCAL(large_shift_arg1)
+	 shld	r3,DBL1H
+LOCAL(small_shift_arg1):
+	mov	DBL1L,r9
+	shld	r3,DBL1L
+	tst	r3,r3
+	add	#32,r3
+	bt/s	LOCAL(same_exp)
+	 div0s	r8,r0	! compare signs
+	shld	r3,r1
+
+	or	r1,DBL1L
+	bf/s	LOCAL(add)
+	shld	r3,r9
+	clrt
+	negc	r9,r9
+	mov.l	LOCAL(x001f0000),r3
+LOCAL(sub_high):
+	mov	DBL0L,DBLRL
+	subc	DBL1L,DBLRL
+	mov	DBL0H,DBLRH
+	bra	LOCAL(subtract_done)
+	 subc	DBL1H,DBLRH
+
+LOCAL(large_shift_arg1):
+	mov.w	LOCAL(d0),r9
+	add	#64,r3
+	cmp/pl	r3
+	shld	r3,r1
+	bf	LOCAL(ret_arg0)
+	cmp/hi	r9,DBL1L
+	mov	DBL1H,DBL1L
+	mov	r9,DBL1H
+	addc	r1,r9
+
+	div0s	r8,r0	! compare signs
+
+	bf	LOCAL(add)
+	clrt
+	mov.l	LOCAL(x001f0000),r3
+	bra	LOCAL(sub_high)
+	 negc	r9,r9
+
+LOCAL(add_clr_r9):
+	mov	#0,r9
+LOCAL(add):
+	mov.l	LOCAL(x00200000),r3
+	addc	DBL1L,DBL0L
+	addc	DBL1H,DBL0H
+	mov.l	LOCAL(x80000000),r1
+	tst	r3,DBL0H
+	mov.l	LOCAL(x7fffffff),r3
+	mov	DBL0L,r0
+	bt/s	LOCAL(no_carry)
+	and	r1,r8
+	tst	r9,r9
+	bf	LOCAL(add_one)
+	tst	#2,r0
+LOCAL(add_one):
+	subc	r9,r9
+	sett
+	mov	r0,DBLRL
+	addc	r9,DBLRL
+	mov	DBL0H,DBLRH
+	addc	r9,DBLRH
+	shlr	DBLRH
+	mov.l	LOCAL(x7ff00000),r3
+	add	r2,DBLRH
+	mov.l	@r15+,r9
+	rotcr	DBLRL
+	cmp/hi	r3,DBLRH
+LOCAL(add_done):
+	bt	LOCAL(inf)
+LOCAL(or_sign):
+	or	r8,DBLRH
+	rts
+	 mov.l	@r15+,r8
+
+LOCAL(inf):
+	bra	LOCAL(or_sign)
+	 mov	r3,DBLRH
+
+LOCAL(pos_difference_0):
+	tst	r3,DBL0H
+	mov	DBL0L,DBLRL
+	mov.l	LOCAL(x80000000),DBL0L
+	mov	DBL0H,DBLRH
+	mov.l	LOCAL(x00100000),DBL0H
+	bt/s	LOCAL(long_norm)
+	and	DBL0L,r8
+	bra	LOCAL(norm_loop)
+	 not	DBL0L,r3
+
+LOCAL(same_exp):
+	bf	LOCAL(add_clr_r9)
+	clrt
+LOCAL(sub_same_exp):
+	subc	DBL1L,DBL0L
+	mov.l	LOCAL(x001f0000),r3
+	subc	DBL1H,DBL0H
+	mov.w	LOCAL(d0),r9
+	bf	LOCAL(pos_difference_0)
+	clrt
+	negc	DBL0L,DBLRL
+	mov.l	LOCAL(x80000000),DBL0L
+	negc	DBL0H,DBLRH
+	mov.l	LOCAL(x00100000),DBL0H
+	tst	r3,DBLRH
+	not	r8,r8
+	bt/s	LOCAL(long_norm)
+	and	DBL0L,r8
+	bra	LOCAL(norm_loop)
+	 not	DBL0L,r3
+
+LOCAL(large_shift_arg0):
+	add	#64,r2
+
+	mov	#0,r9
+	cmp/pl	r2
+	shld	r2,r1
+	bf	LOCAL(ret_arg1_exp_r3)
+	cmp/hi	r9,DBL0L
+	mov	DBL0H,DBL0L
+	mov	r9,DBL0H
+	addc	r1,r9
+	div0s	r8,r0	! compare signs
+	mov	r3,r2	! tentative result exponent
+	bf	LOCAL(add)
+	clrt
+	negc	r9,r9
+	bra	LOCAL(subtract_arg0_arg1_done)
+	 mov	DBL1L,DBLRL
+
+LOCAL(arg1_gt):
+	tst	r0,r2
+	mov	#-20,r9
+	bt/s	LOCAL(denorm_arg0)
+	cmp/hs	r0,r3
+	bt	LOCAL(inf_nan_arg1)
+	sub	r3,r2
+LOCAL(denorm_arg0_done):
+	shad	r9,r2
+	mov.w	LOCAL(m32),r9
+	mov	DBL1H,r8	! tentative result sign
+	and	r1,DBL1H
+	mov	DBL0H,r0	! the 'other' sign
+	and	r1,DBL0H
+	cmp/ge	r9,r2
+	mov	DBL0H,r1
+	shld	r2,DBL0H
+	bf	LOCAL(large_shift_arg0)
+	mov	DBL0L,r9
+	shld	r2,DBL0L
+	add	#32,r2
+	mov.l	r3,@-r15
+	shld	r2,r1
+	mov	r2,r3
+	div0s	r8,r0		! compare signs
+	mov.l	@r15+,r2	! tentative result exponent
+	shld	r3,r9
+	bf/s	LOCAL(add)
+	or	r1,DBL0L
+	clrt
+	negc	r9,r9
+	mov	DBL1L,DBLRL
+LOCAL(subtract_arg0_arg1_done):
+	subc	DBL0L,DBLRL
+	mov	DBL1H,DBLRH
+	mov.l	LOCAL(x001f0000),r3
+	subc	DBL0H,DBLRH
+/* Since the exponents were different, the difference is positive.  */
+/* Fall through */
+LOCAL(subtract_done):
+/* First check if a shift by a few bits is sufficient.  This not only
+   speeds up this case, but also alleviates the need for considering
+   lower bits from r9 or rounding in the other code.
+   Moreover, by handling the upper 1+4 bits of the fraction here, long_norm
+   can assume that DBLRH fits into 20 (20 < 16) bit.  */
+	tst	r3,DBLRH
+	mov.l	LOCAL(x80000000),r3
+	mov.l	LOCAL(x00100000),DBL0H
+	bt/s	LOCAL(long_norm)
+	and	r3,r8
+	mov.l	LOCAL(x7fffffff),r3
+LOCAL(norm_loop):	! Well, this used to be a loop...
+	tst	DBL0H,DBLRH
+	sub	DBL0H,r2
+	bf	LOCAL(norm_round)
+	shll	r9
+	rotcl	DBLRL
+
+	rotcl	DBLRH
+
+	tst	DBL0H,DBLRH
+	sub	DBL0H,r2
+	bf	LOCAL(norm_round)
+	shll	DBLRL
+	rotcl	DBLRH
+	mov.l	@r15+,r9
+	cmp/gt	r2,DBL0H
+	sub	DBL0H,r2
+LOCAL(norm_loop_1):
+	bt	LOCAL(denorm0_n)
+	tst	DBL0H,DBLRH
+	bf	LOCAL(norm_pack)
+	shll	DBLRL
+	rotcl	DBLRH	! clears T
+	bra	LOCAL(norm_loop_1)
+	 subc	DBL0H,r2
+
+LOCAL(no_carry):
+	shlr	r0
+	mov.l	LOCAL(x000fffff),DBLRH
+	addc	r3,r9
+	mov.w	LOCAL(d0),DBL1H
+	mov	DBL0L,DBLRL
+	and	DBL0H,DBLRH	! mask out implicit 1
+	mov.l	LOCAL(x7ff00000),r3
+	addc	DBL1H,DBLRL
+	addc	r2,DBLRH
+	mov.l	@r15+,r9
+	add	DBL1H,DBLRH	! fraction overflow -> exp increase
+	bra	LOCAL(add_done)
+	 cmp/hi	r3,DBLRH
+
+LOCAL(denorm_arg0):
+	bt	LOCAL(inf_nan_arg1)
+	mov.l	LOCAL(x00100000),r2
+	shlr	r1
+	bra	LOCAL(denorm_arg0_done)
+	 sub	r3,r2
+
+LOCAL(inf_nan_arg1):
+	mov	DBL1L,DBLRL
+	bra	LOCAL(pop_r8_r9)
+	 mov	DBL1H,DBLRH
+
+LOCAL(ret_arg1_exp_r3):
+	mov.l	LOCAL(x800fffff),DBLRH
+	bra	LOCAL(ret_arg)
+	 mov	DBL1L,DBLRL
+
+#ifdef __pic__
+	.balign 8
+#endif
+LOCAL(m32):
+	.word	-32
+LOCAL(d0):
+	.word	0
+#ifndef __pic__
+	.balign 8
+#endif
+! Because we had several bits of cancellations, we know that r9 contains
+! only one bit.
+! We'll normalize by shifting words so that DBLRH:DBLRL contains
+! the fraction with 0 < DBLRH <= 0x1fffff, then we shift DBLRH:DBLRL
+! up by 21 minus the number of non-zero bits in DBLRH.
+LOCAL(long_norm):
+	tst	DBLRH,DBLRH
+	mov.w	LOCAL(xff),DBL0L
+	mov	#21,r3
+	bf	LOCAL(long_norm_highset)
+	mov.l	LOCAL(x02100000),DBL1L	! shift 32, implicit 1
+	tst	DBLRL,DBLRL
+	extu.w	DBLRL,DBL0H
+	bt	LOCAL(zero_or_ulp)
+	mov	DBLRL,DBLRH
+	cmp/hi	DBL0H,DBLRL
+	bf	0f
+	mov.l	LOCAL(x01100000),DBL1L	! shift 16, implicit 1
+	clrt
+	shlr16  DBLRH
+	xtrct	DBLRL,r9
+	mov     DBLRH,DBL0H
+LOCAL(long_norm_ulp_done):
+0:	mov	r9,DBLRL	! DBLRH:DBLRL == fraction; DBL0H == DBLRH
+	subc	DBL1L,r2
+	bt	LOCAL(denorm1_b)
+#ifdef __pic__
+	mov.l	LOCAL(c__clz_tab),DBL1H
+LOCAL(long_norm_lookup):
+	mov	r0,r9
+	mova	LOCAL(c__clz_tab),r0
+	add	DBL1H,r0
+#else
+	mov	r0,r9
+LOCAL(long_norm_lookup):
+	mov.l	LOCAL(c__clz_tab),r0
+#endif /* __pic__ */
+	cmp/hi	DBL0L,DBL0H
+	bf	0f
+	shlr8	DBL0H
+0:	mov.b	@(r0,DBL0H),r0
+	bf	0f
+	add	#-8,r3
+0:	mov.w	LOCAL(d20),DBL0L
+	mov	#-20,DBL0H
+	clrt
+	sub	r0,r3
+	mov	r9,r0
+	mov	r3,DBL1H
+	shld	DBL0L,DBL1H
+	subc	DBL1H,r2
+	!
+	bf	LOCAL(no_denorm)
+	shad	DBL0H,r2
+	bra	LOCAL(denorm1_done)
+	add	r2,r3
+	
+LOCAL(norm_round):
+	cmp/pz	r2
+	mov	#0,DBL1H
+	bf	LOCAL(denorm0_1)
+	or	r8,r2
+	mov	DBLRL,DBL1L
+	shlr	DBL1L
+	addc	r3,r9
+	mov.l	@r15+,r9
+	addc	DBL1H,DBLRL	! round to even
+	mov.l	@r15+,r8
+	rts
+	 addc	r2,DBLRH
+
+LOCAL(norm_pack):
+	add	r8,DBLRH
+	mov.l	@r15+,r8
+	rts
+	add	r2,DBLRH
+
+LOCAL(denorm0_1):
+	mov.l	@r15+,r9
+	mov	r8,DBL0L
+	mov.l	@r15+,r8
+LOCAL(denorm0_shift):
+	shlr	DBLRH
+	rotcr	DBLRL
+
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(denorm0_n):
+	mov	r8,DBL0L
+	addc	DBL0H,r2
+	mov.l	@r15+,r8
+	bf	LOCAL(denorm0_shift)
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(no_denorm):
+	add	r2,r8		! add (exponent - 1) to sign
+
+LOCAL(denorm1_done):
+	shld	r3,DBLRH
+	mov	DBLRL,DBL0L
+	shld	r3,DBLRL
+
+	add	r8,DBLRH	! add in sign and (exponent - 1)
+	mov.l	@r15+,r9
+	add	#-32,r3
+	mov.l	@r15+,r8
+	shld	r3,DBL0L
+
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(long_norm_highset):
+	mov.l	LOCAL(x00200000),DBL1L	! shift 1, implicit 1
+	shll	r9
+	rotcl	DBLRL
+	mov	DBLRH,DBL0H
+	rotcl	DBLRH	! clears T
+#ifdef __pic__
+	mov.l	LOCAL(c__clz_tab),DBL1H
+#else
+	mov	r0,r9
+#endif /* __pic__ */
+	subc	DBL1L,r2
+	add	#-1,r3
+	bf	LOCAL(long_norm_lookup)
+LOCAL(denorm1_a):
+	shlr	DBLRH
+	rotcr	DBLRL
+	mov.l	@r15+,r9
+	or	r8,DBLRH
+
+	rts
+	mov.l	@r15+,r8
+
+	.balign	4
+LOCAL(denorm1_b):
+	mov	#-20,DBL0L
+	shad	DBL0L,r2
+	mov	DBLRH,DBL0L
+	shld	r2,DBLRH
+	shld	r2,DBLRL
+	or	r8,DBLRH
+	mov.l	@r15+,r9
+	add	#32,r2
+	mov.l	@r15+,r8
+	shld	r2,DBL0L
+	rts
+	or	DBL0L,DBLRL
+
+LOCAL(zero_or_ulp):
+	tst	r9,r9
+	bf	LOCAL(long_norm_ulp_done)
+	! return +0.0
+LOCAL(pop_r8_r9):
+	mov.l	@r15+,r9
+	rts
+	mov.l	@r15+,r8
+
+LOCAL(d20):
+	.word	20
+LOCAL(xff):
+	.word 0xff
+	.balign	4
+LOCAL(x7ff00000):
+	.long	0x7ff00000
+LOCAL(x001fffff):
+	.long	0x001fffff
+LOCAL(x80000000):
+	.long	0x80000000
+LOCAL(x000fffff):
+	.long	0x000fffff
+LOCAL(x800fffff):
+	.long	0x800fffff
+LOCAL(x001f0000):
+	.long	0x001f0000
+LOCAL(x00200000):
+	.long	0x00200000
+LOCAL(x7fffffff):
+	.long	0x7fffffff
+LOCAL(x00100000):
+	.long	0x00100000
+LOCAL(x02100000):
+	.long	0x02100000
+LOCAL(x01100000):
+	.long	0x01100000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(adddf3))
+ENDFUNC(GLOBAL(subdf3))
Index: gcc/config/sh/IEEE-754/m3/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/mulsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/mulsf3.S	(revision 0)
@@ -0,0 +1,246 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! mulsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+	.balign 4
+	.global GLOBAL(mulsf3)
+	FUNC(GLOBAL(mulsf3))
+GLOBAL(mulsf3):
+	mov.l	LOCAL(x7f800000),r1
+	not	r4,r2
+	mov	r4,r3
+	not	r5,r0
+	tst	r1,r2
+	or	r1,r3
+	bt/s	LOCAL(inf_nan_arg0)
+	 tst	r1,r0
+	bt	LOCAL(inf_nan_arg1)
+	tst	r1,r5
+	mov	r1,r2
+	shll8	r3
+	or	r5,r1
+	bt/s	LOCAL(zero_denorm_arg1)
+	 shll8	r1
+	tst	r2,r4
+	bt	LOCAL(zero_denorm_arg0)
+	dmulu.l	r3,r1
+	mov	r4,r0
+	and	r2,r0
+LOCAL(arg_norm):
+	and	r5,r2
+	mov.l	LOCAL(x3f800000),r3
+	sts	mach,r1
+	sub	r3,r0
+	sts	macl,r3
+	add	r2,r0
+	cmp/pz	r1
+	mov.w	LOCAL(x100),r2
+	bf/s	LOCAL(norm_frac)
+	 tst	r3,r3
+	shll2	r1	/* Shift one up, replace leading 1 with 0.  */
+	shlr	r1
+	tst	r3,r3
+LOCAL(norm_frac):
+	mov.w	LOCAL(mx80),r3
+	bf	LOCAL(round_frac)
+	tst	r2,r1
+LOCAL(round_frac):
+	mov.l	LOCAL(xff000000),r2
+	subc	r3,r1	/* Even overflow gives right result: exp++, frac=0.  */
+	shlr8	r1
+	add	r1,r0
+	shll	r0
+	bt	LOCAL(ill_exp)
+	tst	r2,r0
+	bt	LOCAL(denorm0)
+	cmp/hs	r2,r0
+	bt	LOCAL(inf)
+LOCAL(insert_sign):
+	div0s	r4,r5
+	rts
+	rotcr	r0
+LOCAL(denorm0):
+	sub	r2,r0
+	bra	LOCAL(insert_sign)
+	 shlr	r0
+LOCAL(zero_denorm_arg1):
+	mov.l	LOCAL(x60000000),r2	/* Check exp0 >= -64	*/
+	add	r1,r1
+	tst	r1,r1	/* arg1 == 0 ? */
+	mov	#0,r0
+	bt	LOCAL(insert_sign) /* argument 1 is zero ==> return 0  */
+	tst	r4,r2
+	bt	LOCAL(insert_sign) /* exp0 < -64  ==> return 0 */
+	mov.l	LOCAL(c__clz_tab),r0
+	mov	r3,r2
+	mov	r1,r3
+	bra	LOCAL(arg_normalize)
+	mov	r2,r1
+LOCAL(zero_denorm_arg0):
+	mov.l	LOCAL(x60000000),r2	/* Check exp1 >= -64	*/
+	add	r3,r3
+	tst	r3,r3	/* arg0 == 0 ? */
+	mov	#0,r0
+	bt	LOCAL(insert_sign) /* argument 0 is zero ==> return 0  */
+	tst	r5,r2
+	bt	LOCAL(insert_sign) /* exp1 < -64  ==> return 0 */
+	mov.l	LOCAL(c__clz_tab),r0
+LOCAL(arg_normalize):
+	mov.l	r7,@-r15
+	extu.w	r3,r7
+	cmp/eq	r3,r7
+	mov.l	LOCAL(xff000000),r7
+	mov	#-8,r2
+	bt	0f
+	tst	r7,r3
+	mov	#-16,r2
+	bt	0f
+	mov	#-24,r2
+0:
+	mov	r3,r7
+	shld	r2,r7
+#ifdef __pic__
+	add	r0,r7
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r7),r0
+	add	#32,r2
+	mov	r2,r7
+	mov	#23,r2
+	sub	r0,r7
+	mov.l	LOCAL(x7f800000),r0
+	shld	r7,r3
+	shld	r2,r7
+	mov	r0,r2
+	and	r4,r0
+	sub	r7,r0
+	mov.l	@r15+,r7
+	bra	LOCAL(arg_norm)
+	 dmulu.l	r3,r1
+#if 0 /* This is slightly slower, but could be used if table lookup causes
+         cache thrashing.  */
+	bt	LOCAL(insert_sign) /* exp1 < -64  ==> return 0 */
+	mov.l	LOCAL(xff000000),r2
+	mov	r4,r0
+LOCAL(arg_normalize):
+	tst	r2,r3
+	bf	LOCAL(arg_bit_norm)
+LOCAL(arg_byte_loop):
+	tst	r2,r3
+	add	r2,r0
+	shll8	r3
+	bt	LOCAL(arg_byte_loop)
+	add	r4,r0
+LOCAL(arg_bit_norm):
+	mov.l	LOCAL(x7f800000),r2
+	rotl	r3
+LOCAL(arg_bit_loop):
+	add	r2,r0
+	bf/s	LOCAL(arg_bit_loop)
+	 rotl	r3
+	rotr	r3
+	rotr	r3
+	sub	r2,r0
+	bra	LOCAL(arg_norm)
+	 dmulu.l	r3,r1
+#endif /* 0 */
+LOCAL(inf):
+	bra	LOCAL(insert_sign)
+	 mov	r2,r0
+LOCAL(inf_nan_arg0):
+	bt	LOCAL(inf_nan_both)
+	add	r0,r0
+	cmp/eq	#-1,r0	/* arg1 zero? -> NAN */
+	bt	LOCAL(insert_sign)
+	mov	r4,r0
+LOCAL(inf_insert_sign):
+	bra	LOCAL(insert_sign)
+	 add	r0,r0
+LOCAL(inf_nan_both):
+	mov	r4,r0
+	bra	LOCAL(inf_insert_sign)
+	 or	r5,r0
+LOCAL(inf_nan_arg1):
+	mov	r2,r0
+	add	r0,r0
+	cmp/eq	#-1,r0	/* arg0 zero? */
+	bt	LOCAL(insert_sign)
+	bra	LOCAL(inf_insert_sign)
+	 mov	r5,r0
+LOCAL(ill_exp):
+	cmp/pz	r0
+	mov	#-24,r3
+	bt	LOCAL(inf)
+	add	r1,r1
+	mov	r0,r2
+	sub	r1,r2	! remove fraction to get back pre-rounding exponent.
+	sts	mach,r0
+	sts	macl,r1
+	shad	r3,r2
+	mov	r0,r3
+	shld	r2,r0
+	add	#32,r2
+	cmp/pz	r2
+	shld	r2,r3
+	bf	LOCAL(zero)
+	or	r1,r3
+	mov	#-1,r1
+	tst	r3,r3
+	mov.w	LOCAL(x100),r3
+	bf/s	LOCAL(denorm_round_up)
+	mov	#-0x80,r1
+	tst	r3,r0
+LOCAL(denorm_round_up):
+	mov	#-7,r3
+	subc	r1,r0
+	bra	LOCAL(insert_sign)
+	 shld	r3,r0
+LOCAL(zero):
+	bra	LOCAL(insert_sign)
+	 mov #0,r0
+LOCAL(x100):
+	.word	0x100
+LOCAL(mx80):
+	.word	-0x80
+	.balign	4
+LOCAL(x7f800000):
+	.long 0x7f800000
+LOCAL(x3f800000):
+	.long 0x3f800000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x60000000):
+	.long	0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+	ENDFUNC(GLOBAL(mulsf3))
Index: gcc/config/sh/IEEE-754/m3/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsisf.S	(revision 0)
@@ -0,0 +1,106 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsisf))
+	.global GLOBAL(floatsisf)
+	.balign	4
+GLOBAL(floatsisf):
+	cmp/pz	r4
+	mov	r4,r5
+	bt	0f
+	neg	r4,r5
+0:	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r5,r1
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r5,r1
+	mov	#24,r2
+	bt	0f
+	mov	r5,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r1
+	cmp/pz	r4
+	mov.l	LOCAL(x4a800000),r3	! bias + 23 - implicit 1
+	bt	0f
+	mov.l	LOCAL(xca800000),r3	! sign + bias + 23 - implicit 1
+0:	mov	r5,r0
+	sub	r1,r2
+	mov.l	LOCAL(x80000000),r1
+	shld	r2,r0
+	cmp/pz	r2
+	add	r3,r0
+	bt	LOCAL(noround)
+	add	#31,r2
+	shld	r2,r5
+	add	#-31,r2
+	rotl	r5
+	cmp/hi	r1,r5
+	mov	#0,r3
+	addc	r3,r0
+	mov	#23,r1
+	shld	r1,r2
+	rts
+	sub	r2,r0
+	.balign	8
+LOCAL(noround):
+	mov	#23,r1
+	tst	r4,r4
+	shld	r1,r2
+	bt	LOCAL(ret0)
+	rts
+	sub	r2,r0
+LOCAL(ret0):
+	rts
+	mov	#0,r0
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(xca800000): .long 0xca800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsisf))
Index: gcc/config/sh/IEEE-754/m3/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/muldf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/muldf3.S	(revision 0)
@@ -0,0 +1,486 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! muldf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+! Normal numbers are multiplied in 53 or 54 cycles on SH4-200.
+
+FUNC(GLOBAL(muldf3))
+	.global GLOBAL(muldf3)
+LOCAL(inf_nan_denorm_or_zero_a):
+	mov.l	r8,@-r15
+	sub	r3,DBL0H	! isolate high fraction
+	mov.l	@(4,r15),r8	! original DBL0H (with sign & exp)
+	sub	r3,r1		! 0x7ff00000
+	mov.l	LOCAL(x60000000),r3
+	shll16	r2		! 0xffff0000
+	!			  no stall here for sh4-200
+	!
+	tst	r1,r8
+	mov.l	r0,@-r15
+	bf	LOCAL(inf_nan_a)
+	tst	r1,r0		! test for DBL1 inf, nan or small
+	bt	LOCAL(ret_inf_nan_zero)
+LOCAL(normalize_arg):
+	tst	DBL0H,DBL0H
+	bf	LOCAL(normalize_arg53)
+	tst	DBL0L,DBL0L
+	bt	LOCAL(a_zero)
+	tst	r2,DBL0L
+	mov	DBL0L,DBL0H
+	bt	LOCAL(normalize_arg16)
+	shlr16	DBL0H
+	mov.w	LOCAL(m15),r2	! 1-16
+	bra	LOCAL(normalize_arg48)
+	shll16	DBL0L
+
+LOCAL(normalize_arg53):
+	tst	r2,DBL0H
+	mov	#1,r2
+	bt	LOCAL(normalize_arg48)
+	mov	DBL0H,r1
+	shlr16	r1
+	bra	LOCAL(normalize_DBL0H)
+	mov	#21-16,r3
+
+LOCAL(normalize_arg16):
+	mov.w	LOCAL(m31),r2 ! 1-32
+	mov	#0,DBL0L
+LOCAL(normalize_arg48):
+	mov	DBL0H,r1
+	mov	#21,r3
+LOCAL(normalize_DBL0H):
+	extu.b	r1,r8
+	mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r8,r1
+	!
+	bt	0f
+	shlr8	r1
+0:
+#ifdef	__pic__
+	add	r0,r1
+
+	mova	LOCAL(c__clz_tab),r0
+
+#endif /* __pic__ */
+	mov.b	@(r0,r1),r8
+	mov	DBL0L,r1
+	mov.l	@r15+,r0
+	bt	0f
+	add	#-8,r3
+0:	clrt
+	sub	r8,r3
+	mov.w	LOCAL(d20),r8
+	shld	r3,DBL0H
+	shld	r3,DBL0L
+	sub	r3,r2
+	add	#-32,r3
+	shld	r3,r1
+	mov.l	LOCAL(x00100000),r3
+	or	r1,DBL0H
+	shld	r8,r2
+	mov.l	@r15+,r8
+	add	r2,DBL1H
+	mov.l	LOCAL(x001fffff),r2
+	dmulu.l	DBL0L,DBL1L
+	bra	LOCAL(arg_denorm_done)
+	or	r3,r0		! set implicit 1 bit
+
+LOCAL(a_zero):
+	mov.l	@(4,r15),r8
+	add	#8,r15
+LOCAL(zero):
+	mov	#0,DBLRH
+	bra	LOCAL(pop_ret)
+	mov	#0,DBLRL
+
+! both inf / nan -> result is nan if at least one is none, else inf.
+! BBL0 inf/nan, DBL1 zero   -> result is nan
+! DBL0 inf/nan, DBL1 finite -> result is DBL0 with sign adjustemnt
+LOCAL(inf_nan_a):
+	mov	r8,DBLRH
+	mov.l	@(4,r15),r8
+	add	#8,r15
+	tst	r1,r0	! arg1 inf/nan ?
+	mov	DBL0L,DBLRL
+	bt	LOCAL(both_inf_nan)
+	tst	DBL1L,DBL1L
+	mov	DBL1H,r1
+	bf	LOCAL(pop_ret)
+	add	r1,r1
+	tst	r1,r1
+	!
+	bf	LOCAL(pop_ret)
+LOCAL(nan):
+	mov	#-1,DBLRL
+	bra	LOCAL(pop_ret)
+	mov	#-1,DBLRH
+
+LOCAL(both_inf_nan):
+	or	DBL1L,DBLRL
+	bra	LOCAL(pop_ret)
+	or	DBL1H,DBLRH
+
+LOCAL(ret_inf_nan_zero):
+	tst	r1,r0
+	mov.l	@(4,r15),r8
+	or	DBL0L,DBL0H
+	bf/s	LOCAL(zero)
+	add	#8,r15
+	tst	DBL0H,DBL0H
+	bt	LOCAL(nan)
+LOCAL(inf_nan_b):
+	mov	DBL1L,DBLRL
+	mov	DBL1H,DBLRH
+LOCAL(pop_ret):
+	mov.l	@r15+,DBL0H
+	add	DBLRH,DBLRH
+
+
+	div0s	DBL0H,DBL1H
+
+	rts
+	rotcr	DBLRH
+
+	.balign	4
+/* Argument a has already been tested for being zero or denorm.
+   On the other side, we have to swap a and b so that we can share the
+   normalization code.
+   a: sign/exponent : @r15 fraction: DBL0H:DBL0L
+   b: sign/exponent: DBL1H fraction:    r0:DBL1L  */
+LOCAL(inf_nan_denorm_or_zero_b):
+	sub	r3,r1		! 0x7ff00000
+	mov.l	@r15,r2		! get original DBL0H
+	tst	r1,DBL1H
+	sub	r3,r0		! isolate high fraction
+	bf	LOCAL(inf_nan_b)
+	mov.l	DBL1H,@r15
+	mov	r0,DBL0H
+	mov.l	r8,@-r15
+	mov	r2,DBL1H
+	mov.l	LOCAL(0xffff0000),r2
+	mov.l	r1,@-r15
+	mov	DBL1L,r1
+	mov	DBL0L,DBL1L
+	bra	LOCAL(normalize_arg)
+	mov	r1,DBL0L
+
+LOCAL(d20):
+	.word	20
+LOCAL(m15):
+	.word	-15
+LOCAL(m31):
+	.word	-31
+LOCAL(xff):
+	.word	0xff
+
+	.balign	4
+LOCAL(0xffff0000): .word 0xffff0000
+
+	! calculate a (DBL0H:DBL0L) * b (DBL1H:DBL1L)
+	.balign	4
+GLOBAL(muldf3):
+	mov.l	LOCAL(xfff00000),r3
+	mov	DBL1H,r0
+	dmulu.l	DBL0L,DBL1L
+	mov.l	LOCAL(x7fe00000),r1
+	sub	r3,r0
+	mov.l	DBL0H,@-r15
+	sub	r3,DBL0H
+	tst	r1,DBL0H
+	or	r3,DBL0H
+	mov.l	LOCAL(x001fffff),r2
+	bt	LOCAL(inf_nan_denorm_or_zero_a)
+	tst	r1,r0
+	or	r3,r0		! r0:DBL1L    := b fraction ; u12.52
+	bt	LOCAL(inf_nan_denorm_or_zero_b) ! T clear on fall-through
+LOCAL(arg_denorm_done):
+	and	r2,r0		! r0:DBL1L    := b fraction ; u12.52
+	sts	macl,r3
+	sts	mach,r1
+	dmulu.l	DBL0L,r0
+	and	r2,DBL0H	! DBL0H:DBL0L := a fraction ; u12.52
+	mov.l	r8,@-r15
+	mov	#0,DBL0L
+	mov.l	r9,@-r15
+	sts	macl,r2
+	sts	mach,r8
+	dmulu.l	DBL0H,DBL1L
+	addc	r1,r2
+
+	addc	DBL0L,r8	! add T; clears T
+
+	sts	macl,r1
+	sts	mach,DBL1L
+	dmulu.l	DBL0H,r0
+	addc	r1,r2
+	mov.l	LOCAL(x7ff00000),DBL0H
+	addc	DBL1L,r8	! clears T
+	mov.l	@(8,r15),DBL1L	! a sign/exp w/fraction
+	sts	macl,DBLRL
+	sts	mach,DBLRH
+	and	DBL0H,DBL1L	! a exponent
+	mov.w	LOCAL(x200),r9
+	addc	r8,DBLRL
+	mov.l	LOCAL(x3ff00000),r8	! bias
+	addc	DBL0L,DBLRH	! add T
+	cmp/hi	DBL0L,r3	! 32 guard bits -> sticky: T := r3 != 0
+	movt	r3
+	tst	r9,DBLRH	! T := fraction < 2
+	or	r3,r2		! DBLRH:DBLRL:r2 := result fraction; u24.72
+	bt/s	LOCAL(shll12)
+	sub	r8,DBL1L
+	mov.l	LOCAL(x002fffff),r8
+	and	DBL1H,DBL0H	! b exponent
+	mov.l	LOCAL(x00100000),r9
+	add	DBL0H,DBL1L ! result exponent - 1
+	tst	r8,r2
+	mov.w	LOCAL(m20),r8
+	subc	DBL0L,r9
+	addc	r2,r9 ! r2 value is still needed for denormal rounding
+	mov.w	LOCAL(d11),DBL0L
+	rotcr	r9
+	clrt
+	shld	r8,r9
+	mov.w	LOCAL(m21),r8
+	mov	DBLRL,r3
+	shld	DBL0L,DBLRL
+	addc	r9,DBLRL
+	mov.l	@r15+,r9
+	shld	r8,r3
+	mov.l	@r15+,r8
+	shld	DBL0L,DBLRH
+	mov.l	@r15+,DBL0H
+	addc	r3,DBLRH
+	mov.l	LOCAL(x7ff00000),DBL0L
+	add	DBL1L,DBLRH	! implicit 1 adjusts exponent
+	mov.l	LOCAL(xffe00000),r3
+	cmp/hs	DBL0L,DBLRH
+	add	DBLRH,DBLRH
+	bt	LOCAL(ill_exp_11)
+	tst	r3,DBLRH
+	bt	LOCAL(denorm_exp0_11)
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+
+
+LOCAL(shll12):
+	mov.l	LOCAL(x0017ffff),r8
+	extu.b	DBLRH,DBLRH	! remove implicit 1.
+	mov.l	LOCAL(x00080000),r9
+	and	DBL1H,DBL0H	! b exponent
+	add	DBL0H,DBL1L	! result exponent
+	tst	r8,r2		! rounding adjust for lower guard ...
+	mov.w	LOCAL(m19),r8
+	subc	DBL0L,r9	! ... bits and round to even; clear T
+	addc	r2,r9 ! r2 value is still needed for denormal rounding
+	mov.w	LOCAL(d12),DBL0L
+	rotcr	r9
+	clrt
+	shld	r8,r9
+	mov.w	LOCAL(m20),r8
+	mov	DBLRL,r3
+	shld	DBL0L,DBLRL
+	addc	r9,DBLRL
+	mov.l	@r15+,r9
+	shld	r8,r3
+	mov.l	@r15+,r8
+	shld	DBL0L,DBLRH
+	mov.l	LOCAL(x7ff00000),DBL0L
+	addc	r3,DBLRH
+	mov.l	@r15+,DBL0H
+	add	DBL1L,DBLRH
+	mov.l	LOCAL(xffe00000),r3
+	cmp/hs	DBL0L,DBLRH
+	add	DBLRH,DBLRH
+	bt	LOCAL(ill_exp_12)
+	tst	r3,DBLRH
+	bt	LOCAL(denorm_exp0_12)
+LOCAL(insert_sign):
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+
+LOCAL(overflow):
+	mov	r3,DBLRH
+	mov	#0,DBLRL
+	bra	LOCAL(insert_sign)
+	mov.l	@r15+,r8
+
+LOCAL(denorm_exp0_11):
+	mov.l	r8,@-r15
+	mov	#-21,r8
+	mov.l	r9,@-r15
+	bra	LOCAL(denorm)
+	mov	#-2,DBL1L	! one for denormal, and one for sticky bit
+
+LOCAL(ill_exp_11):
+	mov	DBL1H,DBL1L
+	and	r3,DBL0L	! 0x7fe00000
+	add	DBL1L,DBL1L
+	mov.l	r8,@-r15
+	cmp/hi	DBL1L,DBL0L	! check if exp a was large
+	mov	#-20,DBL0L
+	bf	LOCAL(overflow)
+	mov	#-21,r8
+	mov	DBLRH,DBL1L
+	rotcr	DBL1L		! shift in negative sign
+	mov.l	r9,@-r15
+	shad	DBL0L,DBL1L	! exponent ; s32
+	bra	LOCAL(denorm)
+	add	#-2,DBL1L	! add one for denormal, and one for sticky bit
+
+LOCAL(denorm_exp0_12):
+	mov.l	r8,@-r15
+	mov	#-20,r8
+	mov.l	r9,@-r15
+	bra	LOCAL(denorm)
+	mov	#-2,DBL1L	! one for denormal, and one for sticky bit
+
+	.balign 4		! also aligns LOCAL(denorm)
+LOCAL(ill_exp_12):
+	and	r3,DBL0L	! 0x7fe00000
+	mov	DBL1H,DBL1L
+	add	DBL1L,DBL1L
+	mov.l	r8,@-r15
+	cmp/hi	DBL1L,DBL0L	! check if exp a was large
+	bf	LOCAL(overflow)
+	mov	DBLRH,DBL1L
+	rotcr	DBL1L		! shift in negative sign
+	mov	#-20,r8
+	shad	r8,DBL1L	! exponent ; s32
+	mov.l	r9,@-r15
+	add	#-2,DBL1L	! add one for denormal, and one for sticky bit
+LOCAL(denorm):
+	not	r3,r9		! 0x001fffff
+	mov.l	r10,@-r15
+	mov	r2,r10
+	shld	r8,r10	! 11 or 12 lower bit valid
+	and	r9,DBLRH ! Mask away vestiges of exponent.
+	add	#32,r8
+	sub	r3,DBLRH ! Make leading 1 explicit.
+	shld	r8,r2	! r10:r2 := unrounded result lowpart
+	shlr	DBLRH	! compensate for doubling at end of normal code
+	sub	DBLRL,r10	! reconstruct effect of previous rounding
+	exts.b	r10,r9
+	shad	r3,r10	! sign extension
+	mov	#0,r3
+	clrt
+	addc	r9,DBLRL	! Undo previous rounding.
+	mov.w	LOCAL(m32),r9
+	addc	r10,DBLRH
+	cmp/hi	r3,r2
+	rotcl	DBLRL	! fit in the rest of r2 as a sticky bit.
+	mov.l	@r15+,r10
+	rotcl	DBLRH
+	cmp/ge	r9,DBL1L
+	bt	LOCAL(small_norm_shift)
+	cmp/hi	r3,DBLRL
+	add	#32,DBL1L
+	movt	DBLRL
+	cmp/gt	r9,DBL1L
+	or	DBLRH,DBLRL
+	bt/s	LOCAL(small_norm_shift)
+	mov	r3,DBLRH
+	mov	r3,DBLRL	! exponent too negative to shift - return zero
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	.balign	4
+LOCAL(small_norm_shift):
+	mov	DBLRL,r2	! stash away guard bits
+	shld	DBL1L,DBLRL
+	mov	DBLRH,DBL0L
+	shld	DBL1L,DBLRH
+	mov.l	LOCAL(x7fffffff),r9
+	add	#32,DBL1L
+	shld	DBL1L,r2
+	shld	DBL1L,DBL0L
+	or	DBL0L,DBLRL
+	shlr	DBL0L
+	addc	r2,r9
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+	addc	r3,DBLRL
+	addc	r3,DBLRH
+	div0s	DBL0H,DBL1H
+	add	DBLRH,DBLRH
+	rts
+	rotcr	DBLRH
+
+
+LOCAL(x200):
+	.word 0x200
+LOCAL(m19):
+	.word	-19
+LOCAL(m20):
+	.word	-20
+LOCAL(m21):
+	.word	-21
+LOCAL(m32):
+	.word	-32
+LOCAL(d11):
+	.word	11
+LOCAL(d12):
+	.word	12
+	.balign	4
+LOCAL(x60000000):
+	.long	0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+LOCAL(xfff00000):
+	.long	0xfff00000
+LOCAL(x7fffffff):
+	.long	0x7fffffff
+LOCAL(x00100000):
+	.long	0x00100000
+LOCAL(x7fe00000):
+	.long	0x7fe00000
+LOCAL(x001fffff):
+	.long	0x001fffff
+LOCAL(x7ff00000):
+	.long	0x7ff00000
+LOCAL(x3ff00000):
+	.long	0x3ff00000
+LOCAL(x002fffff):
+	.long	0x002fffff
+LOCAL(xffe00000):
+	.long	0xffe00000
+LOCAL(x0017ffff):
+	.long	0x0017ffff
+LOCAL(x00080000):
+	.long	0x00080000
+ENDFUNC(GLOBAL(muldf3))
Index: gcc/config/sh/IEEE-754/m3/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsidf.S	(revision 0)
@@ -0,0 +1,103 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+! floatsidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsidf))
+	.global GLOBAL(floatsidf)
+	.balign	4
+GLOBAL(floatsidf):
+	tst	r4,r4
+	mov	r4,r1
+	bt	LOCAL(ret0)
+	cmp/pz	r4
+	bt	0f
+	neg	r4,r1
+0:	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r1,r5
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r1,r5
+	mov	#21,r2
+	bt	0f
+	mov	r1,r5
+	shlr16	r5
+	add	#-16,r2
+0:	tst	r3,r5	! 0xff00
+	bt	0f
+	shlr8	r5
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r5
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r5),r5
+	cmp/pz	r4
+	mov.l	LOCAL(x41200000),r3	! bias + 20 - implicit 1
+	bt	0f
+	mov.l	LOCAL(xc1200000),r3	! sign + bias + 20 - implicit 1
+0:	mov	r1,r0	! DBLRL & DBLRH
+	sub	r5,r2
+	mov	r2,r5
+	shld	r2,DBLRH
+	cmp/pz	r2
+	add	r3,DBLRH
+	add	#32,r2
+	shld	r2,DBLRL
+	bf	0f
+	mov.w	LOCAL(d0),DBLRL
+0:	mov	#20,r2
+	shld	r2,r5
+	rts
+	sub	r5,DBLRH
+LOCAL(ret0):
+	mov	#0,DBLRL
+	rts
+	mov	#0,DBLRH
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0):	  .word 0
+		  .word 0x4120
+#else
+		  .word 0x4120
+LOCAL(d0):	  .word 0
+#endif
+LOCAL(xc1200000): .long 0xc1200000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsidf))
Index: gcc/config/sh/IEEE-754/m3/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixdfsi.S	(revision 0)
@@ -0,0 +1,115 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!! fixdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixdfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get UINT_MAX, for set sign bit, you get 0.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixdfsi)
+	FUNC(GLOBAL(fixdfsi))
+	.balign	4
+GLOBAL(fixdfsi):
+	mov.w	LOCAL(x413),r1
+	mov	DBL0H,r0
+	shll	DBL0H
+	mov.l	LOCAL(mask),r3
+	mov	#-21,r2
+	shld	r2,DBL0H	! SH4-200 will start this insn in a new cycle
+	bt/s	LOCAL(neg)
+	sub	r1,DBL0H
+	cmp/pl	DBL0H		! SH4-200 will start this insn in a new cycle
+	and	r3,r0
+	bf/s	LOCAL(ignore_low)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#10,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmax)
+	shld	DBL0H,DBL0L
+	rts
+	or	DBL0L,r0
+
+	.balign	8
+LOCAL(ignore_low):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	bf	0f		! SH4-200 will start this insn in a new cycle
+	mov	#-31,DBL0H	! results in 0 return
+0:	add	#1,r0
+	rts
+	shld	DBL0H,r0
+
+	.balign 4
+LOCAL(neg):
+	cmp/pl	DBL0H
+	and	r3,r0
+	bf/s	LOCAL(ignore_low_neg)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#10,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmin)
+	shld	DBL0H,DBL0L
+	or	DBL0L,r0	! SH4-200 will start this insn in a new cycle
+	rts
+	neg	r0,r0
+
+	.balign 4
+LOCAL(ignore_low_neg):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	add	#1,r0
+	shld	DBL0H,r0
+	bf	0f
+	mov	#0,r0		! results in 0 return
+0:	rts
+	neg	r0,r0
+
+LOCAL(retmax):
+	mov	#-1,r0
+	rts
+	shlr	r0
+
+LOCAL(retmin):
+	mov	#1,r0
+	rts
+	rotr	r0
+
+LOCAL(x413): .word 0x413
+
+	.balign 4
+LOCAL(mask): .long 0x000fffff
+	ENDFUNC(GLOBAL(fixdfsi))
+#endif /* L_fixdfsi */
Index: gcc/config/sh/IEEE-754/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divdf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/divdf3.S	(revision 0)
@@ -0,0 +1,598 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!division of two double precision floating point numbers
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:dividend
+!
+!r6,r7:divisor
+!
+!Exit:
+!r0,r1:quotient
+
+!Notes: dividend is passed in regs r4 and r5 and divisor is passed in regs 
+!r6 and r7, quotient is returned in regs r0 and r1. dividend is referred as op1
+!and divisor as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align	5
+	.global	GLOBAL (divdf3)
+	FUNC (GLOBAL (divdf3))
+
+GLOBAL (divdf3):
+
+#ifdef  __LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r5,r4
+	mov	r1,r5
+
+        mov     r6,r1
+        mov     r7,r6
+        mov     r1,r7
+#endif
+	mov	r4,r2
+	mov.l	.L_inf,r1
+
+	and	r1,r2
+	mov.l   r8,@-r15
+
+	cmp/eq	r1,r2
+	mov     r6,r8
+
+	bt	.L_a_inv
+	and	r1,r8
+
+	cmp/eq	r1,r8
+	mov.l	.L_high_mant,r3
+
+	bf	.L_chk_zero
+	and	r6,r3
+
+	mov.l   .L_mask_sign,r8	
+	cmp/pl	r7
+
+	mov	r8,r0
+	bt	.L_ret_b	!op2=NaN,return op2
+
+	and	r4,r8
+	cmp/pl	r3
+
+	and	r6,r0
+	bt	.L_ret_b	!op2=NaN,return op2
+
+	xor     r8,r0           !op1=normal no,op2=Inf, return Zero
+	mov     #0,r1
+	
+#ifdef __LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_ret_b:
+	mov	r7,r1
+	mov     r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l   @r15+,r8
+
+.L_a_inv:
+	!chk if op1 is Inf or NaN
+	mov.l   .L_high_mant,r2
+	cmp/pl  r5
+
+	and	r4,r2
+	bt	.L_ret_a
+
+	and	r1,r8		!r1 contains infinity
+	cmp/pl	r2
+
+	bt	.L_ret_a
+	cmp/eq	r1,r8
+
+	mov	r1,DBLRH
+	add	DBLRH,DBLRH
+	bf	0f
+	mov	#-1,DBLRH	! Inf/Inf, return NaN.
+0:	div0s	r4,r6
+	mov.l   @r15+,r8	
+	rts
+	rotcr	DBLRH
+
+.L_ret_a:
+	!return op1
+	mov	r5,r1
+	mov	r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+        mov.l   @r15+,r8
+
+.L_chk_zero:
+	!chk if op1=0
+	mov.l   .L_mask_sign,r0
+        mov     r4,r3
+
+        and     r0,r3
+        shll    r4
+
+        and     r6,r0
+        shlr    r4
+
+        xor     r3,r0
+        shll    r6
+
+	shlr	r6
+	tst	r4,r4
+
+
+	bf      .L_op1_not_zero	
+	tst	r5,r5
+	
+        bf      .L_op1_not_zero
+	tst	r7,r7
+
+	mov.l   @r15+,r8
+	bf	.L_ret_zero
+
+	tst	r6,r6
+	bf	.L_ret_zero
+
+	rts
+	mov     #-1,DBLRH       !op1=op2=0, return NaN
+	
+.L_ret_zero:
+	!return zero
+	mov	r0,r1
+	rts
+#ifdef __LITTLE__ENDIAN
+	mov	#0,r0
+#else
+	mov	#0,r1		!op1=0,op2=normal no,return zero
+#endif
+
+.L_norm_b:
+	!normalize op2
+        shll    r7
+        mov.l   .L_imp_bit,r3
+
+        rotcl   r6
+        tst     r3,r6
+
+        add     #-1,r8
+        bt      .L_norm_b
+
+        bra     .L_divide
+        add     #1,r8
+
+.L_op1_not_zero:
+	!op1!=0, chk if op2=0
+	tst	r7,r7	
+	mov	r1,r3
+	
+	mov	#0,r1
+	bf	.L_normal_nos
+
+	tst	r6,r6
+	bf      .L_normal_nos
+
+	mov.l   @r15+,r8
+	or	r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	nop
+
+.L_normal_nos:
+	!op1 and op2 are normal nos
+	tst	r2,r2
+	mov	#-20,r1
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld    r1,r2
+#else
+	SHLR20 (r2)
+#endif
+	bt	.L_norm_a	!normalize dividend
+	
+.L_chk_b:
+	mov.l	r9,@-r15
+	tst	r8,r8
+
+        mov.l   .L_high_mant,r9
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r8
+#else
+        SHLR20 (r8)
+#endif
+				! T set -> normalize divisor
+	SL(bt,	.L_norm_b,
+	 and	r9,r4)
+
+.L_divide:
+	mov.l   .L_2047,r1
+	sub	r8,r2
+
+	mov.l	.L_1023,r8
+	and	r9,r6
+
+	!resultant exponent
+	add	r8,r2
+	!chk the exponent for overflow
+	cmp/ge	r1,r2
+	
+	mov.l	.L_imp_bit,r1
+	bt	.L_overflow
+	
+	mov	#0,r8
+	or	r1,r4
+	
+	or      r1,r6	
+	mov	#-24,r3
+
+	!chk if the divisor is 1(mantissa only)
+	cmp/eq	r8,r7
+	bf	.L_div2
+
+	cmp/eq	 r6,r1
+	bt	.L_den_one
+
+.L_div2:
+	!divide the mantissas
+	shll8	r4
+	mov	r5,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r9
+#else
+        SHLR24 (r9)
+#endif
+	shll8	r6
+
+	or	r9,r4
+	shll8   r5
+
+	mov	r7,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r9
+#else
+        SHLR24 (r9)
+#endif
+	mov	r8,r3
+	shll8	r7
+
+	or	r9,r6	
+	cmp/gt	r4,r6
+
+	mov	r3,r9
+	bt	.L_shift
+
+	cmp/eq	r4,r6
+	bf	.L_loop
+
+	cmp/gt	r5,r7
+	bf	.L_loop
+
+.L_shift:
+	add	#-1,r2
+	shll	r5
+	rotcl	r4
+
+.L_loop:
+	!actual division loop
+	cmp/gt	r6,r4
+	bt	.L_subtract
+
+	cmp/eq	r6,r4
+	bf	.L_skip
+
+	cmp/ge	r7,r5
+	bf	.L_skip
+
+.L_subtract:
+	clrt
+	subc	r7,r5
+	
+	or	r1,r8
+	subc	r6,r4
+
+.L_skip:
+	shlr	r1
+	shll	r5
+
+	rotcl	r4
+	cmp/eq	r1,r3
+
+	bf	.L_loop
+	mov.l	.L_imp_bit,r1
+
+	!chk if the divison was for the higher word of the quotient
+	tst	r1,r9
+	bf	.L_chk_exp
+
+	mov	r8,r9
+	mov.l   .L_mask_sign,r1
+
+	!divide for the lower word of the quotient
+	bra	.L_loop
+	mov	r3,r8
+
+.L_chk_exp:
+	!chk if the result needs to be denormalized
+	cmp/gt	r2,r3
+	bf	.L_round
+	mov     #-53,r7
+
+.L_underflow:
+	!denormalize the result
+	add	#1,r2
+	cmp/gt	r2,r7
+
+	or      r4,r5           !remainder
+	add	#-2,r2
+
+	mov	#32,r4
+	bt      .L_return_zero
+
+	add	r2,r4
+	cmp/ge	r3,r4
+
+	mov	r2,r7
+	mov	r3,r1
+
+	mov     #-54,r2
+	bt	.L_denorm
+	mov	#-32,r7
+
+.L_denorm:
+	shlr	r8
+	rotcr	r1
+
+	shll	r8
+	add     #1,r7
+
+	shlr	r9
+	rotcr	r8
+
+	cmp/eq	r3,r7
+	bf	.L_denorm
+
+	mov	r4,r7
+	cmp/eq	r2,r4
+
+	bt	.L_break
+	mov     r3,r6
+
+	cmp/gt	r7,r3
+	bf	.L_break
+
+	mov	r2,r4
+	mov	r1,r6
+
+	mov	r3,r1
+	bt	.L_denorm
+
+.L_break:
+	mov     #0,r2
+
+	cmp/gt	r1,r2
+
+	addc	r2,r8
+	mov.l   .L_comp_1,r4
+
+	addc	r3,r9		
+	or	r9,r0
+
+	cmp/eq	r5,r3
+	bf	.L_return	
+
+	cmp/eq	r3,r6
+	mov.l	.L_mask_sign,r7
+
+	bf	.L_return
+	cmp/eq	r7,r1
+
+	bf	.L_return
+	and	r4,r8
+
+.L_return:
+	mov.l	@r15+,r9
+	mov     r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_norm_a:
+        !normalize op1
+        shll    r5
+        mov.l   .L_imp_bit,r3
+
+        rotcl   r4
+        tst     r3,r4
+
+        add     #-1,r2
+        bt      .L_norm_a
+
+        bra     .L_chk_b
+        add     #1,r2
+
+.L_overflow:
+	!overflow, return inf
+	mov.l   .L_inf,r2
+#ifdef __LITTLE_ENDIAN__
+	or	r2,r1	
+	mov	#0,r0
+#else
+	or	r2,r0
+	mov	#0,r1
+#endif
+        mov.l   @r15+,r9
+        rts
+        mov.l   @r15+,r8
+
+.L_den_one:
+	!denominator=1, result=numerator
+        mov     r4,r9
+        mov   	#-53,r7
+
+	cmp/ge	r2,r8
+	mov	r8,r4
+
+	mov	r5,r8
+	mov	r4,r3
+
+	!chk the exponent for underflow
+	SL(bt,	.L_underflow,
+	 mov     r4,r5)
+
+	mov.l	.L_high_mant,r7
+        bra     .L_pack
+	mov     #20,r6
+
+.L_return_zero:
+	!return zero
+	mov	r3,r1
+	mov.l	@r15+,r9
+
+	rts
+	mov.l   @r15+,r8
+
+.L_round:
+	!apply rounding
+	cmp/eq	r4,r6
+	bt	.L_lower
+
+	clrt
+	subc    r6,r4
+
+	bra     .L_rounding
+	mov	r4,r6
+	
+.L_lower:
+	clrt
+	subc	r7,r5
+	mov	r5,r6
+	
+.L_rounding:
+	!apply rounding
+	mov.l   .L_invert,r1
+	mov	r3,r4
+
+	movt	r3
+	clrt
+	
+	not	r3,r3
+	and	r1,r3	
+
+	addc	r3,r8
+	mov.l   .L_high_mant,r7
+
+	addc	r4,r9
+	cmp/eq	r4,r6
+
+	mov.l   .L_comp_1,r3
+	SL (bf,	.L_pack,
+	 mov     #20,r6)
+	and	r3,r8
+
+.L_pack:
+	!pack the result, r2=exponent,r0=sign,r8=lower mantissa, r9=higher mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld    r6,r2
+#else
+        SHLL20 (r2)
+#endif
+	and	r7,r9
+
+	or	r2,r0
+	mov	r8,r1
+
+	or      r9,r0
+	mov.l	@r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+	.align	2
+
+.L_mask_sign:
+	.long	0x80000000
+.L_high_mant:
+	.long	0x000fffff
+.L_inf:
+	.long	0x7ff00000
+.L_1023:
+	.long	1023
+.L_2047:
+	.long	2047
+.L_imp_bit:
+	.long	0x00100000	
+.L_comp_1:
+	.long	0xfffffffe
+.L_invert:
+	.long	0x00000001
+
+ENDFUNC (GLOBAL (divdf3))
Index: gcc/config/sh/IEEE-754/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatunssisf.S	(revision 0)
@@ -0,0 +1,137 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of unsigned integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatunsisf)
+	FUNC (GLOBAL (floatunsisf))
+
+GLOBAL (floatunsisf):
+	tst	r4,r4
+	mov	#23,r6
+
+	mov.l	.L_set_24_bits,r7
+	SL(bt,	.L_return,
+	 not	r7,r3)
+
+	! Decide the direction for shifting
+	mov.l	.L_set_24_bit,r5
+	cmp/hi	r7,r4
+
+	not	r5,r2
+	SL(bt,	.L_shift_right,
+	 mov	#0,r7)
+
+	tst	r5,r4
+	
+	mov	#0,r0
+	bf	.L_pack_sf
+
+! Shift the bits to the left. Adjust the exponent
+.L_shift_left:
+	shll	r4
+	tst	r5,r4
+
+	add	#-1,r6
+	bt	.L_shift_left
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa
+.L_pack_sf:
+	mov	#23,r3
+	add	#127,r6
+
+	! Align the exponent
+	and	r2,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+        SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+
+	or	r6,r0
+	rts
+	or	r4,r0
+
+! Shift right the number with rounding
+.L_shift_right:
+	shlr	r4
+	rotcr	r7
+
+	tst	r4,r3
+	add	#1,r6
+
+	bf	.L_shift_right
+	
+	tst	r7,r7
+	bt	.L_sh_rt_1
+
+	shll	r7
+	movt	r1
+
+	add	r1,r4
+
+	tst	r7,r7
+	bf	.L_sh_rt_1
+
+	! Halfway between two numbers.
+	! Round towards LSB = 0
+	shlr	r4
+	shll	r4
+
+.L_sh_rt_1:
+	mov	r4,r0
+
+	! Rounding may have misplaced MSB. Adjust.
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	bf	.L_shift_right
+	bt	.L_pack_sf
+
+.L_return:
+	rts
+	mov	r4,r0
+
+	.align 2
+.L_set_24_bit:
+	.long 0x00800000
+
+.L_set_24_bits:
+	.long 0x00FFFFFF
+
+ENDFUNC (GLOBAL (floatunsisf))
Index: gcc/config/sh/IEEE-754/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunsdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixunsdfsi.S	(revision 0)
@@ -0,0 +1,181 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to unsigned integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global GLOBAL (fixunsdfsi)
+	FUNC (GLOBAL (fixunsdfsi))
+
+GLOBAL (fixunsdfsi):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+#endif
+	mov.l	.L_p_inf,r2
+	mov     #-20,r1
+	
+	mov	r2,r7
+	mov.l   .L_1023,r3
+
+	and	r4,r2
+	shll    r4
+
+        movt    r6		! r6 contains the sign bit
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r2           ! r2 contains the exponent
+#else
+        SHLR20 (r2)
+#endif
+	shlr    r4
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r7
+#else
+        SHLR20 (r7)
+#endif
+	tst	r6,r6	
+	SL(bf,	.L_epil,
+	 mov	#0,r0)
+
+	cmp/hi	r2,r3		! if exp < 1023,return 0
+	mov.l	.L_high_mant,r1
+
+	SL(bt,	.L_epil,
+	 and	r4,r1)		! r1 contains high mantissa
+
+	cmp/eq	r2,r7		! chk if exp is invalid
+	mov.l	.L_1054,r7
+
+	bt	.L_inv_exp
+	mov	#11,r0
+	
+	cmp/hi	r7,r2		! If exp > 1054,return maxint
+	sub     r2,r7		!r7 contains the number of shifts
+
+	mov.l	.L_21bit,r2
+	bt	.L_ret_max
+
+	or	r2,r1
+	mov	r7,r3
+
+	shll8   r1
+	neg     r7,r7
+
+	shll2	r1
+
+        shll	r1
+	cmp/hi	r3,r0
+
+	SL(bt,	.L_lower_mant,
+	 mov	#21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_sh_loop:
+        tst	r7,r7
+        bt      .L_break
+        add     #1,r7
+        bra     .L_sh_loop
+        shlr    r1
+
+.L_break:
+#endif
+	rts
+	mov     r1,r0
+
+.L_lower_mant:
+	neg	r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r0,r5
+#else
+        SHLR21 (r5)
+#endif
+	or	r5,r1		!pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_loop:
+        tst	r7,r7
+        bt      .L_break1
+        add     #1,r7
+        bra     .L_loop
+        shlr    r1
+
+.L_break1:
+#endif
+	mov	r1,r0
+.L_epil:
+	rts
+	nop
+
+.L_inv_exp:
+	cmp/hi	r0,r5
+	bt	.L_epil
+
+	cmp/hi	r0,r1		!compare high mantissa,r1
+	bt	.L_epil
+
+.L_ret_max:
+	mov.l   .L_maxint,r0
+
+	rts
+	nop
+
+	.align	2
+
+.L_maxint:
+	.long	0xffffffff
+.L_p_inf:
+	.long	0x7ff00000
+.L_high_mant:
+	.long	0x000fffff
+.L_1023:
+	.long	0x000003ff
+.L_1054:
+	.long	1054
+.L_21bit:
+	.long	0x00100000
+
+ENDFUNC (GLOBAL (fixunsdfsi))
Index: gcc/config/sh/IEEE-754/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/adddf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/adddf3.S	(revision 0)
@@ -0,0 +1,799 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for adding two double numbers
+
+! Author: Rakesh Kumar
+! SH1 Support by Joern Rennecke
+! Sticky Bit handling : Joern Rennecke
+
+! Arguments: r4-r5, r6-r7
+! Result: r0-r1
+
+! The value in r4-r5 is referred to as op1
+! and that in r6-r7 is referred to as op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+        .align 5
+	.global	GLOBAL (subdf3)
+	FUNC (GLOBAL (subdf3))
+        .global GLOBAL (adddf3)
+	FUNC (GLOBAL (adddf3))
+
+GLOBAL (subdf3):
+#ifdef __LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r6,r2
+
+	mov	r5,r4
+	mov	r7,r6
+
+	mov	r1,r5
+	mov	r2,r7
+#endif
+	mov.l	.L_sign,r2
+	bra	.L_adddf3_1
+	xor	r2,r6
+
+GLOBAL (adddf3):
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r6,r2
+
+	mov	r5,r4
+	mov	r7,r6
+
+	mov	r1,r5
+	mov	r2,r7
+#endif
+	
+.L_adddf3_1:
+	mov.l	r8,@-r15
+	mov	r4,r1
+
+	mov.l 	.L_inf,r2
+	mov	r6,r3
+
+	mov.l	r9,@-r15
+	and	r2,r1		!Exponent of op1 in r1
+
+	mov.l	r10,@-r15
+	and	r2,r3		!Exponent of op2 in r3
+
+	! Check for Nan or Infinity
+	mov.l	.L_sign,r9
+	cmp/eq	r2,r1
+
+	mov	r9,r10
+	bt	.L_thread_inv_exp_op1
+
+	mov	r9,r0
+	cmp/eq	r2,r3
+! op1 has a valid exponent. We need not check it again.
+! Return op2 straight away.
+	and	r4,r9		!r9 has sign bit for op1
+	bt	.L_ret_op2
+
+	! Check for -ve zero
+	cmp/eq	r4,r0
+	and	r6,r10		!r10 has sign bit for op2
+
+	bt	.L_op1_nzero
+
+	cmp/eq	r6,r0
+	bt	.L_op2_nzero
+
+! Check for zero
+.L_non_zero:
+	tst	r4,r4
+	bt	.L_op1_zero
+
+	! op1 is not zero, check op2 for zero
+	tst	r6,r6
+	bt	.L_op2_zero
+
+! r1 and r3 has masked out exponents, r9 and r10 has signs
+.L_add:
+	mov.l	.L_high_mant,r8
+	mov	#-20,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r1		! r1 now has exponent for op1 in its lower bits
+#else
+	SHLR20 (r1)
+#endif
+	and	r8,r6	! Higher bits of mantissa of op2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3		! r3 has exponent for op2 in its lower bits
+#else
+	SHLR20 (r3)
+#endif
+	and	r8,r4	! Higher bits of mantissa of op1
+
+	mov.l	.L_21bit,r8
+
+	tst	r1,r1
+	bt	.L_norm_op1
+
+	! Set the 21st bit.
+	or	r8,r4
+	tst	r3,r3
+
+	bt	.L_norm_op2
+	or	r8,r6
+
+! Check for negative mantissas. Make them positive by negation
+! r9 and r10 have signs of op1 and op2 respectively
+.L_neg_mant:
+	tst	r9,r9
+	bf	.L_neg_op1
+
+	tst	r10,r10
+	bf	.L_neg_op2
+
+.L_add_1:
+	cmp/ge	r1,r3
+
+	mov	r1,r0
+	bt	.L_op2_exp_greater
+
+	sub	r3,r0
+	! If exponent difference is greater than 54, the resultant exponent
+	! won't be changed. Return op1 straight away.
+	mov	#54,r2
+	cmp/gt	r2,r0
+
+	bt	.L_pack_op1
+
+	mov	r1,r3
+	clrt
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+
+	! Shift left the first operand and apply rest of shifts to second operand.
+	mov	#0,r2
+	shll	r5
+
+	rotcl	r4
+
+	add	#-1,r3
+	dt	r0
+
+	bt	.L_add_mant
+	dt	r0
+
+	bt	LOCAL(got_guard)
+	dt	r0
+
+	bt	LOCAL(got_sticky)
+
+! Shift the mantissa part of op2 so that both exponents are equal
+.L_shfrac_op2:
+	shar	r6
+	or	r7,r2	! sticky bit
+
+	rotcr	r7
+	dt	r0
+
+	bf	.L_shfrac_op2
+
+	shlr	r2
+
+	subc	r2,r2	! spread sticky bit across r2
+LOCAL(got_sticky):
+	shar	r6
+
+	rotcr	r7
+
+	rotcr	r2
+LOCAL(got_guard):
+	shar	r6
+
+	rotcr	r7
+
+	rotcr	r2
+
+
+! Add the psotive mantissas and check for overflow by checking the
+! MSB of the resultant. In case of overflow, negate the result.
+.L_add_mant:
+	clrt
+	addc	r7,r5
+
+	mov	#0,r10	! Assume resultant to be positive
+	addc	r6,r4
+
+	cmp/pz	r4
+
+	bt	.L_mant_ptv
+	negc	r2,r2
+
+	negc	r5,r5
+
+	mov.l	.L_sign,r10 ! The assumption was wrong, result is negative
+	negc	r4,r4
+
+! 23rd bit in the high part of mantissa could be set.
+! In this case, right shift the mantissa.
+.L_mant_ptv:
+	mov.l	.L_23bit,r0
+
+	tst	r4,r0
+	bt	.L_mant_ptv_0
+
+	shlr	r4
+	rotcr	r5
+
+	add	#1,r3
+	bra	.L_mant_ptv_1
+	rotcr	r2
+
+.L_mant_ptv_0:
+	mov.l	.L_22bit,r0
+	tst	r4,r0
+
+	bt	.L_norm_mant
+
+.L_mant_ptv_1:
+	! 22 bit of resultant mantissa is set. Shift right the mantissa
+	! and add 1 to exponent
+	add	#1,r3
+	shlr	r4
+	rotcr	r5
+	! The mantissa is already normalized. We don't need to
+	! spend any effort. Branch to epilogue. 
+	bra	.L_epil
+	rotcr	r2
+
+! Normalize operands
+.L_norm_op1:
+	shll	r5
+
+	rotcl	r4
+	add	#-1,r1
+
+	tst	r4,r8
+	bt	.L_norm_op1
+
+	tst	r3,r3
+	SL(bf,	.L_neg_mant,
+	 add	#1,r1)
+
+.L_norm_op2:
+	shll	r7
+
+	rotcl	r6
+	add	#-1,r3
+
+	tst	r6,r8
+	bt	.L_norm_op2
+
+	bra	.L_neg_mant
+	add	#1,r3
+
+! Negate the mantissa of op1
+.L_neg_op1:
+	clrt
+	negc	r5,r5
+
+	negc	r4,r4
+	tst	r10,r10
+
+	bt	.L_add_1
+
+! Negate the mantissa of op2
+.L_neg_op2:
+	clrt
+	negc	r7,r7
+
+	bra	.L_add_1
+	negc	r6,r6
+
+! Thread the jump to .L_inv_exp_op1
+.L_thread_inv_exp_op1:
+	bra	.L_inv_exp_op1
+	nop
+
+.L_ret_op2:
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r6,r1
+#else
+	mov	r6,r0
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r7,r0
+#else
+	mov	r7,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+.L_op1_nzero:
+	tst	r5,r5
+	bt	.L_ret_op2
+
+	! op1 is not zero. Check op2 for negative zero
+	cmp/eq	r6,r0
+	bf	.L_non_zero	! both op1 and op2 are not -0
+
+.L_op2_nzero:
+	tst	r7,r7
+	bf	.L_non_zero
+
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+	mov	r4,r0	! op2 is -0, return op1
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+	mov	r5,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! High bit of op1 is known to be zero.
+! Check low bit. r2 contains 0x00000000
+.L_op1_zero:
+	tst	r5,r5
+	bt	.L_ret_op2
+
+	! op1 is not zero. Check high bit of op2
+	tst	r6,r6
+	bf	.L_add	! both op1 and op2 are not zero
+
+! op1 is not zero. High bit of op2 is known to be zero.
+! Check low bit of op2. r2 contains 0x00000000
+.L_op2_zero:
+	tst	r7,r7
+	bf	.L_add
+
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+	mov	r4,r0	! op2 is zero, return op1
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+	mov	r5,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! exp (op1) is smaller or equal to exp (op2)
+! The logic of same operations is present in .L_add. Kindly refer it for
+! comments
+.L_op2_exp_greater:
+	mov	r3,r0
+	sub	r1,r0
+
+	mov	#54,r2
+	cmp/gt	r2,r0
+
+	bt	.L_pack_op2
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+
+	mov	#0,r2
+	shll	r7
+	rotcl	r6
+	add	#-1,r0
+	add	#-1,r3
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+.L_shfrac_op1:	
+        add     #-1,r0
+        shar    r4
+
+	rotcr	r5
+	rotcr	r2
+
+        cmp/eq  #0,r0
+        bf      .L_shfrac_op1
+
+	bra	.L_add_mant
+	nop
+
+! Return the value in op1
+.L_ret_op1:
+        mov.l   @r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+        mov     r4,r0
+#endif
+
+        mov.l   @r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+        mov     r5,r1
+#endif
+
+        rts
+        mov.l   @r15+,r8
+
+! r1 has exp, r9 has sign, r4 and r5 mantissa
+.L_pack_op1:
+	mov.l	.L_high_mant,r7
+	mov	r4,r0
+
+	tst	r9,r9
+	bt	.L_pack_op1_1
+
+	clrt
+	negc	r5,r5
+	negc	r0,r0
+
+.L_pack_op1_1:
+	and	r7,r0
+	mov	r1,r3
+
+	mov	#20,r2
+	mov	r5,r1
+
+	mov.l	@r15+,r10
+	or	r9,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+	mov.l	@r15+,r9
+
+	or	r3,r0
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+!r2 has exp, r10 has sign, r6 and r7 mantissa
+.L_pack_op2:
+	mov.l	.L_high_mant,r9
+	mov	r6,r0
+
+	tst	r10,r10
+	bt	.L_pack_op2_1
+
+	clrt
+	negc	r7,r7
+	negc	r0,r0
+
+.L_pack_op2_1:
+	and	r9,r0
+	mov	r7,r1
+
+	mov	#20,r2
+	or	r10,r0
+
+	mov.l	@r15+,r10
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+
+	mov.l	@r15+,r9
+
+	or	r3,r0
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+! Normalize the mantissa by setting its 21 bit in high part
+.L_norm_mant:
+	mov.l	.L_21bit,r0
+
+	tst	r4,r0
+	bf	.L_epil
+
+	tst	r4,r4
+	bf	.L_shift_till_1
+
+	tst	r5,r5
+	bf	.L_shift_till_1
+
+	! Mantissa is zero, return 0
+	mov.l	@r15+,r10
+	mov	#0,r0
+
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+
+	rts
+	mov	#0,r1
+
+! A loop for making the 21st bit 1 in high part of resultant mantissa
+! It is already ensured that 1 bit is present in the mantissa
+.L_shift_till_1:
+	clrt
+	shll	r5
+
+	rotcl	r4
+	add	#-1,r3
+
+	tst	r4,r0
+	bt	.L_shift_till_1
+
+! Return the result. Mantissa is in r4-r5. Exponent is in r3
+! Sign bit in r10
+.L_epil:
+	cmp/pl	r3
+
+	bf	.L_denorm
+	mov.l	LOCAL(x7fffffff),r0
+
+	mov	r5,r1
+	shlr	r1
+
+	mov	#0,r1
+	addc	r0,r2
+
+! Check extra MSB here
+	mov.l	.L_22bit,r9
+	addc	r1,r5	! round to even
+
+	addc	r1,r4
+	tst	r9,r4
+
+	bf	.L_epil_1
+
+.L_epil_0:
+	mov.l	.L_21bit,r1
+
+	not	r1,r1
+	and	r1,r4
+
+	mov	r4,r0
+	or	r10,r0
+
+	mov.l	@r15+,r10
+	mov	#20,r2
+
+	mov.l	@r15+,r9
+	mov	r5,r1
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+	or	r3,r0
+
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+.L_epil_1:
+	shlr	r4
+	add	#1,r3
+	bra	.L_epil_0
+	rotcr	r5
+
+.L_denorm:
+	add	#-1,r3
+.L_denorm_1:
+	tst	r3,r3
+	bt	.L_denorm_2
+
+	shlr	r4
+	rotcr	r5
+
+	movt	r1
+	bra	.L_denorm_1
+	add	#1,r3
+
+.L_denorm_2:
+	clrt
+	mov	#0,r2
+	addc	r1,r5
+
+	addc	r2,r4
+	mov	r4,r0
+
+	or	r10,r0
+	mov.l	@r15+,r10
+
+	mov	r5,r1
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+! op1 is known to be positive infinity, and op2 is Inf. The sign
+! of op2 is not known. Return the appropriate value
+.L_op1_pinf_op2_inf:
+	mov.l	.L_sign,r0
+	tst	r6,r0
+
+	bt	.L_ret_op2_1
+
+	! op2 is negative infinity. Inf - Inf is being performed
+	mov.l	.L_inf,r0
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r1
+#endif
+	mov.l	@r15+,r8
+
+	rts
+#ifdef	__LITTLE_ENDIAN__
+	mov	#1,r0
+#else
+	mov	#1,r1	! Any value here will return Nan
+#endif
+	
+.L_ret_op1_1:
+        mov.l   @r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+        mov     r4,r0
+#endif
+
+        mov.l   @r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+        mov     r5,r1
+#endif
+
+        rts
+        mov.l   @r15+,r8
+
+.L_ret_op2_1:
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r6,r1
+#else
+	mov	r6,r0
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r7,r0
+#else
+	mov	r7,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! op1 is negative infinity. Check op2 for infinity or Nan
+.L_op1_ninf:
+	cmp/eq	r2,r3
+	bf	.L_ret_op1_1	! op2 is neither Nan nor Inf
+
+	mov.l	@r15+,r9
+	div0s	r4,r6		! different signs -> NaN
+	mov	r4,DBLRH
+	or	r6,DBLRH
+	mov.l	@r15+,r8
+	SL(bf, 0f,
+	 mov	r5,DBLRL)
+	mov	#-1,DBLRH	! return NaN.
+0:	rts
+	or	r7,DBLRL
+
+!r1 contains exponent for op1, r3 contains exponent for op2
+!r2 has .L_inf (+ve Inf)
+!op1 has invalid exponent. Either it contains Nan or Inf
+.L_inv_exp_op1:
+	! Check if a is Nan
+	cmp/pl	r5
+	bt	.L_ret_op1_1
+
+	mov.l	.L_high_mant,r0
+	and	r4,r0
+
+	cmp/pl	r0
+	bt	.L_ret_op1_1
+
+	! op1 is not Nan. It is infinity. Check the sign of it.
+	! If op2 is Nan, return op2
+	cmp/pz	r4
+
+	bf	.L_op1_ninf
+
+	! op2 is +ve infinity here
+	cmp/eq	r2,r3
+	bf	.L_ret_op1_1	! op2 is neither Nan nor Inf
+
+	! r2 is free now
+	mov.l	.L_high_mant,r0
+	tst	r6,r0		! op2 also has invalid exponent
+
+	bf	.L_ret_op2_1	! op2 is Infinity, and op1 is +Infinity
+
+	tst	r7,r7
+	bt	.L_op1_pinf_op2_inf	! op2 is Infinity, and op1 is +Infinity
+	!op2 is not infinity, It is Nan
+	bf	.L_ret_op2_1
+
+	.align 2	
+.L_high_mant:
+	.long 0x000FFFFF
+
+.L_21bits:
+	.long 0x001FFFFF
+
+.L_22bit:
+	.long 0x00200000
+
+.L_23bit:
+	.long 0x00400000
+
+.L_21bit:
+	.long 0x00100000
+
+.L_sign:
+	.long 0x80000000
+
+.L_inf:
+	.long 0x7ff00000
+
+LOCAL(x7fffffff): .long 0x7fffffff
+
+ENDFUNC (GLOBAL (subdf3))
+ENDFUNC (GLOBAL (adddf3))
Index: gcc/config/sh/IEEE-754/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatsisf.S	(revision 0)
@@ -0,0 +1,200 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatsisf)
+        FUNC (GLOBAL (floatsisf))
+
+GLOBAL (floatsisf):
+	mov.l	.L_sign,r2
+	mov	#23,r6
+
+	! Check for zero
+	tst	r4,r4
+	mov.l	.L_24_bits,r7
+
+	! Extract sign
+	and	r4,r2
+	bt	.L_ret
+
+	! Negative ???
+	mov.l	.L_imp_bit,r5
+	cmp/pl	r4
+
+	not	r7,r3
+	bf	.L_neg
+
+	! Decide the direction for shifting
+	cmp/gt	r7,r4
+	mov	r4,r0
+
+	and	r5,r0
+	bt	.L_shr_0
+
+	! Number may already be in normalized form
+	cmp/eq	#0,r0
+	bf	.L_pack
+
+! Shift the bits to the left. Adjust the exponent
+.L_shl:
+	shll	r4
+	mov	r4,r0
+
+	and	r5,r0
+	cmp/eq	#0,r0
+
+	SL(bt,	.L_shl,
+	 add	#-1,r6)
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa, r2 has sign
+.L_pack:
+	mov	#23,r3
+	not	r5,r5
+
+	mov	r2,r0
+	add	#127,r6
+
+	and	r5,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+
+	or	r6,r0
+	rts
+	or	r4,r0
+
+! Negate the number
+.L_neg:
+	! Take care for -2147483648.
+	mov	r4,r0
+	shll	r0
+	
+	cmp/eq	#0,r0
+	SL(bt,	.L_ret_min,
+	 neg	r4,r4)
+
+        cmp/gt  r7,r4
+        bt	.L_shr_0
+
+	mov	r4,r0
+	and	r5,r0
+
+	cmp/eq	#0,r0
+	bf	.L_pack
+	bt	.L_shl
+	
+.L_shr_0:
+	mov	#0,r1
+
+! Shift right the number with rounding
+.L_shr:
+	shlr	r4
+	movt	r7
+
+	tst	r7,r7
+
+	! Count number of ON bits shifted
+	bt	.L_shr_1
+	add	#1,r1
+
+.L_shr_1:
+	mov	r4,r0
+	add	#1,r6
+
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	! Add MSB of shifted bits
+	bf	.L_shr
+	add	r7,r4
+
+	tst	r7,r7
+	bt	.L_pack
+
+.L_pack1:
+	mov	#1,r0
+	cmp/eq	r1,r0
+
+	bt	.L_rnd
+	mov	r4,r0
+
+	! Rounding may have misplaced MSB. Adjust.
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	bf	.L_shr
+	bt	.L_pack
+
+! If only MSB of shifted bits is ON, we are halfway
+! between two numbers. Round towards even LSB of
+! resultant mantissa.
+.L_rnd:
+	shlr	r4
+	bra	.L_pack
+	shll	r4
+
+.L_ret:
+	rts
+	mov	r4,r0
+
+! Return value for -2147483648
+.L_ret_min:
+	mov.l	.L_min_val,r0
+	rts
+	nop
+
+	.align 2
+.L_sign:
+	.long 0x80000000
+
+.L_imp_bit:
+	.long 0x00800000
+
+.L_24_bits:
+	.long 0x00FFFFFF
+
+.L_nsign:
+	.long 0x7FFFFFFF
+
+.L_min_val:
+	.long 0xCF000000
+
+ENDFUNC (GLOBAL (floatsisf))
Index: gcc/config/sh/IEEE-754/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/muldf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/muldf3.S	(revision 0)
@@ -0,0 +1,601 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!multiplication of two double precision floating point numbers
+!Author:Aanchal Khanna
+!SH1 Support / Simplifications: Joern Rennecke
+!
+!Entry:
+!r4,r5:operand 1
+!
+!r6,r7:operand 2
+!
+!Exit:
+!r0,r1:result
+!
+!Notes: argument 1 is passed in regs r4 and r5 and argument 2 is passed in regs
+!r6 and r7, result is returned in regs r0 and r1. operand 1 is referred as op1
+!and operand 2 as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+	.text
+	.align	5	
+	.global	GLOBAL (muldf3)
+	FUNC (GLOBAL (muldf3))
+
+GLOBAL (muldf3):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+
+        mov     r6,r1
+        mov     r7,r6
+        mov     r1,r7
+#endif
+	mov.l	.L_mask_sign,r0
+	mov	r4,r2
+
+	and	r0,r2		
+	mov	#0,r1
+
+	shll	r4
+	and	r6,r0		
+	
+	xor     r2,r0		!r0 contains the result's sign bit
+	shlr	r4
+
+	mov.l   .L_inf,r2
+	shll	r6
+
+	mov	r4,r3
+	shlr	r6
+	
+.L_chk_a_inv:
+	!chk if op1 is Inf/NaN
+	and	r2,r3
+	mov.l	r8,@-r15
+
+	cmp/eq	r3,r2
+	mov.l	.L_mask_high_mant,r8
+
+	mov	r2,r3
+	bf	.L_chk_b_inv
+
+	mov	r8,r3
+	and	r4,r8
+
+	cmp/hi  r1,r8		
+	bt	.L_return_a	!op1 NaN, return op1
+
+	cmp/hi  r1,r5	
+	mov	r2,r8
+
+	bt      .L_return_a	!op1 NaN, return op1
+	and	r6,r8
+
+	cmp/eq	r8,r2		
+	and	r6,r3
+
+	bt      .L_b_inv
+	cmp/eq	r1,r6		
+
+	bf	.L_return_a	!op1 Inf,op2= normal no return op1
+	cmp/eq	r1,r7
+
+	bf	.L_return_a	!op1 Inf,op2= normal no return op1
+	mov.l   @r15+,r8	
+
+	rts
+	mov	#-1,DBLRH	!op1=Inf, op2=0,return nan
+
+.L_b_inv:
+	!op2 is NaN/Inf
+	cmp/hi	r1,r7
+	mov	r1,r2
+
+	mov	r5,r1
+	bt	.L_return_b	!op2=NaN,return op2
+
+	cmp/hi	r2,r6
+	or	r4,r0
+
+	bt	.L_return_b	!op2=NaN,return op2
+	mov.l   @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts			!op1=Inf,op2=Inf,return Inf with sign
+	nop
+
+.L_chk_b_inv:
+	!Chk if op2 is NaN/Inf
+	and	r6,r2
+	cmp/eq	r3,r2
+
+	bf	.L_chk_a_for_zero
+	and	r6,r8
+
+	cmp/hi	r1,r8
+	bt	.L_return_b	 !op2=NaN,return op2
+
+	cmp/hi	r1,r7
+	bt	.L_return_b	 !op2=NaN,return op2
+
+	cmp/eq	r5,r1
+	bf      .L_return_b	 !op1=normal number,op2=Inf,return Inf
+
+	mov	r7,r1
+	cmp/eq	r4,r1
+
+	bf	.L_return_b	/* op1=normal number, op2=Inf,return Inf */
+	mov.l   @r15+,r8
+
+	rts
+	mov	#-1,DBLRH	!op1=0,op2=Inf,return NaN
+
+.L_return_a:
+	mov	r5,r1
+	or	r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l   @r15+,r8
+
+.L_return_b:
+	mov	r7,r1
+	or	r6,r0	
+	
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+	
+.L_chk_a_for_zero:
+	!Chk if op1 is zero
+	cmp/eq	r1,r4
+	bf	.L_chk_b_for_zero
+	
+	cmp/eq	r1,r5
+	bf	.L_chk_b_for_zero
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+.L_chk_b_for_zero:
+	!op1=0,chk if op2 is zero
+        cmp/eq  r1,r6
+        mov	r1,r3
+	
+	mov.l   .L_inf,r1
+	bf      .L_normal_nos
+
+        cmp/eq  r3,r7
+        bf      .L_normal_nos
+
+	mov	r3,r1
+	mov.l   @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	nop
+
+.L_normal_nos:
+	!op1 and op2 are normal nos
+	mov.l	r9,@-r15
+	mov	r4,r3
+
+	mov     #-20,r9	
+	and	r1,r3	
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r9,r2
+#else
+        SHLR20 (r2)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r9,r3
+#else
+        SHLR20 (r3)
+#endif
+	cmp/pl	r3
+
+	bf	.L_norm_a	!normalize op1
+.L_chk_b:	
+	cmp/pl	r2
+	bf	.L_norm_b	!normalize op2
+
+.L_mul1:
+	add	r3,r2
+	mov.l  .L_1023,r1
+	
+	!resultant exponent in r2
+	add     r1,r2
+	mov.l   .L_2047,r1	
+
+	!Chk the exponent for overflow
+	cmp/ge	r1,r2
+	and     r8,r4
+
+	bt	.L_return_inf
+	mov.l	.L_imp_bit,r1
+	
+	or	r1,r4		
+	and	r8,r6
+
+	or	r1,r6
+	clrt
+
+	!multiplying the mantissas
+	DMULU_SAVE
+	DMULUL	(r7,r5,r1) 	!bits 0-31 of product 	
+
+	DMULUH	(r3)
+	
+	DMULUL	(r4,r7,r8)
+
+	addc	r3,r8
+
+	DMULUH	(r3)
+
+	movt	r9
+	clrt
+
+	DMULUL	(r5,r6,r7)
+
+	addc	r7,r8		!bits 63-32 of product
+
+	movt	r7
+	add	r7,r9
+
+	DMULUH	(r7)
+
+	add	r7,r3
+
+	add	r9,r3
+	clrt
+
+	DMULUL	(r4,r6,r7)
+
+	addc	r7,r3		!bits 64-95 of product
+
+	DMULUH	(r7)
+	DMULU_RESTORE
+	
+	mov	#0,r5
+	addc	r5,r7		!bits 96-105 of product
+
+	cmp/eq	r5,r1
+	mov     #1,r4
+
+	bt	.L_skip
+	or	r4,r8
+.L_skip:
+	mov.l   .L_106_bit,r4
+	mov	r8,r9
+
+.L_chk_extra_msb:
+	!chk if exra MSB is generated
+	and     r7,r4
+	cmp/eq	r5,r4
+
+	mov     #12,r4
+	SL(bf,	.L_shift_rt_by_1,
+	 mov     #31,r5)
+	
+.L_pack_mantissa:
+	!scale the mantissa t0 53 bits
+	mov	#-19,r6
+	mov.l	.L_mask_high_mant,r5
+
+        SHLRN (19, r6, r8)
+
+	and	r3,r5
+
+	shlr	r8
+	movt	r1
+
+        SHLLN (12, r4, r5)
+
+	add	#-1,r6
+
+	or	r5,r8		!lower bits of resulting mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r6,r3
+#else
+        SHLR20 (r3)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r4,r7
+#else
+        SHLL12 (r7)
+#endif
+	clrt
+
+	or	r7,r3		!higher bits of resulting mantissa
+	mov     #0,r7
+
+	!chk the exponent for underflow
+	cmp/ge	r2,r7
+	bt	.L_underflow
+
+	addc    r1,r8           !rounding
+	mov	r8,r1
+
+	addc	r7,r3		!rounding
+	mov.l	.L_mask_22_bit,r5
+
+	and	r3,r5
+	!chk if extra msb is generated after rounding
+	cmp/eq	r7,r5
+
+	mov.l	.L_mask_high_mant,r8
+	bt	.L_pack_result
+
+	add	#1,r2
+	mov.l	.L_2047,r6
+
+	cmp/ge	r6,r2
+
+	bt	.L_return_inf
+	shlr	r3
+
+	rotcr	r1
+
+.L_pack_result:
+	!pack the result, r2=exponent, r3=higher mantissa, r1=lower mantissa
+	!r0=sign bit
+	mov	#20,r6
+	and	r8,r3
+	
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r6,r2
+#else
+        SHLL20 (r2)
+#endif
+	or	r3,r0
+	
+	or      r2,r0
+	mov.l   @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_norm_a:
+	!normalize op1
+	shll	r5
+	mov.l	.L_imp_bit,r1
+
+	rotcl	r4
+	add	#-1,r3
+
+	tst	r1,r4
+	bt	.L_norm_a
+
+	bra	.L_chk_b
+	add	#1,r3
+
+.L_norm_b:
+	!normalize op2
+        shll    r7
+        mov.l   .L_imp_bit,r1
+
+        rotcl   r6
+        add     #-1,r2
+
+        tst     r1,r6
+        bt      .L_norm_b
+
+        bra     .L_mul1
+        add     #1,r2
+
+.L_shift_rt_by_1:
+	!adjust the extra msb
+
+	add     #1,r2           !add 1 to exponent
+	mov.l	.L_2047,r6
+
+	cmp/ge	r6,r2
+	mov	#20,r6
+
+	bt	.L_return_inf
+	shlr	r7		!r7 contains bit 96-105 of product
+
+	rotcr	r3		!r3 contains bit 64-95 of product
+
+	rotcr	r8		!r8 contains bit 32-63 of product
+	bra	.L_pack_mantissa
+
+	rotcr	r1		!r1 contains bit 31-0 of product
+
+.L_return_inf:
+	!return Inf
+	mov.l	.L_inf,r2
+	mov     #0,r1
+
+	or	r2,r0
+	mov.l   @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+	
+.L_underflow:
+	!check if the result needs to be denormalized
+	mov	#-53,r1
+	add	#1,r2
+
+	cmp/gt	r2,r1
+	mov	#32,r4
+
+	add	#-2,r2
+	bt	.L_return_zero
+
+	add	r2,r4
+	mov	r7,r1
+	
+	cmp/ge	r7,r4
+	mov	r2,r6
+
+	mov	#-54,r2
+	bt	.L_denorm
+
+	mov	#-32,r6
+	
+.L_denorm:
+	!denormalize the result
+	shlr	r8
+	rotcr	r1	
+
+	shll	r8
+	add	#1,r6
+
+	shlr	r3
+	rotcr	r8
+
+	cmp/eq	r7,r6
+	bf	.L_denorm
+
+	mov	r4,r6
+	cmp/eq	r2,r4
+
+	bt	.L_break
+	mov	r7,r5
+
+	cmp/gt	r6,r7
+	bf	.L_break
+
+	mov	r2,r4
+	mov	r1,r5
+
+	mov	r7,r1
+	bt	.L_denorm
+
+.L_break:
+	mov	#0,r2
+
+	cmp/gt	r1,r2
+
+	addc	r2,r8
+	mov.l	.L_comp_1,r4
+	
+	addc	r7,r3
+	or	r3,r0
+
+	cmp/eq	r9,r7
+	bf	.L_return
+
+	cmp/eq	r7,r5
+	mov.l	.L_mask_sign,r6
+
+	bf	.L_return
+	cmp/eq	r1,r6
+	
+	bf	.L_return
+	and	r4,r8
+
+.L_return:
+	mov.l	@r15+,r9
+	mov	r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_return_zero:
+	mov.l	@r15+,r9
+	mov	r7,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+	.align	2
+
+.L_mask_high_mant:
+	.long	0x000fffff
+.L_inf:
+	.long	0x7ff00000	
+.L_mask_sign:
+	.long	0x80000000
+.L_1023:
+	.long	-1023
+.L_2047:
+	.long	2047
+.L_imp_bit:
+	.long	0x00100000
+.L_mask_22_bit:
+	.long	0x00200000
+.L_106_bit:
+	.long	0x00000200
+.L_comp_1:
+	.long	0xfffffffe
+
+ENDFUNC (GLOBAL (muldf3))
Index: gcc/config/sh/IEEE-754/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixdfsi.S	(revision 0)
@@ -0,0 +1,200 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to signed integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 	5
+	.global GLOBAL (fixdfsi)
+	FUNC (GLOBAL (fixdfsi))
+
+GLOBAL (fixdfsi):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+
+#endif
+	mov.l	.L_p_inf,r2
+	mov     #-20,r1
+	
+	mov	r2,r7
+	mov.l   .L_1023,r3
+
+	and	r4,r2
+	shll    r4
+        
+	movt    r6		! r6 contains the sign bit
+	
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r2		! r2 contains the exponent
+#else
+        SHLR20 (r2)
+#endif
+	 shlr    r4
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r7
+#else
+        SHLR20 (r7)
+#endif
+	cmp/hi	r2,r3		! if exp < 1023,return 0
+	mov.l	.L_mask_high_mant,r1
+
+	SL(bt,	.L_epil,
+	 mov	#0,r0)
+	and	r4,r1		! r1 contains high mantissa
+
+	cmp/eq	r2,r7		! chk if exp is invalid
+	mov.l	.L_1053,r7
+
+	bt	.L_inv_exp
+	mov	#11,r0
+	
+	cmp/hi	r7,r2		! If exp > 1053,return maxint
+	sub     r2,r7
+
+	mov.l	.L_21bit,r2
+	SL(bt,	.L_ret_max,
+	 add	#1,r7)		! r7 contains the number of shifts
+
+	or	r2,r1
+	mov	r7,r3
+	shll8   r1
+
+	neg     r7,r7
+	shll2	r1
+
+        shll	r1
+	cmp/hi	r3,r0
+
+	!chk if the result can be made only from higher mantissa
+	SL(bt,	.L_lower_mantissa,
+	 mov	#21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_loop:
+        tst	r7,r7
+        bt      .L_break1
+        add     #1,r7
+        bra     .L_loop
+        shlr    r1
+
+.L_break1:
+#endif
+	tst	r6,r6
+	SL(bt,	.L_epil,
+	 mov	r1,r0)
+
+	rts
+	neg	r0,r0
+
+.L_lower_mantissa:
+	!result is made from lower mantissa also
+	neg	r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r0,r5
+#else
+        SHLR21 (r5)
+#endif
+
+	or	r5,r1		!pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_sh_loop:
+	tst	r7,r7
+	bt	.L_break
+	add	#1,r7
+	bra	.L_sh_loop
+	shlr	r1
+
+.L_break:
+#endif
+	mov	r1,r0
+	bra	.L_chk_sign
+	nop
+
+.L_epil:
+	rts
+	nop
+
+.L_inv_exp:
+	cmp/hi	r0,r5
+	bt	.L_epil
+
+	cmp/hi	r0,r1		!compare high mantissa,r1
+	bt	.L_epil
+
+.L_ret_max:
+	mov.l   .L_maxint,r0
+	tst	r6,r6
+	bt	.L_epil
+
+	rts
+	add	#1,r0
+
+.L_chk_sign:
+	tst	r6,r6		!sign bit is set, number is -ve
+	bt	.L_epil
+	
+	rts
+	neg	r0,r0
+
+	.align	2
+
+.L_maxint:
+	.long	0x7fffffff
+.L_p_inf:
+	.long	0x7ff00000
+.L_mask_high_mant:
+	.long	0x000fffff
+.L_1023:
+	.long	0x000003ff
+.L_1053:
+	.long	1053
+.L_21bit:
+	.long	0x00100000
+
+ENDFUNC (GLOBAL (fixdfsi))
Index: gcc/config/sh/IEEE-754/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/divsf3.S	(revision 0)
@@ -0,0 +1,398 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!divides two single precision floating point 
+
+! Author: Aanchal Khanna
+
+! Arguments: Dividend is in r4, divisor in r5
+! Result: r0
+
+! r4 and r5 are referred as op1 and op2 resp.
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align	5
+	.global	GLOBAL (divsf3)
+	FUNC (GLOBAL (divsf3))
+
+GLOBAL (divsf3):
+	mov.l	.L_mask_sign,r1
+	mov	r4,r3
+
+	xor	r5,r3
+	shll	r4
+
+	shlr	r4
+	mov.l	.L_inf,r2
+
+	and	r3,r1		!r1=resultant sign
+	mov	r4,r6
+
+	shll	r5
+	mov	#0,r0		
+
+	shlr	r5
+	and	r2,r6
+
+	cmp/eq	r2,r6
+	mov	r5,r7
+
+	and     r2,r7
+	bt	.L_op1_inv
+
+	cmp/eq	r2,r7
+	mov	#-23,r3
+
+	bt	.L_op2_inv
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+	SHLR23 (r7)
+#else
+	shld	r3,r6
+	shld	r3,r7
+#endif
+
+	cmp/eq	r0,r4
+
+	bt	.L_op1_zero		!dividend=0
+	cmp/eq	r0,r6
+
+	mov.l   .L_imp_bit,r3
+	bt	.L_norm_op1		!normalize dividend
+.L_chk_op2:
+	cmp/eq	r0,r5
+	bt	.L_op2_zero		!divisor=0
+
+	cmp/eq	r0,r7
+	bt	.L_norm_op2		!normalize divisor
+
+.L_div1:
+	sub	r7,r6
+	add	#127,r6			!r6=resultant exponent
+
+	mov     r3,r7
+	mov.l	.L_mask_mant,r3
+
+	and	r3,r4
+	!chk exponent for overflow
+        mov.l   .L_255,r2
+
+	and     r3,r5
+	or	r7,r4
+
+	cmp/ge  r2,r6
+	or	r7,r5
+
+	bt	.L_return_inf
+	mov	r0,r2
+
+	cmp/eq  r4,r5
+	bf      .L_den_one
+
+	cmp/ge	r6,r0
+	!numerator=denominator, quotient=1, remainder=0
+	mov	r7,r2			
+
+	mov     r0,r4
+	!chk exponent for underflow
+	bt	.L_underflow
+        bra     .L_pack
+        nop
+
+.L_den_one:
+	!denominator=1, result=numerator
+
+	cmp/eq  r7,r5
+        bf      .L_divide
+
+	!chk exponent for underflow
+	cmp/ge  r6,r0
+        mov    r4,r2           
+
+        SL(bt,    .L_underflow,
+	 mov	r0,r4)
+	bra     .L_pack
+	nop
+
+.L_divide:
+	!dividing the mantissas r4<-dividend, r5<-divisor
+
+	cmp/hi	r4,r5
+	bf	.L_loop
+
+	shll	r4		! if mantissa(op1)< mantissa(op2)
+	add     #-1,r6		! shift left the numerator and decrease the exponent.
+
+.L_loop:
+	!division loop
+
+	cmp/ge	r5,r4
+	bf	.L_skip
+
+	or	r7,r2
+	sub	r5,r4
+
+.L_skip:
+	shlr	r7
+	shll	r4
+
+	cmp/eq	r0,r7
+	bf	.L_loop
+
+	!chk the exponent for underflow
+	cmp/ge  r6,r0
+	bt      .L_underflow
+	
+	!apply rounding
+	cmp/gt	r5,r4
+	bt	.L_round1
+
+	cmp/eq	r4,r5
+	bt	.L_round2
+
+.L_pack:
+	!pack the result, r1=sign, r2=quotient, r6=exponent
+
+	mov    #23,r4
+	and     r3,r2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r4,r6
+#endif
+	or	r2,r1
+
+	or	r6,r1
+	mov	r1,r0	
+	
+	rts
+	nop
+
+.L_round1:
+	!Apply proper rounding
+
+        bra     .L_pack
+        add     #1,r2
+
+.L_round2:
+	!Apply proper rounding
+
+        mov.l   .L_comp_1,r5
+        bra     .L_pack
+        and     r5,r2
+
+.L_op1_inv:
+	!chk if op1 is Inf or NaN
+
+	mov.l	.L_mask_mant,r3
+	mov	r4,r6
+
+	and	r3,r6
+	cmp/hi	r0,r6
+
+	bt	.L_ret_NaN ! op1 is NaN, return NaN.
+	cmp/eq	r2,r7
+
+	SL(bf,	.L_return,
+	 mov	r4,r0) ! Inf/finite, return Inf
+
+	! Inf/Inf or Inf/NaN, return NaN
+.L_ret_NaN:
+	rts
+	mov	#-1,r0
+	
+.L_op2_inv:
+	!chk if op2 is Inf or NaN
+
+	mov.l	.L_mask_mant,r3
+	mov	r5,r7
+	
+	and	r3,r7
+	cmp/hi	r0,r7
+
+	bt	.L_ret_op2
+	mov	r1,r0
+	
+	rts
+	nop
+
+.L_op1_zero:
+	!op1 is zero. If op2 is zero, return NaN, else return zero
+
+	cmp/eq	r0,r5
+
+	bf	.L_return
+
+	rts
+	mov	#-1,r0
+
+.L_op2_zero:
+	!B is zero,return Inf
+
+	rts
+	or	r2,r0
+
+.L_return_inf:
+	mov.l	.L_inf,r0
+	
+	rts
+	or	r1,r0
+
+.L_norm_op1:
+	!normalize dividend
+
+	shll	r4
+	tst	r2,r4
+	
+	add     #-1,r6
+	bt	.L_norm_op1
+
+	bra	.L_chk_op2
+	add	#1,r6
+
+.L_norm_op2:
+	!normalize divisor
+
+	shll	r5
+	tst	r2,r5
+	
+	add	#-1,r7
+	bt	.L_norm_op2
+
+	bra	.L_div1
+	add	#1,r7
+
+.L_underflow:
+	!denormalize the result
+
+	add	#1,r6
+	mov	#-24,r7
+
+	cmp/gt	r6,r7
+	mov	r2,r5
+
+	bt	.L_return_zero
+	add     #-1,r6
+
+	mov	#32,r3
+	neg	r6,r7
+
+	add	#1,r7
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r6,r2
+#else
+	cmp/ge	r0,r6
+	bf	.L_mov_right
+
+.L_mov_left:
+	cmp/eq	r0,r6
+	bt	.L_out
+
+	shll	r2
+	bra	.L_mov_left
+	add	#-1,r6
+
+.L_mov_right:
+	cmp/eq	r0,r6
+	bt	.L_out
+
+	add	#1,r6
+	bra	.L_mov_right
+	shlr	r2
+	
+.L_out:
+#endif
+	sub	r7,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r3,r5
+#else
+	cmp/ge	r0,r3
+	bf	.L_mov_right_1
+
+.L_mov_left_1:
+	shll	r5
+	add	#-1,r3
+
+	cmp/eq	r0,r3
+	bf	.L_mov_left_1
+
+	bt	.L_out_1
+
+.L_mov_right_1:
+	cmp/eq	r0,r3
+	bt	.L_out_1
+
+	add	#1,r3
+	bra	.L_mov_right_1
+	shlr	r5
+
+.L_out_1:
+#endif
+	shlr	r2
+	addc	r0,r2
+
+	cmp/eq	r4,r0		!r4 contains the remainder
+	mov      r2,r0
+
+	mov.l	.L_mask_sign,r7
+	bf	.L_return
+
+	mov.l   .L_comp_1,r2
+	cmp/eq	r7,r5
+
+	bf	.L_return
+	and	r2,r0
+
+.L_return:
+.L_return_zero:
+	rts
+	or     r1,r0
+	
+.L_ret_op2:
+	rts
+	or	r5,r0
+
+
+	.align	2
+.L_inf:
+	.long	0x7f800000
+.L_mask_sign:
+	.long	0x80000000
+.L_mask_mant:
+	.long	0x007fffff
+.L_imp_bit:
+	.long	0x00800000
+.L_comp_1:
+	.long	0xfffffffe
+.L_255:
+	.long	255
+
+ENDFUNC (GLOBAL (divsf3))
Index: gcc/config/sh/IEEE-754/fixunssfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunssfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixunssfsi.S	(revision 0)
@@ -0,0 +1,155 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion from floating point to unsigned integer
+
+! Author: Rakesh Kumar
+
+! Argument: r4 (in floating point format)
+! Result: r0
+
+! For negative floating point numbers, it returns zero
+
+! The argument is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global	GLOBAL (fixunssfsi)
+	FUNC (GLOBAL (fixunssfsi))
+
+GLOBAL (fixunssfsi):
+	mov.l	.L_sign,r0
+	mov	r4,r2
+
+	! Check for NaN
+	mov.l	.L_inf,r1
+	and	r4,r0
+
+	mov.l	.L_mask_sign,r7
+	mov	#127,r5
+
+	! Remove sign bit
+	cmp/eq	#0,r0
+	and	r7,r2
+
+	! If number is negative, return 0
+	! LIBGCC deviates from standard in this regard.
+	mov	r4,r3
+	SL(bf,	.L_epil,
+	 mov	#0,r0)
+
+	mov.l	.L_frac,r6
+	cmp/gt	r1,r2
+
+	shll	r2
+	SL1(bt,	.L_epil,
+	 shlr16	r2)
+
+	shlr8	r2	! r2 has exponent
+	mov.l	.L_24bit,r1
+
+	and	r6,r3	! r3 has fraction
+	cmp/gt	r2,r5
+
+	! If exponent is less than 127, return 0
+	or	r1,r3
+	bt	.L_epil
+
+	! Process only if exponent is less than 158
+	mov.l	.L_158,r1
+	shll8	r3
+
+	cmp/gt	r1,r2
+	sub	r2,r1
+
+	neg	r1,r1
+	bt	.L_ret_max
+
+! Shift the mantissa with exponent difference from 158
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r1,r3
+#else
+	cmp/gt	r0,r1
+	bt	.L_mov_left
+
+.L_mov_right:
+	cmp/eq	r1,r0
+	bt	.L_ret
+
+	add	#1,r1
+	bra	.L_mov_right
+	shlr	r3
+
+.L_mov_left:
+	add	#-1,r1
+	
+	shll	r3
+	cmp/eq	r1,r0
+
+	bf	.L_mov_left
+
+.L_ret:	
+#endif
+	rts
+	mov	r3,r0
+
+! r0 already has appropriate value
+.L_epil:
+	rts
+	nop
+
+! Return the maximum unsigned integer value
+.L_ret_max:
+	mov.l	.L_max,r3
+
+	rts
+	mov	r3,r0
+
+	.align 2
+.L_inf:
+	.long 0x7F800000
+
+.L_158:
+	.long 158
+
+.L_max:
+	.long 0xFFFFFFFF
+
+.L_frac:
+	.long 0x007FFFFF
+
+.L_sign:
+	.long 0x80000000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_mask_sign:
+	.long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixunssfsi))
Index: gcc/config/sh/IEEE-754/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatunssidf.S	(revision 0)
@@ -0,0 +1,76 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of unsigned integer to double precision floating point number
+!Author:Rakesh Kumar
+!Rewritten for SH1 support: Joern Rennecke
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatunsidf)
+	FUNC (GLOBAL (floatunsidf))
+
+GLOBAL (floatunsidf):
+	mov.w	LOCAL(x41f0),DBLRH	! bias + 32
+	tst	r4,r4			! check for zero
+	bt	.L_ret_zero
+.L_loop:
+	shll	r4	
+	SL(bf,	.L_loop,
+	 add	#-16,DBLRH)
+
+	mov	r4,DBLRL
+
+        SHLL20 (DBLRL)
+
+        shll16	DBLRH ! put exponent in proper place
+
+        SHLR12 (r4)
+
+	rts
+	or	r4,DBLRH
+	
+.L_ret_zero:
+	mov	#0,r1
+	rts
+	mov	#0,r0
+
+LOCAL(x41f0):	.word	0x41f0
+	.align 2
+
+ENDFUNC (GLOBAL (floatunsidf))
Index: gcc/config/sh/IEEE-754/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/addsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/addsf3.S	(revision 0)
@@ -0,0 +1,535 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Add floating point numbers in r4, r5.
+
+! Author: Rakesh Kumar
+
+! Arguments are in r4, r5 and result in r0
+
+! Entry points: ___subsf3, ___addsf3
+
+! r4 and r5 are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+        .global GLOBAL (subsf3)
+	.global	GLOBAL (addsf3)
+	FUNC (GLOBAL (subsf3))
+	FUNC (GLOBAL (addsf3))
+
+GLOBAL (subsf3):
+        mov.l   .L_sign_bit,r1
+        xor     r1,r5
+
+GLOBAL (addsf3):
+	mov.l	r8,@-r15
+	mov	r4,r3
+
+	mov.l	.L_pinf,r2
+	mov	#0,r8
+
+	and	r2,r3 ! op1's exponent.
+	mov	r5,r6
+
+	! Check NaN or Infinity
+	and	r2,r6 ! op2's exponent.
+	cmp/eq	r2,r3
+
+	! go if op1 is NaN or INF. 
+	mov.l	.L_sign_bit,r0
+	SL(bt,	.L_inv_op1,
+	 mov	#-23,r1)
+	
+	! Go if op2 is NaN/INF.
+	cmp/eq	r2,r6
+	mov	r0,r7
+	bt	.L_ret_op2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r3)
+#else
+	shld	r1,r3
+#endif
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+#else
+	shld	r1,r6
+#endif
+
+	! Check for negative zero
+	cmp/eq	r0,r5
+
+	mov	r5,r1
+	SL(bt,	.L_ret_op1,
+	 and	r7,r1)
+
+	cmp/eq	r0,r4
+	bt	.L_ret_op2
+
+	! if op1 is zero return op2
+	tst	r4,r4
+	bt	.L_ret_op2
+
+	! Equal numbers with opposite sign
+	mov	r4,r2
+	xor	r5,r2
+
+	cmp/eq	r0,r2
+	bt	.L_ret_zero
+
+	! if op2 is zero return op1
+	mov.l	.L_mask_fra,r2
+	tst	r5,r5
+
+	! Extract the mantissa
+	mov	r4,r0
+	SL(bt,	.L_ret_op1,
+	 and	r2,r5)
+
+	and	r2,r4
+
+	mov.l	.L_imp_bit,r2
+	and	r7,r0	! sign bit of op1
+
+	! Check for denormals
+	tst	r3,r3
+	bt	.L_norm_op1
+
+	! Attach the implicit bit
+	or	r2,r4
+	tst	r6,r6
+
+	bt	.L_norm_op2
+
+	or	r2,r5
+	tst	r0,r0
+
+	! operands are +ve or -ve??
+	bt	.L_ptv_op1
+
+	neg	r4,r4
+
+.L_ptv_op1:
+	tst	r1,r1
+	bt	.L_ptv_op2
+
+	neg	r5,r5
+
+! Test exponents for equality
+.L_ptv_op2:
+	cmp/eq	r3,r6
+	bt	.L_exp_eq
+
+! Make exponents of two arguments equal
+.L_exp_ne:
+	! r0, r1 contain sign bits.
+	! r4, r5 contain mantissas.
+	! r3, r6 contain exponents.
+	! r2, r7 scratch.
+
+	! Calculate result exponent.
+	mov	r6,r2
+	sub	r3,r2	! e2 - e1
+
+	cmp/pl	r2
+	mov	#23,r7
+
+	! e2 - e1 is -ve
+	bf	.L_exp_ne_1
+
+	mov	r6,r3 ! Result exp.
+	cmp/gt	r7,r2 ! e2-e1 > 23
+
+	mov	#1,r7
+	bt	.L_pack_op2_0
+
+	! Align the mantissa
+.L_loop_ne:
+	shar	r4
+
+	rotcr	r8
+	cmp/eq	r7,r2
+
+	add	#-1,r2
+	bf	.L_loop_ne
+
+	bt	.L_exp_eq
+
+! Exponent difference is too high.
+! Return op2 after placing pieces in proper place
+.L_pack_op2_0:
+	! If op1 is -ve
+	tst	r1,r1
+	bt	.L_pack_op2
+
+	neg	r5,r5
+
+! r6 has exponent
+! r5 has mantissa, r1 has sign
+.L_pack_op2:
+	mov.l	.L_nimp_bit,r2
+	mov	#23,r3
+
+	mov	r1,r0
+	
+	and	r2,r5
+	mov.l	@r15+,r8
+
+	or	r5,r0
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+        rts
+	or	r6,r0
+
+! return op1. It is NAN or INF or op2 is zero.
+.L_ret_op1:
+	mov	r4,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! return zero
+.L_ret_zero:
+	mov	#0,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! return op2. It is NaN or INF or op1 is zero.
+.L_ret_op2:
+	mov	r5,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! op2 is denormal. Normalize it.
+.L_norm_op2:
+	shll	r5
+	add	#-1,r6
+
+	tst	r2,r5
+	bt	.L_norm_op2
+
+	! Check sign
+	tst	r1,r1
+	bt	.L_norm_op2_2
+
+	neg	r5,r5
+
+.L_norm_op2_2:
+	add	#1,r6
+	cmp/eq	r3,r6
+
+	bf	.L_exp_ne
+	bt	.L_exp_eq
+
+! Normalize op1
+.L_norm_op1:
+	shll	r4
+	add	#-1,r3
+
+	tst	r2,r4
+	bt	.L_norm_op1
+
+	! Check sign
+	tst	r0,r0
+	bt	.L_norm_op1_1
+
+	neg	r4,r4
+
+.L_norm_op1_1:
+	! Adjust biasing
+	add	#1,r3
+
+	! Check op2 for denormalized value
+	tst	r6,r6
+	bt	.L_norm_op2
+
+	mov.l	.L_imp_bit,r2
+
+	tst	r1,r1	! Check sign
+	or	r2,r5	! Attach 24th bit
+
+	bt	.L_norm_op1_2
+
+	neg	r5,r5
+
+.L_norm_op1_2:
+	cmp/eq	r3,r6
+
+	bt	.L_exp_eq
+	bf	.L_exp_ne
+
+! op1 is NaN or Inf
+.L_inv_op1:
+	! Return op1 if it is NAN. 
+	! r2 is infinity
+	cmp/gt	r2,r4
+	bt	.L_ret_op1
+
+	! op1 is +/- INF
+	! If op2 is same return now.
+	cmp/eq	r4,r5
+	bt	.L_ret_op1
+
+	! return op2 if it is NAN
+	cmp/gt	r2,r5
+	bt	.L_ret_op2
+
+	! Check if op2 is inf
+	cmp/eq	r2,r6
+	bf	.L_ret_op1
+	
+	! Both op1 and op2 are infinities 
+	!of opp signs, or there is -NAN. Return a NAN.
+	mov.l	@r15+,r8
+	rts
+	mov	#-1,r0
+
+! Make unequal exponents equal.
+.L_exp_ne_1:
+	mov	#-25,r7
+	cmp/gt	r2,r7 ! -23 > e2 - e1
+
+	add	#1,r2
+	bf	.L_exp_ne_2
+
+	tst	r0,r0
+	bt	.L_pack_op1
+
+.L_pack_op1_0:
+	bra	.L_pack_op1
+	neg	r4,r4
+
+! Accumulate the shifted bits in r8
+.L_exp_ne_2:
+	! Shift with rounding
+	shar	r5
+	rotcr	r8
+
+	tst	r2,r2
+
+	add	#1,r2
+	bf	.L_exp_ne_2
+
+! Exponents of op1 and op2 are equal (or made so)
+! The mantissas are in r4-r5 and remaining bits in r8
+.L_exp_eq:
+	add	r5,r4 ! Add fractions.
+	mov.l	.L_sign_bit,r2
+
+	! Check for negative result
+	mov	#0,r0
+	tst	r2,r4
+
+	mov.l	.L_255,r5
+	bt	.L_post_add
+
+	negc	r8,r8
+	negc	r4,r4
+	or	r2,r0
+
+.L_post_add:
+	! Check for extra MSB
+	mov.l	.L_chk_25,r2
+
+	tst	r2,r4
+	bt	.L_imp_check
+
+	shar 	r4
+	rotcr	r8
+
+	add	#1,r3
+	cmp/ge	r5,r3
+
+	! Return Inf if exp > 254
+	bt	.L_ret_inf
+
+! Check for implicit (24th) bit in result
+.L_imp_check:
+        mov.l	.L_imp_bit,r2
+	tst	r2,r4
+
+	bf	.L_pack_op1
+
+! Result needs left shift
+.L_lft_shft:
+	shll	r8
+	rotcl	r4
+
+	add	#-1,r3
+	tst	r2,r4
+
+	bt	.L_lft_shft
+	
+! Pack the result after rounding
+.L_pack_op1:
+	! See if denormalized result is possible 
+	mov.l	.L_chk_25,r5
+	cmp/pl	r3
+
+	bf	.L_denorm_res
+
+	! Are there any bits shifted previously?
+	tst	r8,r8
+	bt	.L_pack_1
+
+	! Round
+	shll	r8
+	movt	r6
+
+	add	r6,r4
+
+	! If we are halfway between two numbers,
+	! round towards LSB = 0
+	tst	r8,r8
+
+	bf	.L_pack_1
+
+	shlr	r4
+	shll	r4
+
+.L_pack_1:
+	! Adjust extra MSB generated after rounding
+	tst	r4,r5
+	mov.l	.L_255,r2
+
+	bt	.L_pack_2
+	shar	r4
+
+	add	#1,r3 
+	cmp/ge	r2,r3	! Check for exp overflow
+
+	bt	.L_ret_inf
+	
+! Pack it finally
+.L_pack_2:
+	! Do not store implicit bit
+	mov.l	.L_nimp_bit,r2
+	mov	#23,r1
+
+	and	r2,r4
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r3)
+#else
+	shld	r1,r3
+#endif
+	mov.l	@r15+,r8
+
+	or	r4,r0
+        rts
+	or	r3,r0
+
+! Return infinity
+.L_ret_inf:
+	mov.l	.L_pinf,r2
+
+	mov.l	@r15+,r8
+	rts
+	or	r2,r0
+
+! Result must be denormalized
+.L_denorm_res:
+	mov	#0,r2
+	
+! Denormalizing loop with rounding
+.L_den_1:
+	shar	r4
+	movt	r6
+
+	tst	r3,r3
+	bt	.L_den_2
+
+	! Increment the exponent
+	add	#1,r3
+
+	tst	r6,r6
+	bt	.L_den_0
+
+	! Count number of ON bits shifted
+	add	#1,r2
+
+.L_den_0:
+	bra	.L_den_1
+	nop
+
+! Apply rounding
+.L_den_2:
+	cmp/eq	r6,r1
+	bf	.L_den_3
+
+	add	r6,r4
+	mov	#1,r1
+
+	! If halfway between two numbers,
+	! round towards LSB = 0
+	cmp/eq	r2,r1
+	bf	.L_den_3
+
+	shar	r4
+	shll	r4
+
+.L_den_3:
+
+	mov.l	@r15+,r8
+	rts
+	or	r4,r0
+	
+	.align 2
+.L_imp_bit:
+        .long   0x00800000
+
+.L_nimp_bit:
+	.long	0xFF7FFFFF
+
+.L_mask_fra:
+        .long   0x007FFFFF
+
+.L_pinf:
+        .long   0x7F800000
+
+.L_sign_bit:
+	.long	0x80000000
+
+.L_bit_25:
+	.long	0x01000000
+
+.L_chk_25:
+        .long   0x7F000000
+
+.L_255:
+	.long	0x000000FF
+
+ENDFUNC (GLOBAL (addsf3))
+ENDFUNC (GLOBAL (subsf3))
Index: gcc/config/sh/IEEE-754/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/mulsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/mulsf3.S	(revision 0)
@@ -0,0 +1,352 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for multiplying two floating point numbers
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 and r5
+! Result: r0
+
+! The arguments are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (mulsf3)
+        FUNC (GLOBAL (mulsf3))
+
+GLOBAL (mulsf3):
+	! Extract the sign bits
+	mov.l	.L_sign,r3
+	mov	r3,r0
+
+	and	r4,r3		! sign bit for op1
+	mov.l	.L_sign_mask,r6
+
+	! Mask out the sign bit from op1 and op2
+	and	r5,r0		! sign bit for op2
+	mov.l	.L_inf,r2
+
+	and	r6,r4
+	xor	r3,r0		! Final sign in r0
+
+	and	r6,r5
+	tst	r4,r4
+
+	! Check for zero
+	mov	r5,r7
+	! Check op1 for zero
+	SL(bt,	.L_op1_zero,
+	 mov	r4,r6)
+
+	tst	r5,r5
+	bt	.L_op2_zero	! op2 is zero
+
+	! Extract the exponents
+	and	r2,r6		! Exponent of op1
+	cmp/eq	r2,r6
+
+	and	r2,r7
+	bt	.L_inv_op1	! op1 is NaN or Inf
+
+	mov.l	.L_mant,r3
+	cmp/eq	r2,r7
+
+	and	r3,r4	! Mantissa of op1
+	bt	.L_ret_op2	! op2 is Nan or Inf
+
+	and	r3,r5	! Mantissa of op2
+
+	mov	#-23,r3
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+	SHLR23 (r7)
+#else
+	shld	r3,r6
+	shld	r3,r7
+#endif
+	! Check for denormals
+	mov.l	.L_24bit,r3
+	tst	r6,r6
+
+	bt	.L_norm_op1	! op1 is denormal
+	add	#-127,r6	! Unbias op1's exp
+
+	tst	r7,r7
+	bt	.L_norm_op2	! op2 is denormal
+
+	add	#-127,r7	! Unbias op2's exp
+
+.L_multiply:
+	add	r6,r7	! Final exponent in r7
+	mov.l	.L_24bit,r1
+
+	! set 24th bit of mantissas
+	mov	#127,r3
+	or	r1,r4
+
+	DMULU_SAVE
+
+	! Multiply
+	or	r1,r5
+	DMULUL	(r4,r5,r4)
+
+	DMULUH	(r5)
+
+	DMULU_RESTORE
+
+	mov.l	.L_16bit,r6
+
+	! Check for extra MSB generated
+	tst	r5,r6
+
+	mov.l	.L_255,r1
+	bf	.L_shift_by_1	! Adjust the extra MSB
+	
+! Normalize the result with rounding
+.L_epil:
+	! Bias the exponent
+	add	#127,r7
+	cmp/ge	r1,r7
+	
+	! Check exponent overflow and underflow
+	bt	.L_ret_inf
+
+	cmp/pl	r7
+	bf	.L_denorm
+
+.L_epil_0:
+	mov	#-23,r3
+	shll	r5
+	mov	#0,r6
+
+! Fit resultant mantissa in 24 bits
+! Apply default rounding
+.L_loop_epil_0:
+        tst	r3,r3
+	bt	.L_loop_epil_out
+
+	add	#1,r3
+	shlr	r4
+
+	bra	.L_loop_epil_0
+	rotcr	r6
+
+! Round mantissa
+.L_loop_epil_out:
+	shll8	r5
+	or	r5,r4
+
+	mov.l	.L_mant,r2
+	mov	#23,r3
+
+	! Check last bit shifted out of result
+	tst	r6,r6
+	bt	.L_epil_2
+
+	! Round
+	shll	r6
+	movt	r5
+
+	add	r5,r4
+
+	! If this is the only ON bit shifted
+	! Round towards LSB = 0
+	tst	r6,r6
+	bf	.L_epil_2
+
+	shlr	r4
+	shll	r4
+
+.L_epil_2:
+	! Rounding may have produced extra MSB.
+	mov.l	.L_25bit,r5
+	tst	r4,r5
+
+	bt	.L_epil_1
+
+	add	#1,r7
+	shlr	r4
+
+.L_epil_1:
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r7)
+#else
+	shld	r3,r7
+#endif
+
+	and	r2,r4
+
+	or	r7,r4
+	rts
+	or	r4,r0
+
+.L_denorm:
+	mov	#0,r3
+
+.L_den_1:
+	shlr	r5
+	rotcr	r4
+
+	cmp/eq	r3,r7
+	bt	.L_epil_0
+
+	bra	.L_den_1
+	add	#1,r7
+	
+
+! Normalize the first argument
+.L_norm_op1:
+	shll	r4
+	tst	r3,r4
+
+	add	#-1,r6
+	bt	.L_norm_op1
+
+	! The biasing is by 126
+	add	#-126,r6
+	tst	r7,r7
+
+	bt      .L_norm_op2
+
+	bra	.L_multiply
+	add	#-127,r7
+
+! Normalize the second argument
+.L_norm_op2:
+	shll	r5
+	tst	r3,r5
+
+	add	#-1,r7
+	bt	.L_norm_op2
+
+	bra	.L_multiply
+	add	#-126,r7
+
+! op2 is zero. Check op1 for exceptional cases
+.L_op2_zero:
+	mov.l	.L_inf,r2
+	and	r2,r6
+
+	! Check if op1 is deterministic
+	cmp/eq	r2,r6
+	SL(bf,	.L_ret_op2,
+	 mov	#1,r1)
+
+	! Return NaN
+	rts
+	mov	#-1,r0
+
+! Adjust the extra MSB
+.L_shift_by_1:
+	shlr	r5
+	rotcr	r4
+
+	add	#1,r7		! Show the shift in exponent
+
+	cmp/gt	r3,r7
+	bf	.L_epil
+
+	! The resultant exponent is invalid
+	mov.l	.L_inf,r1
+	rts
+	or	r1,r0
+
+.L_ret_op1:
+	rts
+	or	r4,r0
+
+! op1 is zero. Check op2 for exceptional cases
+.L_op1_zero:
+	mov.l	.L_inf,r2
+	and	r2,r7
+	
+	! Check if op2 is deterministic
+	cmp/eq	r2,r7
+	SL(bf,	.L_ret_op1,
+	 mov	#1,r1)
+
+	! Return NaN
+	rts
+	mov	#-1,r0
+
+.L_inv_op1:
+	mov.l	.L_mant,r3
+	mov	r4,r6
+
+	and	r3,r6
+	tst	r6,r6
+
+	bf	.L_ret_op1	! op1 is Nan
+	! op1 is not Nan. It is Inf
+
+	cmp/eq	r2,r7
+	bf	.L_ret_op1	! op2 has a valid exponent
+
+! op2 has a invalid exponent. It could be Inf, -Inf, Nan.
+! It doesn't make any difference.
+.L_ret_op2:
+	rts
+	or	r5,r0
+
+.L_ret_inf:
+	rts
+	or	r2,r0
+
+.L_ret_zero:
+	mov	#0,r2
+	rts
+	or	r2,r0
+
+	
+	.align 2
+.L_mant:
+	.long 0x007FFFFF
+
+.L_inf:
+	.long 0x7F800000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_25bit:
+	.long 0x01000000
+
+.L_16bit:
+	.long 0x00008000
+
+.L_sign:
+	.long 0x80000000
+
+.L_sign_mask:
+	.long 0x7FFFFFFF
+
+.L_255:
+	.long 0x000000FF
+
+ENDFUNC (GLOBAL (mulsf3))
Index: gcc/config/sh/IEEE-754/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatsidf.S	(revision 0)
@@ -0,0 +1,151 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of signed integer to double precision floating point number
+!Author:Rakesh Kumar
+!
+!Entry:
+!r4:operand 
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in 
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatsidf)
+	FUNC (GLOBAL (floatsidf))
+
+GLOBAL (floatsidf):
+        mov.l   .L_sign,r0
+        mov     #0,r1
+
+	mov	r0,r2
+	tst	r4,r4 ! check r4 for zero
+
+	! Extract the sign
+	mov	r2,r3
+	SL(bt,	.L_ret_zero,
+	 and	r4,r0)
+
+	cmp/eq	r1,r0
+	not	r3,r3
+
+	mov	r1,r7
+	SL(bt,	.L_loop,
+	 and	r4,r3)
+
+	! Treat -2147483648 as special case
+	cmp/eq	r1,r3
+	neg	r4,r4
+
+	bt	.L_ret_min	
+
+.L_loop:
+	shll	r4	
+	mov	r4,r5
+
+	and	r2,r5
+	cmp/eq	r1,r5
+	
+	add	#1,r7
+	bt	.L_loop
+
+	mov.l	.L_initial_exp,r6
+	not	r2,r2
+	
+	and	r2,r4
+	mov	#21,r3
+
+	sub	r7,r6
+	mov	r4,r1
+
+	mov	#20,r7
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r1
+#else
+        SHLL21 (r1)
+#endif
+	mov	#-11,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r6	! Exponent in proper place
+#else
+        SHLL20 (r6)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r2,r4
+#else
+        SHLR11 (r4)
+#endif
+	or	r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+#ifdef __LITTLE_ENDIAN__
+	or	r4,r1
+#else
+	or	r4,r0
+#endif
+	
+.L_ret_zero:
+	rts
+	mov	#0,r0
+
+.L_ret_min:
+	mov.l	.L_min,r0
+	
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	nop
+
+	.align 2
+
+.L_initial_exp:
+	.long 0x0000041E
+
+.L_sign:
+	.long 0x80000000
+
+.L_min:
+	.long 0xC1E00000
+
+ENDFUNC (GLOBAL (floatsidf))
Index: gcc/config/sh/IEEE-754/fixsfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixsfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixsfsi.S	(revision 0)
@@ -0,0 +1,165 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion routine for float to integer
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 (in floating point format)
+! Return: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global	GLOBAL (fixsfsi)
+	FUNC (GLOBAL (fixsfsi))
+
+GLOBAL (fixsfsi):
+	mov.l	.L_mask_sign,r7
+	mov	r4,r2
+
+	! Check for NaN
+	mov.l	.L_inf,r1
+	and	r7,r2
+
+	cmp/gt	r1,r2
+	mov	#127,r5
+
+	mov	r4,r3
+	SL(bt,	.L_epil,
+	 mov	#0,r0)
+
+	shll	r2
+	mov.l	.L_frac,r6
+
+	shlr16	r2
+	and	r6,r3	! r3 has fraction
+
+	shlr8	r2	! r2 has exponent
+	mov.l	.L_24bit,r1
+
+	! If exponent is less than 127, return 0
+	cmp/gt	r2,r5
+	or	r1,r3	! Set the implicit bit
+
+	mov.l	.L_157,r1
+	SL1(bt,	.L_epil,
+	 shll8	r3)
+
+	! If exponent is greater than 157,
+	! return the maximum/minumum integer
+	! value deducing from sign
+	cmp/gt	r1,r2
+	sub	r2,r1
+
+	mov.l	.L_sign,r2
+	SL(bt,	.L_ret_max,
+	 add	#1,r1)
+
+	and	r4,r2	! Sign in r2
+	neg	r1,r1
+
+	! Shift mantissa by exponent difference from 157
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r1,r3
+#else
+        cmp/gt  r0,r1
+        bt      .L_mov_left
+
+.L_mov_right:
+        cmp/eq  r1,r0
+        bt      .L_ret
+
+        add     #1,r1
+        bra     .L_mov_right
+
+        shlr    r3
+
+.L_mov_left:
+        add     #-1,r1
+
+        shll    r3
+        cmp/eq  r1,r0
+
+        bf      .L_mov_left
+.L_ret:
+#endif
+	! If op1 is negative, negate the result
+	cmp/eq	r0,r2
+	SL(bf,	.L_negate,
+	 mov	r3,r0)
+
+! r0 has the appropriate value
+.L_epil:
+	rts
+	nop
+
+! Return the max/min integer value
+.L_ret_max:
+	and	r4,r2	! Sign in r2
+	mov.l	.L_max,r3
+
+	mov.l	.L_sign,r1
+	cmp/eq	r0,r2
+
+	mov	r3,r0
+	bt	.L_epil
+
+	! Negative number, return min int
+	rts
+	mov	r1,r0
+
+! Negate the result
+.L_negate:
+	rts
+	neg	r0,r0
+
+	.align 2
+.L_inf:
+	.long 0x7F800000
+
+.L_157:
+	.long 157
+
+.L_max:
+	.long 0x7FFFFFFF
+
+.L_frac:
+	.long 0x007FFFFF
+
+.L_sign:
+	.long 0x80000000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_mask_sign:
+	.long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixsfsi))
Index: gcc/config/sh/ieee-754-sf.S
===================================================================
--- gcc/config/sh/ieee-754-sf.S	(revision 0)
+++ gcc/config/sh/ieee-754-sf.S	(revision 0)
@@ -0,0 +1,697 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option) any
+later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING.  If not, write to
+the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_ANY__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Single-precision floating-point emulation.
+   We handle NANs, +-infinity, and +-zero.
+   However, we assume that for NANs, the topmost bit of the fraction is set.  */
+#ifdef L_nesf2
+/* -ffinite-math-only inline version, T := r4:SF == r5:SF
+	cmp/eq	r4,r5
+	mov	r4,r0
+	bt	0f
+	or	r5,r0
+	add	r0,r0
+	tst	r0,r0	! test for +0.0 == -0.0 ; -0.0 == +0.0
+	0:			*/
+	.balign 4
+	.global GLOBAL(nesf2)
+	HIDDEN_FUNC(GLOBAL(nesf2))
+GLOBAL(nesf2):
+        /* If the raw values are unequal, the result is unequal, unless
+	   both values are +-zero.
+	   If the raw values are equal, the result is equal, unless
+	   the values are NaN.  */
+	cmp/eq	r4,r5
+	mov.l   LOCAL(c_SF_NAN_MASK),r1
+	not	r4,r0
+	bt	LOCAL(check_nan)
+	mov	r4,r0
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(check_nan):
+	tst	r1,r0
+	rts
+	movt	r0
+	.balign 4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(nesf2))
+#endif /* L_nesf2 */
+
+#ifdef L_unordsf2
+	.balign 4
+	.global GLOBAL(unordsf2)
+	HIDDEN_FUNC(GLOBAL(unordsf2))
+GLOBAL(unordsf2):
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	not	r4,r0
+	tst	r1,r0
+	not	r5,r0
+	bt	LOCAL(unord)
+	tst	r1,r0
+LOCAL(unord):
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(unordsf2))
+#endif /* L_unordsf2 */
+
+#if defined(L_gtsf2t) || defined(L_gtsf2t_trap)
+/* -ffinite-math-only inline version, T := r4:SF > r5:SF ? 0 : 1
+	cmp/pz	r4
+	mov	r4,r0
+	bf/s	0f
+	 cmp/hs	r5,r4
+	cmp/ge	r4,r5
+	or	r5,r0
+	bt	0f
+	add	r0,r0
+	tst	r0,r0
+	0:			*/
+#ifdef L_gtsf2t
+#define fun_label GLOBAL(gtsf2t)
+#else
+#define fun_label GLOBAL(gtsf2t_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater, the result true, unless
+	   any of them is a nan (but infinity is fine), or both values are
+	   +- zero.  Otherwise, the result false.  */
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	cmp/pz	r4
+	not	r5,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	r4,r0
+	bt	LOCAL(nan)
+	cmp/gt	r5,r4
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/gt	r4,r1)
+	bf	LOCAL(nan)
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	not	r4,r0
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	r4,r5
+#if defined(L_gtsf2t) && defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+#endif /* DELAYED_BRANCHES */
+	rts
+	movt	r0
+#ifdef L_gtsf2t
+LOCAL(check_nan):
+LOCAL(nan):
+	rts
+	mov	#0,r0
+#else /* ! L_gtsf2t */
+LOCAL(check_nan):
+	SLI(cmp/gt	r4,r1)
+	bf	LOCAL(nan)
+	rts
+	movt	r0
+LOCAL(nan):
+	mov	#0,r0
+	trapa	#0
+#endif /* ! L_gtsf2t */
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(fun_label)
+#endif /* L_gtsf2t */
+
+#if defined(L_gesf2f) || defined(L_gesf2f_trap)
+/* -ffinite-math-only inline version, T := r4:SF >= r5:SF */
+	cmp/pz	r5
+	mov	r4,r0
+	bf/s	0f
+	 cmp/hs	r4,r5
+	cmp/ge	r5,r4
+	or	r5,r0
+	bt	0f
+	add	r0,r0
+	tst	r0,r0
+	0:
+#ifdef L_gesf2f
+#define fun_label GLOBAL(gesf2f)
+#else
+#define fun_label GLOBAL(gesf2f_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater or equal, the result is
+	   true, unless any of them is a nan.  If both are -+zero, the
+	   result is true; otherwise, it is false.
+	   We use 0 as true and nonzero as false for this function.  */
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	cmp/pz	r5
+	not	r4,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	r4,r0
+	bt	LOCAL(nan)
+	cmp/gt	r4,r5
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/ge	r1,r5)
+	bt	LOCAL(nan)
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	not	r5,r0
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	r5,r4
+#if defined(L_gesf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+	rts
+	movt	r0
+#if defined(L_gesf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+	cmp/ge	r1,r5
+LOCAL(nan):
+	rts
+	movt	r0
+#endif /* ! DELAYED_BRANCHES */
+#ifdef L_gesf2f_trap
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,r5)
+	bt	LOCAL(nan)
+	rts
+LOCAL(nan):
+	movt	r0
+	trapa	#0
+#endif /* L_gesf2f_trap */
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(gesf2f))
+#endif /* L_gesf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_add_sub_sf3
+#include "IEEE-754/addsf3.S"
+#endif /* _add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+#include "IEEE-754/fixunssfsi.S"
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+#include "IEEE-754/fixsfsi.S"
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/divsf3.S"
+#endif /* L_divsf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift.  Supporting SH1 / SH2 here would
+   make this code too hard to maintain, so if you want to add SH1 / SH2
+   support, do it in a separate copy.  */
+#ifdef DYN_SHIFT
+#ifdef L_add_sub_sf3
+#include "IEEE-754/m3/addsf3.S"
+#endif /* L_add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/m3/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get UINT_MAX, for set sign bit, you get 0.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixunssfsi)
+	FUNC(GLOBAL(fixunssfsi))
+GLOBAL(fixunssfsi):
+	mov.l	LOCAL(max),r2
+	mov	#-23,r1
+	mov	r4,r0
+	shad	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/ge	r2,r0
+	or	r2,r0
+	bt	LOCAL(retmax)
+	cmp/pz	r4
+	and	r1,r0
+	bf	LOCAL(ret0)
+	add	#-23,r4
+	rts
+	shld	r4,r0
+LOCAL(ret0):
+LOCAL(retmax):
+	rts
+	subc	r0,r0
+	.balign 4
+LOCAL(mask):
+	.long	0x00ffffff
+LOCAL(max):
+	.long	0x4f800000
+	ENDFUNC(GLOBAL(fixunssfsi))
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get INT_MAX, for set sign bit, you get INT_MIN.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixsfsi)
+	FUNC(GLOBAL(fixsfsi))
+	.balign	4
+GLOBAL(fixsfsi):
+	mov	r4,r0
+	shll	r4
+	mov	#-24,r1
+	bt	LOCAL(neg)
+	mov.l	LOCAL(max),r2
+	shld	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/pz	r4
+	add	#-23,r4
+	bf	LOCAL(ret0)
+	cmp/gt	r0,r2
+	bf	LOCAL(retmax)
+	and	r1,r0
+	addc	r1,r0
+	rts
+	shld	r4,r0
+
+	.balign	4
+LOCAL(neg):
+	mov.l	LOCAL(min),r2
+	shld	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/pz	r4
+	add	#-23,r4
+	bf	LOCAL(ret0)
+	cmp/gt	r0,r2
+	bf	LOCAL(retmin)
+	and	r1,r0
+	addc	r1,r0
+	shld	r4,r0	! SH4-200 will start this insn on a new cycle
+	rts
+	neg	r0,r0
+
+	.balign	4
+LOCAL(ret0):
+	rts
+	mov	#0,r0
+
+LOCAL(retmax):
+	mov	#-1,r0
+	rts
+	shlr	r0
+
+LOCAL(retmin):
+	mov	#1,r0
+	rts
+	rotr	r0
+
+	.balign 4
+LOCAL(mask):
+	.long	0x007fffff
+LOCAL(max):
+	.long	0x4f000000
+LOCAL(min):
+	.long	0xcf000000
+	ENDFUNC(GLOBAL(fixsfsi))
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/m3/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/m3/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/m3/divsf3.S"
+#endif /* L_divsf3 */
+
+#ifdef L_hypotf
+	.balign 4
+	.global GLOBAL(hypotf)
+	FUNC(GLOBAL(hypotf))
+GLOBAL(hypotf):
+/* This integer implementation takes 71 to 72 cycles in the main path.
+   This is a bit slower than the SH4 can do this computation using double
+   precision hardware floating point - 57 cycles, or 69 with mode switches.  */
+ /* First, calculate x (r4) as the sum of the square of the fractions -
+    the exponent is calculated separately in r3.
+    Then, calculate sqrt(x) for the fraction by reciproot iteration.
+    We get an 7.5 bit inital value using linear approximation with two slopes
+    that are powers of two.
+    x (- [1. .. 2.)  y0 := 1.25 - x/4 - tab(x)   y (- (0.8 .. 1.0)
+    x (- [2. .. 4.)  y0 := 1.   - x/8 - tab(x)   y (- (0.5 .. 0.8)
+ x is represented with two bits before the point,
+ y with 0 bits before the binary point.
+ Thus, to calculate y0 := 1. - x/8 - tab(x), all you have to do is to shift x
+ right by 1, negate it, and subtract tab(x).  */
+
+ /* y1 := 1.5*y0 - 0.5 * (x * y0) * (y0 * y0)
+    z0 := x * y1
+    z1 := z0 + 0.5 * (y1 - (y1*y1) * z0) */
+
+	mov.l	LOCAL(xff000000),r1
+	add	r4,r4
+	mov	r4,r0
+	add	r5,r5
+	cmp/hs	r5,r4
+	sub	r5,r0
+	mov	#-24,r2
+	bf/s	LOCAL(r5_large)
+	shad	r2,r0
+	mov	r4,r3
+	shll8	r4
+	rotcr	r4
+	tst	#0xe0,r0
+	neg	r0,r0
+	bt	LOCAL(ret_abs_r3)
+	tst	r1,r5
+	shll8	r5
+	bt/s	LOCAL(denorm_r5)
+	cmp/hi	r3,r1
+	dmulu.l	r4,r4
+	bf	LOCAL(inf_nan)
+	rotcr	r5
+	shld	r0,r5
+LOCAL(denorm_r5_done):
+	sts	mach,r4
+	dmulu.l	r5,r5
+	mov.l	r6,@-r15
+	mov	#20,r6
+
+	sts	mach,r5
+LOCAL(add_frac):
+	mova	LOCAL(tab)-32,r0
+	mov.l	r7,@-r15
+	mov.w	LOCAL(x1380),r7
+	and	r1,r3
+	addc	r5,r4
+	mov.w	LOCAL(m25),r2	! -25
+	bf	LOCAL(frac_ok)
+	sub	r1,r3
+	rotcr	r4
+	cmp/eq	r1,r3	! did we generate infinity ?
+	bt	LOCAL(inf_nan)
+	shlr	r4
+	mov	r4,r1
+	shld	r2,r1
+	mov.b	@(r0,r1),r0
+	mov	r4,r1
+	shld	r6,r1
+	bra	LOCAL(frac_low2)
+	sub	r1,r7
+
+LOCAL(frac_ok):
+	mov	r4,r1
+	shld	r2,r1
+	mov.b	@(r0,r1),r1
+	cmp/pz	r4
+	mov	r4,r0
+	bt/s	LOCAL(frac_low)
+	shld	r6,r0
+	mov.w	LOCAL(xf80),r7
+	shlr	r0
+LOCAL(frac_low):
+	sub	r0,r7
+LOCAL(frac_low2):
+	mov.l	LOCAL(x40000080),r0 ! avoid denorm results near 1. << r3
+	sub	r1,r7	! {0.12}
+	mov.l	LOCAL(xfffe0000),r5 ! avoid rounding overflow near 4. << r3
+	swap.w	r7,r1	! {0.28}
+	dmulu.l	r1,r4 /* two issue cycles */
+	mulu.w	r7,r7  /* two issue cycles */
+	sts	mach,r2	! {0.26}
+	mov	r1,r7
+	shlr	r1
+	sts	macl,r6	! {0.24}
+	cmp/hi	r0,r4
+	shlr2	r2
+	bf	LOCAL(near_one)
+	shlr	r2	! {0.23} systemic error of linear approximation keeps y1 < 1
+	dmulu.l	r2,r6
+	cmp/hs	r5,r4
+	add	r7,r1	! {1.28}
+	bt	LOCAL(near_four)
+	shlr2	r1	! {1.26}
+	sts	mach,r0	! {0.15} x*y0^3 == {0.16} 0.5*x*y0^3
+	shlr2	r1	! {1.24}
+	shlr8	r1	! {1.16}
+	sett		! compensate for truncation of subtrahend, keep y1 < 1
+	subc	r0,r1   ! {0.16} y1;  max error about 3.5 ulp
+	swap.w	r1,r0
+	dmulu.l	r0,r4	! { 1.30 }
+	mulu.w	r1,r1
+	sts	mach,r2
+	shlr2	r0
+	sts	macl,r1
+	add	r2,r0
+	mov.l	LOCAL(xff000000),r6
+	add	r2,r0
+	dmulu.l	r1,r2
+	add	#127,r0
+	add	r6,r3	! precompensation for adding leading 1
+	sts	mach,r1
+	shlr	r3
+	mov.l	@r15+,r7
+	sub	r1,r0	! {0.31} max error about 50 ulp (+127)
+	mov.l	@r15+,r6
+	shlr8	r0	! {0.23} max error about 0.7 ulp
+	rts
+	add	r3,r0
+	
+LOCAL(r5_large):
+	mov	r5,r3
+	mov	#-31,r2
+	cmp/ge	r2,r0
+	shll8	r5
+	bf	LOCAL(ret_abs_r3)
+	rotcr	r5
+	tst	r1,r4
+	shll8	r4
+	bt/s	LOCAL(denorm_r4)
+	cmp/hi	r3,r1
+	dmulu.l	r5,r5
+	bf	LOCAL(inf_nan)
+	rotcr	r4
+LOCAL(denorm_r4_done):
+	shld	r0,r4
+	sts	mach,r5
+	dmulu.l	r4,r4
+	mov.l	r6,@-r15
+	mov	#20,r6
+	bra	LOCAL(add_frac)
+	sts	mach,r4
+
+LOCAL(near_one):
+	bra	LOCAL(assemble_sqrt)
+	mov	#0,r0
+LOCAL(near_four):
+	! exact round-to-nearest would add 255.  We add 256 for speed & compactness.
+	mov	r4,r0
+	shlr8	r0
+	add	#1,r0
+	tst	r0,r0
+	addc	r0,r3	! might generate infinity.
+LOCAL(assemble_sqrt):
+	mov.l	@r15+,r7
+	shlr	r3
+	mov.l	@r15+,r6
+	rts
+	add	r3,r0
+LOCAL(inf_nan):
+LOCAL(ret_abs_r3):
+	mov	r3,r0
+	rts
+	shlr	r0
+LOCAL(denorm_r5):
+	bf	LOCAL(inf_nan)
+	tst	r1,r4
+	bt	LOCAL(denorm_both)
+	dmulu.l	r4,r4
+	bra	LOCAL(denorm_r5_done)
+	shld	r0,r5
+LOCAL(denorm_r4):
+	bf	LOCAL(inf_nan)
+	tst	r1,r5
+	dmulu.l	r5,r5
+	bf	LOCAL(denorm_r4_done)
+LOCAL(denorm_both):	! normalize according to r3.
+	extu.w	r3,r2
+	mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r3,r2
+	mov	#-8,r2
+	bt	0f
+	tst	r1,r3
+	mov	#-16,r2
+	bt	0f
+	mov	#-24,r2
+0:
+	shld	r2,r3
+	mov.l	r7,@-r15
+#ifdef __pic__
+	add	r0,r3
+	mova	 LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r3),r0
+	add	#32,r2
+	sub	r0,r2
+	shld	r2,r4
+	mov	r2,r7
+	dmulu.l	r4,r4
+	sts.l	pr,@-r15
+	mov	#1,r3
+	bsr	LOCAL(denorm_r5_done)
+	shld	r2,r5
+	mov.l	LOCAL(x01000000),r1
+	neg	r7,r2
+	lds.l	@r15+,pr
+	tst	r1,r0
+	mov.l	@r15+,r7
+	bt	0f
+	add	#1,r2
+	sub	r1,r0
+0:
+	rts
+	shld	r2,r0
+
+LOCAL(m25):
+	.word	-25
+LOCAL(x1380):
+	.word	0x1380
+LOCAL(xf80):
+	.word	0xf80
+	.balign	4
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x40000080):
+	.long	0x40000080
+LOCAL(xfffe0000):
+	.long	0xfffe0000
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+
+/*
+double err(double x)
+{
+  return (x < 2. ? 1.25 - x/4. : 1. - x/8.) - 1./sqrt(x);
+}
+
+int
+main ()
+{
+  int i = 0;
+  double x, s, v;
+  double lx, hx;
+
+  s = 1./32.;
+  for (x = 1.; x < 4; x += s, i++)
+    {
+      lx = x;
+      hx = x + s - 1. / (1 << 30);
+      v = 0.5 * (err (lx) + err (hx));
+      printf ("%s% 4d%c",
+              (i & 7) == 0 ? "\t.byte\t" : "",
+              (int)(v * 4096 + 0.5) - 128,
+              (i & 7) == 7 ? '\n' : ',');
+    }
+  return 0;
+} */
+
+	.balign	4
+LOCAL(tab):
+	.byte	-113, -84, -57, -33, -11,   8,  26,  41
+	.byte	  55,  67,  78,  87,  94, 101, 106, 110
+	.byte	 113, 115, 115, 115, 114, 112, 109, 106
+	.byte	 101,  96,  91,  84,  77,  69,  61,  52
+	.byte	  51,  57,  63,  68,  72,  77,  80,  84
+	.byte	  87,  89,  91,  93,  95,  96,  97,  97
+	.byte	  97,  97,  97,  96,  95,  94,  93,  91
+	.byte	  89,  87,  84,  82,  79,  76,  72,  69
+	.byte	  65,  61,  57,  53,  49,  44,  39,  34
+	.byte	  29,  24,  19,  13,   8,   2,  -4, -10
+	.byte	 -17, -23, -29, -36, -43, -50, -57, -64
+	.byte	 -71, -78, -85, -93,-101,-108,-116,-124
+	ENDFUNC(GLOBAL(hypotf))
+#endif /* L_hypotf */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_ANY__ */
Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md	(revision 162269)
+++ gcc/config/sh/sh.md	(working copy)
@@ -107,6 +107,7 @@ (define_constants [
   (DR0_REG	64)
   (DR2_REG	66)
   (DR4_REG	68)
+  (FR4_REG	68)
   (FR23_REG	87)
 
   (TR0_REG	128)
@@ -174,6 +175,15 @@ (define_constants [
   (UNSPECV_WINDOW_END	10)
   (UNSPECV_CONST_END	11)
   (UNSPECV_EH_RETURN	12)
+  ;; NaN handling for software floating point:
+  ;; We require one bit specific for a precision to be set in all NaNs,
+  ;; so that we can test them with a not / tst sequence.
+  ;; ??? Ironically, this is the quiet bit for now, because that is the
+  ;; only bit set by __builtin_nan ("").
+  ;; ??? Should really use one bit lower and force it set by using
+  ;; a custom encoding function.
+  (SF_NAN_MASK         0x7fc00000)
+  (DF_NAN_MASK         0x7ff80000)
 ])
 
 ;; -------------------------------------------------------------------------
@@ -615,6 +625,14 @@ (define_insn "cmpeqsi_t"
 	cmp/eq	%1,%0"
    [(set_attr "type" "mt_group")])
 
+(define_insn "fpcmp_i1"
+  [(set (reg:SI T_REG)
+	(match_operator:SI 1 "soft_fp_comparison_operator"
+	  [(match_operand 0 "soft_fp_comparison_operand" "r") (const_int 0)]))]
+  "TARGET_SH1_SOFTFP"
+  "tst	%0,%0"
+   [(set_attr "type" "mt_group")])
+
 (define_insn "cmpgtsi_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:SI 0 "arith_reg_operand" "r,r")
@@ -1154,9 +1172,9 @@ (define_insn_and_split "*movsicc_umin"
 
 (define_insn "*movsicc_t_false"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
-	(if_then_else (eq (reg:SI T_REG) (const_int 0))
-		      (match_operand:SI 1 "general_movsrc_operand" "r,I08")
-		      (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+	(if_then_else:SI (eq (reg:SI T_REG) (const_int 0))
+			 (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+			 (match_operand:SI 2 "arith_reg_operand" "0,0")))]
   "TARGET_PRETEND_CMOVE
    && (arith_reg_operand (operands[1], SImode)
        || (immediate_operand (operands[1], SImode)
@@ -1167,9 +1185,9 @@ (define_insn "*movsicc_t_false"
 
 (define_insn "*movsicc_t_true"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
-	(if_then_else (ne (reg:SI T_REG) (const_int 0))
-		      (match_operand:SI 1 "general_movsrc_operand" "r,I08")
-		      (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+	(if_then_else:SI (ne (reg:SI T_REG) (const_int 0))
+			 (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+			 (match_operand:SI 2 "arith_reg_operand" "0,0")))]
   "TARGET_PRETEND_CMOVE
    && (arith_reg_operand (operands[1], SImode)
        || (immediate_operand (operands[1], SImode)
@@ -6849,6 +6867,50 @@ (define_insn "stuff_delay_slot"
 \f
 ;; Conditional branch insns
 
+(define_expand "cmpun_sdf"
+  [(unordered (match_operand 0 "" "") (match_operand 1 "" ""))]
+  ""
+  "
+{
+  HOST_WIDE_INT mask;
+  switch (GET_MODE (operands[0]))
+    {
+    case SFmode:
+      mask = SF_NAN_MASK;
+      break;
+    case DFmode:
+      mask = DF_NAN_MASK;
+      break;
+    default:
+      FAIL;
+    }
+  emit_insn (gen_cmpunsf_i1 (operands[0], operands[1],
+                            force_reg (SImode, GEN_INT (mask))));
+  DONE;
+}")
+
+(define_expand "cmpuneq_sdf"
+  [(uneq (match_operand 0 "" "") (match_operand 1 "" ""))]
+  ""
+  "
+{
+  HOST_WIDE_INT mask;
+  switch (GET_MODE (operands[0]))
+    {
+    case SFmode:
+      mask = SF_NAN_MASK;
+      break;
+    case DFmode:
+      mask = DF_NAN_MASK;
+      break;
+    default:
+      FAIL;
+    }
+  emit_insn (gen_cmpuneqsf_i1 (operands[0], operands[1],
+                              force_reg (SImode, GEN_INT (mask))));
+  DONE;
+}")
+
 (define_expand "cbranchint4_media"
   [(set (pc)
 	(if_then_else (match_operator 0 "shmedia_cbranch_comparison_operator"
@@ -9355,6 +9417,19 @@ (define_expand "cstoredi4"
 
 
 
+(define_expand "sunle"
+  [(set (match_operand:SI 0 "arith_reg_operand" "")
+	(match_dup 1))]
+  "TARGET_SH1_SOFTFP"
+  "
+{
+  if (! currently_expanding_to_rtl)
+    FAIL;
+  sh_emit_compare_and_branch (operands, UNLE);
+  emit_insn (gen_movt (operands[0]));
+  DONE;
+}")
+
 ;; sne moves the complement of the T reg to DEST like this:
 ;;      cmp/eq ...
 ;;      mov    #-1,temp
@@ -9394,11 +9469,15 @@ (define_split
 (define_expand "cstoresf4"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(match_operator:SI 1 "sh_float_comparison_operator"
-	 [(match_operand:SF 2 "arith_operand" "")
-	  (match_operand:SF 3 "arith_operand" "")]))]
+	 [(match_operand:SF 2 "nonmemory_operand" "")
+	  (match_operand:SF 3 "nonmemory_operand" "")]))]
   "TARGET_SH2E || TARGET_SHMEDIA_FPU"
   "if (TARGET_SHMEDIA)
      {
+       if (!arith_operand (operands[2], DFmode))
+	 operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+       if (!arith_operand (operands[3], DFmode))
+	 operands[3] = copy_to_mode_reg (DFmode, operands[3]);
        emit_insn (gen_cstore4_media (operands[0], operands[1],
 				     operands[2], operands[3]));
        DONE;
@@ -9407,18 +9486,22 @@ (define_expand "cstoresf4"
    if (! currently_expanding_to_rtl)
      FAIL;
    
-   sh_emit_compare_and_set (operands, SFmode);
+   sh_expand_float_scc (operands);
    DONE;
 ")
 
 (define_expand "cstoredf4"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(match_operator:SI 1 "sh_float_comparison_operator"
-	 [(match_operand:DF 2 "arith_operand" "")
-	  (match_operand:DF 3 "arith_operand" "")]))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+	 [(match_operand:DF 2 "nonmemory_operand" "")
+	  (match_operand:DF 3 "nonmemory_operand" "")]))]
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "if (TARGET_SHMEDIA)
      {
+       if (!arith_operand (operands[2], DFmode))
+	 operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+       if (!arith_operand (operands[3], DFmode))
+	 operands[3] = copy_to_mode_reg (DFmode, operands[3]);
        emit_insn (gen_cstore4_media (operands[0], operands[1],
 				     operands[2], operands[3]));
        DONE;
@@ -9427,7 +9510,7 @@ (define_expand "cstoredf4"
     if (! currently_expanding_to_rtl)
       FAIL;
    
-   sh_emit_compare_and_set (operands, DFmode);
+   sh_expand_float_scc (operands);
    DONE;
 ")
 
@@ -9765,7 +9848,7 @@ (define_expand "addsf3"
   [(set (match_operand:SF 0 "arith_reg_operand" "")
 	(plus:SF (match_operand:SF 1 "arith_reg_operand" "")
 		 (match_operand:SF 2 "arith_reg_operand" "")))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SH2E)
@@ -9773,6 +9856,12 @@ (define_expand "addsf3"
       expand_sf_binop (&gen_addsf3_i, operands);
       DONE;
     }
+  else if (TARGET_OSFP)
+    {
+      expand_sfunc_binop (SFmode, &gen_addsf3_i3, \"__addsf3\", PLUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*addsf3_media"
@@ -9871,6 +9960,22 @@ (define_insn_and_split "binary_sf_op1"
 }"
   [(set_attr "type" "fparith_media")])
 
+(define_insn "addsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(plus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R6_REG))
+   (clobber (reg:SI R7_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "addsf3_i"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(plus:SF (match_operand:SF 1 "fp_arith_reg_operand" "%0")
@@ -9885,7 +9990,7 @@ (define_expand "subsf3"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
 	(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
 		  (match_operand:SF 2 "fp_arith_reg_operand" "")))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SH2E)
@@ -9893,6 +9998,12 @@ (define_expand "subsf3"
       expand_sf_binop (&gen_subsf3_i, operands);
       DONE;
     }
+  else if (TARGET_OSFP)
+    {
+      expand_sfunc_binop (SFmode, &gen_subsf3_i3, \"__subsf3\", MINUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*subsf3_media"
@@ -9903,6 +10014,23 @@ (define_insn "*subsf3_media"
   "fsub.s	%1, %2, %0"
   [(set_attr "type" "fparith_media")])
 
+(define_insn "subsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(minus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R5_REG))
+   (clobber (reg:SI R6_REG))
+   (clobber (reg:SI R7_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "subsf3_i"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")
@@ -9915,11 +10043,32 @@ (define_insn "subsf3_i"
 
 (define_expand "mulsf3"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
-	(mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
-		 (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+        (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+                 (match_operand:SF 2 "fp_arith_reg_operand" "")))]
   "TARGET_SH2E || TARGET_SHMEDIA_FPU"
   "")
 
+;(define_expand "mulsf3"
+;  [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+;        (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+;                 (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+;  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
+;  "
+; { 
+;   if (TARGET_SH4 || TARGET_SH2A_SINGLE) 
+;     expand_sf_binop (&gen_mulsf3_i4, operands);
+;   else if (TARGET_SH2E)
+;     emit_insn (gen_mulsf3_ie (operands[0], operands[1], operands[2]));
+;  else if (TARGET_OSFP)
+;    { 
+;      expand_sfunc_binop (SFmode, &gen_mulsf3_i3, \"__mulsf3\", MULT,
+;                         operands);
+;      DONE;
+;    }
+;   if (! TARGET_SHMEDIA)
+;     DONE;
+; }")
+
 (define_insn "*mulsf3_media"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -9959,6 +10108,22 @@ (define_insn "mulsf3_i4"
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "single")])
 
+(define_insn "mulsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(mult:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "mac_media"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(plus:SF (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -10119,6 +10284,149 @@ (define_insn "*fixsfsi"
   "ftrc	%1,%0"
   [(set_attr "type" "fp")])
 
+(define_insn "cmpnesf_i1"
+  [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+	(compare:CC_FP_NE (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtsf_i1"
+  [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+	(compare:CC_FP_GT (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltsf_i1"
+  [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+	(compare:CC_FP_UNLT (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqsf_i1_finite"
+  [(set (reg:SI T_REG)
+	(eq:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+	       (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+   (clobber (match_scratch:SI 2 "=0,1,?r"))]
+  "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+  "*
+{
+  if (which_alternative == 0)
+     output_asm_insn (\"cmp/eq\t%0,%1\;or\t%1,%2\;bt\t0f\", operands);
+  else if (which_alternative == 1)
+     output_asm_insn (\"cmp/eq\t%0,%1\;or\t%0,%2\;bt\t0f\", operands);
+  else
+    output_asm_insn (\"cmp/eq\t%0,%1\;mov\t%0,%2\;bt\t0f\;or\t%1,%2\",
+		     operands);
+  return \"add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+  [(set_attr "length" "10,10,12")])
+
+(define_insn "cmplesf_i1_finite"
+  [(set (reg:SI T_REG)
+	(le:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+	       (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+   (clobber (match_scratch:SI 2 "=0,1,r"))]
+  "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+  "*
+{
+  output_asm_insn (\"cmp/pz\t%0\", operands);
+  if (which_alternative == 2)
+    output_asm_insn (\"mov\t%0,%2\", operands);
+  if (TARGET_SH2)
+    output_asm_insn (\"bf/s\t0f\;cmp/hs\t%1,%0\;cmp/ge\t%0,%1\", operands);
+  else
+    output_asm_insn (\"bt\t1f\;bra\t0f\;cmp/hs\t%1,%0\\n1:\tcmp/ge\t%0,%1\",
+		     operands);
+  if (which_alternative == 1)
+    output_asm_insn (\"or\t%0,%2\", operands);
+  else
+    output_asm_insn (\"or\t%1,%2\", operands);
+  return \"bt\t0f\;add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+  [(set_attr "length" "18,18,20")])
+
+(define_insn "cmpunsf_i1"
+  [(set (reg:SI T_REG)
+	(unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
+		      (match_operand:SF 1 "arith_reg_operand" "r,r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+   (clobber (match_scratch:SI 3 "=0,&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+  [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqsf_i1"
+  [(set (reg:SI T_REG)
+	(uneq:SI (match_operand:SF 0 "arith_reg_operand" "r")
+		 (match_operand:SF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "*
+{
+  output_asm_insn (\"not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\", operands);
+  output_asm_insn (\"bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%0,%1\", operands);
+  output_asm_insn (\"mov\t%0,%3\;bt\t0f\;or\t%1,%3\", operands);
+  return \"add\t%3,%3\;tst\t%3,%3\\n0:\";
+}"
+  [(set_attr "length" "24")])
+
+(define_insn "movcc_fp_ne"
+  [(set (match_operand:CC_FP_NE 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_NE 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_gt"
+  [(set (match_operand:CC_FP_GT 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_GT 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_unlt"
+  [(set (match_operand:CC_FP_UNLT 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_UNLT 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
 (define_insn "cmpgtsf_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
@@ -10146,6 +10454,22 @@ (define_insn "ieee_ccmpeqsf_t"
   "* return output_ieee_ccmpeq (insn, operands);"
   [(set_attr "length" "4")])
 
+(define_insn "*cmpltgtsf_t"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+  "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")])
+
+(define_insn "*cmporderedsf_t"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+  "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")])
+
 
 (define_insn "cmpgtsf_t_i4"
   [(set (reg:SI T_REG)
@@ -10178,6 +10502,26 @@ (define_insn "*ieee_ccmpeqsf_t_4"
   [(set_attr "length" "4")
    (set_attr "fp_mode" "single")])
 
+(define_insn "*cmpltgtsf_t_4"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "single")])
+
+(define_insn "*cmporderedsf_t_4"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "single")])
+
 (define_insn "cmpeqsf_media"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(eq:SI (match_operand:SF 1 "fp_arith_reg_operand" "f")
@@ -10213,18 +10557,24 @@ (define_insn "cmpunsf_media"
 (define_expand "cbranchsf4"
   [(set (pc)
 	(if_then_else (match_operator 0 "sh_float_comparison_operator"
-		       [(match_operand:SF 1 "arith_operand" "")
-			(match_operand:SF 2 "arith_operand" "")])
+		       [(match_operand:SF 1 "nonmemory_operand" "")
+			(match_operand:SF 2 "nonmemory_operand" "")])
 		      (match_operand 3 "" "")
 		      (pc)))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SHMEDIA)
-    emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
-					  operands[3]));
+    {
+      if (!arith_operand (operands[1], SFmode))
+	operands[1] = copy_to_mode_reg (SFmode, operands[1]);
+      if (!arith_operand (operands[2], SFmode))
+	operands[2] = copy_to_mode_reg (SFmode, operands[2]);
+      emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+					    operands[2], operands[3]));
+    }
   else
-    sh_emit_compare_and_branch (operands, SFmode);
+    sh_expand_float_cbranch (operands);
   DONE;
 }")
 
@@ -10426,11 +10776,39 @@ (define_insn "abssf2_i"
   [(set_attr "type" "fmove")
    (set_attr "fp_mode" "single")])
 
+(define_expand "abssc2"
+  [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+	(abs:SF (match_operand:SC 1 "fp_arith_reg_operand" "")))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "
+{
+  expand_sfunc_unop (SCmode, &gen_abssc2_i3, \"__hypotf\", ABS, operands);
+  DONE;
+}")
+
+(define_insn "abssc2_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(abs:SF (reg:SC R4_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI R5_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "adddf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(plus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
 		 (match_operand:DF 2 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_FPU_DOUBLE || TARGET_SH3"
   "
 {
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10438,6 +10816,12 @@ (define_expand "adddf3"
       expand_df_binop (&gen_adddf3_i, operands);
       DONE;
     }
+  else if (TARGET_SH3)
+    {
+      expand_sfunc_binop (DFmode, &gen_adddf3_i3_wrap, \"__adddf3\", PLUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*adddf3_media"
@@ -10458,6 +10842,30 @@ (define_insn "adddf3_i"
   [(set_attr "type" "dfp_arith")
    (set_attr "fp_mode" "double")])
 
+(define_expand "adddf3_i3_wrap"
+  [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+  "TARGET_SH3"
+  "
+{
+  emit_insn (gen_adddf3_i3 (operands[1]));
+  emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+  DONE;
+}")
+
+(define_insn "adddf3_i3"
+  [(set (reg:DF R0_REG)
+	(plus:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:DI R2_REG))
+   (clobber (reg:DF R4_REG))
+   (clobber (reg:DF R6_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH3"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "subdf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(minus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10494,7 +10902,7 @@ (define_expand "muldf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(mult:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
 		 (match_operand:DF 2 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_FPU_DOUBLE || TARGET_SH3"
   "
 {
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10502,6 +10910,12 @@ (define_expand "muldf3"
       expand_df_binop (&gen_muldf3_i, operands);
       DONE;
     }
+  else if (TARGET_SH3)
+    {
+      expand_sfunc_binop (DFmode, &gen_muldf3_i3_wrap, \"__muldf3\", MULT,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*muldf3_media"
@@ -10522,6 +10936,32 @@ (define_insn "muldf3_i"
   [(set_attr "type" "dfp_mul")
    (set_attr "fp_mode" "double")])
 
+(define_expand "muldf3_i3_wrap"
+  [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+  "TARGET_SH3"
+  "
+{
+  emit_insn (gen_muldf3_i3 (operands[1]));
+  emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+  DONE;
+}")
+
+(define_insn "muldf3_i3"
+  [(set (reg:DF R0_REG)
+	(mult:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:DI R2_REG))
+   (clobber (reg:DF R4_REG))
+   (clobber (reg:DF R6_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH3"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "divdf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(div:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10651,6 +11091,73 @@ (define_insn "fix_truncdfsi2_i"
 ;; 	      (use (match_dup 2))])
 ;;    (set (match_dup 0) (reg:SI FPUL_REG))])
 
+(define_insn "cmpnedf_i1"
+  [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+	(compare:CC_FP_NE (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtdf_i1"
+  [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+	(compare:CC_FP_GT (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltdf_i1"
+  [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+	(compare:CC_FP_UNLT (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqdf_i1_finite"
+  [(set (reg:SI T_REG)
+	(eq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+	       (match_operand:DF 1 "arith_reg_operand" "r")))
+   (clobber (match_scratch:SI 2 "=&r"))]
+  "TARGET_SH1_SOFTFP && flag_finite_math_only"
+  "cmp/eq\t%R0,%R1\;mov\t%S0,%2\;bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;or\t%S1,%2\;add\t%2,%2\;or\t%R0,%2\;tst\t%2,%2\\n0:"
+  [(set_attr "length" "18")])
+
+(define_insn "cmpundf_i1"
+  [(set (reg:SI T_REG)
+	(unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
+		      (match_operand:DF 1 "arith_reg_operand" "r,r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+   (clobber (match_scratch:SI 3 "=0,&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+  [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqdf_i1"
+  [(set (reg:SI T_REG)
+	(uneq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+		 (match_operand:DF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
+  "TARGET_SH1_SOFTFP"
+  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%R0,%R1\; bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;mov\t%S0,%3\;or\t%S1,%3\;add\t%3,%3\;or\t%R0,%3\;tst\t%3,%3\\n0:"
+  [(set_attr "length" "30")])
+
 (define_insn "cmpgtdf_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:DF 0 "arith_reg_operand" "f")
@@ -10682,6 +11189,26 @@ (define_insn "*ieee_ccmpeqdf_t"
   [(set_attr "length" "4")
    (set_attr "fp_mode" "double")])
 
+(define_insn "*cmpltgtdf_t"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_DOUBLE"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "double")])
+
+(define_insn "*cmpordereddf_t_4"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "double")])
+
 (define_insn "cmpeqdf_media"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(eq:SI (match_operand:DF 1 "fp_arith_reg_operand" "f")
@@ -10717,18 +11244,24 @@ (define_insn "cmpundf_media"
 (define_expand "cbranchdf4"
   [(set (pc)
 	(if_then_else (match_operator 0 "sh_float_comparison_operator"
-		       [(match_operand:DF 1 "arith_operand" "")
-			(match_operand:DF 2 "arith_operand" "")])
+		       [(match_operand:DF 1 "nonmemory_operand" "")
+			(match_operand:DF 2 "nonmemory_operand" "")])
 		      (match_operand 3 "" "")
 		      (pc)))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SHMEDIA)
-    emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
-					  operands[3]));
+    {
+      if (!arith_operand (operands[1], DFmode))
+	operands[1] = copy_to_mode_reg (DFmode, operands[1]);
+      if (!arith_operand (operands[2], DFmode))
+	operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+      emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+					    operands[2], operands[3]));
+    }
   else
-    sh_emit_compare_and_branch (operands, DFmode);
+    sh_expand_float_cbranch (operands);
   DONE;
 }")
 
@@ -10850,16 +11383,82 @@ (define_insn "extendsfdf2_i4"
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "double")])
 
+;; ??? In order to use this efficiently, we'd have to have an extra
+;; register class for r0 and r1 - and that would cause repercussions in
+;; register allocation elsewhere.  So just say we clobber r0 / r1, and
+;; that we can use an arbitrary target.  */
+(define_insn_and_split "extendsfdf2_i1"
+  [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+	(float_extend:DF (reg:SF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (reg:DF R0_REG))]
+  "emit_insn (gen_extendsfdf2_i1_r0 (operands[1]));"
+  [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i1_r0"
+  [(set (reg:DF R0_REG) (float_extend:DF (reg:SF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn_and_split "extendsfdf2_i2e"
+  [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+	(float_extend:DF (reg:SF FR4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI FPUL_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (reg:DF R0_REG))]
+  "emit_insn (gen_extendsfdf2_i2e_r0 (operands[1]));"
+  [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i2e_r0"
+  [(set (reg:DF R0_REG) (float_extend:DF (reg:SF FR4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI FPUL_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "truncdfsf2"
   [(set (match_operand:SF 0 "fpul_operand" "")
-	(float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
-  "
-{
+        (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
+  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU" 
+  "                      
+{                     
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
     {
       emit_df_insn (gen_truncdfsf2_i4 (operands[0], operands[1],
-				       get_fpscr_rtx ()));
+                                       get_fpscr_rtx ()));
       DONE;
     }
 }")
@@ -10879,6 +11478,23 @@ (define_insn "truncdfsf2_i4"
   "fcnvds  %1,%0"
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "double")])
+
+(define_insn "truncdfsf2_i2e"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=w")
+	(float_truncate:SF (reg:DF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI FPUL_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 \f
 ;; Bit field extract patterns.  These give better code for packed bitfields,
 ;; because they allow auto-increment addresses to be generated.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-16 14:01     ` Kaz Kojima
@ 2010-07-17 14:31       ` Joern Rennecke
  2010-07-17 21:23         ` Kaz Kojima
  0 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-17 14:31 UTC (permalink / raw)
  To: Kaz Kojima; +Cc: Naveen.S, gcc, Prafulla.Thakare

Quoting Kaz Kojima <kkojima@rr.iij4u.or.jp>:

> sh_softfp.patch looks basically OK to me, though I'm curious
> with numbers for fp-bit.c/softfp/softfloat.  Could you show us
> some real speed&size numbers?

I don't have any sh[1234] hardware to EEMBC tests on, but the runtime
difference of 'make check' on i686-pc-linux-gnu X sh-elf using a core2 quad
for fp-bit vs. softfloat (w/ compare/conversion/divsf) is two hours 4 minutes
versus 38 minutes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-17 14:31       ` Joern Rennecke
@ 2010-07-17 21:23         ` Kaz Kojima
  2010-07-19  9:23           ` Naveen H. S
  0 siblings, 1 reply; 30+ messages in thread
From: Kaz Kojima @ 2010-07-17 21:23 UTC (permalink / raw)
  To: amylaar; +Cc: Naveen.S, gcc, Prafulla.Thakare

Joern Rennecke <amylaar@spamcop.net> wrote:
> I don't have any sh[1234] hardware to EEMBC tests on, but the runtime
> difference of 'make check' on i686-pc-linux-gnu X sh-elf using a core2 quad
> for fp-bit vs. softfloat (w/ compare/conversion/divsf) is two hours 4 minutes
> versus 38 minutes.

Very impressive.  Thanks!
Reducing 124 minutes to 38 minutes is too attractive.

Regards,
	kaz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: SH optimized software floating point routines
  2010-07-17 13:30     ` Joern Rennecke
@ 2010-07-19  0:59       ` Joern Rennecke
  2010-07-20 13:35         ` Kaz Kojima
  0 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-19  0:59 UTC (permalink / raw)
  To: Naveen H. S, Kaz Kojima, gcc; +Cc: Prafulla Thakare

[-- Attachment #1: Type: text/plain, Size: 286 bytes --]

I've found two bugs in truncdfsf2;
I've also added back a number of hunks that Naveen had dropped.

Note that most of the patch has been prepared in 2006, so that is the
proper most recent copyright date for those files that haven't been touched
save for swapping the Copyright notice.

[-- Attachment #2: sh-softfp-20100718-2131 --]
[-- Type: text/plain, Size: 284140 bytes --]

TODO:

- Test & submit companion patches separately.
- Test.

2010-07-18  Joern Rennecke  <joern.rennecke@embecosm.com>

	* config/sh/IEEE-754/divsf3.S (divsf3):
	Fix sign for zero r4 input.
	Fix comments for NaN return.
	Remove redundant some code.
	* config/sh/ieee-754-df.S: Add comments on
	RETURN_R0_MAIN / RETURN_R0 / RETURN_FR0.
	(RETURN_FR0): Add missing backslash.
	[!DYN_SHIFT] (extendsfdf2) <zero_denorm>: Fix mask used in
	shift_byte loop.
	[!DYN_SHIFT] (extendsfdf2) <x00ff0000>: New constant.
	[!DYN_SHIFT] (truncdfsf2) <inf>: Fix returned value.
	[DYN_SHIFT] (truncdfsf2) <inf>: Likewise.
	[!DYN_SHIFT] (truncdfsf2) <xffe00000>: Remove now unused constant.
	[DYN_SHIFT] (truncdfsf2) <xffe00000>: Likewise.
	* config/sh/sh.c (sh_expand_float_condop): Changed parameters to
	allow separate passing of comparison operands and destination.
	Changed callers.
	Replace use of from_compare.
	Use emit instead of emit_jump_insn.
	(sh_soft_fp_cmp): Remove REG_LIBCALL / REG_RETCAL code.
	Use set_unique_reg_note.
	(expand_sfunc_op): Likewise.
	* config/sh/sh.md (cstoresf4): Add support for software floating point.
	(cstoredf4, cbranchsf4, cbranchdf4): Likewise.
	(cmpnedf_i1): Fix predicate.
	(truncdfsf2): Add TARGET_SH2E case.
	(mulsf3): Fix condition for emitting mulsf3_i3.
	* config/sh/IEEE-754/adddf3.S: Adjust NaN value returned for
	+inf + -inf to agree with DF_NAN_MASK mask.

	* config/sh/t-sh (gt-sh.h): Remove redundant rule.
	(LIB1ASMFUNCS): Add _unordsf2 and _unorddf2.

2006-09-02  J"orn Rennecke  <joern.rennecke@st.com>

	* targhooks.c (regs.h): #include.
	(default_match_adjust): New function.
	* targhooks.h (default_match_adjust): Declare.
	* reload.c (operands_match_p): Use targetm.match_adjust.
	* target.h (struct gcc_target): Add member match_adjust.
	* target-def.h (TARGET_MATCH_ADJUST): New macro.
	* Makefile.in (targhooks.o): Depend on $(REGS_H).
	* config/sh/sh-protos.h (sh_match_adjust): Declare.
	* config/sh/sh.c (TARGET_MATCH_ADJUST): Define as sh_match_adjust.
	(sh_match_adjust): New function.

2006-09-15  J"orn Rennecke  <joern.rennecke@st.com>

	* sched-deps.c (sched_analyze_2): When a likely spilled register
	is used, put in into a scheduling group with the insn that
	sets it and with all the insns in-between.

2006-09-02  J"orn Rennecke  <joern.rennecke@st.com>

	config/sh/t-sh: ($(T)ic_invalidate_array_4-100.o): Add -I. .
	($(T)ic_invalidate_array_4-200.o): Likewise.
	($(T)ic_invalidate_array_4a.o): Likewise.

2006-09-02  J"orn Rennecke  <joern.rennecke@st.com>

	* config/sh/sh.h (LIBGCC2_DOUBLE_TYPE_SIZE): Define

2006-09-02  J"orn Rennecke  <joern.rennecke@st.com>

	* sh.md (*movsicc_t_false, *movsicc_t_true): Add mode.

2006-09-02  J"orn Rennecke  <joern.rennecke@st.com>
	    Aanchal Khanna   <aanchalk@noida.hcltech.com>
	    Rakesh Kumar  <rakesh.kumar@noida.hcltech.com>

	* config/sh/sh-protos.h (sh_function_kind): New enumerator
	SFUNC_FREQUENT.
	(expand_sfunc_unop, expand_sfunc_binop): Declare.
	(sh_expand_float_cbranch): Likewise.
	* config/sh/lib1funcs.asm (ieee-754-sf.S, ieee-754-df.S): #include.
	* config/sh/t-sh (LIB1ASMFUNCS): Add nesf2, _nedf2, _gtsf2t, _gtdf2t,
	_gesf2f, _gedf2f, _extendsfdf2, , _truncdfsf2, _add_sub_sf3, _mulsf3,
	_hypotf, _muldf3, _add_sub_df3, _divsf3, _divdf3, _fixunssfsi,
	_fixsfsi, _fixunsdfsi, _fixdfsi, _floatunssisf, _floatsisf,
	_floatunssidf and _floatsidf.
	(FPBIT, DPBIT, dp-bit.c, fp-bit.c): Removed.
	* config/sh/ieee-754-df.S, config/sh/ieee-754-sf.S: New files.
	* config/sh/predicates.md (soft_fp_comparison_operand): New predicate.
	(soft_fp_comparison_operator): Likewise.
	* config/sh/sh.c (sh_soft_fp_cmp, expand_sfunc_op): New functions.
	(expand_sfunc_unop, expand_sfunc_binop): Likewise.
	(sh_expand_float_cbranch): Likewise.
	(sh_expand_float_condop, sh_expand_float_scc): Likewise.
	(from_compare): Add support for software floating point.
	(function_symbol): Always look up name.  Add SFUNC_FREQUENT case.
	* config/sh/sh.h (TARGET_SH1_SOFTFP): New macro.
	(TARGET_SH1_SOFTFP_MODE): Likewise.
	* config/sh/sh-modes.def (CC_FP_NE, CC_FP_GT, CC_FP_UNLT): New modes.
	* config/sh/lib1funcs.h (SLC, SLI, SLCMP, DMULU_SAVE): New macros.
	(DMULUL, DMULUH, DMULU_RESTORE, SHLL4, SHLR4, SHLL6, SHLR6): Likewise.
	(SHLL12, SHLR12, SHLR19, SHLL23, SHLR24, SHLR21, SHLL21): Likewise.
	(SHLR11, SHLR22, SHLR23, SHLR20, SHLL20, SHLD_COUNT, SHLRN): Likewise.
	(SHLLN, DYN_SHIFT): Likewise.
	(SUPPORT_SH3_OSFP, SUPPORT_SH3E_OSFP): Likewise.
	(SUPPORT_SH4_NOFPU_OSFP, SUPPORT_SH4_SINGLE_ONLY_OSFP): Likewise.
	(TARGET_OSFP): Likewise.
	* config/sh/IEEE-754/m3/divsf3.S: New file.
	* config/sh/IEEE-754/m3/divdf3.S: Likewise.
	* config/sh/IEEE-754/m3/floatunssisf.S: Likewise.
	* config/sh/IEEE-754/m3/floatunssidf.S: Likewise.
	* config/sh/IEEE-754/m3/fixunsdfsi.S: Likewise.
	* config/sh/IEEE-754/m3/divdf3-rt.S: Likewise.
	* config/sh/IEEE-754/m3/addsf3.S: Likewise.
	* config/sh/IEEE-754/m3/adddf3.S: Likewise.
	* config/sh/IEEE-754/m3/mulsf3.S: Likewise.
	* config/sh/IEEE-754/m3/muldf3.S: Likewise.
	* config/sh/IEEE-754/m3/floatsisf.S: Likewise.
	* config/sh/IEEE-754/m3/floatsidf.S: Likewise.
	* config/sh/IEEE-754/m3/fixdfsi.S: Likewise.
	* config/sh/IEEE-754/divdf3.S: Likewise.
	* config/sh/IEEE-754/floatunssisf.S: Likewise.
	* config/sh/IEEE-754/fixunsdfsi.S: Likewise.
	* config/sh/IEEE-754/adddf3.S: Likewise.
	* config/sh/IEEE-754/floatsisf.S: Likewise.
	* config/sh/IEEE-754/muldf3.S: Likewise.
	* config/sh/IEEE-754/fixdfsi.S: Likewise.
	* config/sh/IEEE-754/divsf3.S: Likewise.
	* config/sh/IEEE-754/fixunssfsi.S: Likewise.
	* config/sh/IEEE-754/floatunssidf.S: Likewise.
	* config/sh/IEEE-754/addsf3.S: Likewise.
	* config/sh/IEEE-754/mulsf3.S: Likewise.
	* config/sh/IEEE-754/floatsidf.S: Likewise.
	* config/sh/IEEE-754/fixsfsi.S: Likewise.
	* config/sh/sh.md (SF_NAN_MASK, DF_NAN_MASK, FR4_REG): New constants.
	(fpcmp_i1, addsf3_i3, subsf3_i3): New patterns.
	(mulsf3_i3, cmpnesf_i1, cmpgtsf_i1, cmpunltsf_i1): Likewise.
	(cmpeqsf_i1_finite, cmplesf_i1_finite, cmpunsf_i1): Likewise.
	(cmpuneqsf_i1, movcc_fp_ne, movcc_fp_gtmovcc_fp_unlt): Likewise.
	(cmpltgtsf_t, cmporderedsf_t, cmpltgtsf_t_4): Likewise.
	(cmporderedsf_t_4, abssc2, adddf3_i3_wrap, adddf3_i3): Likewise.
	(muldf3_i3_wrap, muldf3_i3, cmpnedf_i1, cmpgtdf_i1): Likewise.
	(cmpunltdf_i1, cmpeqdf_i1_finite, cmpundf_i1, cmpuneqdf_i1): Likewise.
	(cmpltgtdf_t, cmpordereddf_t_4, extendsfdf2_i1): Likewise.
	(extendsfdf2_i2e, extendsfdf2_i2e_r0, truncdfsf2_i2e): Likewise.
	(extendsfdf2_i1_r0, truncdfsf2_i1): Likewise.
	(cmpun_sdf, cmpuneq_sdf): Likewise.
	(addsf3, subsf3, mulsf3): Add support for software floating point.
	(adddf3, subdf3, muldf3, extendsfdf2, truncdfsf2): Likewise.
	(cmpsf, cmpdf): Don't enable for TARGET_SH2E.
	(movnegt): Match only one operand.  Changed user.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 162269)
+++ gcc/doc/tm.texi	(working copy)
@@ -2753,6 +2753,10 @@ of the individual moves due to expected 
 forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_MATCH_ADJUST (rtx, @var{int})
+This hook is documented in @file{target.def} / @file{targhooks.c}.
+@end deftypefn
+
 @defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 162269)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -2753,6 +2753,8 @@ of the individual moves due to expected 
 forwarding logic, you can set @code{sri->extra_cost} to a negative amount.
 @end deftypefn
 
+@hook TARGET_MATCH_ADJUST
+
 @defmac SECONDARY_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_INPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
 @defmacx SECONDARY_OUTPUT_RELOAD_CLASS (@var{class}, @var{mode}, @var{x})
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 162269)
+++ gcc/targhooks.c	(working copy)
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  
 #include "reload.h"
 #include "optabs.h"
 #include "recog.h"
+#include "regs.h"
 
 
 bool
@@ -906,6 +907,27 @@ default_secondary_reload (bool in_p ATTR
   return rclass;
 }
 
+/*  Given an rtx and its regno, return a regno value that shall be used for
+    purposes of comparison in operands_match_p.
+    Generally, we say that integer registers are subject to big-endian
+    adjustment.  This default target hook should generally work if the mode
+    of a register is a sufficient indication if this adjustment is to take
+    place; this will not work when software floating point is done in integer
+    registers.  */
+int
+default_match_adjust (rtx x, int regno)
+{
+  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+     multiple hard register group of scalar integer registers, so that
+     for example (reg:DI 0) and (reg:SI 1) will be considered the same
+     register.  */
+  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+      && SCALAR_INT_MODE_P (GET_MODE (x))
+      && regno < FIRST_PSEUDO_REGISTER)
+    regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+  return regno;
+}
+
 void
 default_target_option_override (void)
 {
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 162269)
+++ gcc/targhooks.h	(working copy)
@@ -121,6 +121,7 @@ extern const reg_class_t *default_ira_co
 extern reg_class_t default_secondary_reload (bool, rtx, reg_class_t,
 					     enum machine_mode,
 					     secondary_reload_info *);
+extern int default_match_adjust (rtx, int);
 extern void default_target_option_override (void);
 extern void hook_void_bitmap (bitmap);
 extern bool default_handle_c_option (size_t, const char *, int);
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 162269)
+++ gcc/target.def	(working copy)
@@ -1945,6 +1945,14 @@ DEFHOOK
   secondary_reload_info *sri),
  default_secondary_reload)
 
+/* Take an rtx and its regno, and return the regno for purposes of
+   checking a matching constraint.  */
+DEFHOOK
+(match_adjust,
+ "This hook is documented in @file{target.def} / @file{targhooks.c}.",
+ int, (rtx, int),
+ default_match_adjust)
+
 /* This target hook allows the backend to perform additional
    processing while initializing for variable expansion.  */
 DEFHOOK
Index: gcc/reload.c
===================================================================
--- gcc/reload.c	(revision 162269)
+++ gcc/reload.c	(working copy)
@@ -2216,14 +2216,8 @@ operands_match_p (rtx x, rtx y)
 	 multiple hard register group of scalar integer registers, so that
 	 for example (reg:DI 0) and (reg:SI 1) will be considered the same
 	 register.  */
-      if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
-	  && SCALAR_INT_MODE_P (GET_MODE (x))
-	  && i < FIRST_PSEUDO_REGISTER)
-	i += hard_regno_nregs[i][GET_MODE (x)] - 1;
-      if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (y)) > UNITS_PER_WORD
-	  && SCALAR_INT_MODE_P (GET_MODE (y))
-	  && j < FIRST_PSEUDO_REGISTER)
-	j += hard_regno_nregs[j][GET_MODE (y)] - 1;
+      i = targetm.match_adjust (x, i);
+      j = targetm.match_adjust (y, j);
 
       return i == j;
     }
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 162269)
+++ gcc/Makefile.in	(working copy)
@@ -2806,7 +2806,7 @@ opts-common.o : opts-common.c opts.h opt
 targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \
    $(EXPR_H) $(TM_H) $(RTL_H) $(TM_P_H) $(FUNCTION_H) output.h $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
    $(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) gt-targhooks.h \
-   $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h
+   $(OPTABS_H) $(RECOG_H) reload.h hard-reg-set.h $(REGS_H)
 
 bversion.h: s-bversion; @true
 s-bversion: BASE-VER
Index: gcc/config/sh/sh-protos.h
===================================================================
--- gcc/config/sh/sh-protos.h	(revision 162269)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -25,8 +25,13 @@ along with GCC; see the file COPYING3.  
 #define GCC_SH_PROTOS_H
 
 enum sh_function_kind {
-  /* A function with normal C ABI  */
+  /* A function with normal C ABI, or an SH1..SH4 sfunc that may resolved via
+     a PLT.  */
   FUNCTION_ORDINARY,
+  /* A function that is a bit large to put it in every calling dso, but that's
+     typically used often enough so that calling via GOT makes sense for
+     speed.  */
+  SFUNC_FREQUENT,
   /* A special function that guarantees that some otherwise call-clobbered
      registers are not clobbered.  These can't go through the SH5 resolver,
      because it only saves argument passing registers.  */
@@ -115,6 +120,10 @@ extern void expand_sf_binop (rtx (*)(rtx
 extern void expand_df_unop (rtx (*)(rtx, rtx, rtx), rtx *);
 extern void expand_df_binop (rtx (*)(rtx, rtx, rtx, rtx), rtx *);
 extern void expand_fp_branch (rtx (*)(void), rtx (*)(void));
+extern void expand_sfunc_unop (enum machine_mode, rtx (*) (rtx, rtx),
+			       const char *, enum rtx_code code, rtx *);
+extern void expand_sfunc_binop (enum machine_mode, rtx (*) (rtx, rtx),
+				const char *, enum rtx_code code, rtx *);
 extern int sh_insn_length_adjustment (rtx);
 extern int sh_can_redirect_branch (rtx, rtx);
 extern void sh_expand_unop_v2sf (enum rtx_code, rtx, rtx);
@@ -132,6 +141,8 @@ extern struct rtx_def *get_fpscr_rtx (vo
 extern int sh_media_register_for_return (void);
 extern void sh_expand_prologue (void);
 extern void sh_expand_epilogue (bool);
+extern void sh_expand_float_cbranch (rtx operands[4]);
+extern void sh_expand_float_scc (rtx operands[4]);
 extern int sh_need_epilogue (void);
 extern void sh_set_return_address (rtx, rtx);
 extern int initial_elimination_offset (int, int);
@@ -176,6 +187,7 @@ struct secondary_reload_info;
 extern reg_class_t sh_secondary_reload (bool, rtx, reg_class_t,
 					enum machine_mode,
 					struct secondary_reload_info *);
+extern int sh_match_adjust (rtx, int);
 extern int sh2a_get_function_vector_number (rtx);
 extern int sh2a_is_function_vector_call (rtx);
 extern void sh_fix_range (const char *);
Index: gcc/config/sh/lib1funcs.asm
===================================================================
--- gcc/config/sh/lib1funcs.asm	(revision 162269)
+++ gcc/config/sh/lib1funcs.asm	(working copy)
@@ -3931,3 +3931,6 @@ GLOBAL(udiv_qrnnd_16):
 	ENDFUNC(GLOBAL(udiv_qrnnd_16))
 #endif /* !__SHMEDIA__ */
 #endif /* L_udiv_qrnnd_16 */
+
+#include "ieee-754-sf.S"
+#include "ieee-754-df.S"
Index: gcc/config/sh/t-sh
===================================================================
--- gcc/config/sh/t-sh	(revision 162269)
+++ gcc/config/sh/t-sh	(working copy)
@@ -25,30 +25,16 @@ sh-c.o: $(srcdir)/config/sh/sh-c.c \
 LIB1ASMSRC = sh/lib1funcs.asm
 LIB1ASMFUNCS = _ashiftrt _ashiftrt_n _ashiftlt _lshiftrt _movmem \
   _movmem_i4 _mulsi3 _sdivsi3 _sdivsi3_i4 _udivsi3 _udivsi3_i4 _set_fpscr \
-  _div_table _udiv_qrnnd_16 \
+  _div_table _udiv_qrnnd_16 _unordsf2 _unorddf2 \
+  _nesf2 _nedf2 _gtsf2t _gtdf2t _gesf2f _gedf2f _extendsfdf2  _truncdfsf2 \
+  _add_sub_sf3 _mulsf3 _hypotf _muldf3 _add_sub_df3 _divsf3 _divdf3 \
+  _fixunssfsi _fixsfsi _fixunsdfsi _fixdfsi _floatunssisf _floatsisf \
+  _floatunssidf _floatsidf \
   $(LIB1ASMFUNCS_CACHE)
 LIB1ASMFUNCS_CACHE = _ic_invalidate _ic_invalidate_array
 
 TARGET_LIBGCC2_CFLAGS = -mieee
 
-# We want fine grained libraries, so use the new code to build the
-# floating point emulation libraries.
-FPBIT = fp-bit.c
-DPBIT = dp-bit.c
-
-dp-bit.c: $(srcdir)/config/fp-bit.c
-	echo '#ifdef __LITTLE_ENDIAN__' > dp-bit.c
-	echo '#define FLOAT_BIT_ORDER_MISMATCH' >>dp-bit.c
-	echo '#endif' 		>> dp-bit.c
-	cat $(srcdir)/config/fp-bit.c >> dp-bit.c
-
-fp-bit.c: $(srcdir)/config/fp-bit.c
-	echo '#define FLOAT' > fp-bit.c
-	echo '#ifdef __LITTLE_ENDIAN__' >> fp-bit.c
-	echo '#define FLOAT_BIT_ORDER_MISMATCH' >>fp-bit.c
-	echo '#endif' 		>> fp-bit.c
-	cat $(srcdir)/config/fp-bit.c >> fp-bit.c
-
 DEFAULT_ENDIAN = $(word 1,$(TM_ENDIAN_CONFIG))
 OTHER_ENDIAN = $(word 2,$(TM_ENDIAN_CONFIG))
 
@@ -120,7 +106,6 @@ $(T)crtn.o: $(srcdir)/config/sh/crtn.asm
 	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)crtn.o -x assembler-with-cpp $(srcdir)/config/sh/crtn.asm
 
 $(out_object_file): gt-sh.h
-gt-sh.h : s-gtype ; @true
 
 # These are not suitable for COFF.
 # EXTRA_MULTILIB_PARTS= crt1.o crti.o crtn.o crtbegin.o crtend.o
@@ -131,17 +116,17 @@ OPT_EXTRA_PARTS= libgcc-Os-4-200.a libgc
 EXTRA_MULTILIB_PARTS= $(IC_EXTRA_PARTS) $(OPT_EXTRA_PARTS)
 
 $(T)ic_invalidate_array_4-100.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-100.o -DL_ic_invalidate_array -DWAYS=1 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4-100.a: $(T)ic_invalidate_array_4-100.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-100.a $(T)ic_invalidate_array_4-100.o
 
 $(T)ic_invalidate_array_4-200.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4-200.o -DL_ic_invalidate_array -DWAYS=2 -DWAY_SIZE=0x2000 -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4-200.a: $(T)ic_invalidate_array_4-200.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4-200.a $(T)ic_invalidate_array_4-200.o
 
 $(T)ic_invalidate_array_4a.o: $(srcdir)/config/sh/lib1funcs.asm $(GCC_PASSES)
-	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -I. -c -o $(T)ic_invalidate_array_4a.o -DL_ic_invalidate_array -D__FORCE_SH4A__ -x assembler-with-cpp $(srcdir)/config/sh/lib1funcs.asm
 $(T)libic_invalidate_array_4a.a: $(T)ic_invalidate_array_4a.o $(GCC_PASSES)
 	$(AR_CREATE_FOR_TARGET) $(T)libic_invalidate_array_4a.a $(T)ic_invalidate_array_4a.o
 
Index: gcc/config/sh/sh.opt
===================================================================
--- gcc/config/sh/sh.opt	(revision 162269)
+++ gcc/config/sh/sh.opt	(working copy)
@@ -21,7 +21,7 @@
 ;; Used for various architecture options.
 Mask(SH_E)
 
-;; Set if the default precision of th FPU is single.
+;; Set if the default precision of the FPU is single.
 Mask(FPU_SINGLE)
 
 ;; Set if we should generate code using type 2A insns.
Index: gcc/config/sh/ieee-754-df.S
===================================================================
--- gcc/config/sh/ieee-754-df.S	(revision 0)
+++ gcc/config/sh/ieee-754-df.S	(revision 0)
@@ -0,0 +1,791 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_DOUBLE__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Double-precision floating-point emulation.
+   We handle NANs, +-infinity, and +-zero.
+   However, we assume that for NANs, the topmost bit of the fraction is set.  */
+
+#ifdef __LITTLE_ENDIAN__
+#define DBL0L r4
+#define DBL0H r5
+#define DBL1L r6
+#define DBL1H r7
+#define DBLRL r0
+#define DBLRH r1
+#else
+#define DBL0L r5
+#define DBL0H r4
+#define DBL1L r7
+#define DBL1H r6
+#define DBLRL r1
+#define DBLRH r0
+#endif
+
+/* The SH[123] ABI returns floats in r0, -m4-single returns it in fr0.
+   To abstract from this, a function that returns the single-precision
+   float value in r0 should use as in-line epilogue:
+     RETURN_R0_MAIN
+     <delay-slot insn>
+     RETURN_FR0
+   and may branch to that epilogue with:
+     RETURN_R0
+     <delay-slot insn> */
+#ifdef __SH_FPU_ANY__
+#define RETURN_R0_MAIN
+#define RETURN_R0 bra LOCAL(return_r0)
+#define RETURN_FR0 \
+LOCAL(return_r0): \
+ lds r0,fpul; \
+ rts; \
+ fsts fpul,fr0
+#define ARG_TO_R4 \
+ flds fr4,fpul; \
+ sts fpul,r4
+#else /* ! __SH_FPU_ANY__ */
+#define RETURN_R0_MAIN rts
+#define RETURN_R0 rts
+#define RETURN_FR0
+#define ARG_TO_R4
+#endif /* ! __SH_FPU_ANY__ */
+
+#ifdef L_nedf2
+/* -ffinite-math-only -mb inline version, T := r4:DF == r6:DF
+	cmp/eq	r5,r7
+	mov	r4,r0
+	bf	0f
+	cmp/eq	r4,r6
+	bt	0f
+	or	r6,r0
+	add	r0,r0
+	or	r5,r0
+	tst	r0,r0
+	0:			*/
+	.balign 4
+	.global GLOBAL(nedf2)
+	HIDDEN_FUNC(GLOBAL(nedf2))
+GLOBAL(nedf2):
+	cmp/eq	DBL0L,DBL1L
+	mov.l   LOCAL(c_DF_NAN_MASK),r1
+	bf LOCAL(ne)
+	cmp/eq	DBL0H,DBL1H
+	not	DBL0H,r0
+	bt	LOCAL(check_nan)
+	mov	DBL0H,r0
+	or	DBL1H,r0
+	add	r0,r0
+	rts
+	or	DBL0L,r0
+LOCAL(check_nan):
+	tst	r1,r0
+	rts
+	movt	r0
+LOCAL(ne):
+	rts
+	mov #1,r0
+	.balign 4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(nedf2))
+#endif /* L_nedf2 */
+
+#ifdef L_unorddf2
+	.balign 4
+	.global GLOBAL(unorddf2)
+	HIDDEN_FUNC(GLOBAL(unorddf2))
+GLOBAL(unorddf2):
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	not	DBL0H,r0
+	tst	r1,r0
+	not	r6,r0
+	bt	LOCAL(unord)
+	tst	r1,r0
+LOCAL(unord):
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(unorddf2))
+#endif /* L_unorddf2 */
+
+#if defined(L_gtdf2t) || defined(L_gtdf2t_trap)
+#ifdef L_gtdf2t
+#define fun_label GLOBAL(gtdf2t)
+#else
+#define fun_label GLOBAL(gtdf2t_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater, the result true, unless
+	   any of them is a nan (but infinity is fine), or both values are
+	   +- zero.  Otherwise, the result false.  */
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	cmp/pz	DBL0H
+	not	DBL1H,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	DBL0H,r0
+	bt	LOCAL(nan) /* return zero if DBL1 is NAN.  */
+	cmp/eq	DBL1H,DBL0H
+	bt	LOCAL(cmp_low)
+	cmp/gt	DBL1H,DBL0H
+	or	DBL1H,r0
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/gt	DBL0H,r1)
+	add	r0,r0
+	bf	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	or	DBL0L,r0
+	rts
+	or	DBL1L,r0 /* non-zero unless both DBL0 and DBL1 are +-zero.  */
+LOCAL(cmp_low):
+	cmp/hi	DBL1L,DBL0L
+	rts
+	movt	r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan) /* return zero if DBL1 is NAN.  */
+	cmp/eq	DBL1H,DBL0H
+	SLC(bt,	LOCAL(neg_cmp_low),
+	 cmp/hi	DBL0L,DBL1L)
+	not	DBL0H,r0
+	tst	r1,r0
+	bt	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	cmp/hi	DBL0H,DBL1H
+	SLI(rts	!,)
+	SLI(movt r0 !,)
+LOCAL(neg_cmp_low):
+	SLI(cmp/hi	DBL0L,DBL1L)
+	rts
+	movt	r0
+LOCAL(check_nan):
+#ifdef L_gtdf2t
+LOCAL(nan):
+	rts
+	mov	#0,r0
+#else
+	SLI(cmp/gt DBL0H,r1)
+	bf	LOCAL(nan) /* return zero if DBL0 is NAN.  */
+	rts
+	mov	#0,r0
+LOCAL(nan):
+	mov	#0,r0
+	trapa	#0
+#endif
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(fun_label)
+#endif /* defined(L_gtdf2t) || defined(L_gtdf2t_trap) */
+
+#ifdef L_gedf2f
+	.balign 4
+	.global GLOBAL(gedf2f)
+	HIDDEN_FUNC(GLOBAL(gedf2f))
+GLOBAL(gedf2f):
+	/* If the raw values compare greater or equal, the result is
+	   true, unless any of them is a nan, or both are the
+	   same infinity.  If both are -+zero, the result is true;
+	   otherwise, it is false.
+	   We use 0 as true and nonzero as false for this function.  */
+	mov.l	LOCAL(c_DF_NAN_MASK),r1
+	cmp/pz	DBL1H
+	not	DBL0H,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	DBL0H,r0
+	bt	LOCAL(nan)
+	cmp/eq	DBL0H,DBL1H
+	bt	LOCAL(cmp_low)
+	cmp/gt	DBL0H,DBL1H
+	or	DBL1H,r0
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/ge	r1,DBL1H)
+	add	r0,r0
+	bt	LOCAL(nan)
+	or	DBL0L,r0
+	rts
+	or	DBL1L,r0
+LOCAL(cmp_low):
+	cmp/hi	DBL0L,DBL1L
+#if defined(L_gedf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+	rts
+	movt	r0
+#if defined(L_gedf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,DBL1H)
+LOCAL(nan):
+	rts
+	movt	r0
+#elif defined(L_gedf2f_trap)
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,DBL1H)
+	bt	LOCAL(nan)
+	rts
+LOCAL(nan):
+	movt	r0
+	trapa	#0
+#endif /* L_gedf2f_trap */
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	cmp/eq	DBL0H,DBL1H
+	not	DBL1H,r0
+	SLC(bt,	LOCAL(neg_cmp_low),
+	 cmp/hi	DBL1L,DBL0L)
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	DBL1H,DBL0H
+	SLI(rts !,)
+	SLI(movt	r0 !,)
+LOCAL(neg_cmp_low):
+	SLI(cmp/hi	DBL1L,DBL0L)
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_DF_NAN_MASK):
+	.long DF_NAN_MASK
+	ENDFUNC(GLOBAL(gedf2f))
+#endif /* L_gedf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_extendsfdf2
+	.balign 4
+	.global GLOBAL(extendsfdf2)
+	FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+	ARG_TO_R4
+	mov.l	LOCAL(x7f800000),r3
+	mov	r4,DBLRL
+	tst	r3,r4
+	bt	LOCAL(zero_denorm)
+	mov.l	LOCAL(xe0000000),r2
+	rotr	DBLRL
+	rotr	DBLRL
+	rotr	DBLRL
+	and	r2,DBLRL
+	mov	r4,DBLRH
+	not	r4,r2
+	tst	r3,r2
+	mov.l	LOCAL(x38000000),r2
+	bf	0f
+	add	r2,r2	! infinity / NaN adjustment
+0:	shll	DBLRH
+	shlr2	DBLRH
+	shlr2	DBLRH
+	add	DBLRH,DBLRH
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+LOCAL(zero_denorm):
+	mov.l	r4,@-r15
+	add	r4,r4
+	tst	r4,r4
+	bt	LOCAL(zero)
+	mov.l	LOCAL(x00ff0000),r3
+	mov.w	LOCAL(x389),r2
+LOCAL(shift_byte):
+	tst	r3,r4
+	shll8	r4
+	SL(bt,	LOCAL(shift_byte),
+	 add	#-8,r2)
+LOCAL(shift_bit):
+	shll	r4
+	SL(bf,	LOCAL(shift_bit),
+	 add	#-1,r2)
+	mov	#0,DBLRL
+	mov	r4,DBLRH
+	mov.l	@r15+,r4
+	shlr8	DBLRH
+	shlr2	DBLRH
+	shlr	DBLRH
+	rotcr	DBLRL
+	cmp/gt	r4,DBLRH	! get sign
+	rotcr	DBLRH
+	rotcr	DBLRL
+	shll16	r2
+	shll8	r2
+	rts
+	add	r2,DBLRH
+LOCAL(zero):
+	mov.l	@r15+,DBLRH
+	rts
+	mov	#0,DBLRL
+LOCAL(x389):	.word 0x389
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(xe0000000):
+	.long	0xe0000000
+LOCAL(x00ff0000):
+	.long	0x00ff0000
+	ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+	.balign 4
+	.global GLOBAL(truncdfsf2)
+	FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+	mov.l	LOCAL(x38000000),r3	! exponent adjustment DF -> SF
+	mov	DBL0H,r1
+	mov.l	LOCAL(x70000000),r2	! mask for out-of-range exponent bits
+	mov	DBL0H,r0
+	mov.l	DBL0L,@-r15
+	sub	r3,r1
+	tst	r2,r1
+	shll8	r0			!
+	shll2	r0			! Isolate highpart fraction.
+	shll2	r0			!
+	bf	LOCAL(ill_exp)
+	shll2	r1
+	mov.l	LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits.  */
+	shll2	r1
+	mov.l	LOCAL(xff000000),r3
+	shlr8	r0
+	tst	r2,DBL0L /* Check if msb guard bit wants rounding up.  */
+	shlr16	DBL0L
+	shlr8	DBL0L
+	shlr2	DBL0L
+	SL1(bt,	LOCAL(add_frac),
+	 shlr2	DBL0L)
+	add	#1,DBL0L
+LOCAL(add_frac):
+	add	DBL0L,r0
+	mov.l	LOCAL(x01000000),r2
+	and	r3,r1
+	mov.l	@r15+,DBL0L
+	add	r1,r0
+	tst	r3,r0
+	bt	LOCAL(inf_denorm0)
+	cmp/hs	r3,r0
+LOCAL(denorm_noup_sh1):
+	bt	LOCAL(inf)
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	RETURN_R0_MAIN
+	rotcr	r0
+RETURN_FR0
+LOCAL(inf_denorm0):	!  We might need to undo previous rounding.
+	mov.l	LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits.  */
+	tst	r1,r1
+	bf	LOCAL(inf)
+	add	#-1,r0
+	tst	r3,DBL0L /* Check if msb guard bit was rounded up.  */
+	mov.l	LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits.  */
+	addc	r2,r0
+	shlr	r0
+	tst	r3,DBL0L /* Check if msb guard bit wants rounding up.  */
+#ifdef DELAYED_BRANCHES
+	bt/s	LOCAL(denorm_noup)
+#else
+	bt	LOCAL(denorm_noup_sh1)
+#endif
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	add	#1,r0
+LOCAL(denorm_noup):
+	RETURN_R0
+	rotcr	r0
+LOCAL(ill_exp):
+	div0s	DBL0H,r1
+	mov.l	LOCAL(x7ff80000),r2
+	add	r1,r1
+	bf	LOCAL(inf_nan)
+	mov.w	LOCAL(m32),r3 /* Handle denormal or zero.  */
+	shlr16	r1
+	exts.w	r1,r1
+	shll2	r1
+	add	r1,r1
+	shlr8	r1
+	exts.w	r1,r1
+	add	#-8,r1	/* Go from 9 to 1 guard bit in MSW.  */
+	cmp/gt	r3,r1
+	mov.l	@r15+,r3 /* DBL0L */
+	bf	LOCAL(zero)
+	mov.l	DBL0L, @-r15
+	shll8	DBL0L
+	rotcr	r0	/* Insert leading 1.  */
+	shlr16	r3
+	shll2	r3
+	add	r3,r3
+	shlr8	r3
+	cmp/pl	DBL0L	/* Check lower 23 guard bits if guard bit 23 is 0.  */
+	addc	r3,r0	/* Assemble fraction with compressed guard bits.  */
+	mov.l	@r15+,DBL0L
+	mov	#0,r2
+	neg	r1,r1
+LOCAL(denorm_loop):
+	shlr	r0
+	rotcl	r2
+	dt	r1
+	bf	LOCAL(denorm_loop)
+	tst	#2,r0
+	rotcl	r0
+	tst	r2,r2
+	rotcl	r0
+	xor	#3,r0
+	add	#3,r0	/* Even overflow gives the correct result.  */
+	shlr2	r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(zero):
+	mov	#0,r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(inf_nan):
+	not	DBL0H,r0
+	tst	r2,r0
+	mov.l	@r15+,DBL0L
+	bf	LOCAL(inf)
+	RETURN_R0
+	mov	#-1,r0	/* NAN */
+LOCAL(inf):	/* r2 must be positive here.  */
+	mov.l	LOCAL(xff000000),r0
+	div0s	r2,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(m32):
+	.word	-32
+	.balign	4
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(x70000000):
+	.long	0x70000000
+LOCAL(x2fffffff):
+	.long	0x2fffffff
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x5fffffff):
+	.long	0x5fffffff
+LOCAL(x7ff80000):
+	.long	0x7ff80000
+	ENDFUNC(GLOBAL(truncdfsf2))
+#endif /*  L_truncdfsf2 */
+#ifdef L_add_sub_df3
+#include "IEEE-754/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift.  Supporting SH1 / SH2 here would
+   make this code too hard to maintain, so if you want to add SH1 / SH2
+   support, do it in a separate copy.  */
+#ifdef DYN_SHIFT
+#ifdef L_extendsfdf2
+	.balign 4
+	.global GLOBAL(extendsfdf2)
+	FUNC(GLOBAL(extendsfdf2))
+GLOBAL(extendsfdf2):
+	ARG_TO_R4
+	mov.l	LOCAL(x7f800000),r2
+	mov	#29,r3
+	mov	r4,DBLRL
+	not	r4,DBLRH
+	tst	r2,r4
+	shld	r3,DBLRL
+	bt	LOCAL(zero_denorm)
+	mov	#-3,r3
+	tst	r2,DBLRH
+	mov	r4,DBLRH
+	mov.l	LOCAL(x38000000),r2
+	bt/s	LOCAL(inf_nan)
+	 shll	DBLRH
+	shld	r3,DBLRH
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+	.balign	4
+LOCAL(inf_nan):
+	shld	r3,DBLRH
+	add	r2,r2
+	rotcr	DBLRH
+	rts
+	add	r2,DBLRH
+LOCAL(zero_denorm):
+	mov.l	r4,@-r15
+	add	r4,r4
+	tst	r4,r4
+	extu.w	r4,r2
+	bt	LOCAL(zero)
+	cmp/eq	r4,r2
+	extu.b	r4,r1
+	bf/s	LOCAL(three_bytes)
+	 mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r4,r1
+	mov	#22,DBLRH
+	bt	LOCAL(one_byte)
+	shlr8	r2
+	mov	#14,DBLRH
+LOCAL(one_byte):
+#ifdef __pic__
+	add	r0,r2
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r2),r2
+	mov	#21,r3
+	mov.w	LOCAL(x0),DBLRL
+	sub	r2,DBLRH
+LOCAL(norm_shift):
+	shld	DBLRH,r4
+	mov.l	@r15+,r2
+	shld	r3,DBLRH
+	mov.l	LOCAL(xb7ffffff),r3
+	add	r4,DBLRH
+	cmp/pz	r2
+	mov	r2,r4
+	rotcr	DBLRH
+	rts
+	sub	r3,DBLRH
+LOCAL(three_bytes):
+	mov	r4,r2
+	shlr16	r2
+#ifdef __pic__
+	add	r0,r2
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r2),r2
+	mov	#21,r3
+	mov	#6-32,DBLRH
+	sub	r2,DBLRH
+	mov	r4,DBLRL
+	shld	DBLRH,DBLRL
+	bra	LOCAL(norm_shift)
+	add	#32,DBLRH
+LOCAL(zero):
+	rts	/* DBLRL has already been zeroed above.  */
+	mov.l @r15+,DBLRH
+LOCAL(x0):
+	.word 0
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(xb7ffffff):
+	/* Flip sign back, do exponent adjustment, and remove leading one.  */
+	.long 0x80000000 + 0x38000000 - 1
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+	ENDFUNC(GLOBAL(extendsfdf2))
+#endif /* L_extendsfdf2 */
+
+#ifdef L_truncdfsf2
+	.balign 4
+	.global GLOBAL(truncdfsf2)
+	FUNC(GLOBAL(truncdfsf2))
+GLOBAL(truncdfsf2):
+	mov.l	LOCAL(x38000000),r3
+	mov	DBL0H,r1
+	mov.l	LOCAL(x70000000),r2
+	mov	DBL0H,r0
+	sub	r3,r1
+	mov.l	DBL0L,@-r15
+	tst	r2,r1
+	mov	#12,r3
+	shld	r3,r0			! Isolate highpart fraction.
+	bf	LOCAL(ill_exp)
+	shll2	r1
+	mov.l	LOCAL(x2fffffff),r2 /* Fraction lsb | lower guard bits.  */
+	shll2	r1
+	mov.l	LOCAL(xff000000),r3
+	shlr8	r0
+	tst	r2,DBL0L /* Check if msb guard bit wants rounding up.  */
+	mov	#-28,r2
+	bt/s	LOCAL(add_frac)
+	 shld	r2,DBL0L
+	add	#1,DBL0L
+LOCAL(add_frac):
+	add	DBL0L,r0
+	mov.l	LOCAL(x01000000),r2
+	and	r3,r1
+	mov.l	@r15+,DBL0L
+	add	r1,r0
+	tst	r3,r0
+	bt	LOCAL(inf_denorm0)
+#if 0	// No point checking overflow -> infinity if we dont't raise a signal.
+	cmp/hs	r3,r0
+	bt	LOCAL(inf)
+#endif
+	div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	RETURN_R0_MAIN
+	rotcr	r0
+RETURN_FR0
+LOCAL(inf_denorm0):	! We might need to undo previous rounding.
+	mov.l	LOCAL(x2fffffff),r3 /* Old fraction lsb | lower guard bits.  */
+	tst	r1,r1
+	bf	LOCAL(inf)
+	add	#-1,r0
+	tst	r3,DBL0L /* Check if msb guard bit was rounded up.  */
+	mov.l	LOCAL(x5fffffff),r3 /* Fraction lsb | lower guard bits.  */
+	addc	r2,r0
+	shlr	r0
+	tst	r3,DBL0L /* Check if msb guard bit wants rounding up.  */
+	bt/s	LOCAL(denorm_noup)
+	 div0s	DBL0H,r2	/* copy orig. sign into T.  */
+	add	#1,r0
+LOCAL(denorm_noup):
+	RETURN_R0
+	rotcr	r0
+LOCAL(ill_exp):
+	div0s	DBL0H,r1
+	mov.l	LOCAL(x7ff80000),r2
+	add	r1,r1
+	bf	LOCAL(inf_nan)
+	mov.w	LOCAL(m32),r3 /* Handle denormal or zero.  */
+	mov	#-21,r2
+	shad	r2,r1
+	add	#-8,r1	/* Go from 9 to 1 guard bit in MSW.  */
+	cmp/gt	r3,r1
+	mov.l	@r15+,r3 /* DBL0L */
+	bf	LOCAL(zero)
+	mov.l	DBL0L, @-r15
+	shll8	DBL0L
+	rotcr	r0	/* Insert leading 1.  */
+	shld	r2,r3
+	cmp/pl	DBL0L	/* Check lower 23 guard bits if guard bit 23 is 0.  */
+	addc	r3,r0	/* Assemble fraction with compressed guard bits.  */
+	mov	r0,r2
+	shld	r1,r0
+	mov.l	@r15+,DBL0L
+	add	#32,r1
+	shld	r1,r2
+	tst	#2,r0
+	rotcl	r0
+	tst	r2,r2
+	rotcl	r0
+	xor	#3,r0
+	add	#3,r0	/* Even overflow gives the correct result.  */
+	shlr2	r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(zero):
+	mov	#0,r0
+	div0s	r0,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(inf_nan):
+	not	DBL0H,r0
+	tst	r2,r0
+	mov.l	@r15+,DBL0L
+	bf	LOCAL(inf)
+	RETURN_R0
+	mov	#-1,r0	/* NAN */
+LOCAL(inf):	/* r2 must be positive here.  */
+	mov.l	LOCAL(xff000000),r0
+	div0s	r2,DBL0H
+	RETURN_R0
+	rotcr	r0
+LOCAL(m32):
+	.word	-32
+	.balign	4
+LOCAL(x38000000):
+	.long	0x38000000
+LOCAL(x70000000):
+	.long	0x70000000
+LOCAL(x2fffffff):
+	.long	0x2fffffff
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x5fffffff):
+	.long	0x5fffffff
+LOCAL(x7ff80000):
+	.long	0x7ff80000
+	ENDFUNC(GLOBAL(truncdfsf2))
+#endif /* L_truncdfsf2 */
+
+
+#ifdef L_add_sub_df3
+#include "IEEE-754/m3/adddf3.S"
+#endif /* _add_sub_df3 */
+
+#ifdef L_muldf3
+#include "IEEE-754/m3/muldf3.S"
+#endif /* L_muldf3 */
+
+#ifdef L_fixunsdfsi
+#include "IEEE-754/m3/fixunsdfsi.S"
+#endif /* L_fixunsdfsi */
+
+#ifdef L_fixdfsi
+#include "IEEE-754/m3/fixdfsi.S"
+#endif /* L_fixdfsi */
+
+#ifdef L_floatunssidf
+#include "IEEE-754/m3/floatunssidf.S"
+#endif /* L_floatunssidf */
+
+#ifdef L_floatsidf
+#include "IEEE-754/m3/floatsidf.S"
+#endif /* L_floatsidf */
+
+#ifdef L_divdf3
+#include "IEEE-754/m3/divdf3.S"
+#endif /* L_divdf3 */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_DOUBLE__ */
Index: gcc/config/sh/predicates.md
===================================================================
--- gcc/config/sh/predicates.md	(revision 162269)
+++ gcc/config/sh/predicates.md	(working copy)
@@ -719,6 +719,33 @@ (define_predicate "shift_operator"
 (define_predicate "symbol_ref_operand"
   (match_code "symbol_ref"))
 
+(define_special_predicate "soft_fp_comparison_operand"
+  (match_code "subreg,reg")
+{
+  switch (GET_MODE (op))
+    {
+    default:
+      return 0;
+    case CC_FP_NEmode: case CC_FP_GTmode: case CC_FP_UNLTmode:
+      break;
+    }
+  return register_operand (op, mode);
+})
+
+(define_predicate "soft_fp_comparison_operator"
+  (match_code "eq, unle, ge")
+{
+  switch (GET_CODE (op))
+    {
+    default:
+      return 0;
+    case EQ:  mode = CC_FP_NEmode;    break;
+    case UNLE:        mode = CC_FP_GTmode;    break;
+    case GE:  mode = CC_FP_UNLTmode;  break;
+    }
+  return register_operand (XEXP (op, 0), mode);
+})
+
 ;; Same as target_reg_operand, except that label_refs and symbol_refs
 ;; are accepted before reload.
 
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c	(revision 162269)
+++ gcc/config/sh/sh.c	(working copy)
@@ -284,6 +284,7 @@ static int sh_arg_partial_bytes (CUMULAT
 			         tree, bool);
 static bool sh_scalar_mode_supported_p (enum machine_mode);
 static int sh_dwarf_calling_convention (const_tree);
+static void sh_expand_float_condop (rtx *operands, rtx, rtx (*[2]) (rtx));
 static void sh_encode_section_info (tree, rtx, int);
 static int sh2a_function_vector_p (tree);
 static void sh_trampoline_init (rtx, tree, rtx);
@@ -551,6 +552,9 @@ static const struct attribute_spec sh_at
 /* Machine-specific symbol_ref flags.  */
 #define SYMBOL_FLAG_FUNCVEC_FUNCTION    (SYMBOL_FLAG_MACH_DEP << 0)
 
+#undef TARGET_MATCH_ADJUST
+#define TARGET_MATCH_ADJUST sh_match_adjust
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 \f
 /* Implement TARGET_HANDLE_OPTION.  */
@@ -2180,6 +2184,72 @@ sh_emit_cheap_store_flag (enum machine_m
   return gen_rtx_fmt_ee (code, VOIDmode, target, const0_rtx);
 }
 
+static rtx
+sh_soft_fp_cmp (int code, enum machine_mode op_mode, rtx op0, rtx op1)
+{
+  const char *name = NULL;
+  rtx (*fun) (rtx, rtx), addr, tmp, last, equiv;
+  int df = op_mode == DFmode;
+  enum machine_mode mode = CODE_FOR_nothing; /* shut up warning.  */
+
+  switch (code)
+    {
+    case EQ:
+      if (!flag_finite_math_only)
+	{
+	  name = df ? "__nedf2" : "__nesf2";
+	  fun = df ? gen_cmpnedf_i1 : gen_cmpnesf_i1;
+	  mode = CC_FP_NEmode;
+	  break;
+	} /* Fall through.  */
+    case UNEQ:
+      fun = gen_cmpuneq_sdf;
+      break;
+    case UNLE:
+      if (flag_finite_math_only && !df)
+	{
+	  fun = gen_cmplesf_i1_finite;
+	  break;
+	}
+      name = df ? "__gtdf2t" : "__gtsf2t";
+      fun = df ? gen_cmpgtdf_i1 : gen_cmpgtsf_i1;
+      mode = CC_FP_GTmode;
+      break;
+    case GE:
+      if (flag_finite_math_only && !df)
+	{
+	  tmp = op0; op0 = op1; op1 = tmp;
+	  fun = gen_cmplesf_i1_finite;
+	  break;
+	}
+      name = df ? "__gedf2f" : "__gesf2f";
+      fun = df ? gen_cmpunltdf_i1 : gen_cmpunltsf_i1;
+      mode = CC_FP_UNLTmode;
+      break;
+    case UNORDERED:
+      fun = gen_cmpun_sdf;
+      break;
+    default: gcc_unreachable ();
+    }
+
+  if (!name)
+    return fun (force_reg (op_mode, op0), force_reg (op_mode, op1));
+
+  tmp = gen_reg_rtx (mode);
+  addr = gen_reg_rtx (Pmode);
+  function_symbol (addr, name, SFUNC_STATIC);
+  emit_move_insn (gen_rtx_REG (op_mode, R4_REG), op0);
+  emit_move_insn (gen_rtx_REG (op_mode, R5_REG + df), op1);
+  last = emit_insn (fun (tmp, addr));
+  equiv = gen_rtx_fmt_ee (COMPARE, mode, op0, op1);
+  set_unique_reg_note (last, REG_EQUAL, equiv);
+  /* Use fpcmp_i1 rather than cmpeqsi_t, so that the optimizers can grok
+     the computation.  */
+  return gen_rtx_SET (VOIDmode,
+		      gen_rtx_REG (SImode, T_REG),
+		      gen_rtx_fmt_ee (code, SImode, tmp, CONST0_RTX (mode)));
+}
+
 /* Called from the md file, set up the operands of a compare instruction.  */
 
 void
@@ -8662,6 +8732,49 @@ sh_fix_range (const char *const_str)
       str = comma + 1;
     }
 }
+
+/* Expand an sfunc operation taking NARGS MODE arguments, using generator
+   function FUN, which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using EQUIV.  */
+static void
+expand_sfunc_op (int nargs, enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		 const char *name, rtx equiv, rtx *operands)
+{
+  int next_reg = FIRST_PARM_REG, i;
+  rtx addr, last, insn;
+
+  addr = gen_reg_rtx (Pmode);
+  function_symbol (addr, name, SFUNC_FREQUENT);
+  for ( i = 1; i <= nargs; i++)
+    {
+      insn = emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
+      next_reg += GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+    }
+  last = emit_insn ((*fun) (operands[0], addr));
+  set_unique_reg_note (last, REG_EQUAL, equiv);
+}
+
+/* Expand an sfunc unary operation taking an MODE argument, using generator
+   function FUN, which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using CODE.  */
+void
+expand_sfunc_unop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		   const char *name, enum rtx_code code, rtx *operands)
+{
+  rtx equiv = gen_rtx_fmt_e (code, GET_MODE (operands[0]), operands[1]);
+  expand_sfunc_op (1, mode, fun, name, equiv, operands);
+}
+
+/* Expand an sfunc binary operation in MODE, using generator function FUN,
+   which needs symbol NAME loaded int a register first.
+   Add a REG_EQUAL note using CODE.  */
+void
+expand_sfunc_binop (enum machine_mode mode, rtx (*fun) (rtx, rtx),
+		    const char *name, enum rtx_code code, rtx *operands)
+{
+  rtx equiv = gen_rtx_fmt_ee (code, mode, operands[1], operands[2]);
+  expand_sfunc_op (2, mode, fun, name, equiv, operands);
+}
 \f
 /* Insert any deferred function attributes from earlier pragmas.  */
 static void
@@ -11593,11 +11706,10 @@ function_symbol (rtx target, const char 
 {
   rtx sym;
 
-  /* If this is not an ordinary function, the name usually comes from a
-     string literal or an sprintf buffer.  Make sure we use the same
+  /* The name usually comes from a string literal or an sprintf buffer.
+     Make sure we use the same
      string consistently, so that cse will be able to unify address loads.  */
-  if (kind != FUNCTION_ORDINARY)
-    name = IDENTIFIER_POINTER (get_identifier (name));
+  name = IDENTIFIER_POINTER (get_identifier (name));
   sym = gen_rtx_SYMBOL_REF (Pmode, name);
   SYMBOL_REF_FLAGS (sym) = SYMBOL_FLAG_FUNCTION;
   if (flag_pic)
@@ -11605,6 +11717,10 @@ function_symbol (rtx target, const char 
       {
       case FUNCTION_ORDINARY:
 	break;
+      case SFUNC_FREQUENT:
+	if (!optimize || optimize_size)
+	  break;
+	/* Fall through.  */
       case SFUNC_GOT:
 	{
 	  rtx reg = target ? target : gen_reg_rtx (Pmode);
@@ -11715,6 +11831,168 @@ sh_expand_t_scc (rtx operands[])
   return 1;
 }
 
+void
+sh_expand_float_cbranch (rtx operands[4])
+{
+  static rtx (*branches[]) (rtx) = { gen_branch_true, gen_branch_false };
+
+  sh_expand_float_condop (operands, operands[3], branches);
+}
+
+void
+sh_expand_float_scc (rtx operands[4])
+{
+  static rtx (*movts[]) (rtx) = { gen_movt, gen_movnegt };
+
+  sh_expand_float_condop (&operands[1], operands[0], movts);
+}
+
+/* The first element of USER is for positive logic, the second one for
+   negative logic.  */
+static void
+sh_expand_float_condop (rtx *operands, rtx dest, rtx (*user[2]) (rtx))
+{
+  enum machine_mode mode = GET_MODE (operands[1]);
+  enum rtx_code comparison = GET_CODE (operands[0]);
+  int swap_operands = 0;
+  rtx op0, op1;
+  rtx lab = NULL_RTX;
+
+  if (TARGET_SH1_SOFTFP_MODE (mode))
+    {
+      switch (comparison)
+	{
+	case NE:
+	  comparison = EQ;
+	  user++;
+	  break;
+	case LT:
+	  swap_operands = 1;	/* Fall through.  */
+	case GT:
+	  comparison = UNLE;
+	  user++;
+	  break;
+	case UNGT:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLT:
+	  comparison = GE;
+	  user++;
+	  break;
+	case UNGE:
+	  swap_operands = 1;
+	  comparison = UNLE;
+	  break;
+	case LE:
+	  swap_operands = 1;
+	  comparison = GE;	/* Fall through.  */
+	case EQ:
+	case UNEQ:
+	case GE:
+	case UNLE:
+	case UNORDERED:
+	  break;
+	case LTGT:
+	  comparison = UNEQ;
+	  user++;
+	  break;
+	case ORDERED:
+	  comparison = UNORDERED;
+	  user++;
+	  break;
+	  
+	default: gcc_unreachable ();
+	}
+    }
+  else /* SH2E .. SH4 Hardware floating point */
+    {
+      switch (comparison)
+	{
+	case LTGT:
+	  if (!flag_finite_math_only)
+	    break;
+	  /* Fall through.  */
+	case NE:
+	  comparison = EQ;
+	  user++;
+	  break;
+	case LT:
+	  swap_operands = 1;
+	  comparison = GT;	/* Fall through.  */
+	case GT:
+	case EQ:
+	case ORDERED:
+	  break;
+	case LE:
+	  swap_operands = 1;
+	  comparison = GE;	/* Fall through.  */
+	case GE:
+	  if (flag_finite_math_only)
+	    {
+	      swap_operands ^= 1;
+	      comparison = GT;
+	      user++;
+	      break;
+	    }
+	  break;
+	case UNGT:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLT:
+	  if (flag_finite_math_only)
+	    {
+	      swap_operands ^= 1;
+	      comparison = GT;
+	      break;
+	    }
+	  comparison = GE;
+	  user++;
+	  break;
+	case UNGE:
+	  swap_operands = 1;	/* Fall through.  */
+	case UNLE:
+	  comparison = GT;
+	  user++;
+	  break;
+	case UNEQ:
+	  if (flag_finite_math_only)
+	    {
+	      comparison = EQ;
+	      break;
+	    }
+	  comparison = LTGT;
+	  user++;
+	  break;
+	case UNORDERED:
+	  comparison = ORDERED;
+	  user++;
+	  break;
+
+	default: gcc_unreachable ();
+	}
+      operands[1] = force_reg (mode, operands[1]);
+      operands[2] = force_reg (mode, operands[2]);
+      if (comparison == GE)
+	{
+	  lab = gen_label_rtx ();
+	  sh_emit_scc_to_t (GT, operands[1+swap_operands],
+			    operands[2-swap_operands]);
+	  emit_jump_insn (gen_branch_true (lab));
+	  comparison = EQ;
+	}
+    }
+  op0 = operands[1+swap_operands];
+  op1 = operands[2-swap_operands];
+  if (GET_MODE_CLASS (mode) == MODE_FLOAT && TARGET_SH1_SOFTFP_MODE (mode))
+    emit_insn (sh_soft_fp_cmp (comparison, mode, op0, op1));
+  else
+    sh_emit_set_t_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, T_REG),
+				     gen_rtx_fmt_ee (comparison, SImode,
+						     op0, op1)),
+			mode);
+  if (lab)
+    emit_label (lab);
+  emit ((*user) (dest));
+}
+
 /* INSN is an sfunc; return the rtx that describes the address used.  */
 static rtx
 extract_sfunc_addr (rtx insn)
@@ -12266,6 +12544,19 @@ sh_secondary_reload (bool in_p, rtx x, r
   return NO_REGS;
 }
 
+int
+sh_match_adjust (rtx x, int regno)
+{
+  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+     multiple hard register group of scalar integer registers, so that
+     for example (reg:DI 0) and (reg:SI 1) will be considered the same
+     register.  */
+  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD
+      && regno < FIRST_PSEUDO_REGISTER)
+    regno += hard_regno_nregs[regno][GET_MODE (x)] - 1;
+  return regno;
+}
+
 enum sh_divide_strategy_e sh_div_strategy = SH_DIV_STRATEGY_DEFAULT;
 
 #include "gt-sh.h"
Index: gcc/config/sh/sh.h
===================================================================
--- gcc/config/sh/sh.h	(revision 162269)
+++ gcc/config/sh/sh.h	(working copy)
@@ -183,6 +183,11 @@ do { \
 #define TARGET_FPU_DOUBLE \
   ((target_flags & MASK_SH4) != 0 || TARGET_SH2A_DOUBLE)
 
+#define TARGET_SH1_SOFTFP (TARGET_SH1 && !TARGET_FPU_DOUBLE)
+
+#define TARGET_SH1_SOFTFP_MODE(MODE) \
+  (TARGET_SH1_SOFTFP && (!TARGET_SH2E || (MODE) == DFmode))
+
 /* Nonzero if an FPU is available.  */
 #define TARGET_FPU_ANY (TARGET_SH2E || TARGET_FPU_DOUBLE)
 
@@ -329,6 +334,38 @@ do { \
 #define SUPPORT_ANY_SH5 \
   (SUPPORT_ANY_SH5_32MEDIA || SUPPORT_ANY_SH5_64MEDIA)
 
+/* Check if we have support for optimized software floating point using
+   dynamic shifts - then some function calls clobber fewer registers.  */
+#ifdef SUPPORT_SH3
+#define SUPPORT_SH3_OSFP 1
+#else
+#define SUPPORT_SH3_OSFP 0
+#endif
+
+#ifdef SUPPORT_SH3E
+#define SUPPORT_SH3E_OSFP 1
+#else
+#define SUPPORT_SH3E_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_NOFPU) || defined(SUPPORT_SH3_OSFP)
+#define SUPPORT_SH4_NOFPU_OSFP 1
+#else
+#define SUPPORT_SH4_NOFPU_OSFP 0
+#endif
+
+#if defined(SUPPORT_SH4_SINGLE_ONLY) || defined (SUPPORT_SH3E_OSFP)
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 1
+#else
+#define SUPPORT_SH4_SINGLE_ONLY_OSFP 0
+#endif
+
+#define TARGET_OSFP (0 \
+ || (TARGET_SH3 && !TARGET_SH2E && SUPPORT_SH3_OSFP) \
+ || (TARGET_SH3E && SUPPORT_SH3E_OSFP) \
+ || (TARGET_HARD_SH4 && !TARGET_SH2E && SUPPORT_SH4_NOFPU_OSFP) \
+ || (TARGET_HARD_SH4 && TARGET_SH2E && SUPPORT_SH4_SINGLE_ONLY_OSFP))
+
 /* Reset all target-selection flags.  */
 #define MASK_ARCH (MASK_SH1 | MASK_SH2 | MASK_SH3 | MASK_SH_E | MASK_SH4 \
 		   | MASK_HARD_SH2A | MASK_HARD_SH2A_DOUBLE | MASK_SH4A \
@@ -2047,6 +2084,12 @@ struct sh_args {
 #define LIBGCC2_DOUBLE_TYPE_SIZE 64
 #endif
 
+#if defined(__SH2E__) || defined(__SH3E__) || defined( __SH4_SINGLE_ONLY__)
+#define LIBGCC2_DOUBLE_TYPE_SIZE 32
+#else
+#define LIBGCC2_DOUBLE_TYPE_SIZE 64
+#endif
+
 /* 'char' is signed by default.  */
 #define DEFAULT_SIGNED_CHAR  1
 
Index: gcc/config/sh/sh-modes.def
===================================================================
--- gcc/config/sh/sh-modes.def	(revision 162269)
+++ gcc/config/sh/sh-modes.def	(working copy)
@@ -22,6 +22,11 @@ PARTIAL_INT_MODE (SI);
 /* PDI mode is used to represent a function address in a target register.  */
 PARTIAL_INT_MODE (DI);
 
+/* For software floating point comparisons.  */
+CC_MODE (CC_FP_NE);
+CC_MODE (CC_FP_GT);
+CC_MODE (CC_FP_UNLT);
+
 /* Vector modes.  */
 VECTOR_MODE  (INT, QI, 2);    /*                 V2QI */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
Index: gcc/config/sh/lib1funcs.h
===================================================================
--- gcc/config/sh/lib1funcs.h	(revision 162269)
+++ gcc/config/sh/lib1funcs.h	(working copy)
@@ -64,13 +64,151 @@ see the files COPYING3 and COPYING.RUNTI
 #endif /* !__LITTLE_ENDIAN__ */
 
 #ifdef __sh1__
+/* branch with two-argument delay slot insn */
 #define SL(branch, dest, in_slot, in_slot_arg2) \
 	in_slot, in_slot_arg2; branch dest
+/* branch with one-argument delay slot insn */
 #define SL1(branch, dest, in_slot) \
 	in_slot; branch dest
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+        branch dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg2) in_slot, in_slot_arg2
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+	branch .+6; bra .+6; cmp2, cmp2arg2; cmp1, cmp1arg2
+#define DMULU_SAVE \
+ mov.l r10,@-r15; \
+ mov.l r11,@-r15; \
+ mov.l r12,@-r15; \
+ mov.l r13,@-r15
+#define DMULUL(m1, m2, rl) \
+ swap.w m1,r12; \
+ mulu.w r12,m2; \
+ swap.w m2,r13; \
+ sts macl,r10; \
+ mulu.w r13,m1; \
+ clrt; \
+ sts macl,r11; \
+ mulu.w r12,r13; \
+ addc r11,r10; \
+ sts macl,r12; \
+ mulu.w m1,m2; \
+ movt r11; \
+ sts macl,rl; \
+ mov r10,r13; \
+ shll16 r13; \
+ addc r13,rl; \
+ xtrct r11,r10; \
+ addc r10,r12 \
+/* N.B. the carry is cleared here.  */
+#define DMULUH(rh) mov r12,rh
+#define DMULU_RESTORE \
+ mov.l @r15+,r13; \
+ mov.l @r15+,r12; \
+ mov.l @r15+,r11; \
+ mov.l @r15+,r10
 #else /* ! __sh1__ */
+/* branch with two-argument delay slot insn */
 #define SL(branch, dest, in_slot, in_slot_arg2) \
-	branch##.s dest; in_slot, in_slot_arg2
+	branch##/s dest; in_slot, in_slot_arg2
+/* branch with one-argument delay slot insn */
 #define SL1(branch, dest, in_slot) \
 	branch##/s dest; in_slot
+/* branch with comparison in delay slot */
+#define SLC(branch, dest, in_slot, in_slot_arg2) \
+        branch##/s dest; in_slot, in_slot_arg2
+/* comparison in a delay slot, at branch destination */
+#define SLI(in_slot, in_slot_arg)
+#define SLCMP(branch, cmp1, cmp1arg2, cmp2, cmp2arg2) \
+	branch##/s .+6; cmp1, cmp1arg2; cmp2, cmp2arg2
+#define DMULU_SAVE
+#define DMULUL(m1, m2, rl) dmulu.l m1,m2; sts macl,rl
+#define DMULUH(rh) sts mach,rh
+#define DMULU_RESTORE
 #endif /* !__sh1__ */
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+/* don't #define DYN_SHIFT */
+  #define SHLL4(REG)	\
+	shll2	REG;	\
+	shll2	REG
+
+  #define SHLR4(REG)	\
+	shlr2	REG;	\
+	shlr2	REG
+
+  #define SHLL6(REG)	\
+	shll2	REG;	\
+	shll2	REG;	\
+	shll2	REG
+
+  #define SHLR6(REG)	\
+	shlr2	REG;	\
+	shlr2	REG;	\
+	shlr2	REG
+
+  #define SHLL12(REG)	\
+	shll8	REG;	\
+	SHLL4 (REG)
+
+  #define SHLR12(REG)	\
+	shlr8	REG;	\
+	SHLR4 (REG)
+
+  #define SHLR19(REG)	\
+	shlr16	REG;	\
+	shlr2	REG;	\
+	shlr	REG
+
+  #define SHLL23(REG)	\
+	shll16	REG;	\
+	shlr	REG;	\
+	shll8	REG
+
+  #define SHLR24(REG)	\
+	shlr16	REG;	\
+	shlr8	REG
+
+  #define SHLR21(REG)	\
+	shlr16	REG;	\
+	shll2	REG;	\
+	add	REG,REG;\
+	shlr8	REG
+
+  #define SHLL21(REG)	\
+	shll16	REG;	\
+	SHLL4 (REG);	\
+	add	REG,REG
+
+  #define SHLR11(REG)	\
+	shlr8	REG;	\
+	shlr2	REG;	\
+	shlr	REG
+
+  #define SHLR22(REG)	\
+	shlr16	REG;	\
+	shll2	REG;	\
+	shlr8	REG
+
+  #define SHLR23(REG)	\
+	shlr16	REG;	\
+	add	REG,REG;\
+	shlr8	REG
+
+  #define SHLR20(REG)	\
+	shlr16	REG;	\
+	SHLR4 (REG)
+
+  #define SHLL20(REG)	\
+	shll16	REG;	\
+	SHLL4 (REG)
+#define SHLD_COUNT(N,COUNT)
+#define SHLRN(N,COUNT,REG) SHLR##N(REG)
+#define SHLLN(N,COUNT,REG) SHLL##N(REG)
+#else
+#define SHLD_COUNT(N,COUNT) mov #N,COUNT
+#define SHLRN(N,COUNT,REG) shld COUNT,REG
+#define SHLLN(N,COUNT,REG) shld COUNT,REG
+#define DYN_SHIFT 1
+#endif
Index: gcc/config/sh/IEEE-754/m3/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/divsf3.S	(revision 0)
@@ -0,0 +1,360 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! divsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+! long 0th..3rd significant byte
+#ifdef __LITTLE_ENDIAN__
+#define L0SB	3
+#define L1SB	2
+#define L2SB	1
+#define L3SB	0
+#else
+#define L0SB	0
+#define L1SB	1
+#define L2SB	2
+#define L3SB	3
+#endif
+
+! clobbered: r0,r1,r2,r3,r6,r7,T (and for sh.md's purposes PR)
+!
+! Note: When the divisor is larger than the divident, we have to adjust the
+! exponent down by one.  We do this automatically when subtracting the entire
+! exponent/fraction bitstring as an integer, by means of the borrow from
+! bit 23 to bit 24.
+! Note: non-denormal rounding of a division result cannot cause fraction
+! overflow / exponent change. (r4 > r5 : fraction must stay in (2..1] interval;
+! r4 < r5: having an extra bit of precision available, even the smallest
+! possible difference of the result from one is rounded in all rounding modes
+! to a fraction smaller than one.)
+! sh4-200: 59 cycles
+! sh4-300: 44 cycles
+! tab indent: exponent / sign computations
+! tab+space indent: fraction computation
+FUNC(GLOBAL(divsf3))
+	.global GLOBAL(divsf3)
+	.balign	4
+GLOBAL(divsf3):
+	mov.l	LOCAL(x7f800000),r3
+	mov	#1,r2
+	mov	r4,r6
+	 shll8	 r6
+	mov	r5,r7
+	 shll8	 r7
+	rotr	r2
+	tst	r3,r4
+	or	r2,r6
+	bt/s	LOCAL(denorm_arg0)
+	or	r2,r7
+	tst	r3,r5
+	bt	LOCAL(denorm_arg1)
+	 shlr	 r6
+	mov.l	LOCAL(x3f000000),r3	! bias minus explict leading 1
+	 div0u
+LOCAL(denorm_done):
+	 div1	 r7,r6
+	mov.l	r8,@-r15
+	 bt	 0f
+	 div1	r7,r6
+0:	mov.l	r9,@-r15
+	 div1	 r7,r6
+	add	r4,r3
+	 div1	 r7,r6
+	sub	r5,r3	! result sign/exponent minus 1 if no overflow/underflow
+	 div1	 r7,r6
+	or	r3,r2
+	 div1	 r7,r6
+	mov.w	LOCAL(xff00),r9
+	 div1	 r7,r6
+	mov.l	r2,@-r15 ! L0SB is 0xff iff denorm / infinity exp is computed
+	 div1	 r7,r6
+	mov.w	LOCAL(m23),r2
+	 div1	 r7,r6
+	mov	r4,r0
+	 div1	 r7,r6
+	 extu.b	 r6,r1
+	 and	 r9,r6
+	 swap.w	 r1,r1	! first 8 bits of result fraction in bit 23..16
+	 div1	 r7,r6
+	shld	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L3SB,r15)	! 0xff iff divident was infinity / nan
+	 div1	 r7,r6
+	mov	r5,r0
+	 div1	 r7,r6
+	shld	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L2SB,r15)	! 0xff iff divisor was infinity / nan
+	 div1	 r7,r6
+	mov	r4,r0
+	 div1	 r7,r6
+	mov.w	LOCAL(m31),r2
+	 div1	 r7,r6
+	 extu.b	 r6,r8	! second 8 bits of result fraction in bit 7..0
+	 and	 r9,r6
+	mov.l	LOCAL(xff800000),r9
+	 div1	 r7,r6
+	xor	r5,r0	! msb := correct result sign
+	 div1	 r7,r6
+	xor	r3,r0	! xor with sign of result sign/exponent word
+	 div1	 r7,r6
+	shad	r2,r0
+	 div1	 r7,r6
+	mov.b	r0,@(L1SB,r15)	! 0xff	iff exponent over/underflows
+	and	r9,r3	! isolate sign / exponent
+	 mov.w	 LOCAL(xff01),r2
+	 div1	 r7,r6
+	swap.b	r8,r0	! second 8 bits of result fraction in bit 15..8
+	 div1	 r7,r6
+	or	r1,r0	! first 16 bits of result fraction in bit 23..8
+	 div1	 r7,r6
+	mov.w	LOCAL(m1),r9
+	 div1	 r7,r6
+	mov.l	@r15+,r8 ! load encoding of unusal exponent conditions
+	 and	 r6,r2	! rest | result lsb
+	 mov	 #0,r1
+	 bf	 0f	! bit below lsb clear -> no rounding
+	 cmp/hi	r1,r2
+0:	 extu.b	 r6,r1
+	 or	 r1,r0	! 24 bit result fraction with explicit leading 1
+	addc	r3,r0	! add in exponent / sign
+	cmp/str	r9,r8
+	! (no stall *here* for SH4-100 / SH4-200)
+	bt/s	LOCAL(inf_nan_denorm_zero)
+	mov.l	@r15+,r9
+	rts
+	mov.l	@r15+,r8
+
+/* The exponennt adjustment for denormal numbers is done by leaving an
+   adjusted value in r3; r4/r5 are not changed.  */
+	.balign	4
+LOCAL(denorm_arg0):
+	mov.w	LOCAL(xff00),r1
+	sub	r2,r6	! 0x800000000 : remove implict 1
+	tst	r6,r6
+	sts.l	pr,@-r15
+	bt	LOCAL(div_zero)
+	bsr	LOCAL(clz)
+	mov	r6,r0
+	shld	r0,r6
+	tst	r3,r5
+	mov.l	LOCAL(x3f800000),r3	! bias - 1 + 1
+	mov	#23,r1
+	shld	r1,r0
+	bt/s	LOCAL(denorm_arg1_2)
+	sub	r0,r3
+	 shlr	 r6
+	bra	LOCAL(denorm_done)
+	 div0u
+
+LOCAL(denorm_arg1):
+	mov.l	LOCAL(x3f000000),r3	! bias - 1
+LOCAL(denorm_arg1_2):
+	sub	r2,r7	! 0x800000000 : remove implict 1
+	mov.w	LOCAL(xff00),r1
+	tst	r7,r7
+	sts.l	pr,@-r15
+	bt	LOCAL(div_by_zero)
+	bsr	LOCAL(clz)
+	mov	r7,r0
+	shld	r0,r7
+	add	#-1,r0
+	mov	#23,r1
+	shld	r1,r0
+	add	r0,r3
+	 shlr	 r6
+	bra	LOCAL(denorm_done)
+	 div0u
+
+	.balign	4
+LOCAL(inf_nan_denorm_zero):
+! r0 has the rounded result, r6 has the non-rounded lowest bits & rest.
+! the bit just below the LSB of r6 is available as ~Q
+
+! Alternative way to get at ~Q:
+! if rounding took place, ~Q must be set.
+! if the rest appears to be zero, ~Q must be set.
+! if the rest appears to be nonzero, but rounding didn't take place,
+! ~Q must be clear;  the apparent rest will then require adjusting to test if 
+! the actual rest is nonzero.
+	mov	r0,r2
+	not	r8,r0
+	tst	#0xff,r0
+	shlr8	r0
+	mov.l	@r15+,r8
+	bt/s	LOCAL(div_inf_or_nan)
+	tst	#0xff,r0
+	mov	r4,r0
+	bt	LOCAL(div_by_inf_or_nan)
+	add	r0,r0
+	mov	r5,r1
+	add	r1,r1
+	cmp/hi	r1,r0
+	mov	r6,r0
+	bt	LOCAL(overflow)
+	sub	r2,r0
+	exts.b	r0,r0	! -1 if rounding took place
+	shlr8	r6	! isolate div1-mangled rest
+	addc	r2,r0	! generate carry if rounding took place
+	shlr8	r7
+	sub	r3,r0	! pre-rounding fraction
+	bt	0f ! going directly to denorm_sticky would cause mispredicts
+	tst	r6,r6	! rest can only be zero if lost bit was set
+0:	add	r7,r6	! (T ? corrupt : reconstruct) actual rest
+	bt	0f
+	cmp/pl	r6
+0:	mov.w	LOCAL(m24),r1
+	addc	r0,r0	! put in sticky bit
+	add	#-1,r3
+	mov.l	LOCAL(x40000000),r6
+	add	r3,r3
+	mov	r0,r2
+	shad	r1,r3	! exponent ; s32.0
+	!
+	shld	r3,r0
+	add	#30,r3
+	cmp/pl	r3
+	shld	r3,r2
+	bf	LOCAL(zero_nan)	! return zero
+	rotl	r2
+	cmp/hi	r6,r2
+	mov	#0,r7
+	addc	r7,r0
+	div0s	r4,r5
+	rts
+	rotcr	r0
+	
+! ????
+! undo normal rounding (lowest bits still in r6). then do denormal rounding.
+	
+LOCAL(overflow):
+	mov.l	LOCAL(xff000000),r0
+	div0s	r4,r5
+	rts
+	rotcl	r0
+	
+LOCAL(div_inf_or_nan):
+	mov	r4,r0
+	bra	LOCAL(nan_if_t)
+	add	r0,r0
+	
+LOCAL(div_by_inf_or_nan):
+	mov.l	LOCAL(xff000000),r1
+	mov	#0,r0
+	mov	r5,r2
+	add	r2,r2
+	bra	LOCAL(nan_if_t)
+	cmp/hi	r1,r2
+
+
+
+! still need to check for divide by zero or divide by nan
+! r3: 0x7f800000
+	.balign	4
+LOCAL(div_zero):
+	mov	r5,r1
+	add	r1,r1
+	tst	r1,r1	! 0 / 0 -> nan
+	not	r5,r1
+	bt	LOCAL(nan)
+	add	r3,r3
+	cmp/hi	r3,r1	! 0 / nan -> nan (but 0 / inf -> 0)
+LOCAL(zero_nan):
+	mov	#0,r0
+LOCAL(nan_if_t):
+	bf	0f:
+LOCAL(nan):
+	mov	#-1,r0
+0:	div0s	r4,r5	! compute sign
+	rts
+	rotcr	r0	! insert sign
+
+LOCAL(div_by_zero):
+	mov.l	LOCAL(xff000000),r0
+	mov	r5,r2
+	add	r2,r2
+	bra	LOCAL(nan_if_t)
+	cmp/hi	r0,r2
+	
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#8-8,r9
+	xtrct	r0,r8
+	add	#16,r9
+0:	tst	r1,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(m23):	.word -23
+LOCAL(m24):	.word -24
+LOCAL(m31):	.word -31
+LOCAL(xff01):	.word 0xff01
+	.balign	4
+LOCAL(xff000000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(xff00):	.word 0xff00
+LOCAL(m1):	.word -1
+#else
+LOCAL(m1):	.word -1
+LOCAL(xff00):	.word 0xff00
+#endif
+LOCAL(x7f800000): .long 0x7f800000
+LOCAL(x3f000000): .long 0x3f000000
+LOCAL(x3f800000): .long 0x3f800000
+LOCAL(xff800000): .long 0xff800000
+LOCAL(x40000000): .long 0x40000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(divsf3))
Index: gcc/config/sh/IEEE-754/m3/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3.S	(revision 0)
@@ -0,0 +1,603 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* y = 1/x  ; x (- [1,2)
+   y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+   y1 = y0 - ((y0) * x - 1) * y0  =  y-x*d^2
+   y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+   z0 = y2*a ;  a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+   z1 = y2*a1 (round to nearest odd 0.5 ulp);
+   a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+   z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+   Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+   with suitable scaling and/or top truncation.
+   We use a slightly modified algorithm here that checks if the lower
+   bits in z1 are sufficient to determine the outcome of rounding - in that
+   case a2 is not computed.
+   -z1 is computed in units of 1/128 ulp, with an error in the range
+   -0x3.e/128 .. +0 ulp.
+   Thus, after adding three, the result can be safely rounded for normal
+   numbers if any of the bits 5..2 is set, or if the highest guard bit
+   (bit 6 if y <1, otherwise bit 7) is set.
+   (Because of the way truncation works, we would be fine for an open
+    error interval of (-4/128..+1/128) ulp )
+   For denormal numbers, the rounding point lies higher, but it would be
+   quite cumbersome to calculate where exactly; it is sufficient if any
+   of the bits 7..3 is set.
+   x truncated to 20 bits is sufficient to calculate y0 or even y1.
+   Table entries are adjusted by about +128 to use full signed byte range.
+   This adjustment has been perturbed slightly to allow cse with the
+   shift count constant -26.
+   The threshold point for the shift adjust before rounding is found by
+   comparing the fractions, which is exact, unlike the top bit of y2.
+   Therefore, the top bit of y2 becomes slightly random after the adjustment
+   shift, but that's OK because this can happen only at the boundaries of
+   the interval, and the biasing of the error means that it can in fact happen
+   only at the bottom end.  And there, the carry propagation will make sure
+   that in the end we will have in effect an implicit 1 (or two whem rounding
+   up...)  */
+/* If an exact result exists, it can have no more bits than the divident.
+   Hence, we don't need to bother with the round-to-even tie breaker
+   unless the result is denormalized.  */
+/* 64 cycles through main path for sh4-300 (about 93.7% of normalized numbers),
+   82 for the path for rounding tie-breaking for normalized numbers
+   (including one branch mispredict).
+   Some cycles might be saved by more careful register allocation.  */
+
+#define x_h r12
+#define yn  r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too.  We still have to come back to denorm_arg1_done,
+   since we heven't done any of the work yet that we do till the denorm_arg0
+   entry point.  We know that neither of the arguments is inf/nan, but
+   arg0 might be zero.  Check for that first to avoid having to establish an
+   rts return address.  */
+LOCAL(both_denorm):
+	mov.l	r9,@-r15
+	mov	DBL0H,r1
+	mov.l	r0,@-r15
+	shll2	r1
+	mov.w LOCAL(both_denorm_cleanup_off),r9
+	or	DBL0L,r1
+	tst	r1,r1
+	mov	DBL0H,r0
+	bf/s	LOCAL(zero_denorm_arg0_1)
+	shll2	r0
+	mov.l	@(4,r15),r9
+	add	#8,r15
+	bra	LOCAL(ret_inf_nan_0)
+	mov	r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+	mov.l	@r15+,r0
+	!
+	mov.l	@r15+,r9
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+	bra	LOCAL(denorm_arg1_done)
+	!
+	add	r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+   (implicit 1).  To leave the result exponent unaltered, the other
+   argument's exponent is adjusted by the the shift count.  */
+
+	.balign 4
+LOCAL(arg0_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL0L,r0
+	shll	DBL0H
+	add	#1,r0
+	mov	DBL0L,DBL0H
+	shld	r0,DBL0H
+	rotcr	DBL0H
+	tst	DBL0L,DBL0L	/* Check for divide of zero.  */
+	add	#-33,r0
+	shld	r0,DBL0L
+	bf/s	LOCAL(adjust_arg1_exp)
+	add	#64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign.  */
+	mov.l	@r15+,r10
+	mov	#0,DBLRH
+	mov.l	@r15+,r9
+	bra	LOCAL(ret_inf_nan_0)
+	mov.l	@r15+,r8
+
+	.balign 4
+LOCAL(arg1_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL1L,r0
+	shll	DBL1H
+	add	#1,r0
+	mov	DBL1L,DBL1H
+	shld	r0,DBL1H
+	rotcr	DBL1H
+	tst	DBL1L,DBL1L	/* Check for divide by zero.  */
+	add	#-33,r0
+	shld	r0,DBL1L
+	bf/s	LOCAL(adjust_arg0_exp)
+	add	#64,r0
+	mov	DBL0H,r0
+	add	r0,r0
+	tst	r0,r0	! 0 / 0 ?
+	mov	#-1,DBLRH
+	bf	LOCAL(return_inf)
+	!
+	bt	LOCAL(ret_inf_nan_0)
+	!
+
+	.balign 4
+LOCAL(zero_denorm_arg1):
+	not	DBL0H,r3
+	mov	DBL1H,r0
+	tst	r2,r3
+	shll2	r0
+	bt	LOCAL(early_inf_nan_arg0)
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg1_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	!
+	shll	DBL1H
+	mov	DBL1L,r3
+	shld	r0,DBL1H
+	shld	r0,DBL1L
+	rotcr	DBL1H
+	add	#-32,r0
+	shld	r0,r3
+	add	#32,r0
+	or	r3,DBL1H
+LOCAL(adjust_arg0_exp):
+	tst	r2,DBL0H
+	mov	#20,r3
+	shld	r3,r0
+	bt	LOCAL(both_denorm)
+	add	DBL0H,r0
+	div0s	r0,DBL0H	! Check for obvious overflow.  */
+	not	r0,r3		! Check for more subtle overflow - lest
+	bt	LOCAL(return_inf)
+	mov	r0,DBL0H
+	tst	r2,r3		! we mistake it for NaN later
+	mov	#12,r3
+	bf	LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign.  */
+	mov	#20,r3
+	mov	#-2,DBLRH
+	bra	LOCAL(ret_inf_nan_0)
+	shad	r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan  nan/x -> nan */
+LOCAL(inf_nan_arg0):
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+LOCAL(early_inf_nan_arg0):
+	not	DBL1H,r3
+	mov	DBL0H,DBLRH
+	tst	r2,r3	! both inf/nan?
+	add	DBLRH,DBLRH
+	bf	LOCAL(ret_inf_nan_0)
+	mov	#-1,DBLRH
+LOCAL(ret_inf_nan_0):
+	mov	#0,DBLRL
+	mov.l	@r15+,r12
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+/* Already handled: inf/x, nan/x .  Thus: x/inf -> 0; x/nan -> nan */
+	.balign	4
+LOCAL(inf_nan_arg1):
+	mov	DBL1H,r2
+	mov	#12,r1
+	shld	r1,r2
+	mov.l	@r15+,r10
+	mov	#0,DBLRL
+	mov.l	@r15+,r9
+	or	DBL1L,r2
+	mov.l	@r15+,r8
+	cmp/hi	DBLRL,r2
+	mov.l	@r15+,r12
+	subc	DBLRH,DBLRH
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+	.balign 4
+LOCAL(zero_denorm_arg0):
+	mov.w	LOCAL(denorm_arg0_done_off),r9
+	not	DBL1H,r1
+	mov	DBL0H,r0
+	tst	r2,r1
+	shll2	r0
+	bt	LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg0_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	shll	DBL0H
+	mov	DBL0L,r12
+	shld	r0,DBL0H
+	shld	r0,DBL0L
+	rotcr	DBL0H
+	add	#-32,r0
+	shld	r0,r12
+	add	#32,r0
+	or	r12,DBL0H
+LOCAL(adjust_arg1_exp):
+	mov	#20,r12
+	shld	r12,r0
+	add	DBL1H,r0
+	div0s	r0,DBL1H	! Check for obvious underflow.  */
+	not	r0,r12		! Check for more subtle underflow - lest
+	bt	LOCAL(return_0)
+	mov	r0,DBL1H
+	tst	r2,r12		! we mistake it for NaN later
+	bt	LOCAL(return_0)
+	!
+	braf	r9
+	mov	#13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00):	.word 0xff00
+LOCAL(denorm_arg0_done_off):
+	.word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+	.word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign	8
+GLOBAL(divdf3):
+ mov.l	LOCAL(x7ff00000),r2
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst	r2,DBL1H
+ mov.l	r12,@-r15
+ bt	LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov	DBL1H,x_h	! x_h live in r12
+ shld	r3,x_h	! x - 1 ; u0.20
+ mov	x_h,yn
+ mova	LOCAL(ytab),r0
+ mov.l	r8,@-r15
+ shld	r1,yn	! x-1 ; u26.6
+ mov.b	@(r0,yn),yn
+ mov	#6,r0
+ mov.l	r9,@-r15
+ mov	x_h,r8
+ mov.l	r10,@-r15
+ shlr16	x_h	! x - 1; u16.16	! x/2 - 0.5 ; u15.17
+ add	x_h,r1	! SH4-200 single-issues this insn
+ shld	r0,yn
+ sub	r1,yn	! yn := y0 ; u15.17
+ mov	DBL1L,r1
+ mov	#-20,r10
+ mul.l	yn,x_h	! r12 dead
+ swap.w	yn,r9
+ shld	r10,r1
+ sts	macl,r0	! y0 * (x-1) - n ; u-1.32
+ add	r9,r0	! y0 * x - 1     ; s-1.32
+ tst	r2,DBL0H
+ dmuls.l r0,yn
+ mov.w	LOCAL(d13),r0
+ or	r1,r8	! x  - 1; u0.32
+ add	yn,yn	! yn = y0 ; u14.18
+ bt	LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):
+ sts	mach,r1	!      d0 ; s14.18
+ sub	r1,yn	! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov	DBL0L,r12
+ shld	r0,yn	! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w	LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld	r10,r12
+ mov	yn,r0
+ mov	DBL0H,r8
+ add	yn,yn	! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts	mach,r1	! y1 * (x-1); u1.31
+ add	r0,r1	! y1 * x    ; u1.31
+ dmulu.l yn,r1
+ not	DBL0H,r10
+ shld	r9,r8
+ tst	r2,r10
+ or	r8,r12	! a - 1; u0.32
+ bt	LOCAL(inf_nan_arg0)
+ sts	mach,r1	! d1+yn; u1.31
+ sett		! adjust y2 so that it can be interpreted as s1.31
+ not	DBL1H,r10
+ subc	r1,yn	! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l	LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst	r2,r10
+ or	DBL1H,r2
+ bt	LOCAL(inf_nan_arg1)
+ mov.l	r11,@-r15
+ sts	mach,r12	! y2*(a-1) ; u1.31
+ add	yn,r12		! z0       ; u1.31
+ dmulu.l r12,DBL1L
+ mov.l	LOCAL(x40000000),DBLRH ! bias + 1
+ and	r9,r2		! x ; u12.20
+ cmp/hi	DBL0L,DBL1L
+ sts	macl,r8
+ mov	#-24,r11
+ sts	mach,r9 	! r9:r8 := z0 * DBL1L; u-19.64
+ subc	DBL1H,DBLRH
+ mul.l	r12,r2  	! (r9+macl):r8 == z0*x; u-19.64
+ shll	r8
+ add	DBL0H,DBLRH	! result sign/exponent + 1
+ mov	r8,r10
+ sts	macl,DBLRL
+ add	DBLRL,r9
+ rotcl	r9		! r9:r8 := z*x; u-20.63
+ shld	r11,r10
+ mov.l	LOCAL(x7fe00000),DBLRL
+ sub	DBL0L,r9	! r9:r8 := -a ; u-20.63
+ cmp/pz	r9		! In corner cases this shift can loose ..
+ shll8	r9		!  .. the sign, so check it first.
+ mov.l	LOCAL(x00200000),r11
+ or	r10,r9	! -a1 ; s-28.32
+ mov.l	LOCAL(x00100000),r10
+ dmulu.l r9,yn	! sign for r9 is in T
+ xor	DBL0H,DBL1H	! calculate expected sign & bit20
+ mov.w	LOCAL(d120),DBL0H ! to test bits 6..4
+ xor	DBLRH,DBL1H
+ !
+ sts	mach,DBL0L	! -z1 ; s-27.32
+ bt 0f
+ sub	yn,DBL0L	! multiply adjust for -a1 negative; r3 dies here
+0:tst	r10,DBL1H		! set T if a >= x
+ mov.l LOCAL(xfff00000),r3
+ bt	0f
+ add	DBL0L,DBL0L	! z1 ; s-27.32 / s-28.32
+0:bt 0f
+ add	r12,r12	! z0 ; u1.31 / u0.31
+0:add	#6-64,DBL0L
+ and	r3,DBLRH	! isolate sign / exponent
+ tst	DBL0H,DBL0L
+ bf/s	LOCAL(exact)	! make the hot path taken for best branch prediction
+ cmp/pz	DBL1H
+
+! Unless we follow the next branch, we need to test which way the rounding
+! should go.
+! For normal numbers, we know that the result is not exact, so the sign
+! of the rest will be conclusive.
+! We generate a number that looks safely rounded so that denorm handling
+! can safely test the number twice.
+! r10:r8 == 0 will indicate if the number was exact, which can happen
+! when we come here for denormals to check a number that is close or
+! equal to a result in whole ulps.
+ bf	LOCAL(ret_denorm_inf)	! denorm or infinity, DBLRH has inverted sign
+ add	#64,DBL0L
+LOCAL(find_adjust): tst	r10,DBL1H ! set T if a >= x
+ mov	#-2,r10
+ addc	r10,r10
+ mov	DBL0L,DBLRL	! z1 ; s-27.32 / s-28.32 ; lower 4 bits unsafe.
+ shad	r10,DBLRL	! tentatively rounded z1 ; s-24.32
+ shll8	r8		! r9:r8 := -a1 ; s-28.64
+ clrt
+ dmuls.l DBLRL,DBL1L	! DBLRL signed, DBL1L unsigned
+ mov	r8,r10
+ shll16	r8		! r8  := lowpart  of -a1 ; s-44.48
+ xtrct	r9,r10		! r10 := highpart of -a1 ; s-44.48
+ !
+ sts	macl,r3
+ subc	r3,r8
+ sts	mach,r3
+ subc	r3,r10
+ cmp/pz	DBL1L
+ mul.l	DBLRL,r2
+ bt	0f
+ sub	DBLRL,r10	! adjust for signed/unsigned multiply
+0: mov.l	LOCAL(x7fe00000),DBLRL
+ mov	#-26,r2
+ sts	macl,r9
+ sub	r9,r10		! r10:r8 := -a2
+ add	#-64+16,DBL0L	! the denorm code negates this adj. for exact results
+ shld	r2,r10		! convert sign into adjustment in the range 32..63
+ sub	r10,DBL0L
+ cmp/pz	DBL1H
+
+ .balign 4
+LOCAL(exact):
+ bf	LOCAL(ret_denorm_inf)	! denorm or infinity, DBLRH has inverted sign
+ tst	DBLRL,DBLRH
+ bt	LOCAL(ret_denorm_inf)	! denorm, DBLRH has correct sign
+ mov	#-7,DBL1H
+ cmp/pz	DBL0L		! T is sign extension of z1
+ not	DBL0L,DBLRL
+ subc	r11,DBLRH	! calculate sign / exponent minus implicit 1 minus T
+ mov.l	@r15+,r11
+ mov.l	@r15+,r10
+ shad	DBL1H,DBLRL
+ mov.l	@r15+,r9
+ mov	#-11,DBL1H
+ mov	r12,r8		! z0 contributes to DBLRH and DBLRL
+ shld	DBL1H,r12
+ mov	#21,DBL1H
+ clrt
+ shld	DBL1H,r8
+ addc	r8,DBLRL
+ mov.l	@r15+,r8
+ addc	r12,DBLRH
+ rts
+ mov.l	@r15+,r12
+
+!	sign in DBLRH ^ DBL1H
+! If the last 7 bits are in the range 64..64+7, we might have an exact
+! value in the preceding bits - or we might not. For denorms, we need to
+! find out.
+! if r10:r8 is zero, we just have found out that there is an exact value.
+	.balign	4
+LOCAL(ret_denorm_inf):
+	mov	DBLRH,r3
+	add	r3,r3
+	div0s	DBL1H,r3
+	mov	#120,DBLRL
+	bt	LOCAL(ret_inf_late)
+	add	#64,DBL0L
+	tst	DBLRL,DBL0L
+	mov	#-21,DBLRL
+	bt	LOCAL(find_adjust)
+	or	r10,r8
+	tst	r8,r8		! check if find_adjust found an exact value.
+	shad	DBLRL,r3
+	bf	0f
+	add	#-16,DBL0L	! if yes, cancel adjustment
+0:	mov	#-8,DBLRL	! remove the three lowest (inexact) bits
+	and	DBLRL,DBL0L
+	add	#-2-11,r3	! shift count for denorm generation
+	mov	DBL0L,DBLRL
+	mov	#28,r2
+	mov.l	@r15+,r11
+	mov.l	@r15+,r10
+	shll2	DBLRL
+	mov.l	@r15+,r9
+	shld	r2,DBL0L
+	mov.l	@r15+,r8
+	mov	#-31,r2
+	cmp/ge	r2,r3
+	shll2	DBLRL
+	bt/s	0f
+	add	DBL0L,r12	! fraction in r12:DBLRL ; u1.63
+	negc	DBLRL,DBLRL	! T := DBLRL != 0
+	add	#31,r3
+	mov	r12,DBLRL
+	rotcl	DBLRL		! put in sticky bit
+	movt	r12
+	cmp/ge	r2,r3
+	bt/s	LOCAL(return_0_late)
+0:	div0s	DBL1H,DBLRH	! calculate sign
+	mov	r12,DBLRH
+	shld	r3,DBLRH
+	mov	DBLRL,r2
+	shld	r3,DBLRL
+	add	#32,r3
+	add	DBLRH,DBLRH
+	mov.l	LOCAL(x80000000),DBL1H
+	shld	r3,r12
+	rotcr	DBLRH		! combine sign with highpart
+	add	#-1,r3
+	shld	r3,r2
+	mov	#0,r3
+	rotl	r2
+	cmp/hi	DBL1H,r2
+	addc	r12,DBLRL
+	mov.l	@r15+,r12
+	rts
+	addc	r3,DBLRH
+
+LOCAL(ret_inf_late):
+	mov.l	@r15+,r11
+	mov.l	@r15+,r10
+	mov	DBLRH,DBL0H
+	mov.l	@r15+,r9
+	bra	LOCAL(return_inf)
+	mov.l	@r15+,r8
+
+LOCAL(return_0_late):
+	div0s	DBLRH,DBL1H
+	mov.l	@r15+,r12
+	mov	#0,DBLRH
+	rts
+	rotcr	DBLRH
+
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#21,r9
+	xtrct	r0,r8
+	add	#-16,r9
+0:	tst	r12,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#-8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(d1):	.word 1
+LOCAL(d12):	.word 12
+LOCAL(d13):	.word 13
+LOCAL(d120):	.word 120
+
+	.balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+        .byte   120, 105,  91,  78,  66,  54,  43,  33
+        .byte    24,  15,   8,   0,  -5, -12, -17, -22
+        .byte   -27, -31, -34, -37, -40, -42, -44, -45
+        .byte   -46, -46, -47, -46, -46, -45, -44, -42
+        .byte   -41, -39, -36, -34, -31, -28, -24, -20
+        .byte   -17, -12,  -8,  -4,   0,   5,  10,  16
+        .byte    21,  27,  33,  39,  45,  52,  58,  65
+        .byte    72,  79,  86,  93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssisf.S	(revision 0)
@@ -0,0 +1,89 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsisf))
+	.global GLOBAL(floatunsisf)
+	.balign	4
+GLOBAL(floatunsisf):
+	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r4,r1
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r4,r1
+	mov	#24,r2
+	bt	0f
+	mov	r4,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r1
+	mov	r4,r0
+	mov.l	LOCAL(x4a800000),r3	! bias + 23 - implicit 1
+	tst	r4,r4
+	bt	LOCAL(ret0)
+	!
+	sub	r1,r2
+	mov.l	LOCAL(x80000000),r1
+	shld	r2,r0
+	cmp/pz	r2
+	add	r3,r0
+	bt	LOCAL(noround)
+	add	#31,r2
+	shld	r2,r4
+	rotl	r4
+	add	#-31,r2
+	cmp/hi	r1,r4
+	mov	#0,r3
+	addc	r3,r0
+LOCAL(noround):
+	mov	#23,r1
+	shld	r1,r2
+	rts
+	sub	r2,r0
+LOCAL(ret0):
+	rts
+	nop
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsisf))
Index: gcc/config/sh/IEEE-754/m3/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatunssidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatunssidf.S	(revision 0)
@@ -0,0 +1,91 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! floatunssidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatunsidf))
+	.global GLOBAL(floatunsidf)
+	.balign	4
+GLOBAL(floatunsidf):
+	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r4,r1
+	mov.w	LOCAL(0xff00),r3
+	cmp/eq	r4,r1
+	mov	#21,r2
+	bt	0f
+	mov	r4,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r5
+	mov	r4,DBLRL
+	mov.l	LOCAL(x41200000),r3	! bias + 20 - implicit 1
+	tst	r4,r4
+	mov	r4,DBLRH
+	bt	LOCAL(ret0)
+	sub	r5,r2
+	mov	r2,r5
+	shld	r2,DBLRH
+	cmp/pz	r2
+	add	r3,DBLRH
+	add	#32,r2
+	shld	r2,DBLRL
+	bf	0f
+	mov.w	LOCAL(d0),DBLRL
+0:	mov	#20,r2
+	shld	r2,r5
+	rts
+	sub	r5,DBLRH
+LOCAL(ret0):
+	mov	r4,DBLRL
+	rts
+	mov	r4,DBLRH
+
+LOCAL(0xff00):	.word  0xff00
+	.balign	4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0):	  .word 0
+		  .word 0x4120
+#else
+		  .word 0x4120
+LOCAL(d0):	  .word 0
+#endif
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatunsidf))
Index: gcc/config/sh/IEEE-754/m3/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixunsdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixunsdfsi.S	(revision 0)
@@ -0,0 +1,77 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!! fixunsdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixunsdfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get INT_MAX, for set sign bit, you get INT_MIN.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixunsdfsi)
+	FUNC(GLOBAL(fixunsdfsi))
+	.balign	4
+GLOBAL(fixunsdfsi):
+	mov.w	LOCAL(x413),r1	! bias + 20
+	mov	DBL0H,r0
+	shll	DBL0H
+	mov.l	LOCAL(mask),r3
+	mov	#-21,r2
+	shld	r2,DBL0H	! SH4-200 will start this insn in a new cycle
+	bt/s	LOCAL(ret0)
+	sub	r1,DBL0H
+	cmp/pl	DBL0H		! SH4-200 will start this insn in a new cycle
+	and	r3,r0
+	bf/s	LOCAL(ignore_low)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#11,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmax)
+	shld	DBL0H,DBL0L
+	rts
+	or	DBL0L,r0
+
+	.balign	8
+LOCAL(ignore_low):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	add	#1,r0
+	bf	0f
+LOCAL(ret0): mov #0,r0		! results in 0 return
+0:	rts
+	shld	DBL0H,r0
+
+LOCAL(retmax):
+	rts
+	mov	#-1,r0
+
+LOCAL(x413): .word 0x413
+
+	.balign 4
+LOCAL(mask): .long 0x000fffff
+	ENDFUNC(GLOBAL(fixunsdfsi))
+#endif /* L_fixunsdfsi */
Index: gcc/config/sh/IEEE-754/m3/divdf3-rt.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/divdf3-rt.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/divdf3-rt.S	(revision 0)
@@ -0,0 +1,514 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! divdf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke joern.rennecke@st.com
+
+/* This version is not quite finshed, since I've found that I can
+   get better average performance with a slightly altered algorithm.
+   Still, if you want a version for hard real time, this version here might
+   be a good starting point, since it has effectively no conditional
+   branches in the path that deals with normal numbers
+   (branches with zero offset are effectively conditional execution),
+   and thus it has a uniform execution time in this path.  */
+
+/* y = 1/x  ; x (- [1,2)
+   y0 = 1.5 - x/2 - tab[(1-x)*64] = y + d ; abs(d)/y <= 0x1.0c/256
+
+   y1 = y0 - ((y0) * x - 1) * y0  =  y-x*d^2
+   y2 = y1 - ((y1) * x - 1) * y1 =~= y-x^3*d^4
+
+   z0 = y2*a ;  a1 = a - z0*x /# 32 * 64 -> 64 bit #/
+   z1 = y2*a1 (round to nearest odd 0.5 ulp);
+   a2 = a1 - z1*x /# 32 * 64 -> 64 bit #/
+
+   z = a/x = z0 + z1 - 0.5 ulp + (a2 > 0) * ulp
+
+   Unless stated otherwise, multiplies can be done in 32 * 32 bit or less
+   with suitable scaling and/or top truncation.
+   x truncated to 20 bits is sufficient to calculate y0 or even y1.
+   Table entries are adjusted by about +128 to use full signed byte range.
+   This adjustment has been perturbed slightly to allow cse with the
+   shift count constant -26.
+   The threshold point for the shift adjust before rounding is found by
+   comparing the fractions, which is exact, unlike the top bit of y2.
+   Therefore, the top bit of y2 becomes slightly random after the adjustment
+   shift, but that's OK because this can happen only at the boundaries of
+   the interval, and the baising of the error means that it can in fact happen
+   only at the bottom end.  And there, the carry propagation will make sure
+   that in the end we will have in effect an implicit 1 (or two whem rounding
+   up...)  */
+/* If an exact result exists, it can have no more bits than the divident.
+   Hence, we don't need to bother with the round-to-even tie breaker
+   unless the result is denormalized.  */
+/* 70 cycles through main path for sh4-300 .  Some cycles might be
+   saved by more careful register allocation.
+   122 cycles for sh4-200.  If execution time for sh4-200 is of concern,
+   a specially scheduled version makes sense.  */
+
+#define x_h r12
+#define yn  r3
+
+FUNC(GLOBAL(divdf3))
+ .global GLOBAL(divdf3)
+
+/* Adjust arg0 now, too.  We still have to come back to denorm_arg1_done,
+   since we heven't done any of the work yet that we do till the denorm_arg0
+   entry point.  We know that neither of the arguments is inf/nan, but
+   arg0 might be zero.  Check for that first to avoid having to establish an
+   rts return address.  */
+LOCAL(both_denorm):
+	mov.l	r9,@-r15
+	mov	DBL0H,r1
+	mov.l	r0,@-r15
+	shll2	r1
+	mov.w LOCAL(both_denorm_cleanup_off),r9
+	or	DBL0L,r1
+	tst	r1,r1
+	mov	DBL0H,r0
+	bf/s	LOCAL(zero_denorm_arg0_1)
+	shll2	r0
+	mov.l	@(4,r15),r9
+	add	#8,r15
+	bra	LOCAL(ret_inf_nan_0)
+	mov	r1,DBLRH
+
+LOCAL(both_denorm_cleanup):
+	mov.l	@r15+,r0
+	!
+	mov.l	@r15+,r9
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+	bra	LOCAL(denorm_arg1_done)
+	!
+	add	r0,DBL0H
+
+/* Denorm handling leaves the incoming denorm argument with an exponent of +1
+   (implicit 1).  To leave the result exponent unaltered, the other
+   argument's exponent is adjusted by the the shift count.  */
+
+	.balign 4
+LOCAL(arg0_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL0L,r0
+	shll	DBL0H
+	add	#1,r0
+	mov	DBL0L,DBL0H
+	shld	r0,DBL0H
+	rotcr	DBL0H
+	tst	DBL0L,DBL0L	/* Check for divide of zero.  */
+	add	#-33,r0
+	shld	r0,DBL0L
+	bf/s	LOCAL(adjust_arg1_exp)
+	add	#64,r0
+LOCAL(return_0): /* Return 0 with appropriate sign.  */
+	mov.l	@r15+,r10
+	mov	#0,DBLRH
+	mov.l	@r15+,r9
+	bra	LOCAL(ret_inf_nan_0)
+	mov.l	@r15+,r8
+
+	.balign 4
+LOCAL(arg1_tiny):
+	bsr	LOCAL(clz)
+	mov	DBL1L,r0
+	shll	DBL1H
+	add	#1,r0
+	mov	DBL1L,DBL1H
+	shld	r0,DBL1H
+	rotcr	DBL1H
+	tst	DBL1L,DBL1L	/* Check for divide by zero.  */
+	add	#-33,r0
+	shld	r0,DBL1L
+	bf/s	LOCAL(adjust_arg0_exp)
+	add	#64,r0
+	mov	DBL0H,r0
+	add	r0,r0
+	tst	r0,r0	! 0 / 0 ?
+	mov	#-1,DBLRH
+	bf	LOCAL(return_inf)
+	!
+	bt	LOCAL(ret_inf_nan_0)
+	!
+
+	.balign 4
+LOCAL(zero_denorm_arg1):
+	not	DBL0H,r3
+	mov	DBL1H,r0
+	tst	r2,r3
+	shll2	r0
+	bt	LOCAL(early_inf_nan_arg0)
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg1_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	!
+	shll	DBL1H
+	mov	DBL1L,r3
+	shld	r0,DBL1H
+	shld	r0,DBL1L
+	rotcr	DBL1H
+	add	#-32,r0
+	shld	r0,r3
+	add	#32,r0
+	or	r3,DBL1H
+LOCAL(adjust_arg0_exp):
+	tst	r2,DBL0H
+	mov	#20,r3
+	shld	r3,r0
+	bt	LOCAL(both_denorm)
+	add	DBL0H,r0
+	div0s	r0,DBL0H	! Check for obvious overflow.  */
+	not	r0,r3		! Check for more subtle overflow - lest
+	bt	LOCAL(return_inf)
+	mov	r0,DBL0H
+	tst	r2,r3		! we mistake it for NaN later
+	mov	#12,r3
+	bf	LOCAL(denorm_arg1_done)
+LOCAL(return_inf): /* Return infinity with appropriate sign.  */
+	mov	#20,r3
+	mov	#-2,DBLRH
+	bra	LOCAL(ret_inf_nan_0)
+	shad	r3,DBLRH
+
+/* inf/n -> inf; inf/0 -> inf; inf/inf -> nan; inf/nan->nan  nan/x -> nan */
+LOCAL(inf_nan_arg0):
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+LOCAL(early_inf_nan_arg0):
+	not	DBL1H,r3
+	mov	DBL0H,DBLRH
+	tst	r2,r3	! both inf/nan?
+	add	DBLRH,DBLRH
+	bf	LOCAL(ret_inf_nan_0)
+	mov	#-1,DBLRH
+LOCAL(ret_inf_nan_0):
+	mov	#0,DBLRL
+	mov.l	@r15+,r12
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+/* Already handled: inf/x, nan/x .  Thus: x/inf -> 0; x/nan -> nan */
+	.balign	4
+LOCAL(inf_nan_arg1):
+	mov	DBL1H,r2
+	mov	#12,r1
+	shld	r1,r2
+	mov.l	@r15+,r10
+	mov	#0,DBLRL
+	mov.l	@r15+,r9
+	or	DBL1L,r2
+	mov.l	@r15+,r8
+	cmp/hi	DBLRL,r2
+	mov.l	@r15+,r12
+	subc	DBLRH,DBLRH
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	
+	.balign 4
+LOCAL(zero_denorm_arg0):
+	mov.w	LOCAL(denorm_arg0_done_off),r9
+	not	DBL1H,r1
+	mov	DBL0H,r0
+	tst	r2,r1
+	shll2	r0
+	bt	LOCAL(inf_nan_arg1)
+LOCAL(zero_denorm_arg0_1):
+	tst	r0,r0
+	mov.w	LOCAL(xff00),r12
+	bt/s	LOCAL(arg0_tiny)
+	sts.l	pr,@-r15
+	bsr	LOCAL(clz)
+	shlr2	r0
+	shll	DBL0H
+	mov	DBL0L,r12
+	shld	r0,DBL0H
+	shld	r0,DBL0L
+	rotcr	DBL0H
+	add	#-32,r0
+	shld	r0,r12
+	add	#32,r0
+	or	r12,DBL0H
+LOCAL(adjust_arg1_exp):
+	mov	#20,r12
+	shld	r12,r0
+	add	DBL1H,r0
+	div0s	r0,DBL1H	! Check for obvious underflow.  */
+	not	r0,r12		! Check for more subtle underflow - lest
+	bt	LOCAL(return_0)
+	mov	r0,DBL1H
+	tst	r2,r12		! we mistake it for NaN later
+	bt	LOCAL(return_0)
+	!
+	braf	r9
+	mov	#13,r0
+LOCAL(zero_denorm_arg1_dispatch):
+
+LOCAL(xff00):	.word 0xff00
+LOCAL(denorm_arg0_done_off):
+	.word LOCAL(denorm_arg0_done)-LOCAL(zero_denorm_arg1_dispatch)
+LOCAL(both_denorm_cleanup_off):
+	.word LOCAL(both_denorm_cleanup)-LOCAL(zero_denorm_arg1_dispatch)
+
+ .balign	8
+GLOBAL(divdf3):
+ mov.l	LOCAL(x7ff00000),r2
+ mov	#12,r3
+ mov.l	LOCAL(xfffe2006),r1	! yn := (-1. << 17) + (0x80 << 6) ; shift #-26
+ tst	r2,DBL1H
+ mov.l	r12,@-r15
+ bt	LOCAL(zero_denorm_arg1)
+
+LOCAL(denorm_arg1_done):
+ mov	DBL1H,x_h	! x_h live in r12
+ shld	r3,x_h	! x - 1 ; u0.20
+ mov	x_h,yn
+ mova	LOCAL(ytab),r0
+ mov.l	r8,@-r15
+ shld	r1,yn	! x-1 ; u26.6
+ mov.b	@(r0,yn),yn
+ mov	#6,r0
+ mov.l	r9,@-r15
+ mov	x_h,r8
+ mov.l	r10,@-r15
+ shlr16	x_h	! x - 1; u16.16	! x/2 - 0.5 ; u15.17
+ add	x_h,r1	! SH4-200 single-issues this insn
+ shld	r0,yn
+ sub	r1,yn	! yn := y0 ; u15.17
+ mov	DBL1L,r1
+ mov	#-20,r10
+ mul.l	yn,x_h	! r12 dead
+ swap.w	yn,r9
+ shld	r10,r1
+ sts	macl,r0	! y0 * (x-1) - n ; u-1.32
+ add	r9,r0	! y0 * x - 1     ; s-1.32
+ tst	r2,DBL0H
+ dmuls.l r0,yn
+ mov.w	LOCAL(d13),r0
+ or	r1,r8	! x  - 1; u0.32
+ add	yn,yn	! yn = y0 ; u14.18
+ bt	LOCAL(zero_denorm_arg0)
+
+LOCAL(denorm_arg0_done):	! This label must stay aligned.
+ sts	mach,r1	!      d0 ; s14.18
+ sub	r1,yn	! yn = y1 ; u14.18 ; <= 0x3fffc
+ mov	DBL0L,r12
+ shld	r0,yn	! yn = y1 ; u1.31 ; <= 0x7fff8000
+ mov.w	LOCAL(d12),r9
+ dmulu.l yn,r8
+ shld	r10,r12
+ mov	yn,r0
+ mov	DBL0H,r8
+ add	yn,yn	! yn = y1 ; u0.32 ; <= 0xffff0000
+ sts	mach,r1	! y1 * (x-1); u1.31
+ add	r0,r1	! y1 * x    ; u1.31
+ dmulu.l yn,r1
+ not	DBL0H,r10
+ shld	r9,r8
+ tst	r2,r10
+ or	r8,r12	! a - 1; u0.32
+ bt	LOCAL(inf_nan_arg0)
+ sts	mach,r1	! d1+yn; u1.31
+ sett		! adjust y2 so that it can be interpreted as s1.31
+ not	DBL1H,r10
+ subc	r1,yn	! yn := y2 ; u1.31 ; can be 0x7fffffff
+ mov.l	LOCAL(x001fffff),r9
+ dmulu.l yn,r12
+ tst	r2,r10
+ or	DBL1H,r2
+ bt	LOCAL(inf_nan_arg1)
+ mov.l	r11,@-r15
+ sts	mach,r11	! y2*(a-1) ; u1.31
+ add	yn,r11		! z0       ; u1.31
+ dmulu.l r11,DBL1L
+ mov.l	LOCAL(x40000000),DBLRH	! bias + 1
+ and	r9,r2		! x ; u12.20
+ cmp/hi	DBL0L,DBL1L
+ sts	macl,r8
+ mov	#-24,r12
+ sts	mach,r9 	! r9:r8 := z0 * DBL1L; u-19.64
+ subc	DBL1H,DBLRH
+ mul.l	r11,r2  	! (r9+macl):r8 == z0*x; u-19.64
+ shll	r8
+ add	DBL0H,DBLRH	! result sign/exponent + 1
+ mov	r8,r10
+ sts	macl,DBLRL
+ add	DBLRL,r9
+ rotcl	r9		! r9:r8 := z*x; u-20.63
+ shld	r12,r10
+ mov.l	LOCAL(x7fe00000),DBLRL
+ sub	DBL0L,r9	! r9:r8 := -a ; u-20.63
+ mov.l	LOCAL(x00200000),r12
+FIXME: the following  shift might loose the sign.
+ shll8	r9
+ or	r10,r9	! -a1 ; s-28.32
+ mov.l	LOCAL(x00100000),r10
+ dmuls.l r9,yn	! r3 dead
+ mov	DBL1H,r3
+ mov.l LOCAL(xfff00000),DBL0L
+ xor	DBL0H,r3	! calculate expected sign & bit20
+ div0s	r3,DBLRH
+ xor	DBLRH,r3
+ bt	LOCAL(ret_denorm_inf)
+ tst	DBLRL,DBLRH
+ bt	LOCAL(ret_denorm)
+ sub	r12,DBLRH ! calculate sign / exponent minus implicit 1
+ tst	r10,r3	! set T if a >= x
+ sts	mach,r12! -z1 ; s-27.32
+ bt	0f
+ add	r11,r11	! z0 ; u1.31 / u0.31
+0: mov	#6,r3
+ negc	r3,r10 ! shift count := a >= x ? -7 : -6; T := 1
+ shll8	r8	! r9:r8 := -a1 ; s-28.64
+ shad	r10,r12	! -z1 ; truncate to s-20.32 / s-21.32
+ rotcl	r12	! -z1 ; s-21.32 / s-22.32 / round to odd 0.5 ulp ; T := sign
+ add	#20,r10
+ dmulu.l r12,DBL1L ! r12 signed, DBL1L unsigned
+ and	DBL0L,DBLRH	! isolate sign / exponent
+ shld	r10,r9
+ mov	r8,r3
+ shld	r10,r8
+ sts	macl,DBL0L
+ sts	mach,DBLRL
+ add	#-32,r10
+ shld	r10,r3
+ mul.l r12,r2
+ bf	0f	! adjustment for signed/unsigned multiply
+ sub	DBL1L,DBLRL	! DBL1L dead
+0: shar	r12	! -z1 ; truncate to s-20.32 / s-21.32
+ sts	macl,DBL1L
+ or	r3,r9	! r9:r8 := -a1 ;             s-41.64/s-42.64
+ !
+ cmp/hi	r8,DBL0L
+ add	DBLRL,DBL1L ! DBL1L:DBL0L := -z1*x ; s-41.64/s-42.64
+ subc	DBL1L,r9
+ not	r12,DBLRL ! z1, truncated to s-20.32 / s-21.32
+ shll	r9	! T :=  a2 > 0
+ mov	r11,r2
+ mov	#21,r7
+ shld	r7,r11
+ addc	r11,DBLRL
+ mov.l	@r15+,r11
+ mov.l	@r15+,r10
+ mov	#-11,r7
+ mov.l	@r15+,r9
+ shld	r7,r2
+ mov.l	@r15+,r8
+ addc	r2,DBLRH
+ rts
+ mov.l	@r15+,r12
+
+LOCAL(ret_denorm):
+	tst	r10,DBLRH
+	bra	LOCAL(denorm_have_count)
+	movt	DBLRH	! calculate shift count (off by 2)
+
+LOCAL(ret_denorm_inf):
+	mov	DBLRH,r12
+	add	r12,r12
+	cmp/pz	r12
+	mov	#-21,DBLRL
+	bt	LOCAL(ret_inf_late)
+	shld	DBLRL,DBLRH
+LOCAL(denorm_have_count):
+	add	#-2,DBLRH
+/* FIXME */
+	bra	LOCAL(return_0)
+	mov.l	@r15+,r11
+
+LOCAL(ret_inf_late):
+	mov.l	@r15+,r11
+	!
+	mov.l	@r15+,r10
+	!
+	mov.l	@r15+,r9
+	bra	LOCAL(return_inf)
+	mov.l	@r15+,r8
+
+	.balign	4
+LOCAL(clz):
+	mov.l	r8,@-r15
+	extu.w	r0,r8
+	mov.l	r9,@-r15
+	cmp/eq	r0,r8
+	bt/s	0f
+	mov	#8-11,r9
+	xtrct	r0,r8
+	add	#16,r9
+0:	tst	r12,r8	! 0xff00
+	mov.l	LOCAL(c_clz_tab),r0
+	bt	0f
+	shlr8	r8
+0:	bt	0f
+	add	#8,r9
+0:
+#ifdef	__PIC__
+	add	r0,r8
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r8),r8
+	mov	r9,r0
+	mov.l	@r15+,r9
+	!
+	!
+	!
+	sub	r8,r0
+	mov.l	@r15+,r8
+	rts
+	lds.l	@r15+,pr
+
+!	We encode even some words as pc-relative that would fit as immediate
+!	in the instruction in order to avoid some pipeline stalls on
+!	SH4-100 / SH4-200.
+LOCAL(d1):	.word 1
+LOCAL(d12):	.word 12
+LOCAL(d13):	.word 13
+
+	.balign 4
+LOCAL(x7ff00000): .long 0x7ff00000
+LOCAL(xfffe2006): .long 0xfffe2006
+LOCAL(x001fffff): .long 0x001fffff
+LOCAL(x40000000): .long 0x40000000
+LOCAL(x7fe00000): .long 0x7fe00000
+LOCAL(x00100000): .long 0x00100000
+LOCAL(x00200000): .long 0x00200000
+LOCAL(xfff00000): .long 0xfff00000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+LOCAL(ytab):
+        .byte   120, 105,  91,  78,  66,  54,  43,  33
+        .byte    24,  15,   8,   0,  -5, -12, -17, -22
+        .byte   -27, -31, -34, -37, -40, -42, -44, -45
+        .byte   -46, -46, -47, -46, -46, -45, -44, -42
+        .byte   -41, -39, -36, -34, -31, -28, -24, -20
+        .byte   -17, -12,  -8,  -4,   0,   5,  10,  16
+        .byte    21,  27,  33,  39,  45,  52,  58,  65
+        .byte    72,  79,  86,  93, 101, 109, 116, 124
+ENDFUNC(GLOBAL(divdf3))
Index: gcc/config/sh/IEEE-754/m3/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/addsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/addsf3.S	(revision 0)
@@ -0,0 +1,285 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! addsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+#ifdef L_add_sub_sf3
+	.balign 4
+	.global GLOBAL(subsf3)
+	FUNC(GLOBAL(subsf3))
+	.global GLOBAL(addsf3)
+	FUNC(GLOBAL(addsf3))
+GLOBAL(subsf3):
+	cmp/pz	r5
+	add	r5,r5
+	rotcr	r5
+	.balign 4
+GLOBAL(addsf3):
+	mov.l	LOCAL(x7f800000),r3
+	mov	r4,r6
+	add	r6,r6
+	mov	r5,r7
+	add	r7,r7
+	mov	r4,r0
+	or	r3,r0
+	cmp/hi	r6,r7
+	mov	r5,r1
+	bf/s	LOCAL(r4_hs)
+	 or	r3,r1
+	cmp/eq	r5,r1
+	bt	LOCAL(ret_r5) /* sole Inf or NaN, return unchanged.  */
+	shll8	r0	! r4 fraction
+	shll8	r1	! r5 fraction
+	mov	r6,r3
+	mov	#-24,r2
+	mov	r7,r6
+	shld	r2,r6	! r5 exp
+	mov	r0,r7
+	shld	r2,r3	! r4 exp
+	tst	r6,r6
+	sub	r6,r3	! exp difference (negative or 0)
+	bt	LOCAL(denorm_r4)
+LOCAL(denorm_r4_done): ! r1: u1.31
+	shld	r3,r0	! Get 31 upper bits, including 8 guard bits
+	mov.l	LOCAL(xff000000),r2
+	add	#31,r3
+	mov.l	r5,@-r15 ! push result sign.
+	cmp/pl	r3	! r0 has no more than one bit set -> return arg 1
+	shld	r3,r7	! copy of lowest guard bit in r0 and lower guard bits
+	bf	LOCAL(ret_stack)
+	div0s	r4,r5
+	bf/s	LOCAL(add)
+	 cmp/pl	r7	/* Is LSB in r0 clear, but any lower guards bit set?  */
+	subc	r0,r1
+	mov.l	LOCAL(c__clz_tab),r7
+	tst	r2,r1
+	mov	#-24,r3
+	bf/s LOCAL(norm_r0)
+	 mov	r1,r0
+	extu.w	r1,r1
+	bra	LOCAL(norm_check2)
+	 cmp/eq	r0,r1
+LOCAL(ret_r5):
+	rts
+	mov	r5,r0
+LOCAL(ret_stack):
+	rts
+	mov.l	@r15+,r0
+
+/* We leave the numbers denormalized, but we change the bit position to be
+   consistent with normalized numbers.  This also removes the spurious
+   leading one that was inserted before.  */
+LOCAL(denorm_r4):
+	tst	r3,r3
+	bf/s	LOCAL(denorm_r4_done)
+	add	r0,r0
+	bra	LOCAL(denorm_r4_done)
+	add	r1,r1
+LOCAL(denorm_r5):
+	tst	r6,r6
+	add	r1,r1
+	bf	LOCAL(denorm_r5_done)
+	clrt
+	bra	LOCAL(denorm_r5_done)
+	add	r0,r0
+
+/* If the exponent differs by two or more, normalization is minimal, and
+   few guard bits are needed for an exact final result, so sticky guard
+   bit compresion before subtraction (or add) works fine.
+   If the exponent differs by one, only one extra guard bit is generated,
+   and effectively no guard bit compression takes place.  */
+
+	.balign	4
+LOCAL(r4_hs):
+	cmp/eq	r4,r0
+	mov	#-24,r3
+	bt	LOCAL(inf_nan_arg0)
+	shld	r3,r7
+	shll8	r0
+	tst	r7,r7
+	shll8	r1
+	mov.l	LOCAL(xff000000),r2
+	bt/s	LOCAL(denorm_r5)
+	shld	r3,r6
+LOCAL(denorm_r5_done):
+	mov	r1,r3
+	subc	r6,r7
+	bf	LOCAL(same_exp)
+	shld	r7,r1	/* Get 31 upper bits.  */
+	add	#31,r7
+	mov.l	r4,@-r15 ! push result sign.
+	cmp/pl	r7
+	shld	r7,r3
+	bf	LOCAL(ret_stack)
+	div0s	r4,r5
+	bf/s	LOCAL(add)
+	 cmp/pl	r3	/* Is LSB in r1 clear, but any lower guard bit set?  */
+	subc	r1,r0
+	mov.l	LOCAL(c__clz_tab),r7
+LOCAL(norm_check):
+	tst	r2,r0
+	mov	#-24,r3
+	bf LOCAL(norm_r0)
+	extu.w	r0,r1
+	cmp/eq	r0,r1
+LOCAL(norm_check2):
+	mov	#-8,r3
+	bt LOCAL(norm_r0)
+	mov	#-16,r3
+LOCAL(norm_r0):
+	mov	r0,r1
+	shld	r3,r0
+#ifdef __pic__
+	add	r0,r7
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r7),r7
+	add	#25,r3
+	add	#-9+1,r6
+	mov	r1,r0
+	sub	r7,r3
+	mov.l	LOCAL(xbfffffff),r7
+	sub	r3,r6	/* generate exp-1  */
+	mov.w	LOCAL(d24),r2
+	cmp/pz	r6	/* check exp > 0  */
+	shld	r3,r0	/* Leading 1 becomes +1 exp adjustment.  */
+	bf	LOCAL(zero_denorm)
+LOCAL(denorm_done):
+	add	#30,r3
+	shld	r3,r1
+	mov.w   LOCAL(m1),r3
+	tst	r7,r1	! clear T if rounding up
+	shld	r2,r6
+	subc	r3,r0	! round - overflow will boost exp adjustment to 2.
+	mov.l	@r15+,r2
+	add	r6,r0	! overflow will generate inf
+	cmp/ge	r2,r3	! get sign into T
+	rts
+	rotcr	r0
+LOCAL(ret_r4):
+	rts
+	mov	r4,r0
+
+/* At worst, we are shifting the number back in place where an incoming
+   denormal was.  Thus, the shifts won't get out of range.  They still
+   might generate a zero fraction, but that's OK, that makes it 0.  */
+LOCAL(zero_denorm):
+	add	r6,r3
+	mov	r1,r0
+	mov	#0,r6	/* leading one will become free (except for rounding) */
+	bra	LOCAL(denorm_done)
+	shld	r3,r0
+
+/* Handle abs(r4) >= abs(r5), same exponents specially so we don't need
+   check for a zero fraction in the main path.  */
+LOCAL(same_exp):
+	div0s	r4,r5
+	mov.l	r4,@-r15
+	bf	LOCAL(add)
+	cmp/eq	r1,r0
+	mov.l	LOCAL(c__clz_tab),r7
+	bf/s	LOCAL(norm_check)
+	 sub	r1,r0
+	rts	! zero difference -> return +zero
+	mov.l	@r15+,r1
+
+/* r2: 0xff000000 */
+LOCAL(add):
+	addc	r1,r0
+	mov.w	LOCAL(x2ff),r7
+	shll8	r6
+	bf/s	LOCAL(no_carry)
+	shll16	r6
+	tst	r7,r0
+	shlr8	r0
+	mov.l	@r15+,r3	! discard saved sign
+	subc	r2,r0
+	sett
+	addc	r6,r0
+	cmp/hs	r2,r0
+	bt/s	LOCAL(inf)
+	 div0s	r7,r4 /* Copy sign.  */
+	rts
+	rotcr	r0
+LOCAL(inf):
+	mov	r6,r0
+	rts
+	rotcr	r0
+LOCAL(no_carry):
+	mov.w	LOCAL(m1),r3
+	tst	r6,r6
+	bt	LOCAL(denorm_add)
+	add	r0,r0
+	tst	r7,r0		! check if lower guard bit set or round to even
+	shlr8	r0
+	mov.l	@r15+,r1	! discard saved sign
+	subc	r3,r0	! round ; overflow -> exp++
+	cmp/ge	r4,r3	/* Copy sign.  */
+	add	r6,r0	! overflow -> inf
+	rts
+	rotcr	r0
+
+LOCAL(denorm_add):
+	cmp/ge	r4,r3	/* Copy sign.  */
+	shlr8	r0
+	mov.l	@r15+,r1	! discard saved sign
+	rts
+	rotcr	r0
+
+LOCAL(inf_nan_arg0):
+	cmp/eq	r5,r1
+	bf	LOCAL(ret_r4)
+	div0s	r4,r5		/* Both are inf or NaN, check signs.  */
+	bt	LOCAL(ret_nan)	/* inf - inf, or NaN.  */
+	mov	r4,r0		! same sign; return NaN if either is NaN.
+	rts
+	or	r5,r0
+LOCAL(ret_nan):
+	rts
+	mov	#-1,r0
+
+LOCAL(d24):
+	.word	24
+LOCAL(x2ff):
+	.word	0x2ff
+LOCAL(m1):
+	.word	-1
+	.balign	4
+LOCAL(x7f800000):
+	.long	0x7f800000
+LOCAL(xbfffffff):
+	.long	0xbfffffff
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(xfe000000):
+	.long	0xfe000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+
+	ENDFUNC(GLOBAL(addsf3))
+	ENDFUNC(GLOBAL(subsf3))
+#endif /* L_add_sub_sf3 */
Index: gcc/config/sh/IEEE-754/m3/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/adddf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/adddf3.S	(revision 0)
@@ -0,0 +1,582 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! adddf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4-200 without FPU, but can also be used for SH3.
+! Numbers with same sign are added in typically 37 cycles, worst case is
+! 43 cycles, unless there is an overflow, in which case the addition can
+! take up to takes 47 cycles.
+! Normal numbers with different sign are added in 56 (57 for PIC) cycles
+! or less on SH4.
+! If one of the inputs is a denormal, the worst case is 59 (60 for PIC)
+! cycles. (Two denormal inputs are faster than normal inputs, and
+! denormal outputs don't slow down computation).
+! Subtraction takes two cycles to negate the second input and then drops
+! through to addition.
+
+/* If the input exponents of a difference of two normalized numbers
+   differ by more than one, the output does not need to be adjusted
+   by more than one bit position.  Hence, it makes sense to ensure that
+   the shifts by 0 & 1 are handled quickly to reduce average and worst
+   case times.  */
+FUNC(GLOBAL(adddf3))
+FUNC(GLOBAL(subdf3))
+	.global	GLOBAL(adddf3)
+	.global	GLOBAL(subdf3)
+LOCAL(denorm_arg1):
+	bt	LOCAL(inf_nan_arg0)
+	tst	r0,r2
+	bt/s	LOCAL(denorm_both)
+	shlr	r1
+	mov.l	LOCAL(x00100000),r3
+	bra	LOCAL(denorm_arg1_done)
+	 sub	r2,r3
+
+! Handle denorm addition here because otherwise the ordinary addition would
+! have to check for denormal results.
+! Denormal subtraction could also be done faster, but the denorm subtraction
+! path here is still one cycles faster than the one for normalized input
+! numbers, and 16 instructions shorter than the fastest version.
+! Here we also generate +0.0 + +0.0 -> +0.0 ; -0.0 + -0.0 -> -0.0
+LOCAL(denorm_both):
+	div0s	DBL0H,DBL1H
+	mov.l	LOCAL(x800fffff),r9
+	bt/s	LOCAL(denorm_sub)
+	and	r1,DBL1H
+	and	r9,DBL0H
+	mov.l	@r15+,r9
+	mov	DBL0L,DBLRL
+	mov	DBL0H,DBLRH
+	addc	DBL1L,DBLRL
+	mov.l	@r15+,r8
+	rts
+	 addc	DBL1H,DBLRH
+
+! N.B., since subtraction also generates +0.0 for subtraction of numbers
+! with identical fractions, this also covers the +0.0 + -0.0 -> +0.0 /
+! -0.0 + +0.0 -> +0.0 cases.
+LOCAL(denorm_sub):
+	mov	DBL0H,r8	! tentative result sign
+	and	r1,DBL0H
+	bra	LOCAL(sub_same_exp)
+	 addc	r1,r2	! exponent++, clear T
+
+LOCAL(inf_nan_arg0):
+	mov	DBL0L,DBLRL
+	bra	LOCAL(pop_r8_r9)
+	 mov	DBL0H,DBLRH
+
+LOCAL(ret_arg0):
+	mov.l	LOCAL(x800fffff),DBLRH
+	mov	DBL0L,DBLRL
+	mov	r2,r3
+LOCAL(ret_arg):
+	mov.l	@r15+,r9
+	and	r8,DBLRH
+	mov.l	@r15+,r8
+	rts
+	or	r3,DBLRH
+
+	.balign	4
+GLOBAL(subdf3):
+	cmp/pz	DBL1H
+	add	DBL1H,DBL1H
+	rotcr	DBL1H
+	nop
+
+GLOBAL(adddf3):
+	mov.l	LOCAL(x7ff00000),r0
+	mov	DBL0H,r2
+	mov.l	LOCAL(x001fffff),r1
+	mov	DBL1H,r3
+	mov.l	r8,@-r15
+	and	r0,r2
+	mov.l	r9,@-r15
+	and	r0,r3
+	cmp/hi	r2,r3
+	or	r0,DBL0H
+	or	r0,DBL1H
+	bt	LOCAL(arg1_gt)
+	tst	r0,r3
+	mov	#-20,r9
+	bt/s	LOCAL(denorm_arg1)
+	cmp/hs	r0,r2
+	bt	LOCAL(inf_nan_arg0)
+	sub	r2,r3
+LOCAL(denorm_arg1_done):	! r2 is tentative result exponent
+	shad	r9,r3
+	mov.w	LOCAL(m32),r9
+	mov	DBL0H,r8	! tentative result sign
+	and	r1,DBL0H	! arg0 fraction
+	mov	DBL1H,r0	! the 'other' sign
+	and	r1,DBL1H	! arg1 fraction
+	cmp/ge	r9,r3
+	mov	DBL1H,r1
+	bf/s	LOCAL(large_shift_arg1)
+	 shld	r3,DBL1H
+LOCAL(small_shift_arg1):
+	mov	DBL1L,r9
+	shld	r3,DBL1L
+	tst	r3,r3
+	add	#32,r3
+	bt/s	LOCAL(same_exp)
+	 div0s	r8,r0	! compare signs
+	shld	r3,r1
+
+	or	r1,DBL1L
+	bf/s	LOCAL(add)
+	shld	r3,r9
+	clrt
+	negc	r9,r9
+	mov.l	LOCAL(x001f0000),r3
+LOCAL(sub_high):
+	mov	DBL0L,DBLRL
+	subc	DBL1L,DBLRL
+	mov	DBL0H,DBLRH
+	bra	LOCAL(subtract_done)
+	 subc	DBL1H,DBLRH
+
+LOCAL(large_shift_arg1):
+	mov.w	LOCAL(d0),r9
+	add	#64,r3
+	cmp/pl	r3
+	shld	r3,r1
+	bf	LOCAL(ret_arg0)
+	cmp/hi	r9,DBL1L
+	mov	DBL1H,DBL1L
+	mov	r9,DBL1H
+	addc	r1,r9
+
+	div0s	r8,r0	! compare signs
+
+	bf	LOCAL(add)
+	clrt
+	mov.l	LOCAL(x001f0000),r3
+	bra	LOCAL(sub_high)
+	 negc	r9,r9
+
+LOCAL(add_clr_r9):
+	mov	#0,r9
+LOCAL(add):
+	mov.l	LOCAL(x00200000),r3
+	addc	DBL1L,DBL0L
+	addc	DBL1H,DBL0H
+	mov.l	LOCAL(x80000000),r1
+	tst	r3,DBL0H
+	mov.l	LOCAL(x7fffffff),r3
+	mov	DBL0L,r0
+	bt/s	LOCAL(no_carry)
+	and	r1,r8
+	tst	r9,r9
+	bf	LOCAL(add_one)
+	tst	#2,r0
+LOCAL(add_one):
+	subc	r9,r9
+	sett
+	mov	r0,DBLRL
+	addc	r9,DBLRL
+	mov	DBL0H,DBLRH
+	addc	r9,DBLRH
+	shlr	DBLRH
+	mov.l	LOCAL(x7ff00000),r3
+	add	r2,DBLRH
+	mov.l	@r15+,r9
+	rotcr	DBLRL
+	cmp/hi	r3,DBLRH
+LOCAL(add_done):
+	bt	LOCAL(inf)
+LOCAL(or_sign):
+	or	r8,DBLRH
+	rts
+	 mov.l	@r15+,r8
+
+LOCAL(inf):
+	bra	LOCAL(or_sign)
+	 mov	r3,DBLRH
+
+LOCAL(pos_difference_0):
+	tst	r3,DBL0H
+	mov	DBL0L,DBLRL
+	mov.l	LOCAL(x80000000),DBL0L
+	mov	DBL0H,DBLRH
+	mov.l	LOCAL(x00100000),DBL0H
+	bt/s	LOCAL(long_norm)
+	and	DBL0L,r8
+	bra	LOCAL(norm_loop)
+	 not	DBL0L,r3
+
+LOCAL(same_exp):
+	bf	LOCAL(add_clr_r9)
+	clrt
+LOCAL(sub_same_exp):
+	subc	DBL1L,DBL0L
+	mov.l	LOCAL(x001f0000),r3
+	subc	DBL1H,DBL0H
+	mov.w	LOCAL(d0),r9
+	bf	LOCAL(pos_difference_0)
+	clrt
+	negc	DBL0L,DBLRL
+	mov.l	LOCAL(x80000000),DBL0L
+	negc	DBL0H,DBLRH
+	mov.l	LOCAL(x00100000),DBL0H
+	tst	r3,DBLRH
+	not	r8,r8
+	bt/s	LOCAL(long_norm)
+	and	DBL0L,r8
+	bra	LOCAL(norm_loop)
+	 not	DBL0L,r3
+
+LOCAL(large_shift_arg0):
+	add	#64,r2
+
+	mov	#0,r9
+	cmp/pl	r2
+	shld	r2,r1
+	bf	LOCAL(ret_arg1_exp_r3)
+	cmp/hi	r9,DBL0L
+	mov	DBL0H,DBL0L
+	mov	r9,DBL0H
+	addc	r1,r9
+	div0s	r8,r0	! compare signs
+	mov	r3,r2	! tentative result exponent
+	bf	LOCAL(add)
+	clrt
+	negc	r9,r9
+	bra	LOCAL(subtract_arg0_arg1_done)
+	 mov	DBL1L,DBLRL
+
+LOCAL(arg1_gt):
+	tst	r0,r2
+	mov	#-20,r9
+	bt/s	LOCAL(denorm_arg0)
+	cmp/hs	r0,r3
+	bt	LOCAL(inf_nan_arg1)
+	sub	r3,r2
+LOCAL(denorm_arg0_done):
+	shad	r9,r2
+	mov.w	LOCAL(m32),r9
+	mov	DBL1H,r8	! tentative result sign
+	and	r1,DBL1H
+	mov	DBL0H,r0	! the 'other' sign
+	and	r1,DBL0H
+	cmp/ge	r9,r2
+	mov	DBL0H,r1
+	shld	r2,DBL0H
+	bf	LOCAL(large_shift_arg0)
+	mov	DBL0L,r9
+	shld	r2,DBL0L
+	add	#32,r2
+	mov.l	r3,@-r15
+	shld	r2,r1
+	mov	r2,r3
+	div0s	r8,r0		! compare signs
+	mov.l	@r15+,r2	! tentative result exponent
+	shld	r3,r9
+	bf/s	LOCAL(add)
+	or	r1,DBL0L
+	clrt
+	negc	r9,r9
+	mov	DBL1L,DBLRL
+LOCAL(subtract_arg0_arg1_done):
+	subc	DBL0L,DBLRL
+	mov	DBL1H,DBLRH
+	mov.l	LOCAL(x001f0000),r3
+	subc	DBL0H,DBLRH
+/* Since the exponents were different, the difference is positive.  */
+/* Fall through */
+LOCAL(subtract_done):
+/* First check if a shift by a few bits is sufficient.  This not only
+   speeds up this case, but also alleviates the need for considering
+   lower bits from r9 or rounding in the other code.
+   Moreover, by handling the upper 1+4 bits of the fraction here, long_norm
+   can assume that DBLRH fits into 20 (20 < 16) bit.  */
+	tst	r3,DBLRH
+	mov.l	LOCAL(x80000000),r3
+	mov.l	LOCAL(x00100000),DBL0H
+	bt/s	LOCAL(long_norm)
+	and	r3,r8
+	mov.l	LOCAL(x7fffffff),r3
+LOCAL(norm_loop):	! Well, this used to be a loop...
+	tst	DBL0H,DBLRH
+	sub	DBL0H,r2
+	bf	LOCAL(norm_round)
+	shll	r9
+	rotcl	DBLRL
+
+	rotcl	DBLRH
+
+	tst	DBL0H,DBLRH
+	sub	DBL0H,r2
+	bf	LOCAL(norm_round)
+	shll	DBLRL
+	rotcl	DBLRH
+	mov.l	@r15+,r9
+	cmp/gt	r2,DBL0H
+	sub	DBL0H,r2
+LOCAL(norm_loop_1):
+	bt	LOCAL(denorm0_n)
+	tst	DBL0H,DBLRH
+	bf	LOCAL(norm_pack)
+	shll	DBLRL
+	rotcl	DBLRH	! clears T
+	bra	LOCAL(norm_loop_1)
+	 subc	DBL0H,r2
+
+LOCAL(no_carry):
+	shlr	r0
+	mov.l	LOCAL(x000fffff),DBLRH
+	addc	r3,r9
+	mov.w	LOCAL(d0),DBL1H
+	mov	DBL0L,DBLRL
+	and	DBL0H,DBLRH	! mask out implicit 1
+	mov.l	LOCAL(x7ff00000),r3
+	addc	DBL1H,DBLRL
+	addc	r2,DBLRH
+	mov.l	@r15+,r9
+	add	DBL1H,DBLRH	! fraction overflow -> exp increase
+	bra	LOCAL(add_done)
+	 cmp/hi	r3,DBLRH
+
+LOCAL(denorm_arg0):
+	bt	LOCAL(inf_nan_arg1)
+	mov.l	LOCAL(x00100000),r2
+	shlr	r1
+	bra	LOCAL(denorm_arg0_done)
+	 sub	r3,r2
+
+LOCAL(inf_nan_arg1):
+	mov	DBL1L,DBLRL
+	bra	LOCAL(pop_r8_r9)
+	 mov	DBL1H,DBLRH
+
+LOCAL(ret_arg1_exp_r3):
+	mov.l	LOCAL(x800fffff),DBLRH
+	bra	LOCAL(ret_arg)
+	 mov	DBL1L,DBLRL
+
+#ifdef __pic__
+	.balign 8
+#endif
+LOCAL(m32):
+	.word	-32
+LOCAL(d0):
+	.word	0
+#ifndef __pic__
+	.balign 8
+#endif
+! Because we had several bits of cancellations, we know that r9 contains
+! only one bit.
+! We'll normalize by shifting words so that DBLRH:DBLRL contains
+! the fraction with 0 < DBLRH <= 0x1fffff, then we shift DBLRH:DBLRL
+! up by 21 minus the number of non-zero bits in DBLRH.
+LOCAL(long_norm):
+	tst	DBLRH,DBLRH
+	mov.w	LOCAL(xff),DBL0L
+	mov	#21,r3
+	bf	LOCAL(long_norm_highset)
+	mov.l	LOCAL(x02100000),DBL1L	! shift 32, implicit 1
+	tst	DBLRL,DBLRL
+	extu.w	DBLRL,DBL0H
+	bt	LOCAL(zero_or_ulp)
+	mov	DBLRL,DBLRH
+	cmp/hi	DBL0H,DBLRL
+	bf	0f
+	mov.l	LOCAL(x01100000),DBL1L	! shift 16, implicit 1
+	clrt
+	shlr16  DBLRH
+	xtrct	DBLRL,r9
+	mov     DBLRH,DBL0H
+LOCAL(long_norm_ulp_done):
+0:	mov	r9,DBLRL	! DBLRH:DBLRL == fraction; DBL0H == DBLRH
+	subc	DBL1L,r2
+	bt	LOCAL(denorm1_b)
+#ifdef __pic__
+	mov.l	LOCAL(c__clz_tab),DBL1H
+LOCAL(long_norm_lookup):
+	mov	r0,r9
+	mova	LOCAL(c__clz_tab),r0
+	add	DBL1H,r0
+#else
+	mov	r0,r9
+LOCAL(long_norm_lookup):
+	mov.l	LOCAL(c__clz_tab),r0
+#endif /* __pic__ */
+	cmp/hi	DBL0L,DBL0H
+	bf	0f
+	shlr8	DBL0H
+0:	mov.b	@(r0,DBL0H),r0
+	bf	0f
+	add	#-8,r3
+0:	mov.w	LOCAL(d20),DBL0L
+	mov	#-20,DBL0H
+	clrt
+	sub	r0,r3
+	mov	r9,r0
+	mov	r3,DBL1H
+	shld	DBL0L,DBL1H
+	subc	DBL1H,r2
+	!
+	bf	LOCAL(no_denorm)
+	shad	DBL0H,r2
+	bra	LOCAL(denorm1_done)
+	add	r2,r3
+	
+LOCAL(norm_round):
+	cmp/pz	r2
+	mov	#0,DBL1H
+	bf	LOCAL(denorm0_1)
+	or	r8,r2
+	mov	DBLRL,DBL1L
+	shlr	DBL1L
+	addc	r3,r9
+	mov.l	@r15+,r9
+	addc	DBL1H,DBLRL	! round to even
+	mov.l	@r15+,r8
+	rts
+	 addc	r2,DBLRH
+
+LOCAL(norm_pack):
+	add	r8,DBLRH
+	mov.l	@r15+,r8
+	rts
+	add	r2,DBLRH
+
+LOCAL(denorm0_1):
+	mov.l	@r15+,r9
+	mov	r8,DBL0L
+	mov.l	@r15+,r8
+LOCAL(denorm0_shift):
+	shlr	DBLRH
+	rotcr	DBLRL
+
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(denorm0_n):
+	mov	r8,DBL0L
+	addc	DBL0H,r2
+	mov.l	@r15+,r8
+	bf	LOCAL(denorm0_shift)
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(no_denorm):
+	add	r2,r8		! add (exponent - 1) to sign
+
+LOCAL(denorm1_done):
+	shld	r3,DBLRH
+	mov	DBLRL,DBL0L
+	shld	r3,DBLRL
+
+	add	r8,DBLRH	! add in sign and (exponent - 1)
+	mov.l	@r15+,r9
+	add	#-32,r3
+	mov.l	@r15+,r8
+	shld	r3,DBL0L
+
+	rts
+	add	DBL0L,DBLRH
+
+LOCAL(long_norm_highset):
+	mov.l	LOCAL(x00200000),DBL1L	! shift 1, implicit 1
+	shll	r9
+	rotcl	DBLRL
+	mov	DBLRH,DBL0H
+	rotcl	DBLRH	! clears T
+#ifdef __pic__
+	mov.l	LOCAL(c__clz_tab),DBL1H
+#else
+	mov	r0,r9
+#endif /* __pic__ */
+	subc	DBL1L,r2
+	add	#-1,r3
+	bf	LOCAL(long_norm_lookup)
+LOCAL(denorm1_a):
+	shlr	DBLRH
+	rotcr	DBLRL
+	mov.l	@r15+,r9
+	or	r8,DBLRH
+
+	rts
+	mov.l	@r15+,r8
+
+	.balign	4
+LOCAL(denorm1_b):
+	mov	#-20,DBL0L
+	shad	DBL0L,r2
+	mov	DBLRH,DBL0L
+	shld	r2,DBLRH
+	shld	r2,DBLRL
+	or	r8,DBLRH
+	mov.l	@r15+,r9
+	add	#32,r2
+	mov.l	@r15+,r8
+	shld	r2,DBL0L
+	rts
+	or	DBL0L,DBLRL
+
+LOCAL(zero_or_ulp):
+	tst	r9,r9
+	bf	LOCAL(long_norm_ulp_done)
+	! return +0.0
+LOCAL(pop_r8_r9):
+	mov.l	@r15+,r9
+	rts
+	mov.l	@r15+,r8
+
+LOCAL(d20):
+	.word	20
+LOCAL(xff):
+	.word 0xff
+	.balign	4
+LOCAL(x7ff00000):
+	.long	0x7ff00000
+LOCAL(x001fffff):
+	.long	0x001fffff
+LOCAL(x80000000):
+	.long	0x80000000
+LOCAL(x000fffff):
+	.long	0x000fffff
+LOCAL(x800fffff):
+	.long	0x800fffff
+LOCAL(x001f0000):
+	.long	0x001f0000
+LOCAL(x00200000):
+	.long	0x00200000
+LOCAL(x7fffffff):
+	.long	0x7fffffff
+LOCAL(x00100000):
+	.long	0x00100000
+LOCAL(x02100000):
+	.long	0x02100000
+LOCAL(x01100000):
+	.long	0x01100000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(adddf3))
+ENDFUNC(GLOBAL(subdf3))
Index: gcc/config/sh/IEEE-754/m3/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/mulsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/mulsf3.S	(revision 0)
@@ -0,0 +1,241 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! mulsf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+
+	.balign 4
+	.global GLOBAL(mulsf3)
+	FUNC(GLOBAL(mulsf3))
+GLOBAL(mulsf3):
+	mov.l	LOCAL(x7f800000),r1
+	not	r4,r2
+	mov	r4,r3
+	not	r5,r0
+	tst	r1,r2
+	or	r1,r3
+	bt/s	LOCAL(inf_nan_arg0)
+	 tst	r1,r0
+	bt	LOCAL(inf_nan_arg1)
+	tst	r1,r5
+	mov	r1,r2
+	shll8	r3
+	or	r5,r1
+	bt/s	LOCAL(zero_denorm_arg1)
+	 shll8	r1
+	tst	r2,r4
+	bt	LOCAL(zero_denorm_arg0)
+	dmulu.l	r3,r1
+	mov	r4,r0
+	and	r2,r0
+LOCAL(arg_norm):
+	and	r5,r2
+	mov.l	LOCAL(x3f800000),r3
+	sts	mach,r1
+	sub	r3,r0
+	sts	macl,r3
+	add	r2,r0
+	cmp/pz	r1
+	mov.w	LOCAL(x100),r2
+	bf/s	LOCAL(norm_frac)
+	 tst	r3,r3
+	shll2	r1	/* Shift one up, replace leading 1 with 0.  */
+	shlr	r1
+	tst	r3,r3
+LOCAL(norm_frac):
+	mov.w	LOCAL(mx80),r3
+	bf	LOCAL(round_frac)
+	tst	r2,r1
+LOCAL(round_frac):
+	mov.l	LOCAL(xff000000),r2
+	subc	r3,r1	/* Even overflow gives right result: exp++, frac=0.  */
+	shlr8	r1
+	add	r1,r0
+	shll	r0
+	bt	LOCAL(ill_exp)
+	tst	r2,r0
+	bt	LOCAL(denorm0)
+	cmp/hs	r2,r0
+	bt	LOCAL(inf)
+LOCAL(insert_sign):
+	div0s	r4,r5
+	rts
+	rotcr	r0
+LOCAL(denorm0):
+	sub	r2,r0
+	bra	LOCAL(insert_sign)
+	 shlr	r0
+LOCAL(zero_denorm_arg1):
+	mov.l	LOCAL(x60000000),r2	/* Check exp0 >= -64	*/
+	add	r1,r1
+	tst	r1,r1	/* arg1 == 0 ? */
+	mov	#0,r0
+	bt	LOCAL(insert_sign) /* argument 1 is zero ==> return 0  */
+	tst	r4,r2
+	bt	LOCAL(insert_sign) /* exp0 < -64  ==> return 0 */
+	mov.l	LOCAL(c__clz_tab),r0
+	mov	r3,r2
+	mov	r1,r3
+	bra	LOCAL(arg_normalize)
+	mov	r2,r1
+LOCAL(zero_denorm_arg0):
+	mov.l	LOCAL(x60000000),r2	/* Check exp1 >= -64	*/
+	add	r3,r3
+	tst	r3,r3	/* arg0 == 0 ? */
+	mov	#0,r0
+	bt	LOCAL(insert_sign) /* argument 0 is zero ==> return 0  */
+	tst	r5,r2
+	bt	LOCAL(insert_sign) /* exp1 < -64  ==> return 0 */
+	mov.l	LOCAL(c__clz_tab),r0
+LOCAL(arg_normalize):
+	mov.l	r7,@-r15
+	extu.w	r3,r7
+	cmp/eq	r3,r7
+	mov.l	LOCAL(xff000000),r7
+	mov	#-8,r2
+	bt	0f
+	tst	r7,r3
+	mov	#-16,r2
+	bt	0f
+	mov	#-24,r2
+0:
+	mov	r3,r7
+	shld	r2,r7
+#ifdef __pic__
+	add	r0,r7
+	mova  LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r7),r0
+	add	#32,r2
+	mov	r2,r7
+	mov	#23,r2
+	sub	r0,r7
+	mov.l	LOCAL(x7f800000),r0
+	shld	r7,r3
+	shld	r2,r7
+	mov	r0,r2
+	and	r4,r0
+	sub	r7,r0
+	mov.l	@r15+,r7
+	bra	LOCAL(arg_norm)
+	 dmulu.l	r3,r1
+#if 0 /* This is slightly slower, but could be used if table lookup causes
+         cache thrashing.  */
+	bt	LOCAL(insert_sign) /* exp1 < -64  ==> return 0 */
+	mov.l	LOCAL(xff000000),r2
+	mov	r4,r0
+LOCAL(arg_normalize):
+	tst	r2,r3
+	bf	LOCAL(arg_bit_norm)
+LOCAL(arg_byte_loop):
+	tst	r2,r3
+	add	r2,r0
+	shll8	r3
+	bt	LOCAL(arg_byte_loop)
+	add	r4,r0
+LOCAL(arg_bit_norm):
+	mov.l	LOCAL(x7f800000),r2
+	rotl	r3
+LOCAL(arg_bit_loop):
+	add	r2,r0
+	bf/s	LOCAL(arg_bit_loop)
+	 rotl	r3
+	rotr	r3
+	rotr	r3
+	sub	r2,r0
+	bra	LOCAL(arg_norm)
+	 dmulu.l	r3,r1
+#endif /* 0 */
+LOCAL(inf):
+	bra	LOCAL(insert_sign)
+	 mov	r2,r0
+LOCAL(inf_nan_arg0):
+	bt	LOCAL(inf_nan_both)
+	add	r0,r0
+	cmp/eq	#-1,r0	/* arg1 zero? -> NAN */
+	bt	LOCAL(insert_sign)
+	mov	r4,r0
+LOCAL(inf_insert_sign):
+	bra	LOCAL(insert_sign)
+	 add	r0,r0
+LOCAL(inf_nan_both):
+	mov	r4,r0
+	bra	LOCAL(inf_insert_sign)
+	 or	r5,r0
+LOCAL(inf_nan_arg1):
+	mov	r2,r0
+	add	r0,r0
+	cmp/eq	#-1,r0	/* arg0 zero? */
+	bt	LOCAL(insert_sign)
+	bra	LOCAL(inf_insert_sign)
+	 mov	r5,r0
+LOCAL(ill_exp):
+	cmp/pz	r0
+	mov	#-24,r3
+	bt	LOCAL(inf)
+	add	r1,r1
+	mov	r0,r2
+	sub	r1,r2	! remove fraction to get back pre-rounding exponent.
+	sts	mach,r0
+	sts	macl,r1
+	shad	r3,r2
+	mov	r0,r3
+	shld	r2,r0
+	add	#32,r2
+	cmp/pz	r2
+	shld	r2,r3
+	bf	LOCAL(zero)
+	or	r1,r3
+	mov	#-1,r1
+	tst	r3,r3
+	mov.w	LOCAL(x100),r3
+	bf/s	LOCAL(denorm_round_up)
+	mov	#-0x80,r1
+	tst	r3,r0
+LOCAL(denorm_round_up):
+	mov	#-7,r3
+	subc	r1,r0
+	bra	LOCAL(insert_sign)
+	 shld	r3,r0
+LOCAL(zero):
+	bra	LOCAL(insert_sign)
+	 mov #0,r0
+LOCAL(x100):
+	.word	0x100
+LOCAL(mx80):
+	.word	-0x80
+	.balign	4
+LOCAL(x7f800000):
+	.long 0x7f800000
+LOCAL(x3f800000):
+	.long 0x3f800000
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x60000000):
+	.long	0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+	ENDFUNC(GLOBAL(mulsf3))
Index: gcc/config/sh/IEEE-754/m3/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsisf.S	(revision 0)
@@ -0,0 +1,101 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! floatsisf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsisf))
+	.global GLOBAL(floatsisf)
+	.balign	4
+GLOBAL(floatsisf):
+	cmp/pz	r4
+	mov	r4,r5
+	bt	0f
+	neg	r4,r5
+0:	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r5,r1
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r5,r1
+	mov	#24,r2
+	bt	0f
+	mov	r5,r1
+	shlr16	r1
+	add	#-16,r2
+0:	tst	r3,r1	! 0xff00
+	bt	0f
+	shlr8	r1
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r1
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r1),r1
+	cmp/pz	r4
+	mov.l	LOCAL(x4a800000),r3	! bias + 23 - implicit 1
+	bt	0f
+	mov.l	LOCAL(xca800000),r3	! sign + bias + 23 - implicit 1
+0:	mov	r5,r0
+	sub	r1,r2
+	mov.l	LOCAL(x80000000),r1
+	shld	r2,r0
+	cmp/pz	r2
+	add	r3,r0
+	bt	LOCAL(noround)
+	add	#31,r2
+	shld	r2,r5
+	add	#-31,r2
+	rotl	r5
+	cmp/hi	r1,r5
+	mov	#0,r3
+	addc	r3,r0
+	mov	#23,r1
+	shld	r1,r2
+	rts
+	sub	r2,r0
+	.balign	8
+LOCAL(noround):
+	mov	#23,r1
+	tst	r4,r4
+	shld	r1,r2
+	bt	LOCAL(ret0)
+	rts
+	sub	r2,r0
+LOCAL(ret0):
+	rts
+	mov	#0,r0
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x4a800000): .long 0x4a800000
+LOCAL(xca800000): .long 0xca800000
+LOCAL(x80000000): .long 0x80000000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsisf))
Index: gcc/config/sh/IEEE-754/m3/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/muldf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/muldf3.S	(revision 0)
@@ -0,0 +1,481 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! muldf3 for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+! Normal numbers are multiplied in 53 or 54 cycles on SH4-200.
+
+FUNC(GLOBAL(muldf3))
+	.global GLOBAL(muldf3)
+LOCAL(inf_nan_denorm_or_zero_a):
+	mov.l	r8,@-r15
+	sub	r3,DBL0H	! isolate high fraction
+	mov.l	@(4,r15),r8	! original DBL0H (with sign & exp)
+	sub	r3,r1		! 0x7ff00000
+	mov.l	LOCAL(x60000000),r3
+	shll16	r2		! 0xffff0000
+	!			  no stall here for sh4-200
+	!
+	tst	r1,r8
+	mov.l	r0,@-r15
+	bf	LOCAL(inf_nan_a)
+	tst	r1,r0		! test for DBL1 inf, nan or small
+	bt	LOCAL(ret_inf_nan_zero)
+LOCAL(normalize_arg):
+	tst	DBL0H,DBL0H
+	bf	LOCAL(normalize_arg53)
+	tst	DBL0L,DBL0L
+	bt	LOCAL(a_zero)
+	tst	r2,DBL0L
+	mov	DBL0L,DBL0H
+	bt	LOCAL(normalize_arg16)
+	shlr16	DBL0H
+	mov.w	LOCAL(m15),r2	! 1-16
+	bra	LOCAL(normalize_arg48)
+	shll16	DBL0L
+
+LOCAL(normalize_arg53):
+	tst	r2,DBL0H
+	mov	#1,r2
+	bt	LOCAL(normalize_arg48)
+	mov	DBL0H,r1
+	shlr16	r1
+	bra	LOCAL(normalize_DBL0H)
+	mov	#21-16,r3
+
+LOCAL(normalize_arg16):
+	mov.w	LOCAL(m31),r2 ! 1-32
+	mov	#0,DBL0L
+LOCAL(normalize_arg48):
+	mov	DBL0H,r1
+	mov	#21,r3
+LOCAL(normalize_DBL0H):
+	extu.b	r1,r8
+	mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r8,r1
+	!
+	bt	0f
+	shlr8	r1
+0:
+#ifdef	__pic__
+	add	r0,r1
+
+	mova	LOCAL(c__clz_tab),r0
+
+#endif /* __pic__ */
+	mov.b	@(r0,r1),r8
+	mov	DBL0L,r1
+	mov.l	@r15+,r0
+	bt	0f
+	add	#-8,r3
+0:	clrt
+	sub	r8,r3
+	mov.w	LOCAL(d20),r8
+	shld	r3,DBL0H
+	shld	r3,DBL0L
+	sub	r3,r2
+	add	#-32,r3
+	shld	r3,r1
+	mov.l	LOCAL(x00100000),r3
+	or	r1,DBL0H
+	shld	r8,r2
+	mov.l	@r15+,r8
+	add	r2,DBL1H
+	mov.l	LOCAL(x001fffff),r2
+	dmulu.l	DBL0L,DBL1L
+	bra	LOCAL(arg_denorm_done)
+	or	r3,r0		! set implicit 1 bit
+
+LOCAL(a_zero):
+	mov.l	@(4,r15),r8
+	add	#8,r15
+LOCAL(zero):
+	mov	#0,DBLRH
+	bra	LOCAL(pop_ret)
+	mov	#0,DBLRL
+
+! both inf / nan -> result is nan if at least one is none, else inf.
+! BBL0 inf/nan, DBL1 zero   -> result is nan
+! DBL0 inf/nan, DBL1 finite -> result is DBL0 with sign adjustemnt
+LOCAL(inf_nan_a):
+	mov	r8,DBLRH
+	mov.l	@(4,r15),r8
+	add	#8,r15
+	tst	r1,r0	! arg1 inf/nan ?
+	mov	DBL0L,DBLRL
+	bt	LOCAL(both_inf_nan)
+	tst	DBL1L,DBL1L
+	mov	DBL1H,r1
+	bf	LOCAL(pop_ret)
+	add	r1,r1
+	tst	r1,r1
+	!
+	bf	LOCAL(pop_ret)
+LOCAL(nan):
+	mov	#-1,DBLRL
+	bra	LOCAL(pop_ret)
+	mov	#-1,DBLRH
+
+LOCAL(both_inf_nan):
+	or	DBL1L,DBLRL
+	bra	LOCAL(pop_ret)
+	or	DBL1H,DBLRH
+
+LOCAL(ret_inf_nan_zero):
+	tst	r1,r0
+	mov.l	@(4,r15),r8
+	or	DBL0L,DBL0H
+	bf/s	LOCAL(zero)
+	add	#8,r15
+	tst	DBL0H,DBL0H
+	bt	LOCAL(nan)
+LOCAL(inf_nan_b):
+	mov	DBL1L,DBLRL
+	mov	DBL1H,DBLRH
+LOCAL(pop_ret):
+	mov.l	@r15+,DBL0H
+	add	DBLRH,DBLRH
+
+
+	div0s	DBL0H,DBL1H
+
+	rts
+	rotcr	DBLRH
+
+	.balign	4
+/* Argument a has already been tested for being zero or denorm.
+   On the other side, we have to swap a and b so that we can share the
+   normalization code.
+   a: sign/exponent : @r15 fraction: DBL0H:DBL0L
+   b: sign/exponent: DBL1H fraction:    r0:DBL1L  */
+LOCAL(inf_nan_denorm_or_zero_b):
+	sub	r3,r1		! 0x7ff00000
+	mov.l	@r15,r2		! get original DBL0H
+	tst	r1,DBL1H
+	sub	r3,r0		! isolate high fraction
+	bf	LOCAL(inf_nan_b)
+	mov.l	DBL1H,@r15
+	mov	r0,DBL0H
+	mov.l	r8,@-r15
+	mov	r2,DBL1H
+	mov.l	LOCAL(0xffff0000),r2
+	mov.l	r1,@-r15
+	mov	DBL1L,r1
+	mov	DBL0L,DBL1L
+	bra	LOCAL(normalize_arg)
+	mov	r1,DBL0L
+
+LOCAL(d20):
+	.word	20
+LOCAL(m15):
+	.word	-15
+LOCAL(m31):
+	.word	-31
+LOCAL(xff):
+	.word	0xff
+
+	.balign	4
+LOCAL(0xffff0000): .word 0xffff0000
+
+	! calculate a (DBL0H:DBL0L) * b (DBL1H:DBL1L)
+	.balign	4
+GLOBAL(muldf3):
+	mov.l	LOCAL(xfff00000),r3
+	mov	DBL1H,r0
+	dmulu.l	DBL0L,DBL1L
+	mov.l	LOCAL(x7fe00000),r1
+	sub	r3,r0
+	mov.l	DBL0H,@-r15
+	sub	r3,DBL0H
+	tst	r1,DBL0H
+	or	r3,DBL0H
+	mov.l	LOCAL(x001fffff),r2
+	bt	LOCAL(inf_nan_denorm_or_zero_a)
+	tst	r1,r0
+	or	r3,r0		! r0:DBL1L    := b fraction ; u12.52
+	bt	LOCAL(inf_nan_denorm_or_zero_b) ! T clear on fall-through
+LOCAL(arg_denorm_done):
+	and	r2,r0		! r0:DBL1L    := b fraction ; u12.52
+	sts	macl,r3
+	sts	mach,r1
+	dmulu.l	DBL0L,r0
+	and	r2,DBL0H	! DBL0H:DBL0L := a fraction ; u12.52
+	mov.l	r8,@-r15
+	mov	#0,DBL0L
+	mov.l	r9,@-r15
+	sts	macl,r2
+	sts	mach,r8
+	dmulu.l	DBL0H,DBL1L
+	addc	r1,r2
+
+	addc	DBL0L,r8	! add T; clears T
+
+	sts	macl,r1
+	sts	mach,DBL1L
+	dmulu.l	DBL0H,r0
+	addc	r1,r2
+	mov.l	LOCAL(x7ff00000),DBL0H
+	addc	DBL1L,r8	! clears T
+	mov.l	@(8,r15),DBL1L	! a sign/exp w/fraction
+	sts	macl,DBLRL
+	sts	mach,DBLRH
+	and	DBL0H,DBL1L	! a exponent
+	mov.w	LOCAL(x200),r9
+	addc	r8,DBLRL
+	mov.l	LOCAL(x3ff00000),r8	! bias
+	addc	DBL0L,DBLRH	! add T
+	cmp/hi	DBL0L,r3	! 32 guard bits -> sticky: T := r3 != 0
+	movt	r3
+	tst	r9,DBLRH	! T := fraction < 2
+	or	r3,r2		! DBLRH:DBLRL:r2 := result fraction; u24.72
+	bt/s	LOCAL(shll12)
+	sub	r8,DBL1L
+	mov.l	LOCAL(x002fffff),r8
+	and	DBL1H,DBL0H	! b exponent
+	mov.l	LOCAL(x00100000),r9
+	add	DBL0H,DBL1L ! result exponent - 1
+	tst	r8,r2
+	mov.w	LOCAL(m20),r8
+	subc	DBL0L,r9
+	addc	r2,r9 ! r2 value is still needed for denormal rounding
+	mov.w	LOCAL(d11),DBL0L
+	rotcr	r9
+	clrt
+	shld	r8,r9
+	mov.w	LOCAL(m21),r8
+	mov	DBLRL,r3
+	shld	DBL0L,DBLRL
+	addc	r9,DBLRL
+	mov.l	@r15+,r9
+	shld	r8,r3
+	mov.l	@r15+,r8
+	shld	DBL0L,DBLRH
+	mov.l	@r15+,DBL0H
+	addc	r3,DBLRH
+	mov.l	LOCAL(x7ff00000),DBL0L
+	add	DBL1L,DBLRH	! implicit 1 adjusts exponent
+	mov.l	LOCAL(xffe00000),r3
+	cmp/hs	DBL0L,DBLRH
+	add	DBLRH,DBLRH
+	bt	LOCAL(ill_exp_11)
+	tst	r3,DBLRH
+	bt	LOCAL(denorm_exp0_11)
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+
+
+LOCAL(shll12):
+	mov.l	LOCAL(x0017ffff),r8
+	extu.b	DBLRH,DBLRH	! remove implicit 1.
+	mov.l	LOCAL(x00080000),r9
+	and	DBL1H,DBL0H	! b exponent
+	add	DBL0H,DBL1L	! result exponent
+	tst	r8,r2		! rounding adjust for lower guard ...
+	mov.w	LOCAL(m19),r8
+	subc	DBL0L,r9	! ... bits and round to even; clear T
+	addc	r2,r9 ! r2 value is still needed for denormal rounding
+	mov.w	LOCAL(d12),DBL0L
+	rotcr	r9
+	clrt
+	shld	r8,r9
+	mov.w	LOCAL(m20),r8
+	mov	DBLRL,r3
+	shld	DBL0L,DBLRL
+	addc	r9,DBLRL
+	mov.l	@r15+,r9
+	shld	r8,r3
+	mov.l	@r15+,r8
+	shld	DBL0L,DBLRH
+	mov.l	LOCAL(x7ff00000),DBL0L
+	addc	r3,DBLRH
+	mov.l	@r15+,DBL0H
+	add	DBL1L,DBLRH
+	mov.l	LOCAL(xffe00000),r3
+	cmp/hs	DBL0L,DBLRH
+	add	DBLRH,DBLRH
+	bt	LOCAL(ill_exp_12)
+	tst	r3,DBLRH
+	bt	LOCAL(denorm_exp0_12)
+LOCAL(insert_sign):
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+
+LOCAL(overflow):
+	mov	r3,DBLRH
+	mov	#0,DBLRL
+	bra	LOCAL(insert_sign)
+	mov.l	@r15+,r8
+
+LOCAL(denorm_exp0_11):
+	mov.l	r8,@-r15
+	mov	#-21,r8
+	mov.l	r9,@-r15
+	bra	LOCAL(denorm)
+	mov	#-2,DBL1L	! one for denormal, and one for sticky bit
+
+LOCAL(ill_exp_11):
+	mov	DBL1H,DBL1L
+	and	r3,DBL0L	! 0x7fe00000
+	add	DBL1L,DBL1L
+	mov.l	r8,@-r15
+	cmp/hi	DBL1L,DBL0L	! check if exp a was large
+	mov	#-20,DBL0L
+	bf	LOCAL(overflow)
+	mov	#-21,r8
+	mov	DBLRH,DBL1L
+	rotcr	DBL1L		! shift in negative sign
+	mov.l	r9,@-r15
+	shad	DBL0L,DBL1L	! exponent ; s32
+	bra	LOCAL(denorm)
+	add	#-2,DBL1L	! add one for denormal, and one for sticky bit
+
+LOCAL(denorm_exp0_12):
+	mov.l	r8,@-r15
+	mov	#-20,r8
+	mov.l	r9,@-r15
+	bra	LOCAL(denorm)
+	mov	#-2,DBL1L	! one for denormal, and one for sticky bit
+
+	.balign 4		! also aligns LOCAL(denorm)
+LOCAL(ill_exp_12):
+	and	r3,DBL0L	! 0x7fe00000
+	mov	DBL1H,DBL1L
+	add	DBL1L,DBL1L
+	mov.l	r8,@-r15
+	cmp/hi	DBL1L,DBL0L	! check if exp a was large
+	bf	LOCAL(overflow)
+	mov	DBLRH,DBL1L
+	rotcr	DBL1L		! shift in negative sign
+	mov	#-20,r8
+	shad	r8,DBL1L	! exponent ; s32
+	mov.l	r9,@-r15
+	add	#-2,DBL1L	! add one for denormal, and one for sticky bit
+LOCAL(denorm):
+	not	r3,r9		! 0x001fffff
+	mov.l	r10,@-r15
+	mov	r2,r10
+	shld	r8,r10	! 11 or 12 lower bit valid
+	and	r9,DBLRH ! Mask away vestiges of exponent.
+	add	#32,r8
+	sub	r3,DBLRH ! Make leading 1 explicit.
+	shld	r8,r2	! r10:r2 := unrounded result lowpart
+	shlr	DBLRH	! compensate for doubling at end of normal code
+	sub	DBLRL,r10	! reconstruct effect of previous rounding
+	exts.b	r10,r9
+	shad	r3,r10	! sign extension
+	mov	#0,r3
+	clrt
+	addc	r9,DBLRL	! Undo previous rounding.
+	mov.w	LOCAL(m32),r9
+	addc	r10,DBLRH
+	cmp/hi	r3,r2
+	rotcl	DBLRL	! fit in the rest of r2 as a sticky bit.
+	mov.l	@r15+,r10
+	rotcl	DBLRH
+	cmp/ge	r9,DBL1L
+	bt	LOCAL(small_norm_shift)
+	cmp/hi	r3,DBLRL
+	add	#32,DBL1L
+	movt	DBLRL
+	cmp/gt	r9,DBL1L
+	or	DBLRH,DBLRL
+	bt/s	LOCAL(small_norm_shift)
+	mov	r3,DBLRH
+	mov	r3,DBLRL	! exponent too negative to shift - return zero
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+	div0s	DBL0H,DBL1H
+	rts
+	rotcr	DBLRH
+	.balign	4
+LOCAL(small_norm_shift):
+	mov	DBLRL,r2	! stash away guard bits
+	shld	DBL1L,DBLRL
+	mov	DBLRH,DBL0L
+	shld	DBL1L,DBLRH
+	mov.l	LOCAL(x7fffffff),r9
+	add	#32,DBL1L
+	shld	DBL1L,r2
+	shld	DBL1L,DBL0L
+	or	DBL0L,DBLRL
+	shlr	DBL0L
+	addc	r2,r9
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+	addc	r3,DBLRL
+	addc	r3,DBLRH
+	div0s	DBL0H,DBL1H
+	add	DBLRH,DBLRH
+	rts
+	rotcr	DBLRH
+
+
+LOCAL(x200):
+	.word 0x200
+LOCAL(m19):
+	.word	-19
+LOCAL(m20):
+	.word	-20
+LOCAL(m21):
+	.word	-21
+LOCAL(m32):
+	.word	-32
+LOCAL(d11):
+	.word	11
+LOCAL(d12):
+	.word	12
+	.balign	4
+LOCAL(x60000000):
+	.long	0x60000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+LOCAL(xfff00000):
+	.long	0xfff00000
+LOCAL(x7fffffff):
+	.long	0x7fffffff
+LOCAL(x00100000):
+	.long	0x00100000
+LOCAL(x7fe00000):
+	.long	0x7fe00000
+LOCAL(x001fffff):
+	.long	0x001fffff
+LOCAL(x7ff00000):
+	.long	0x7ff00000
+LOCAL(x3ff00000):
+	.long	0x3ff00000
+LOCAL(x002fffff):
+	.long	0x002fffff
+LOCAL(xffe00000):
+	.long	0xffe00000
+LOCAL(x0017ffff):
+	.long	0x0017ffff
+LOCAL(x00080000):
+	.long	0x00080000
+ENDFUNC(GLOBAL(muldf3))
Index: gcc/config/sh/IEEE-754/m3/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/floatsidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/floatsidf.S	(revision 0)
@@ -0,0 +1,98 @@
+/* Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+! floatsidf for the Renesas SH / STMicroelectronics ST40 CPUs.
+! Contributed by Joern Rennecke
+! joern.rennecke@st.com
+!
+! This code is optimized for SH4 without FPU, but can also be used for SH3.
+
+FUNC(GLOBAL(floatsidf))
+	.global GLOBAL(floatsidf)
+	.balign	4
+GLOBAL(floatsidf):
+	tst	r4,r4
+	mov	r4,r1
+	bt	LOCAL(ret0)
+	cmp/pz	r4
+	bt	0f
+	neg	r4,r1
+0:	mov.l	LOCAL(c_clz_tab),r0
+	extu.w	r1,r5
+	mov.w	LOCAL(xff00),r3
+	cmp/eq	r1,r5
+	mov	#21,r2
+	bt	0f
+	mov	r1,r5
+	shlr16	r5
+	add	#-16,r2
+0:	tst	r3,r5	! 0xff00
+	bt	0f
+	shlr8	r5
+0:	bt	0f
+	add	#-8,r2
+0:
+#ifdef	__PIC__
+	add	r0,r5
+	mova	LOCAL(c_clz_tab),r0
+#endif
+	mov.b	@(r0,r5),r5
+	cmp/pz	r4
+	mov.l	LOCAL(x41200000),r3	! bias + 20 - implicit 1
+	bt	0f
+	mov.l	LOCAL(xc1200000),r3	! sign + bias + 20 - implicit 1
+0:	mov	r1,r0	! DBLRL & DBLRH
+	sub	r5,r2
+	mov	r2,r5
+	shld	r2,DBLRH
+	cmp/pz	r2
+	add	r3,DBLRH
+	add	#32,r2
+	shld	r2,DBLRL
+	bf	0f
+	mov.w	LOCAL(d0),DBLRL
+0:	mov	#20,r2
+	shld	r2,r5
+	rts
+	sub	r5,DBLRH
+LOCAL(ret0):
+	mov	#0,DBLRL
+	rts
+	mov	#0,DBLRH
+
+LOCAL(xff00):	.word 0xff00
+	.balign	4
+LOCAL(x41200000):
+#ifdef __LITTLE_ENDIAN__
+LOCAL(d0):	  .word 0
+		  .word 0x4120
+#else
+		  .word 0x4120
+LOCAL(d0):	  .word 0
+#endif
+LOCAL(xc1200000): .long 0xc1200000
+LOCAL(c_clz_tab):
+#ifdef __pic__
+        .long   GLOBAL(clz_tab) - .
+#else
+        .long   GLOBAL(clz_tab)
+#endif
+ENDFUNC(GLOBAL(floatsidf))
Index: gcc/config/sh/IEEE-754/m3/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/m3/fixdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/m3/fixdfsi.S	(revision 0)
@@ -0,0 +1,110 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!! fixdfsi for Renesas SH / STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifdef L_fixdfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get UINT_MAX, for set sign bit, you get 0.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixdfsi)
+	FUNC(GLOBAL(fixdfsi))
+	.balign	4
+GLOBAL(fixdfsi):
+	mov.w	LOCAL(x413),r1
+	mov	DBL0H,r0
+	shll	DBL0H
+	mov.l	LOCAL(mask),r3
+	mov	#-21,r2
+	shld	r2,DBL0H	! SH4-200 will start this insn in a new cycle
+	bt/s	LOCAL(neg)
+	sub	r1,DBL0H
+	cmp/pl	DBL0H		! SH4-200 will start this insn in a new cycle
+	and	r3,r0
+	bf/s	LOCAL(ignore_low)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#10,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmax)
+	shld	DBL0H,DBL0L
+	rts
+	or	DBL0L,r0
+
+	.balign	8
+LOCAL(ignore_low):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	bf	0f		! SH4-200 will start this insn in a new cycle
+	mov	#-31,DBL0H	! results in 0 return
+0:	add	#1,r0
+	rts
+	shld	DBL0H,r0
+
+	.balign 4
+LOCAL(neg):
+	cmp/pl	DBL0H
+	and	r3,r0
+	bf/s	LOCAL(ignore_low_neg)
+	addc	r3,r0	! uses T == 1; sets implict 1
+	mov	#10,r2
+	shld	DBL0H,r0	! SH4-200 will start this insn in a new cycle
+	cmp/gt	r2,DBL0H
+	add	#-32,DBL0H
+	bt	LOCAL(retmin)
+	shld	DBL0H,DBL0L
+	or	DBL0L,r0	! SH4-200 will start this insn in a new cycle
+	rts
+	neg	r0,r0
+
+	.balign 4
+LOCAL(ignore_low_neg):
+	mov	#-21,r2
+	cmp/gt	DBL0H,r2	! SH4-200 will start this insn in a new cycle
+	add	#1,r0
+	shld	DBL0H,r0
+	bf	0f
+	mov	#0,r0		! results in 0 return
+0:	rts
+	neg	r0,r0
+
+LOCAL(retmax):
+	mov	#-1,r0
+	rts
+	shlr	r0
+
+LOCAL(retmin):
+	mov	#1,r0
+	rts
+	rotr	r0
+
+LOCAL(x413): .word 0x413
+
+	.balign 4
+LOCAL(mask): .long 0x000fffff
+	ENDFUNC(GLOBAL(fixdfsi))
+#endif /* L_fixdfsi */
Index: gcc/config/sh/IEEE-754/divdf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divdf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/divdf3.S	(revision 0)
@@ -0,0 +1,593 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!division of two double precision floating point numbers
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:dividend
+!
+!r6,r7:divisor
+!
+!Exit:
+!r0,r1:quotient
+
+!Notes: dividend is passed in regs r4 and r5 and divisor is passed in regs 
+!r6 and r7, quotient is returned in regs r0 and r1. dividend is referred as op1
+!and divisor as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align	5
+	.global	GLOBAL (divdf3)
+	FUNC (GLOBAL (divdf3))
+
+GLOBAL (divdf3):
+
+#ifdef  __LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r5,r4
+	mov	r1,r5
+
+        mov     r6,r1
+        mov     r7,r6
+        mov     r1,r7
+#endif
+	mov	r4,r2
+	mov.l	.L_inf,r1
+
+	and	r1,r2
+	mov.l   r8,@-r15
+
+	cmp/eq	r1,r2
+	mov     r6,r8
+
+	bt	.L_a_inv
+	and	r1,r8
+
+	cmp/eq	r1,r8
+	mov.l	.L_high_mant,r3
+
+	bf	.L_chk_zero
+	and	r6,r3
+
+	mov.l   .L_mask_sign,r8	
+	cmp/pl	r7
+
+	mov	r8,r0
+	bt	.L_ret_b	!op2=NaN,return op2
+
+	and	r4,r8
+	cmp/pl	r3
+
+	and	r6,r0
+	bt	.L_ret_b	!op2=NaN,return op2
+
+	xor     r8,r0           !op1=normal no,op2=Inf, return Zero
+	mov     #0,r1
+	
+#ifdef __LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_ret_b:
+	mov	r7,r1
+	mov     r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l   @r15+,r8
+
+.L_a_inv:
+	!chk if op1 is Inf or NaN
+	mov.l   .L_high_mant,r2
+	cmp/pl  r5
+
+	and	r4,r2
+	bt	.L_ret_a
+
+	and	r1,r8		!r1 contains infinity
+	cmp/pl	r2
+
+	bt	.L_ret_a
+	cmp/eq	r1,r8
+
+	mov	r1,DBLRH
+	add	DBLRH,DBLRH
+	bf	0f
+	mov	#-1,DBLRH	! Inf/Inf, return NaN.
+0:	div0s	r4,r6
+	mov.l   @r15+,r8	
+	rts
+	rotcr	DBLRH
+
+.L_ret_a:
+	!return op1
+	mov	r5,r1
+	mov	r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+        mov.l   @r15+,r8
+
+.L_chk_zero:
+	!chk if op1=0
+	mov.l   .L_mask_sign,r0
+        mov     r4,r3
+
+        and     r0,r3
+        shll    r4
+
+        and     r6,r0
+        shlr    r4
+
+        xor     r3,r0
+        shll    r6
+
+	shlr	r6
+	tst	r4,r4
+
+
+	bf      .L_op1_not_zero	
+	tst	r5,r5
+	
+        bf      .L_op1_not_zero
+	tst	r7,r7
+
+	mov.l   @r15+,r8
+	bf	.L_ret_zero
+
+	tst	r6,r6
+	bf	.L_ret_zero
+
+	rts
+	mov     #-1,DBLRH       !op1=op2=0, return NaN
+	
+.L_ret_zero:
+	!return zero
+	mov	r0,r1
+	rts
+#ifdef __LITTLE__ENDIAN
+	mov	#0,r0
+#else
+	mov	#0,r1		!op1=0,op2=normal no,return zero
+#endif
+
+.L_norm_b:
+	!normalize op2
+        shll    r7
+        mov.l   .L_imp_bit,r3
+
+        rotcl   r6
+        tst     r3,r6
+
+        add     #-1,r8
+        bt      .L_norm_b
+
+        bra     .L_divide
+        add     #1,r8
+
+.L_op1_not_zero:
+	!op1!=0, chk if op2=0
+	tst	r7,r7	
+	mov	r1,r3
+	
+	mov	#0,r1
+	bf	.L_normal_nos
+
+	tst	r6,r6
+	bf      .L_normal_nos
+
+	mov.l   @r15+,r8
+	or	r3,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	nop
+
+.L_normal_nos:
+	!op1 and op2 are normal nos
+	tst	r2,r2
+	mov	#-20,r1
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld    r1,r2
+#else
+	SHLR20 (r2)
+#endif
+	bt	.L_norm_a	!normalize dividend
+	
+.L_chk_b:
+	mov.l	r9,@-r15
+	tst	r8,r8
+
+        mov.l   .L_high_mant,r9
+
+! The subsequent branch is for the upper compare
+! Shifting will not alter the result, for the
+! macro is declared with care.
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r8
+#else
+        SHLR20 (r8)
+#endif
+				! T set -> normalize divisor
+	SL(bt,	.L_norm_b,
+	 and	r9,r4)
+
+.L_divide:
+	mov.l   .L_2047,r1
+	sub	r8,r2
+
+	mov.l	.L_1023,r8
+	and	r9,r6
+
+	!resultant exponent
+	add	r8,r2
+	!chk the exponent for overflow
+	cmp/ge	r1,r2
+	
+	mov.l	.L_imp_bit,r1
+	bt	.L_overflow
+	
+	mov	#0,r8
+	or	r1,r4
+	
+	or      r1,r6	
+	mov	#-24,r3
+
+	!chk if the divisor is 1(mantissa only)
+	cmp/eq	r8,r7
+	bf	.L_div2
+
+	cmp/eq	 r6,r1
+	bt	.L_den_one
+
+.L_div2:
+	!divide the mantissas
+	shll8	r4
+	mov	r5,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r9
+#else
+        SHLR24 (r9)
+#endif
+	shll8	r6
+
+	or	r9,r4
+	shll8   r5
+
+	mov	r7,r9
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r9
+#else
+        SHLR24 (r9)
+#endif
+	mov	r8,r3
+	shll8	r7
+
+	or	r9,r6	
+	cmp/gt	r4,r6
+
+	mov	r3,r9
+	bt	.L_shift
+
+	cmp/eq	r4,r6
+	bf	.L_loop
+
+	cmp/gt	r5,r7
+	bf	.L_loop
+
+.L_shift:
+	add	#-1,r2
+	shll	r5
+	rotcl	r4
+
+.L_loop:
+	!actual division loop
+	cmp/gt	r6,r4
+	bt	.L_subtract
+
+	cmp/eq	r6,r4
+	bf	.L_skip
+
+	cmp/ge	r7,r5
+	bf	.L_skip
+
+.L_subtract:
+	clrt
+	subc	r7,r5
+	
+	or	r1,r8
+	subc	r6,r4
+
+.L_skip:
+	shlr	r1
+	shll	r5
+
+	rotcl	r4
+	cmp/eq	r1,r3
+
+	bf	.L_loop
+	mov.l	.L_imp_bit,r1
+
+	!chk if the divison was for the higher word of the quotient
+	tst	r1,r9
+	bf	.L_chk_exp
+
+	mov	r8,r9
+	mov.l   .L_mask_sign,r1
+
+	!divide for the lower word of the quotient
+	bra	.L_loop
+	mov	r3,r8
+
+.L_chk_exp:
+	!chk if the result needs to be denormalized
+	cmp/gt	r2,r3
+	bf	.L_round
+	mov     #-53,r7
+
+.L_underflow:
+	!denormalize the result
+	add	#1,r2
+	cmp/gt	r2,r7
+
+	or      r4,r5           !remainder
+	add	#-2,r2
+
+	mov	#32,r4
+	bt      .L_return_zero
+
+	add	r2,r4
+	cmp/ge	r3,r4
+
+	mov	r2,r7
+	mov	r3,r1
+
+	mov     #-54,r2
+	bt	.L_denorm
+	mov	#-32,r7
+
+.L_denorm:
+	shlr	r8
+	rotcr	r1
+
+	shll	r8
+	add     #1,r7
+
+	shlr	r9
+	rotcr	r8
+
+	cmp/eq	r3,r7
+	bf	.L_denorm
+
+	mov	r4,r7
+	cmp/eq	r2,r4
+
+	bt	.L_break
+	mov     r3,r6
+
+	cmp/gt	r7,r3
+	bf	.L_break
+
+	mov	r2,r4
+	mov	r1,r6
+
+	mov	r3,r1
+	bt	.L_denorm
+
+.L_break:
+	mov     #0,r2
+
+	cmp/gt	r1,r2
+
+	addc	r2,r8
+	mov.l   .L_comp_1,r4
+
+	addc	r3,r9		
+	or	r9,r0
+
+	cmp/eq	r5,r3
+	bf	.L_return	
+
+	cmp/eq	r3,r6
+	mov.l	.L_mask_sign,r7
+
+	bf	.L_return
+	cmp/eq	r7,r1
+
+	bf	.L_return
+	and	r4,r8
+
+.L_return:
+	mov.l	@r15+,r9
+	mov     r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_norm_a:
+        !normalize op1
+        shll    r5
+        mov.l   .L_imp_bit,r3
+
+        rotcl   r4
+        tst     r3,r4
+
+        add     #-1,r2
+        bt      .L_norm_a
+
+        bra     .L_chk_b
+        add     #1,r2
+
+.L_overflow:
+	!overflow, return inf
+	mov.l   .L_inf,r2
+#ifdef __LITTLE_ENDIAN__
+	or	r2,r1	
+	mov	#0,r0
+#else
+	or	r2,r0
+	mov	#0,r1
+#endif
+        mov.l   @r15+,r9
+        rts
+        mov.l   @r15+,r8
+
+.L_den_one:
+	!denominator=1, result=numerator
+        mov     r4,r9
+        mov   	#-53,r7
+
+	cmp/ge	r2,r8
+	mov	r8,r4
+
+	mov	r5,r8
+	mov	r4,r3
+
+	!chk the exponent for underflow
+	SL(bt,	.L_underflow,
+	 mov     r4,r5)
+
+	mov.l	.L_high_mant,r7
+        bra     .L_pack
+	mov     #20,r6
+
+.L_return_zero:
+	!return zero
+	mov	r3,r1
+	mov.l	@r15+,r9
+
+	rts
+	mov.l   @r15+,r8
+
+.L_round:
+	!apply rounding
+	cmp/eq	r4,r6
+	bt	.L_lower
+
+	clrt
+	subc    r6,r4
+
+	bra     .L_rounding
+	mov	r4,r6
+	
+.L_lower:
+	clrt
+	subc	r7,r5
+	mov	r5,r6
+	
+.L_rounding:
+	!apply rounding
+	mov.l   .L_invert,r1
+	mov	r3,r4
+
+	movt	r3
+	clrt
+	
+	not	r3,r3
+	and	r1,r3	
+
+	addc	r3,r8
+	mov.l   .L_high_mant,r7
+
+	addc	r4,r9
+	cmp/eq	r4,r6
+
+	mov.l   .L_comp_1,r3
+	SL (bf,	.L_pack,
+	 mov     #20,r6)
+	and	r3,r8
+
+.L_pack:
+	!pack the result, r2=exponent,r0=sign,r8=lower mantissa, r9=higher mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld    r6,r2
+#else
+        SHLL20 (r2)
+#endif
+	and	r7,r9
+
+	or	r2,r0
+	mov	r8,r1
+
+	or      r9,r0
+	mov.l	@r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+	.align	2
+
+.L_mask_sign:
+	.long	0x80000000
+.L_high_mant:
+	.long	0x000fffff
+.L_inf:
+	.long	0x7ff00000
+.L_1023:
+	.long	1023
+.L_2047:
+	.long	2047
+.L_imp_bit:
+	.long	0x00100000	
+.L_comp_1:
+	.long	0xfffffffe
+.L_invert:
+	.long	0x00000001
+
+ENDFUNC (GLOBAL (divdf3))
Index: gcc/config/sh/IEEE-754/floatunssisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatunssisf.S	(revision 0)
@@ -0,0 +1,132 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of unsigned integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatunsisf)
+	FUNC (GLOBAL (floatunsisf))
+
+GLOBAL (floatunsisf):
+	tst	r4,r4
+	mov	#23,r6
+
+	mov.l	.L_set_24_bits,r7
+	SL(bt,	.L_return,
+	 not	r7,r3)
+
+	! Decide the direction for shifting
+	mov.l	.L_set_24_bit,r5
+	cmp/hi	r7,r4
+
+	not	r5,r2
+	SL(bt,	.L_shift_right,
+	 mov	#0,r7)
+
+	tst	r5,r4
+	
+	mov	#0,r0
+	bf	.L_pack_sf
+
+! Shift the bits to the left. Adjust the exponent
+.L_shift_left:
+	shll	r4
+	tst	r5,r4
+
+	add	#-1,r6
+	bt	.L_shift_left
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa
+.L_pack_sf:
+	mov	#23,r3
+	add	#127,r6
+
+	! Align the exponent
+	and	r2,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+        SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+
+	or	r6,r0
+	rts
+	or	r4,r0
+
+! Shift right the number with rounding
+.L_shift_right:
+	shlr	r4
+	rotcr	r7
+
+	tst	r4,r3
+	add	#1,r6
+
+	bf	.L_shift_right
+	
+	tst	r7,r7
+	bt	.L_sh_rt_1
+
+	shll	r7
+	movt	r1
+
+	add	r1,r4
+
+	tst	r7,r7
+	bf	.L_sh_rt_1
+
+	! Halfway between two numbers.
+	! Round towards LSB = 0
+	shlr	r4
+	shll	r4
+
+.L_sh_rt_1:
+	mov	r4,r0
+
+	! Rounding may have misplaced MSB. Adjust.
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	bf	.L_shift_right
+	bt	.L_pack_sf
+
+.L_return:
+	rts
+	mov	r4,r0
+
+	.align 2
+.L_set_24_bit:
+	.long 0x00800000
+
+.L_set_24_bits:
+	.long 0x00FFFFFF
+
+ENDFUNC (GLOBAL (floatunsisf))
Index: gcc/config/sh/IEEE-754/fixunsdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunsdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixunsdfsi.S	(revision 0)
@@ -0,0 +1,176 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to unsigned integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global GLOBAL (fixunsdfsi)
+	FUNC (GLOBAL (fixunsdfsi))
+
+GLOBAL (fixunsdfsi):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+#endif
+	mov.l	.L_p_inf,r2
+	mov     #-20,r1
+	
+	mov	r2,r7
+	mov.l   .L_1023,r3
+
+	and	r4,r2
+	shll    r4
+
+        movt    r6		! r6 contains the sign bit
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r2           ! r2 contains the exponent
+#else
+        SHLR20 (r2)
+#endif
+	shlr    r4
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r7
+#else
+        SHLR20 (r7)
+#endif
+	tst	r6,r6	
+	SL(bf,	.L_epil,
+	 mov	#0,r0)
+
+	cmp/hi	r2,r3		! if exp < 1023,return 0
+	mov.l	.L_high_mant,r1
+
+	SL(bt,	.L_epil,
+	 and	r4,r1)		! r1 contains high mantissa
+
+	cmp/eq	r2,r7		! chk if exp is invalid
+	mov.l	.L_1054,r7
+
+	bt	.L_inv_exp
+	mov	#11,r0
+	
+	cmp/hi	r7,r2		! If exp > 1054,return maxint
+	sub     r2,r7		!r7 contains the number of shifts
+
+	mov.l	.L_21bit,r2
+	bt	.L_ret_max
+
+	or	r2,r1
+	mov	r7,r3
+
+	shll8   r1
+	neg     r7,r7
+
+	shll2	r1
+
+        shll	r1
+	cmp/hi	r3,r0
+
+	SL(bt,	.L_lower_mant,
+	 mov	#21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_sh_loop:
+        tst	r7,r7
+        bt      .L_break
+        add     #1,r7
+        bra     .L_sh_loop
+        shlr    r1
+
+.L_break:
+#endif
+	rts
+	mov     r1,r0
+
+.L_lower_mant:
+	neg	r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r0,r5
+#else
+        SHLR21 (r5)
+#endif
+	or	r5,r1		!pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_loop:
+        tst	r7,r7
+        bt      .L_break1
+        add     #1,r7
+        bra     .L_loop
+        shlr    r1
+
+.L_break1:
+#endif
+	mov	r1,r0
+.L_epil:
+	rts
+	nop
+
+.L_inv_exp:
+	cmp/hi	r0,r5
+	bt	.L_epil
+
+	cmp/hi	r0,r1		!compare high mantissa,r1
+	bt	.L_epil
+
+.L_ret_max:
+	mov.l   .L_maxint,r0
+
+	rts
+	nop
+
+	.align	2
+
+.L_maxint:
+	.long	0xffffffff
+.L_p_inf:
+	.long	0x7ff00000
+.L_high_mant:
+	.long	0x000fffff
+.L_1023:
+	.long	0x000003ff
+.L_1054:
+	.long	1054
+.L_21bit:
+	.long	0x00100000
+
+ENDFUNC (GLOBAL (fixunsdfsi))
Index: gcc/config/sh/IEEE-754/adddf3.S
===================================================================
--- gcc/config/sh/IEEE-754/adddf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/adddf3.S	(revision 0)
@@ -0,0 +1,786 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for adding two double numbers
+
+! Author: Rakesh Kumar
+! SH1 Support by Joern Rennecke
+! Sticky Bit handling : Joern Rennecke
+
+! Arguments: r4-r5, r6-r7
+! Result: r0-r1
+
+! The value in r4-r5 is referred to as op1
+! and that in r6-r7 is referred to as op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+        .align 5
+	.global	GLOBAL (subdf3)
+	FUNC (GLOBAL (subdf3))
+        .global GLOBAL (adddf3)
+	FUNC (GLOBAL (adddf3))
+
+GLOBAL (subdf3):
+#ifdef __LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r6,r2
+
+	mov	r5,r4
+	mov	r7,r6
+
+	mov	r1,r5
+	mov	r2,r7
+#endif
+	mov.l	.L_sign,r2
+	bra	.L_adddf3_1
+	xor	r2,r6
+
+GLOBAL (adddf3):
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+	mov	r6,r2
+
+	mov	r5,r4
+	mov	r7,r6
+
+	mov	r1,r5
+	mov	r2,r7
+#endif
+	
+.L_adddf3_1:
+	mov.l	r8,@-r15
+	mov	r4,r1
+
+	mov.l 	.L_inf,r2
+	mov	r6,r3
+
+	mov.l	r9,@-r15
+	and	r2,r1		!Exponent of op1 in r1
+
+	mov.l	r10,@-r15
+	and	r2,r3		!Exponent of op2 in r3
+
+	! Check for Nan or Infinity
+	mov.l	.L_sign,r9
+	cmp/eq	r2,r1
+
+	mov	r9,r10
+	bt	.L_thread_inv_exp_op1
+
+	mov	r9,r0
+	cmp/eq	r2,r3
+! op1 has a valid exponent. We need not check it again.
+! Return op2 straight away.
+	and	r4,r9		!r9 has sign bit for op1
+	bt	.L_ret_op2
+
+	! Check for -ve zero
+	cmp/eq	r4,r0
+	and	r6,r10		!r10 has sign bit for op2
+
+	bt	.L_op1_nzero
+
+	cmp/eq	r6,r0
+	bt	.L_op2_nzero
+
+! Check for zero
+.L_non_zero:
+	tst	r4,r4
+	bt	.L_op1_zero
+
+	! op1 is not zero, check op2 for zero
+	tst	r6,r6
+	bt	.L_op2_zero
+
+! r1 and r3 has masked out exponents, r9 and r10 has signs
+.L_add:
+	mov.l	.L_high_mant,r8
+	mov	#-20,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r1		! r1 now has exponent for op1 in its lower bits
+#else
+	SHLR20 (r1)
+#endif
+	and	r8,r6	! Higher bits of mantissa of op2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3		! r3 has exponent for op2 in its lower bits
+#else
+	SHLR20 (r3)
+#endif
+	and	r8,r4	! Higher bits of mantissa of op1
+
+	mov.l	.L_21bit,r8
+
+	tst	r1,r1
+	bt	.L_norm_op1
+
+	! Set the 21st bit.
+	or	r8,r4
+	tst	r3,r3
+
+	bt	.L_norm_op2
+	or	r8,r6
+
+! Check for negative mantissas. Make them positive by negation
+! r9 and r10 have signs of op1 and op2 respectively
+.L_neg_mant:
+	tst	r9,r9
+	bf	.L_neg_op1
+
+	tst	r10,r10
+	bf	.L_neg_op2
+
+.L_add_1:
+	cmp/ge	r1,r3
+
+	mov	r1,r0
+	bt	.L_op2_exp_greater
+
+	sub	r3,r0
+	! If exponent difference is greater than 54, the resultant exponent
+	! won't be changed. Return op1 straight away.
+	mov	#54,r2
+	cmp/gt	r2,r0
+
+	bt	.L_pack_op1
+
+	mov	r1,r3
+	clrt
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+
+	! Shift left the first operand and apply rest of shifts to second operand.
+	mov	#0,r2
+	shll	r5
+
+	rotcl	r4
+
+	add	#-1,r3
+	dt	r0
+
+	bt	.L_add_mant
+	dt	r0
+
+	bt	LOCAL(got_guard)
+	dt	r0
+
+	bt	LOCAL(got_sticky)
+
+! Shift the mantissa part of op2 so that both exponents are equal
+.L_shfrac_op2:
+	shar	r6
+	or	r7,r2	! sticky bit
+
+	rotcr	r7
+	dt	r0
+
+	bf	.L_shfrac_op2
+
+	shlr	r2
+
+	subc	r2,r2	! spread sticky bit across r2
+LOCAL(got_sticky):
+	shar	r6
+
+	rotcr	r7
+
+	rotcr	r2
+LOCAL(got_guard):
+	shar	r6
+
+	rotcr	r7
+
+	rotcr	r2
+
+
+! Add the psotive mantissas and check for overflow by checking the
+! MSB of the resultant. In case of overflow, negate the result.
+.L_add_mant:
+	clrt
+	addc	r7,r5
+
+	mov	#0,r10	! Assume resultant to be positive
+	addc	r6,r4
+
+	cmp/pz	r4
+
+	bt	.L_mant_ptv
+	negc	r2,r2
+
+	negc	r5,r5
+
+	mov.l	.L_sign,r10 ! The assumption was wrong, result is negative
+	negc	r4,r4
+
+! 23rd bit in the high part of mantissa could be set.
+! In this case, right shift the mantissa.
+.L_mant_ptv:
+	mov.l	.L_23bit,r0
+
+	tst	r4,r0
+	bt	.L_mant_ptv_0
+
+	shlr	r4
+	rotcr	r5
+
+	add	#1,r3
+	bra	.L_mant_ptv_1
+	rotcr	r2
+
+.L_mant_ptv_0:
+	mov.l	.L_22bit,r0
+	tst	r4,r0
+
+	bt	.L_norm_mant
+
+.L_mant_ptv_1:
+	! 22 bit of resultant mantissa is set. Shift right the mantissa
+	! and add 1 to exponent
+	add	#1,r3
+	shlr	r4
+	rotcr	r5
+	! The mantissa is already normalized. We don't need to
+	! spend any effort. Branch to epilogue. 
+	bra	.L_epil
+	rotcr	r2
+
+! Normalize operands
+.L_norm_op1:
+	shll	r5
+
+	rotcl	r4
+	add	#-1,r1
+
+	tst	r4,r8
+	bt	.L_norm_op1
+
+	tst	r3,r3
+	SL(bf,	.L_neg_mant,
+	 add	#1,r1)
+
+.L_norm_op2:
+	shll	r7
+
+	rotcl	r6
+	add	#-1,r3
+
+	tst	r6,r8
+	bt	.L_norm_op2
+
+	bra	.L_neg_mant
+	add	#1,r3
+
+! Negate the mantissa of op1
+.L_neg_op1:
+	clrt
+	negc	r5,r5
+
+	negc	r4,r4
+	tst	r10,r10
+
+	bt	.L_add_1
+
+! Negate the mantissa of op2
+.L_neg_op2:
+	clrt
+	negc	r7,r7
+
+	bra	.L_add_1
+	negc	r6,r6
+
+! Thread the jump to .L_inv_exp_op1
+.L_thread_inv_exp_op1:
+	bra	.L_inv_exp_op1
+	nop
+
+.L_ret_op2:
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r6,r1
+#else
+	mov	r6,r0
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r7,r0
+#else
+	mov	r7,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+.L_op1_nzero:
+	tst	r5,r5
+	bt	.L_ret_op2
+
+	! op1 is not zero. Check op2 for negative zero
+	cmp/eq	r6,r0
+	bf	.L_non_zero	! both op1 and op2 are not -0
+
+.L_op2_nzero:
+	tst	r7,r7
+	bf	.L_non_zero
+
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+	mov	r4,r0	! op2 is -0, return op1
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+	mov	r5,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! High bit of op1 is known to be zero.
+! Check low bit. r2 contains 0x00000000
+.L_op1_zero:
+	tst	r5,r5
+	bt	.L_ret_op2
+
+	! op1 is not zero. Check high bit of op2
+	tst	r6,r6
+	bf	.L_add	! both op1 and op2 are not zero
+
+! op1 is not zero. High bit of op2 is known to be zero.
+! Check low bit of op2. r2 contains 0x00000000
+.L_op2_zero:
+	tst	r7,r7
+	bf	.L_add
+
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+	mov	r4,r0	! op2 is zero, return op1
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+	mov	r5,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! exp (op1) is smaller or equal to exp (op2)
+! The logic of same operations is present in .L_add. Kindly refer it for
+! comments
+.L_op2_exp_greater:
+	mov	r3,r0
+	sub	r1,r0
+
+	mov	#54,r2
+	cmp/gt	r2,r0
+
+	bt	.L_pack_op2
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+
+	mov	#0,r2
+	shll	r7
+	rotcl	r6
+	add	#-1,r0
+	add	#-1,r3
+
+	cmp/eq	#0,r0
+	bt	.L_add_mant
+.L_shfrac_op1:	
+        add     #-1,r0
+        shar    r4
+
+	rotcr	r5
+	rotcr	r2
+
+        cmp/eq  #0,r0
+        bf      .L_shfrac_op1
+
+	bra	.L_add_mant
+	nop
+
+! Return the value in op1
+.L_ret_op1:
+        mov.l   @r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+        mov     r4,r0
+#endif
+
+        mov.l   @r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+        mov     r5,r1
+#endif
+
+        rts
+        mov.l   @r15+,r8
+
+! r1 has exp, r9 has sign, r4 and r5 mantissa
+.L_pack_op1:
+	mov.l	.L_high_mant,r7
+	mov	r4,r0
+
+	tst	r9,r9
+	bt	.L_pack_op1_1
+
+	clrt
+	negc	r5,r5
+	negc	r0,r0
+
+.L_pack_op1_1:
+	and	r7,r0
+	mov	r1,r3
+
+	mov	#20,r2
+	mov	r5,r1
+
+	mov.l	@r15+,r10
+	or	r9,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+	mov.l	@r15+,r9
+
+	or	r3,r0
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+!r2 has exp, r10 has sign, r6 and r7 mantissa
+.L_pack_op2:
+	mov.l	.L_high_mant,r9
+	mov	r6,r0
+
+	tst	r10,r10
+	bt	.L_pack_op2_1
+
+	clrt
+	negc	r7,r7
+	negc	r0,r0
+
+.L_pack_op2_1:
+	and	r9,r0
+	mov	r7,r1
+
+	mov	#20,r2
+	or	r10,r0
+
+	mov.l	@r15+,r10
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+
+	mov.l	@r15+,r9
+
+	or	r3,r0
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+! Normalize the mantissa by setting its 21 bit in high part
+.L_norm_mant:
+	mov.l	.L_21bit,r0
+
+	tst	r4,r0
+	bf	.L_epil
+
+	tst	r4,r4
+	bf	.L_shift_till_1
+
+	tst	r5,r5
+	bf	.L_shift_till_1
+
+	! Mantissa is zero, return 0
+	mov.l	@r15+,r10
+	mov	#0,r0
+
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+
+	rts
+	mov	#0,r1
+
+! A loop for making the 21st bit 1 in high part of resultant mantissa
+! It is already ensured that 1 bit is present in the mantissa
+.L_shift_till_1:
+	clrt
+	shll	r5
+
+	rotcl	r4
+	add	#-1,r3
+
+	tst	r4,r0
+	bt	.L_shift_till_1
+
+! Return the result. Mantissa is in r4-r5. Exponent is in r3
+! Sign bit in r10
+.L_epil:
+	cmp/pl	r3
+
+	bf	.L_denorm
+	mov.l	LOCAL(x7fffffff),r0
+
+	mov	r5,r1
+	shlr	r1
+
+	mov	#0,r1
+	addc	r0,r2
+
+! Check extra MSB here
+	mov.l	.L_22bit,r9
+	addc	r1,r5	! round to even
+
+	addc	r1,r4
+	tst	r9,r4
+
+	bf	.L_epil_1
+
+.L_epil_0:
+	mov.l	.L_21bit,r1
+
+	not	r1,r1
+	and	r1,r4
+
+	mov	r4,r0
+	or	r10,r0
+
+	mov.l	@r15+,r10
+	mov	#20,r2
+
+	mov.l	@r15+,r9
+	mov	r5,r1
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r2,r3
+#else
+	SHLL20 (r3)
+#endif
+	or	r3,r0
+
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+.L_epil_1:
+	shlr	r4
+	add	#1,r3
+	bra	.L_epil_0
+	rotcr	r5
+
+.L_denorm:
+	add	#-1,r3
+.L_denorm_1:
+	tst	r3,r3
+	bt	.L_denorm_2
+
+	shlr	r4
+	rotcr	r5
+
+	movt	r1
+	bra	.L_denorm_1
+	add	#1,r3
+
+.L_denorm_2:
+	clrt
+	mov	#0,r2
+	addc	r1,r5
+
+	addc	r2,r4
+	mov	r4,r0
+
+	or	r10,r0
+	mov.l	@r15+,r10
+
+	mov	r5,r1
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r0,r2
+	mov	r1,r0
+	mov	r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+! op1 is known to be positive infinity, and op2 is Inf. The sign
+! of op2 is not known. Return the appropriate value
+.L_op1_pinf_op2_inf:
+	mov.l	.L_sign,r0
+	tst	r6,r0
+
+	bt	.L_ret_op2_1
+
+	! op2 is negative infinity. Inf - Inf is being performed
+	mov.l	.L_inf,r0
+	mov.l	@r15+,r10
+	mov.l	@r15+,r9
+	mov.l	@r15+,r8
+	rts
+	mov	#-1,DBLRH	! return NaN.
+	
+.L_ret_op1_1:
+        mov.l   @r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r4,r1
+#else
+        mov     r4,r0
+#endif
+
+        mov.l   @r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r5,r0
+#else
+        mov     r5,r1
+#endif
+
+        rts
+        mov.l   @r15+,r8
+
+.L_ret_op2_1:
+	mov.l	@r15+,r10
+#ifdef	__LITTLE_ENDIAN__
+	mov	r6,r1
+#else
+	mov	r6,r0
+#endif
+
+	mov.l	@r15+,r9
+#ifdef	__LITTLE_ENDIAN__
+	mov	r7,r0
+#else
+	mov	r7,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+! op1 is negative infinity. Check op2 for infinity or Nan
+.L_op1_ninf:
+	cmp/eq	r2,r3
+	bf	.L_ret_op1_1	! op2 is neither Nan nor Inf
+
+	mov.l	@r15+,r9
+	div0s	r4,r6		! different signs -> NaN
+	mov	r4,DBLRH
+	or	r6,DBLRH
+	mov.l	@r15+,r8
+	SL(bf, 0f,
+	 mov	r5,DBLRL)
+	mov	#-1,DBLRH	! return NaN.
+0:	rts
+	or	r7,DBLRL
+
+!r1 contains exponent for op1, r3 contains exponent for op2
+!r2 has .L_inf (+ve Inf)
+!op1 has invalid exponent. Either it contains Nan or Inf
+.L_inv_exp_op1:
+	! Check if a is Nan
+	cmp/pl	r5
+	bt	.L_ret_op1_1
+
+	mov.l	.L_high_mant,r0
+	and	r4,r0
+
+	cmp/pl	r0
+	bt	.L_ret_op1_1
+
+	! op1 is not Nan. It is infinity. Check the sign of it.
+	! If op2 is Nan, return op2
+	cmp/pz	r4
+
+	bf	.L_op1_ninf
+
+	! op2 is +ve infinity here
+	cmp/eq	r2,r3
+	bf	.L_ret_op1_1	! op2 is neither Nan nor Inf
+
+	! r2 is free now
+	mov.l	.L_high_mant,r0
+	tst	r6,r0		! op2 also has invalid exponent
+
+	bf	.L_ret_op2_1	! branch if op2 is NaN
+
+	tst	r7,r7
+	bt	.L_op1_pinf_op2_inf	! op2 is Infinity, and op1 is +Infinity
+	!op2 is not infinity, It is Nan
+	bf	.L_ret_op2_1
+
+	.align 2	
+.L_high_mant:
+	.long 0x000FFFFF
+
+.L_21bits:
+	.long 0x001FFFFF
+
+.L_22bit:
+	.long 0x00200000
+
+.L_23bit:
+	.long 0x00400000
+
+.L_21bit:
+	.long 0x00100000
+
+.L_sign:
+	.long 0x80000000
+
+.L_inf:
+	.long 0x7ff00000
+
+LOCAL(x7fffffff): .long 0x7fffffff
+
+ENDFUNC (GLOBAL (subdf3))
+ENDFUNC (GLOBAL (adddf3))
Index: gcc/config/sh/IEEE-754/floatsisf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsisf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatsisf.S	(revision 0)
@@ -0,0 +1,195 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion of integer to floating point
+
+! Author: Rakesh Kumar
+
+! Argument: r4
+! Result: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatsisf)
+        FUNC (GLOBAL (floatsisf))
+
+GLOBAL (floatsisf):
+	mov.l	.L_sign,r2
+	mov	#23,r6
+
+	! Check for zero
+	tst	r4,r4
+	mov.l	.L_24_bits,r7
+
+	! Extract sign
+	and	r4,r2
+	bt	.L_ret
+
+	! Negative ???
+	mov.l	.L_imp_bit,r5
+	cmp/pl	r4
+
+	not	r7,r3
+	bf	.L_neg
+
+	! Decide the direction for shifting
+	cmp/gt	r7,r4
+	mov	r4,r0
+
+	and	r5,r0
+	bt	.L_shr_0
+
+	! Number may already be in normalized form
+	cmp/eq	#0,r0
+	bf	.L_pack
+
+! Shift the bits to the left. Adjust the exponent
+.L_shl:
+	shll	r4
+	mov	r4,r0
+
+	and	r5,r0
+	cmp/eq	#0,r0
+
+	SL(bt,	.L_shl,
+	 add	#-1,r6)
+
+! Pack the value in floating point format.
+! r6 has unbiased exponent, r4 has mantissa, r2 has sign
+.L_pack:
+	mov	#23,r3
+	not	r5,r5
+
+	mov	r2,r0
+	add	#127,r6
+
+	and	r5,r4
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+
+	or	r6,r0
+	rts
+	or	r4,r0
+
+! Negate the number
+.L_neg:
+	! Take care for -2147483648.
+	mov	r4,r0
+	shll	r0
+	
+	cmp/eq	#0,r0
+	SL(bt,	.L_ret_min,
+	 neg	r4,r4)
+
+        cmp/gt  r7,r4
+        bt	.L_shr_0
+
+	mov	r4,r0
+	and	r5,r0
+
+	cmp/eq	#0,r0
+	bf	.L_pack
+	bt	.L_shl
+	
+.L_shr_0:
+	mov	#0,r1
+
+! Shift right the number with rounding
+.L_shr:
+	shlr	r4
+	movt	r7
+
+	tst	r7,r7
+
+	! Count number of ON bits shifted
+	bt	.L_shr_1
+	add	#1,r1
+
+.L_shr_1:
+	mov	r4,r0
+	add	#1,r6
+
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	! Add MSB of shifted bits
+	bf	.L_shr
+	add	r7,r4
+
+	tst	r7,r7
+	bt	.L_pack
+
+.L_pack1:
+	mov	#1,r0
+	cmp/eq	r1,r0
+
+	bt	.L_rnd
+	mov	r4,r0
+
+	! Rounding may have misplaced MSB. Adjust.
+	and	r3,r0
+	cmp/eq	#0,r0
+
+	bf	.L_shr
+	bt	.L_pack
+
+! If only MSB of shifted bits is ON, we are halfway
+! between two numbers. Round towards even LSB of
+! resultant mantissa.
+.L_rnd:
+	shlr	r4
+	bra	.L_pack
+	shll	r4
+
+.L_ret:
+	rts
+	mov	r4,r0
+
+! Return value for -2147483648
+.L_ret_min:
+	mov.l	.L_min_val,r0
+	rts
+	nop
+
+	.align 2
+.L_sign:
+	.long 0x80000000
+
+.L_imp_bit:
+	.long 0x00800000
+
+.L_24_bits:
+	.long 0x00FFFFFF
+
+.L_nsign:
+	.long 0x7FFFFFFF
+
+.L_min_val:
+	.long 0xCF000000
+
+ENDFUNC (GLOBAL (floatsisf))
Index: gcc/config/sh/IEEE-754/muldf3.S
===================================================================
--- gcc/config/sh/IEEE-754/muldf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/muldf3.S	(revision 0)
@@ -0,0 +1,596 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!multiplication of two double precision floating point numbers
+!Author:Aanchal Khanna
+!SH1 Support / Simplifications: Joern Rennecke
+!
+!Entry:
+!r4,r5:operand 1
+!
+!r6,r7:operand 2
+!
+!Exit:
+!r0,r1:result
+!
+!Notes: argument 1 is passed in regs r4 and r5 and argument 2 is passed in regs
+!r6 and r7, result is returned in regs r0 and r1. operand 1 is referred as op1
+!and operand 2 as op2.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+	.text
+	.align	5	
+	.global	GLOBAL (muldf3)
+	FUNC (GLOBAL (muldf3))
+
+GLOBAL (muldf3):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+
+        mov     r6,r1
+        mov     r7,r6
+        mov     r1,r7
+#endif
+	mov.l	.L_mask_sign,r0
+	mov	r4,r2
+
+	and	r0,r2		
+	mov	#0,r1
+
+	shll	r4
+	and	r6,r0		
+	
+	xor     r2,r0		!r0 contains the result's sign bit
+	shlr	r4
+
+	mov.l   .L_inf,r2
+	shll	r6
+
+	mov	r4,r3
+	shlr	r6
+	
+.L_chk_a_inv:
+	!chk if op1 is Inf/NaN
+	and	r2,r3
+	mov.l	r8,@-r15
+
+	cmp/eq	r3,r2
+	mov.l	.L_mask_high_mant,r8
+
+	mov	r2,r3
+	bf	.L_chk_b_inv
+
+	mov	r8,r3
+	and	r4,r8
+
+	cmp/hi  r1,r8		
+	bt	.L_return_a	!op1 NaN, return op1
+
+	cmp/hi  r1,r5	
+	mov	r2,r8
+
+	bt      .L_return_a	!op1 NaN, return op1
+	and	r6,r8
+
+	cmp/eq	r8,r2		
+	and	r6,r3
+
+	bt      .L_b_inv
+	cmp/eq	r1,r6		
+
+	bf	.L_return_a	!op1 Inf,op2= normal no return op1
+	cmp/eq	r1,r7
+
+	bf	.L_return_a	!op1 Inf,op2= normal no return op1
+	mov.l   @r15+,r8	
+
+	rts
+	mov	#-1,DBLRH	!op1=Inf, op2=0,return nan
+
+.L_b_inv:
+	!op2 is NaN/Inf
+	cmp/hi	r1,r7
+	mov	r1,r2
+
+	mov	r5,r1
+	bt	.L_return_b	!op2=NaN,return op2
+
+	cmp/hi	r2,r6
+	or	r4,r0
+
+	bt	.L_return_b	!op2=NaN,return op2
+	mov.l   @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts			!op1=Inf,op2=Inf,return Inf with sign
+	nop
+
+.L_chk_b_inv:
+	!Chk if op2 is NaN/Inf
+	and	r6,r2
+	cmp/eq	r3,r2
+
+	bf	.L_chk_a_for_zero
+	and	r6,r8
+
+	cmp/hi	r1,r8
+	bt	.L_return_b	 !op2=NaN,return op2
+
+	cmp/hi	r1,r7
+	bt	.L_return_b	 !op2=NaN,return op2
+
+	cmp/eq	r5,r1
+	bf      .L_return_b	 !op1=normal number,op2=Inf,return Inf
+
+	mov	r7,r1
+	cmp/eq	r4,r1
+
+	bf	.L_return_b	/* op1=normal number, op2=Inf,return Inf */
+	mov.l   @r15+,r8
+
+	rts
+	mov	#-1,DBLRH	!op1=0,op2=Inf,return NaN
+
+.L_return_a:
+	mov	r5,r1
+	or	r4,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l   @r15+,r8
+
+.L_return_b:
+	mov	r7,r1
+	or	r6,r0	
+	
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+	
+.L_chk_a_for_zero:
+	!Chk if op1 is zero
+	cmp/eq	r1,r4
+	bf	.L_chk_b_for_zero
+	
+	cmp/eq	r1,r5
+	bf	.L_chk_b_for_zero
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l	@r15+,r8
+
+.L_chk_b_for_zero:
+	!op1=0,chk if op2 is zero
+        cmp/eq  r1,r6
+        mov	r1,r3
+	
+	mov.l   .L_inf,r1
+	bf      .L_normal_nos
+
+        cmp/eq  r3,r7
+        bf      .L_normal_nos
+
+	mov	r3,r1
+	mov.l   @r15+,r8
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	nop
+
+.L_normal_nos:
+	!op1 and op2 are normal nos
+	mov.l	r9,@-r15
+	mov	r4,r3
+
+	mov     #-20,r9	
+	and	r1,r3	
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r9,r2
+#else
+        SHLR20 (r2)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r9,r3
+#else
+        SHLR20 (r3)
+#endif
+	cmp/pl	r3
+
+	bf	.L_norm_a	!normalize op1
+.L_chk_b:	
+	cmp/pl	r2
+	bf	.L_norm_b	!normalize op2
+
+.L_mul1:
+	add	r3,r2
+	mov.l  .L_1023,r1
+	
+	!resultant exponent in r2
+	add     r1,r2
+	mov.l   .L_2047,r1	
+
+	!Chk the exponent for overflow
+	cmp/ge	r1,r2
+	and     r8,r4
+
+	bt	.L_return_inf
+	mov.l	.L_imp_bit,r1
+	
+	or	r1,r4		
+	and	r8,r6
+
+	or	r1,r6
+	clrt
+
+	!multiplying the mantissas
+	DMULU_SAVE
+	DMULUL	(r7,r5,r1) 	!bits 0-31 of product 	
+
+	DMULUH	(r3)
+	
+	DMULUL	(r4,r7,r8)
+
+	addc	r3,r8
+
+	DMULUH	(r3)
+
+	movt	r9
+	clrt
+
+	DMULUL	(r5,r6,r7)
+
+	addc	r7,r8		!bits 63-32 of product
+
+	movt	r7
+	add	r7,r9
+
+	DMULUH	(r7)
+
+	add	r7,r3
+
+	add	r9,r3
+	clrt
+
+	DMULUL	(r4,r6,r7)
+
+	addc	r7,r3		!bits 64-95 of product
+
+	DMULUH	(r7)
+	DMULU_RESTORE
+	
+	mov	#0,r5
+	addc	r5,r7		!bits 96-105 of product
+
+	cmp/eq	r5,r1
+	mov     #1,r4
+
+	bt	.L_skip
+	or	r4,r8
+.L_skip:
+	mov.l   .L_106_bit,r4
+	mov	r8,r9
+
+.L_chk_extra_msb:
+	!chk if exra MSB is generated
+	and     r7,r4
+	cmp/eq	r5,r4
+
+	mov     #12,r4
+	SL(bf,	.L_shift_rt_by_1,
+	 mov     #31,r5)
+	
+.L_pack_mantissa:
+	!scale the mantissa t0 53 bits
+	mov	#-19,r6
+	mov.l	.L_mask_high_mant,r5
+
+        SHLRN (19, r6, r8)
+
+	and	r3,r5
+
+	shlr	r8
+	movt	r1
+
+        SHLLN (12, r4, r5)
+
+	add	#-1,r6
+
+	or	r5,r8		!lower bits of resulting mantissa
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r6,r3
+#else
+        SHLR20 (r3)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r4,r7
+#else
+        SHLL12 (r7)
+#endif
+	clrt
+
+	or	r7,r3		!higher bits of resulting mantissa
+	mov     #0,r7
+
+	!chk the exponent for underflow
+	cmp/ge	r2,r7
+	bt	.L_underflow
+
+	addc    r1,r8           !rounding
+	mov	r8,r1
+
+	addc	r7,r3		!rounding
+	mov.l	.L_mask_22_bit,r5
+
+	and	r3,r5
+	!chk if extra msb is generated after rounding
+	cmp/eq	r7,r5
+
+	mov.l	.L_mask_high_mant,r8
+	bt	.L_pack_result
+
+	add	#1,r2
+	mov.l	.L_2047,r6
+
+	cmp/ge	r6,r2
+
+	bt	.L_return_inf
+	shlr	r3
+
+	rotcr	r1
+
+.L_pack_result:
+	!pack the result, r2=exponent, r3=higher mantissa, r1=lower mantissa
+	!r0=sign bit
+	mov	#20,r6
+	and	r8,r3
+	
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r6,r2
+#else
+        SHLL20 (r2)
+#endif
+	or	r3,r0
+	
+	or      r2,r0
+	mov.l   @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_norm_a:
+	!normalize op1
+	shll	r5
+	mov.l	.L_imp_bit,r1
+
+	rotcl	r4
+	add	#-1,r3
+
+	tst	r1,r4
+	bt	.L_norm_a
+
+	bra	.L_chk_b
+	add	#1,r3
+
+.L_norm_b:
+	!normalize op2
+        shll    r7
+        mov.l   .L_imp_bit,r1
+
+        rotcl   r6
+        add     #-1,r2
+
+        tst     r1,r6
+        bt      .L_norm_b
+
+        bra     .L_mul1
+        add     #1,r2
+
+.L_shift_rt_by_1:
+	!adjust the extra msb
+
+	add     #1,r2           !add 1 to exponent
+	mov.l	.L_2047,r6
+
+	cmp/ge	r6,r2
+	mov	#20,r6
+
+	bt	.L_return_inf
+	shlr	r7		!r7 contains bit 96-105 of product
+
+	rotcr	r3		!r3 contains bit 64-95 of product
+
+	rotcr	r8		!r8 contains bit 32-63 of product
+	bra	.L_pack_mantissa
+
+	rotcr	r1		!r1 contains bit 31-0 of product
+
+.L_return_inf:
+	!return Inf
+	mov.l	.L_inf,r2
+	mov     #0,r1
+
+	or	r2,r0
+	mov.l   @r15+,r9
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+	
+.L_underflow:
+	!check if the result needs to be denormalized
+	mov	#-53,r1
+	add	#1,r2
+
+	cmp/gt	r2,r1
+	mov	#32,r4
+
+	add	#-2,r2
+	bt	.L_return_zero
+
+	add	r2,r4
+	mov	r7,r1
+	
+	cmp/ge	r7,r4
+	mov	r2,r6
+
+	mov	#-54,r2
+	bt	.L_denorm
+
+	mov	#-32,r6
+	
+.L_denorm:
+	!denormalize the result
+	shlr	r8
+	rotcr	r1	
+
+	shll	r8
+	add	#1,r6
+
+	shlr	r3
+	rotcr	r8
+
+	cmp/eq	r7,r6
+	bf	.L_denorm
+
+	mov	r4,r6
+	cmp/eq	r2,r4
+
+	bt	.L_break
+	mov	r7,r5
+
+	cmp/gt	r6,r7
+	bf	.L_break
+
+	mov	r2,r4
+	mov	r1,r5
+
+	mov	r7,r1
+	bt	.L_denorm
+
+.L_break:
+	mov	#0,r2
+
+	cmp/gt	r1,r2
+
+	addc	r2,r8
+	mov.l	.L_comp_1,r4
+	
+	addc	r7,r3
+	or	r3,r0
+
+	cmp/eq	r9,r7
+	bf	.L_return
+
+	cmp/eq	r7,r5
+	mov.l	.L_mask_sign,r6
+
+	bf	.L_return
+	cmp/eq	r1,r6
+	
+	bf	.L_return
+	and	r4,r8
+
+.L_return:
+	mov.l	@r15+,r9
+	mov	r8,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	mov.l   @r15+,r8
+
+.L_return_zero:
+	mov.l	@r15+,r9
+	mov	r7,r1
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+
+	rts
+	mov.l	@r15+,r8
+
+	.align	2
+
+.L_mask_high_mant:
+	.long	0x000fffff
+.L_inf:
+	.long	0x7ff00000	
+.L_mask_sign:
+	.long	0x80000000
+.L_1023:
+	.long	-1023
+.L_2047:
+	.long	2047
+.L_imp_bit:
+	.long	0x00100000
+.L_mask_22_bit:
+	.long	0x00200000
+.L_106_bit:
+	.long	0x00000200
+.L_comp_1:
+	.long	0xfffffffe
+
+ENDFUNC (GLOBAL (muldf3))
Index: gcc/config/sh/IEEE-754/fixdfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixdfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixdfsi.S	(revision 0)
@@ -0,0 +1,195 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of double precision floating point number to signed integer
+!Author:Aanchal Khanna
+!
+!Entry:
+!r4,r5:operand
+!
+!Exit:
+!r0:result
+!
+!Note:argument is passed in regs r4 and r5, the result is returned in
+!reg r0.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 	5
+	.global GLOBAL (fixdfsi)
+	FUNC (GLOBAL (fixdfsi))
+
+GLOBAL (fixdfsi):
+
+#ifdef  __LITTLE_ENDIAN__
+        mov     r4,r1
+        mov     r5,r4
+        mov     r1,r5
+
+#endif
+	mov.l	.L_p_inf,r2
+	mov     #-20,r1
+	
+	mov	r2,r7
+	mov.l   .L_1023,r3
+
+	and	r4,r2
+	shll    r4
+        
+	movt    r6		! r6 contains the sign bit
+	
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r2		! r2 contains the exponent
+#else
+        SHLR20 (r2)
+#endif
+	 shlr    r4
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r1,r7
+#else
+        SHLR20 (r7)
+#endif
+	cmp/hi	r2,r3		! if exp < 1023,return 0
+	mov.l	.L_mask_high_mant,r1
+
+	SL(bt,	.L_epil,
+	 mov	#0,r0)
+	and	r4,r1		! r1 contains high mantissa
+
+	cmp/eq	r2,r7		! chk if exp is invalid
+	mov.l	.L_1053,r7
+
+	bt	.L_inv_exp
+	mov	#11,r0
+	
+	cmp/hi	r7,r2		! If exp > 1053,return maxint
+	sub     r2,r7
+
+	mov.l	.L_21bit,r2
+	SL(bt,	.L_ret_max,
+	 add	#1,r7)		! r7 contains the number of shifts
+
+	or	r2,r1
+	mov	r7,r3
+	shll8   r1
+
+	neg     r7,r7
+	shll2	r1
+
+        shll	r1
+	cmp/hi	r3,r0
+
+	!chk if the result can be made only from higher mantissa
+	SL(bt,	.L_lower_mantissa,
+	 mov	#21,r0)
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_loop:
+        tst	r7,r7
+        bt      .L_break1
+        add     #1,r7
+        bra     .L_loop
+        shlr    r1
+
+.L_break1:
+#endif
+	tst	r6,r6
+	SL(bt,	.L_epil,
+	 mov	r1,r0)
+
+	rts
+	neg	r0,r0
+
+.L_lower_mantissa:
+	!result is made from lower mantissa also
+	neg	r0,r0
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r0,r5
+#else
+        SHLR21 (r5)
+#endif
+
+	or	r5,r1		!pack lower and higher mantissas
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r1
+#else
+.L_sh_loop:
+	tst	r7,r7
+	bt	.L_break
+	add	#1,r7
+	bra	.L_sh_loop
+	shlr	r1
+
+.L_break:
+#endif
+	mov	r1,r0
+	bra	.L_chk_sign
+	nop
+
+.L_epil:
+	rts
+	nop
+
+.L_inv_exp:
+	cmp/hi	r0,r5
+	bt	.L_epil
+
+	cmp/hi	r0,r1		!compare high mantissa,r1
+	bt	.L_epil
+
+.L_ret_max:
+	mov.l   .L_maxint,r0
+	tst	r6,r6
+	bt	.L_epil
+
+	rts
+	add	#1,r0
+
+.L_chk_sign:
+	tst	r6,r6		!sign bit is set, number is -ve
+	bt	.L_epil
+	
+	rts
+	neg	r0,r0
+
+	.align	2
+
+.L_maxint:
+	.long	0x7fffffff
+.L_p_inf:
+	.long	0x7ff00000
+.L_mask_high_mant:
+	.long	0x000fffff
+.L_1023:
+	.long	0x000003ff
+.L_1053:
+	.long	1053
+.L_21bit:
+	.long	0x00100000
+
+ENDFUNC (GLOBAL (fixdfsi))
Index: gcc/config/sh/IEEE-754/divsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/divsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/divsf3.S	(revision 0)
@@ -0,0 +1,393 @@
+/* Copyright (C) 2004, 2006, 2010 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!divides two single precision floating point 
+
+! Author: Aanchal Khanna
+
+! Arguments: Dividend is in r4, divisor in r5
+! Result: r0
+
+! r4 and r5 are referred as op1 and op2 resp.
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align	5
+	.global	GLOBAL (divsf3)
+	FUNC (GLOBAL (divsf3))
+
+GLOBAL (divsf3):
+	mov.l	.L_mask_sign,r1
+	mov	r4,r3
+
+	xor	r5,r3
+	shll	r4
+
+	shlr	r4
+	mov.l	.L_inf,r2
+
+	and	r3,r1		!r1=resultant sign
+	mov	r4,r6
+
+	shll	r5
+	mov	#0,r0		
+
+	shlr	r5
+	and	r2,r6
+
+	cmp/eq	r2,r6
+	mov	r5,r7
+
+	and     r2,r7
+	bt	.L_op1_inv
+
+	cmp/eq	r2,r7
+	mov	#-23,r3
+
+	bt	.L_op2_inv
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+	SHLR23 (r7)
+#else
+	shld	r3,r6
+	shld	r3,r7
+#endif
+
+	cmp/eq	r0,r4
+
+	bt	.L_op1_zero		!dividend=0
+	cmp/eq	r0,r6
+
+	mov.l   .L_imp_bit,r3
+	bt	.L_norm_op1		!normalize dividend
+.L_chk_op2:
+	cmp/eq	r0,r5
+	bt	.L_op2_zero		!divisor=0
+
+	cmp/eq	r0,r7
+	bt	.L_norm_op2		!normalize divisor
+
+.L_div1:
+	sub	r7,r6
+	add	#127,r6			!r6=resultant exponent
+
+	mov     r3,r7
+	mov.l	.L_mask_mant,r3
+
+	and	r3,r4
+	!chk exponent for overflow
+        mov.l   .L_255,r2
+
+	and     r3,r5
+	or	r7,r4
+
+	cmp/ge  r2,r6
+	or	r7,r5
+
+	bt	.L_return_inf
+	mov	r0,r2
+
+	cmp/eq  r4,r5
+	bf      .L_den_one
+
+	cmp/ge	r6,r0
+	!numerator=denominator, quotient=1, remainder=0
+	mov	r7,r2			
+
+	mov     r0,r4
+	!chk exponent for underflow
+	bt	.L_underflow
+        bra     .L_pack
+        nop
+
+.L_den_one:
+	!denominator=1, result=numerator
+
+	cmp/eq  r7,r5
+        bf      .L_divide
+
+	!chk exponent for underflow
+	cmp/ge  r6,r0
+        mov    r4,r2           
+
+        SL(bt,    .L_underflow,
+	 mov	r0,r4)
+	bra     .L_pack
+	nop
+
+.L_divide:
+	!dividing the mantissas r4<-dividend, r5<-divisor
+
+	cmp/hi	r4,r5
+	bf	.L_loop
+
+	shll	r4		! if mantissa(op1)< mantissa(op2)
+	add     #-1,r6		! shift left the numerator and decrease the exponent.
+
+.L_loop:
+	!division loop
+
+	cmp/ge	r5,r4
+	bf	.L_skip
+
+	or	r7,r2
+	sub	r5,r4
+
+.L_skip:
+	shlr	r7
+	shll	r4
+
+	cmp/eq	r0,r7
+	bf	.L_loop
+
+	!chk the exponent for underflow
+	cmp/ge  r6,r0
+	bt      .L_underflow
+	
+	!apply rounding
+	cmp/gt	r5,r4
+	bt	.L_round1
+
+	cmp/eq	r4,r5
+	bt	.L_round2
+
+.L_pack:
+	!pack the result, r1=sign, r2=quotient, r6=exponent
+
+	mov    #23,r4
+	and     r3,r2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r4,r6
+#endif
+	or	r2,r1
+
+	or	r6,r1
+	mov	r1,r0	
+	
+	rts
+	nop
+
+.L_round1:
+	!Apply proper rounding
+
+        bra     .L_pack
+        add     #1,r2
+
+.L_round2:
+	!Apply proper rounding
+
+        mov.l   .L_comp_1,r5
+        bra     .L_pack
+        and     r5,r2
+
+.L_op1_inv:
+	!chk if op1 is Inf or NaN
+
+	mov.l	.L_mask_mant,r3
+	mov	r4,r6
+
+	and	r3,r6
+	cmp/hi	r0,r6
+
+	bt	.L_ret_NaN ! op1 is NaN, return NaN.
+	cmp/eq	r2,r7
+
+	SL(bf,	.L_return,
+	 mov	r4,r0) ! Inf/finite, return Inf
+
+	! Inf/Inf or Inf/NaN, return NaN
+.L_ret_NaN:
+	rts
+	mov	#-1,r0
+	
+.L_op2_inv:
+	!chk if op2 is Inf or NaN
+
+	mov.l	.L_mask_mant,r3
+	mov	r5,r7
+	
+	and	r3,r7
+	cmp/hi	r0,r7
+
+	bt	.L_ret_op2
+	mov	r1,r0
+	
+	rts
+	nop
+
+.L_op1_zero:
+	!op1 is zero. If op2 is zero, return NaN, else return zero
+
+	cmp/eq	r0,r5
+
+	bf	.L_return
+
+	rts
+	mov	#-1,r0
+
+.L_op2_zero:
+	!B is zero,return Inf
+
+	rts
+	or	r2,r0
+
+.L_return_inf:
+	mov.l	.L_inf,r0
+	
+	rts
+	or	r1,r0
+
+.L_norm_op1:
+	!normalize dividend
+
+	shll	r4
+	tst	r2,r4
+	
+	add     #-1,r6
+	bt	.L_norm_op1
+
+	bra	.L_chk_op2
+	add	#1,r6
+
+.L_norm_op2:
+	!normalize divisor
+
+	shll	r5
+	tst	r2,r5
+	
+	add	#-1,r7
+	bt	.L_norm_op2
+
+	bra	.L_div1
+	add	#1,r7
+
+.L_underflow:
+	!denormalize the result
+
+	add	#1,r6
+	mov	#-24,r7
+
+	cmp/gt	r6,r7
+	mov	r2,r5
+
+	bt	.L_return_zero
+	add     #-1,r6
+
+	mov	#32,r3
+	neg	r6,r7
+
+	add	#1,r7
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r6,r2
+#else
+	cmp/ge	r0,r6
+	bf	.L_mov_right
+
+.L_mov_left:
+	cmp/eq	r0,r6
+	bt	.L_out
+
+	shll	r2
+	bra	.L_mov_left
+	add	#-1,r6
+
+.L_mov_right:
+	cmp/eq	r0,r6
+	bt	.L_out
+
+	add	#1,r6
+	bra	.L_mov_right
+	shlr	r2
+	
+.L_out:
+#endif
+	sub	r7,r3
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r3,r5
+#else
+	cmp/ge	r0,r3
+	bf	.L_mov_right_1
+
+.L_mov_left_1:
+	shll	r5
+	add	#-1,r3
+
+	cmp/eq	r0,r3
+	bf	.L_mov_left_1
+
+	bt	.L_out_1
+
+.L_mov_right_1:
+	cmp/eq	r0,r3
+	bt	.L_out_1
+
+	add	#1,r3
+	bra	.L_mov_right_1
+	shlr	r5
+
+.L_out_1:
+#endif
+	shlr	r2
+	addc	r0,r2
+
+	cmp/eq	r4,r0		!r4 contains the remainder
+	mov      r2,r0
+
+	mov.l	.L_mask_sign,r7
+	bf	.L_return
+
+	mov.l   .L_comp_1,r2
+	cmp/eq	r7,r5
+
+	bf	.L_return
+	and	r2,r0
+
+.L_return:
+.L_return_zero:
+	rts
+	or     r1,r0
+	
+.L_ret_op2:
+	rts
+	or	r5,r0
+
+
+	.align	2
+.L_inf:
+	.long	0x7f800000
+.L_mask_sign:
+	.long	0x80000000
+.L_mask_mant:
+	.long	0x007fffff
+.L_imp_bit:
+	.long	0x00800000
+.L_comp_1:
+	.long	0xfffffffe
+.L_255:
+	.long	255
+
+ENDFUNC (GLOBAL (divsf3))
Index: gcc/config/sh/IEEE-754/fixunssfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixunssfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixunssfsi.S	(revision 0)
@@ -0,0 +1,150 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion from floating point to unsigned integer
+
+! Author: Rakesh Kumar
+
+! Argument: r4 (in floating point format)
+! Result: r0
+
+! For negative floating point numbers, it returns zero
+
+! The argument is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global	GLOBAL (fixunssfsi)
+	FUNC (GLOBAL (fixunssfsi))
+
+GLOBAL (fixunssfsi):
+	mov.l	.L_sign,r0
+	mov	r4,r2
+
+	! Check for NaN
+	mov.l	.L_inf,r1
+	and	r4,r0
+
+	mov.l	.L_mask_sign,r7
+	mov	#127,r5
+
+	! Remove sign bit
+	cmp/eq	#0,r0
+	and	r7,r2
+
+	! If number is negative, return 0
+	! LIBGCC deviates from standard in this regard.
+	mov	r4,r3
+	SL(bf,	.L_epil,
+	 mov	#0,r0)
+
+	mov.l	.L_frac,r6
+	cmp/gt	r1,r2
+
+	shll	r2
+	SL1(bt,	.L_epil,
+	 shlr16	r2)
+
+	shlr8	r2	! r2 has exponent
+	mov.l	.L_24bit,r1
+
+	and	r6,r3	! r3 has fraction
+	cmp/gt	r2,r5
+
+	! If exponent is less than 127, return 0
+	or	r1,r3
+	bt	.L_epil
+
+	! Process only if exponent is less than 158
+	mov.l	.L_158,r1
+	shll8	r3
+
+	cmp/gt	r1,r2
+	sub	r2,r1
+
+	neg	r1,r1
+	bt	.L_ret_max
+
+! Shift the mantissa with exponent difference from 158
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r1,r3
+#else
+	cmp/gt	r0,r1
+	bt	.L_mov_left
+
+.L_mov_right:
+	cmp/eq	r1,r0
+	bt	.L_ret
+
+	add	#1,r1
+	bra	.L_mov_right
+	shlr	r3
+
+.L_mov_left:
+	add	#-1,r1
+	
+	shll	r3
+	cmp/eq	r1,r0
+
+	bf	.L_mov_left
+
+.L_ret:	
+#endif
+	rts
+	mov	r3,r0
+
+! r0 already has appropriate value
+.L_epil:
+	rts
+	nop
+
+! Return the maximum unsigned integer value
+.L_ret_max:
+	mov.l	.L_max,r3
+
+	rts
+	mov	r3,r0
+
+	.align 2
+.L_inf:
+	.long 0x7F800000
+
+.L_158:
+	.long 158
+
+.L_max:
+	.long 0xFFFFFFFF
+
+.L_frac:
+	.long 0x007FFFFF
+
+.L_sign:
+	.long 0x80000000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_mask_sign:
+	.long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixunssfsi))
Index: gcc/config/sh/IEEE-754/floatunssidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatunssidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatunssidf.S	(revision 0)
@@ -0,0 +1,71 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of unsigned integer to double precision floating point number
+!Author:Rakesh Kumar
+!Rewritten for SH1 support: Joern Rennecke
+!
+!Entry:
+!r4:operand
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatunsidf)
+	FUNC (GLOBAL (floatunsidf))
+
+GLOBAL (floatunsidf):
+	mov.w	LOCAL(x41f0),DBLRH	! bias + 32
+	tst	r4,r4			! check for zero
+	bt	.L_ret_zero
+.L_loop:
+	shll	r4	
+	SL(bf,	.L_loop,
+	 add	#-16,DBLRH)
+
+	mov	r4,DBLRL
+
+        SHLL20 (DBLRL)
+
+        shll16	DBLRH ! put exponent in proper place
+
+        SHLR12 (r4)
+
+	rts
+	or	r4,DBLRH
+	
+.L_ret_zero:
+	mov	#0,r1
+	rts
+	mov	#0,r0
+
+LOCAL(x41f0):	.word	0x41f0
+	.align 2
+
+ENDFUNC (GLOBAL (floatunsidf))
Index: gcc/config/sh/IEEE-754/addsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/addsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/addsf3.S	(revision 0)
@@ -0,0 +1,530 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Add floating point numbers in r4, r5.
+
+! Author: Rakesh Kumar
+
+! Arguments are in r4, r5 and result in r0
+
+! Entry points: ___subsf3, ___addsf3
+
+! r4 and r5 are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+        .global GLOBAL (subsf3)
+	.global	GLOBAL (addsf3)
+	FUNC (GLOBAL (subsf3))
+	FUNC (GLOBAL (addsf3))
+
+GLOBAL (subsf3):
+        mov.l   .L_sign_bit,r1
+        xor     r1,r5
+
+GLOBAL (addsf3):
+	mov.l	r8,@-r15
+	mov	r4,r3
+
+	mov.l	.L_pinf,r2
+	mov	#0,r8
+
+	and	r2,r3 ! op1's exponent.
+	mov	r5,r6
+
+	! Check NaN or Infinity
+	and	r2,r6 ! op2's exponent.
+	cmp/eq	r2,r3
+
+	! go if op1 is NaN or INF. 
+	mov.l	.L_sign_bit,r0
+	SL(bt,	.L_inv_op1,
+	 mov	#-23,r1)
+	
+	! Go if op2 is NaN/INF.
+	cmp/eq	r2,r6
+	mov	r0,r7
+	bt	.L_ret_op2
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r3)
+#else
+	shld	r1,r3
+#endif
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+#else
+	shld	r1,r6
+#endif
+
+	! Check for negative zero
+	cmp/eq	r0,r5
+
+	mov	r5,r1
+	SL(bt,	.L_ret_op1,
+	 and	r7,r1)
+
+	cmp/eq	r0,r4
+	bt	.L_ret_op2
+
+	! if op1 is zero return op2
+	tst	r4,r4
+	bt	.L_ret_op2
+
+	! Equal numbers with opposite sign
+	mov	r4,r2
+	xor	r5,r2
+
+	cmp/eq	r0,r2
+	bt	.L_ret_zero
+
+	! if op2 is zero return op1
+	mov.l	.L_mask_fra,r2
+	tst	r5,r5
+
+	! Extract the mantissa
+	mov	r4,r0
+	SL(bt,	.L_ret_op1,
+	 and	r2,r5)
+
+	and	r2,r4
+
+	mov.l	.L_imp_bit,r2
+	and	r7,r0	! sign bit of op1
+
+	! Check for denormals
+	tst	r3,r3
+	bt	.L_norm_op1
+
+	! Attach the implicit bit
+	or	r2,r4
+	tst	r6,r6
+
+	bt	.L_norm_op2
+
+	or	r2,r5
+	tst	r0,r0
+
+	! operands are +ve or -ve??
+	bt	.L_ptv_op1
+
+	neg	r4,r4
+
+.L_ptv_op1:
+	tst	r1,r1
+	bt	.L_ptv_op2
+
+	neg	r5,r5
+
+! Test exponents for equality
+.L_ptv_op2:
+	cmp/eq	r3,r6
+	bt	.L_exp_eq
+
+! Make exponents of two arguments equal
+.L_exp_ne:
+	! r0, r1 contain sign bits.
+	! r4, r5 contain mantissas.
+	! r3, r6 contain exponents.
+	! r2, r7 scratch.
+
+	! Calculate result exponent.
+	mov	r6,r2
+	sub	r3,r2	! e2 - e1
+
+	cmp/pl	r2
+	mov	#23,r7
+
+	! e2 - e1 is -ve
+	bf	.L_exp_ne_1
+
+	mov	r6,r3 ! Result exp.
+	cmp/gt	r7,r2 ! e2-e1 > 23
+
+	mov	#1,r7
+	bt	.L_pack_op2_0
+
+	! Align the mantissa
+.L_loop_ne:
+	shar	r4
+
+	rotcr	r8
+	cmp/eq	r7,r2
+
+	add	#-1,r2
+	bf	.L_loop_ne
+
+	bt	.L_exp_eq
+
+! Exponent difference is too high.
+! Return op2 after placing pieces in proper place
+.L_pack_op2_0:
+	! If op1 is -ve
+	tst	r1,r1
+	bt	.L_pack_op2
+
+	neg	r5,r5
+
+! r6 has exponent
+! r5 has mantissa, r1 has sign
+.L_pack_op2:
+	mov.l	.L_nimp_bit,r2
+	mov	#23,r3
+
+	mov	r1,r0
+	
+	and	r2,r5
+	mov.l	@r15+,r8
+
+	or	r5,r0
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r6)
+#else
+	shld	r3,r6
+#endif
+        rts
+	or	r6,r0
+
+! return op1. It is NAN or INF or op2 is zero.
+.L_ret_op1:
+	mov	r4,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! return zero
+.L_ret_zero:
+	mov	#0,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! return op2. It is NaN or INF or op1 is zero.
+.L_ret_op2:
+	mov	r5,r0
+
+	rts
+	mov.l	@r15+,r8
+
+! op2 is denormal. Normalize it.
+.L_norm_op2:
+	shll	r5
+	add	#-1,r6
+
+	tst	r2,r5
+	bt	.L_norm_op2
+
+	! Check sign
+	tst	r1,r1
+	bt	.L_norm_op2_2
+
+	neg	r5,r5
+
+.L_norm_op2_2:
+	add	#1,r6
+	cmp/eq	r3,r6
+
+	bf	.L_exp_ne
+	bt	.L_exp_eq
+
+! Normalize op1
+.L_norm_op1:
+	shll	r4
+	add	#-1,r3
+
+	tst	r2,r4
+	bt	.L_norm_op1
+
+	! Check sign
+	tst	r0,r0
+	bt	.L_norm_op1_1
+
+	neg	r4,r4
+
+.L_norm_op1_1:
+	! Adjust biasing
+	add	#1,r3
+
+	! Check op2 for denormalized value
+	tst	r6,r6
+	bt	.L_norm_op2
+
+	mov.l	.L_imp_bit,r2
+
+	tst	r1,r1	! Check sign
+	or	r2,r5	! Attach 24th bit
+
+	bt	.L_norm_op1_2
+
+	neg	r5,r5
+
+.L_norm_op1_2:
+	cmp/eq	r3,r6
+
+	bt	.L_exp_eq
+	bf	.L_exp_ne
+
+! op1 is NaN or Inf
+.L_inv_op1:
+	! Return op1 if it is NAN. 
+	! r2 is infinity
+	cmp/gt	r2,r4
+	bt	.L_ret_op1
+
+	! op1 is +/- INF
+	! If op2 is same return now.
+	cmp/eq	r4,r5
+	bt	.L_ret_op1
+
+	! return op2 if it is NAN
+	cmp/gt	r2,r5
+	bt	.L_ret_op2
+
+	! Check if op2 is inf
+	cmp/eq	r2,r6
+	bf	.L_ret_op1
+	
+	! Both op1 and op2 are infinities 
+	!of opp signs, or there is -NAN. Return a NAN.
+	mov.l	@r15+,r8
+	rts
+	mov	#-1,r0
+
+! Make unequal exponents equal.
+.L_exp_ne_1:
+	mov	#-25,r7
+	cmp/gt	r2,r7 ! -23 > e2 - e1
+
+	add	#1,r2
+	bf	.L_exp_ne_2
+
+	tst	r0,r0
+	bt	.L_pack_op1
+
+.L_pack_op1_0:
+	bra	.L_pack_op1
+	neg	r4,r4
+
+! Accumulate the shifted bits in r8
+.L_exp_ne_2:
+	! Shift with rounding
+	shar	r5
+	rotcr	r8
+
+	tst	r2,r2
+
+	add	#1,r2
+	bf	.L_exp_ne_2
+
+! Exponents of op1 and op2 are equal (or made so)
+! The mantissas are in r4-r5 and remaining bits in r8
+.L_exp_eq:
+	add	r5,r4 ! Add fractions.
+	mov.l	.L_sign_bit,r2
+
+	! Check for negative result
+	mov	#0,r0
+	tst	r2,r4
+
+	mov.l	.L_255,r5
+	bt	.L_post_add
+
+	negc	r8,r8
+	negc	r4,r4
+	or	r2,r0
+
+.L_post_add:
+	! Check for extra MSB
+	mov.l	.L_chk_25,r2
+
+	tst	r2,r4
+	bt	.L_imp_check
+
+	shar 	r4
+	rotcr	r8
+
+	add	#1,r3
+	cmp/ge	r5,r3
+
+	! Return Inf if exp > 254
+	bt	.L_ret_inf
+
+! Check for implicit (24th) bit in result
+.L_imp_check:
+        mov.l	.L_imp_bit,r2
+	tst	r2,r4
+
+	bf	.L_pack_op1
+
+! Result needs left shift
+.L_lft_shft:
+	shll	r8
+	rotcl	r4
+
+	add	#-1,r3
+	tst	r2,r4
+
+	bt	.L_lft_shft
+	
+! Pack the result after rounding
+.L_pack_op1:
+	! See if denormalized result is possible 
+	mov.l	.L_chk_25,r5
+	cmp/pl	r3
+
+	bf	.L_denorm_res
+
+	! Are there any bits shifted previously?
+	tst	r8,r8
+	bt	.L_pack_1
+
+	! Round
+	shll	r8
+	movt	r6
+
+	add	r6,r4
+
+	! If we are halfway between two numbers,
+	! round towards LSB = 0
+	tst	r8,r8
+
+	bf	.L_pack_1
+
+	shlr	r4
+	shll	r4
+
+.L_pack_1:
+	! Adjust extra MSB generated after rounding
+	tst	r4,r5
+	mov.l	.L_255,r2
+
+	bt	.L_pack_2
+	shar	r4
+
+	add	#1,r3 
+	cmp/ge	r2,r3	! Check for exp overflow
+
+	bt	.L_ret_inf
+	
+! Pack it finally
+.L_pack_2:
+	! Do not store implicit bit
+	mov.l	.L_nimp_bit,r2
+	mov	#23,r1
+
+	and	r2,r4
+
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r3)
+#else
+	shld	r1,r3
+#endif
+	mov.l	@r15+,r8
+
+	or	r4,r0
+        rts
+	or	r3,r0
+
+! Return infinity
+.L_ret_inf:
+	mov.l	.L_pinf,r2
+
+	mov.l	@r15+,r8
+	rts
+	or	r2,r0
+
+! Result must be denormalized
+.L_denorm_res:
+	mov	#0,r2
+	
+! Denormalizing loop with rounding
+.L_den_1:
+	shar	r4
+	movt	r6
+
+	tst	r3,r3
+	bt	.L_den_2
+
+	! Increment the exponent
+	add	#1,r3
+
+	tst	r6,r6
+	bt	.L_den_0
+
+	! Count number of ON bits shifted
+	add	#1,r2
+
+.L_den_0:
+	bra	.L_den_1
+	nop
+
+! Apply rounding
+.L_den_2:
+	cmp/eq	r6,r1
+	bf	.L_den_3
+
+	add	r6,r4
+	mov	#1,r1
+
+	! If halfway between two numbers,
+	! round towards LSB = 0
+	cmp/eq	r2,r1
+	bf	.L_den_3
+
+	shar	r4
+	shll	r4
+
+.L_den_3:
+
+	mov.l	@r15+,r8
+	rts
+	or	r4,r0
+	
+	.align 2
+.L_imp_bit:
+        .long   0x00800000
+
+.L_nimp_bit:
+	.long	0xFF7FFFFF
+
+.L_mask_fra:
+        .long   0x007FFFFF
+
+.L_pinf:
+        .long   0x7F800000
+
+.L_sign_bit:
+	.long	0x80000000
+
+.L_bit_25:
+	.long	0x01000000
+
+.L_chk_25:
+        .long   0x7F000000
+
+.L_255:
+	.long	0x000000FF
+
+ENDFUNC (GLOBAL (addsf3))
+ENDFUNC (GLOBAL (subsf3))
Index: gcc/config/sh/IEEE-754/mulsf3.S
===================================================================
--- gcc/config/sh/IEEE-754/mulsf3.S	(revision 0)
+++ gcc/config/sh/IEEE-754/mulsf3.S	(revision 0)
@@ -0,0 +1,347 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Routine for multiplying two floating point numbers
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 and r5
+! Result: r0
+
+! The arguments are referred as op1 and op2
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (mulsf3)
+        FUNC (GLOBAL (mulsf3))
+
+GLOBAL (mulsf3):
+	! Extract the sign bits
+	mov.l	.L_sign,r3
+	mov	r3,r0
+
+	and	r4,r3		! sign bit for op1
+	mov.l	.L_sign_mask,r6
+
+	! Mask out the sign bit from op1 and op2
+	and	r5,r0		! sign bit for op2
+	mov.l	.L_inf,r2
+
+	and	r6,r4
+	xor	r3,r0		! Final sign in r0
+
+	and	r6,r5
+	tst	r4,r4
+
+	! Check for zero
+	mov	r5,r7
+	! Check op1 for zero
+	SL(bt,	.L_op1_zero,
+	 mov	r4,r6)
+
+	tst	r5,r5
+	bt	.L_op2_zero	! op2 is zero
+
+	! Extract the exponents
+	and	r2,r6		! Exponent of op1
+	cmp/eq	r2,r6
+
+	and	r2,r7
+	bt	.L_inv_op1	! op1 is NaN or Inf
+
+	mov.l	.L_mant,r3
+	cmp/eq	r2,r7
+
+	and	r3,r4	! Mantissa of op1
+	bt	.L_ret_op2	! op2 is Nan or Inf
+
+	and	r3,r5	! Mantissa of op2
+
+	mov	#-23,r3
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLR23 (r6)
+	SHLR23 (r7)
+#else
+	shld	r3,r6
+	shld	r3,r7
+#endif
+	! Check for denormals
+	mov.l	.L_24bit,r3
+	tst	r6,r6
+
+	bt	.L_norm_op1	! op1 is denormal
+	add	#-127,r6	! Unbias op1's exp
+
+	tst	r7,r7
+	bt	.L_norm_op2	! op2 is denormal
+
+	add	#-127,r7	! Unbias op2's exp
+
+.L_multiply:
+	add	r6,r7	! Final exponent in r7
+	mov.l	.L_24bit,r1
+
+	! set 24th bit of mantissas
+	mov	#127,r3
+	or	r1,r4
+
+	DMULU_SAVE
+
+	! Multiply
+	or	r1,r5
+	DMULUL	(r4,r5,r4)
+
+	DMULUH	(r5)
+
+	DMULU_RESTORE
+
+	mov.l	.L_16bit,r6
+
+	! Check for extra MSB generated
+	tst	r5,r6
+
+	mov.l	.L_255,r1
+	bf	.L_shift_by_1	! Adjust the extra MSB
+	
+! Normalize the result with rounding
+.L_epil:
+	! Bias the exponent
+	add	#127,r7
+	cmp/ge	r1,r7
+	
+	! Check exponent overflow and underflow
+	bt	.L_ret_inf
+
+	cmp/pl	r7
+	bf	.L_denorm
+
+.L_epil_0:
+	mov	#-23,r3
+	shll	r5
+	mov	#0,r6
+
+! Fit resultant mantissa in 24 bits
+! Apply default rounding
+.L_loop_epil_0:
+        tst	r3,r3
+	bt	.L_loop_epil_out
+
+	add	#1,r3
+	shlr	r4
+
+	bra	.L_loop_epil_0
+	rotcr	r6
+
+! Round mantissa
+.L_loop_epil_out:
+	shll8	r5
+	or	r5,r4
+
+	mov.l	.L_mant,r2
+	mov	#23,r3
+
+	! Check last bit shifted out of result
+	tst	r6,r6
+	bt	.L_epil_2
+
+	! Round
+	shll	r6
+	movt	r5
+
+	add	r5,r4
+
+	! If this is the only ON bit shifted
+	! Round towards LSB = 0
+	tst	r6,r6
+	bf	.L_epil_2
+
+	shlr	r4
+	shll	r4
+
+.L_epil_2:
+	! Rounding may have produced extra MSB.
+	mov.l	.L_25bit,r5
+	tst	r4,r5
+
+	bt	.L_epil_1
+
+	add	#1,r7
+	shlr	r4
+
+.L_epil_1:
+#if defined (__sh1__) || defined (__sh2__) || defined (__SH2E__)
+	SHLL23 (r7)
+#else
+	shld	r3,r7
+#endif
+
+	and	r2,r4
+
+	or	r7,r4
+	rts
+	or	r4,r0
+
+.L_denorm:
+	mov	#0,r3
+
+.L_den_1:
+	shlr	r5
+	rotcr	r4
+
+	cmp/eq	r3,r7
+	bt	.L_epil_0
+
+	bra	.L_den_1
+	add	#1,r7
+	
+
+! Normalize the first argument
+.L_norm_op1:
+	shll	r4
+	tst	r3,r4
+
+	add	#-1,r6
+	bt	.L_norm_op1
+
+	! The biasing is by 126
+	add	#-126,r6
+	tst	r7,r7
+
+	bt      .L_norm_op2
+
+	bra	.L_multiply
+	add	#-127,r7
+
+! Normalize the second argument
+.L_norm_op2:
+	shll	r5
+	tst	r3,r5
+
+	add	#-1,r7
+	bt	.L_norm_op2
+
+	bra	.L_multiply
+	add	#-126,r7
+
+! op2 is zero. Check op1 for exceptional cases
+.L_op2_zero:
+	mov.l	.L_inf,r2
+	and	r2,r6
+
+	! Check if op1 is deterministic
+	cmp/eq	r2,r6
+	SL(bf,	.L_ret_op2,
+	 mov	#1,r1)
+
+	! Return NaN
+	rts
+	mov	#-1,r0
+
+! Adjust the extra MSB
+.L_shift_by_1:
+	shlr	r5
+	rotcr	r4
+
+	add	#1,r7		! Show the shift in exponent
+
+	cmp/gt	r3,r7
+	bf	.L_epil
+
+	! The resultant exponent is invalid
+	mov.l	.L_inf,r1
+	rts
+	or	r1,r0
+
+.L_ret_op1:
+	rts
+	or	r4,r0
+
+! op1 is zero. Check op2 for exceptional cases
+.L_op1_zero:
+	mov.l	.L_inf,r2
+	and	r2,r7
+	
+	! Check if op2 is deterministic
+	cmp/eq	r2,r7
+	SL(bf,	.L_ret_op1,
+	 mov	#1,r1)
+
+	! Return NaN
+	rts
+	mov	#-1,r0
+
+.L_inv_op1:
+	mov.l	.L_mant,r3
+	mov	r4,r6
+
+	and	r3,r6
+	tst	r6,r6
+
+	bf	.L_ret_op1	! op1 is Nan
+	! op1 is not Nan. It is Inf
+
+	cmp/eq	r2,r7
+	bf	.L_ret_op1	! op2 has a valid exponent
+
+! op2 has a invalid exponent. It could be Inf, -Inf, Nan.
+! It doesn't make any difference.
+.L_ret_op2:
+	rts
+	or	r5,r0
+
+.L_ret_inf:
+	rts
+	or	r2,r0
+
+.L_ret_zero:
+	mov	#0,r2
+	rts
+	or	r2,r0
+
+	
+	.align 2
+.L_mant:
+	.long 0x007FFFFF
+
+.L_inf:
+	.long 0x7F800000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_25bit:
+	.long 0x01000000
+
+.L_16bit:
+	.long 0x00008000
+
+.L_sign:
+	.long 0x80000000
+
+.L_sign_mask:
+	.long 0x7FFFFFFF
+
+.L_255:
+	.long 0x000000FF
+
+ENDFUNC (GLOBAL (mulsf3))
Index: gcc/config/sh/IEEE-754/floatsidf.S
===================================================================
--- gcc/config/sh/IEEE-754/floatsidf.S	(revision 0)
+++ gcc/config/sh/IEEE-754/floatsidf.S	(revision 0)
@@ -0,0 +1,146 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+!conversion of signed integer to double precision floating point number
+!Author:Rakesh Kumar
+!
+!Entry:
+!r4:operand 
+!
+!Exit:
+!r0,r1:result
+!
+!Note:argument is passed in reg r4 and the result is returned in 
+!regs r0 and r1.
+!
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+        .text
+        .align 5
+        .global GLOBAL (floatsidf)
+	FUNC (GLOBAL (floatsidf))
+
+GLOBAL (floatsidf):
+        mov.l   .L_sign,r0
+        mov     #0,r1
+
+	mov	r0,r2
+	tst	r4,r4 ! check r4 for zero
+
+	! Extract the sign
+	mov	r2,r3
+	SL(bt,	.L_ret_zero,
+	 and	r4,r0)
+
+	cmp/eq	r1,r0
+	not	r3,r3
+
+	mov	r1,r7
+	SL(bt,	.L_loop,
+	 and	r4,r3)
+
+	! Treat -2147483648 as special case
+	cmp/eq	r1,r3
+	neg	r4,r4
+
+	bt	.L_ret_min	
+
+.L_loop:
+	shll	r4	
+	mov	r4,r5
+
+	and	r2,r5
+	cmp/eq	r1,r5
+	
+	add	#1,r7
+	bt	.L_loop
+
+	mov.l	.L_initial_exp,r6
+	not	r2,r2
+	
+	and	r2,r4
+	mov	#21,r3
+
+	sub	r7,r6
+	mov	r4,r1
+
+	mov	#20,r7
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r3,r1
+#else
+        SHLL21 (r1)
+#endif
+	mov	#-11,r2
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r7,r6	! Exponent in proper place
+#else
+        SHLL20 (r6)
+#endif
+
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+        shld    r2,r4
+#else
+        SHLR11 (r4)
+#endif
+	or	r6,r0
+
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+#ifdef __LITTLE_ENDIAN__
+	or	r4,r1
+#else
+	or	r4,r0
+#endif
+	
+.L_ret_zero:
+	rts
+	mov	#0,r0
+
+.L_ret_min:
+	mov.l	.L_min,r0
+	
+#ifdef __LITTLE_ENDIAN__
+        mov     r0,r2
+        mov     r1,r0
+        mov     r2,r1
+#endif
+	rts
+	nop
+
+	.align 2
+
+.L_initial_exp:
+	.long 0x0000041E
+
+.L_sign:
+	.long 0x80000000
+
+.L_min:
+	.long 0xC1E00000
+
+ENDFUNC (GLOBAL (floatsidf))
Index: gcc/config/sh/IEEE-754/fixsfsi.S
===================================================================
--- gcc/config/sh/IEEE-754/fixsfsi.S	(revision 0)
+++ gcc/config/sh/IEEE-754/fixsfsi.S	(revision 0)
@@ -0,0 +1,160 @@
+/* Copyright (C) 2004 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+! Conversion routine for float to integer
+
+! Author: Rakesh Kumar
+
+! Arguments: r4 (in floating point format)
+! Return: r0
+
+! r4 is referred as op1
+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+
+	.text
+	.align 5
+	.global	GLOBAL (fixsfsi)
+	FUNC (GLOBAL (fixsfsi))
+
+GLOBAL (fixsfsi):
+	mov.l	.L_mask_sign,r7
+	mov	r4,r2
+
+	! Check for NaN
+	mov.l	.L_inf,r1
+	and	r7,r2
+
+	cmp/gt	r1,r2
+	mov	#127,r5
+
+	mov	r4,r3
+	SL(bt,	.L_epil,
+	 mov	#0,r0)
+
+	shll	r2
+	mov.l	.L_frac,r6
+
+	shlr16	r2
+	and	r6,r3	! r3 has fraction
+
+	shlr8	r2	! r2 has exponent
+	mov.l	.L_24bit,r1
+
+	! If exponent is less than 127, return 0
+	cmp/gt	r2,r5
+	or	r1,r3	! Set the implicit bit
+
+	mov.l	.L_157,r1
+	SL1(bt,	.L_epil,
+	 shll8	r3)
+
+	! If exponent is greater than 157,
+	! return the maximum/minumum integer
+	! value deducing from sign
+	cmp/gt	r1,r2
+	sub	r2,r1
+
+	mov.l	.L_sign,r2
+	SL(bt,	.L_ret_max,
+	 add	#1,r1)
+
+	and	r4,r2	! Sign in r2
+	neg	r1,r1
+
+	! Shift mantissa by exponent difference from 157
+#if !defined (__sh1__) && !defined (__sh2__) && !defined (__SH2E__)
+	shld	r1,r3
+#else
+        cmp/gt  r0,r1
+        bt      .L_mov_left
+
+.L_mov_right:
+        cmp/eq  r1,r0
+        bt      .L_ret
+
+        add     #1,r1
+        bra     .L_mov_right
+
+        shlr    r3
+
+.L_mov_left:
+        add     #-1,r1
+
+        shll    r3
+        cmp/eq  r1,r0
+
+        bf      .L_mov_left
+.L_ret:
+#endif
+	! If op1 is negative, negate the result
+	cmp/eq	r0,r2
+	SL(bf,	.L_negate,
+	 mov	r3,r0)
+
+! r0 has the appropriate value
+.L_epil:
+	rts
+	nop
+
+! Return the max/min integer value
+.L_ret_max:
+	and	r4,r2	! Sign in r2
+	mov.l	.L_max,r3
+
+	mov.l	.L_sign,r1
+	cmp/eq	r0,r2
+
+	mov	r3,r0
+	bt	.L_epil
+
+	! Negative number, return min int
+	rts
+	mov	r1,r0
+
+! Negate the result
+.L_negate:
+	rts
+	neg	r0,r0
+
+	.align 2
+.L_inf:
+	.long 0x7F800000
+
+.L_157:
+	.long 157
+
+.L_max:
+	.long 0x7FFFFFFF
+
+.L_frac:
+	.long 0x007FFFFF
+
+.L_sign:
+	.long 0x80000000
+
+.L_24bit:
+	.long 0x00800000
+
+.L_mask_sign:
+	.long 0x7FFFFFFF
+
+ENDFUNC (GLOBAL (fixsfsi))
Index: gcc/config/sh/ieee-754-sf.S
===================================================================
--- gcc/config/sh/ieee-754-sf.S	(revision 0)
+++ gcc/config/sh/ieee-754-sf.S	(revision 0)
@@ -0,0 +1,692 @@
+/* Copyright (C) 2004, 2006 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+!! libgcc software floating-point routines for Renesas SH /
+!! STMicroelectronics ST40 CPUs
+!! Contributed by J"orn Rennecke joern.rennecke@st.com
+
+#ifndef __SH_FPU_ANY__
+
+#include "lib1funcs.h"
+#include "insn-constants.h"
+
+/* Single-precision floating-point emulation.
+   We handle NANs, +-infinity, and +-zero.
+   However, we assume that for NANs, the topmost bit of the fraction is set.  */
+#ifdef L_nesf2
+/* -ffinite-math-only inline version, T := r4:SF == r5:SF
+	cmp/eq	r4,r5
+	mov	r4,r0
+	bt	0f
+	or	r5,r0
+	add	r0,r0
+	tst	r0,r0	! test for +0.0 == -0.0 ; -0.0 == +0.0
+	0:			*/
+	.balign 4
+	.global GLOBAL(nesf2)
+	HIDDEN_FUNC(GLOBAL(nesf2))
+GLOBAL(nesf2):
+        /* If the raw values are unequal, the result is unequal, unless
+	   both values are +-zero.
+	   If the raw values are equal, the result is equal, unless
+	   the values are NaN.  */
+	cmp/eq	r4,r5
+	mov.l   LOCAL(c_SF_NAN_MASK),r1
+	not	r4,r0
+	bt	LOCAL(check_nan)
+	mov	r4,r0
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(check_nan):
+	tst	r1,r0
+	rts
+	movt	r0
+	.balign 4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(nesf2))
+#endif /* L_nesf2 */
+
+#ifdef L_unordsf2
+	.balign 4
+	.global GLOBAL(unordsf2)
+	HIDDEN_FUNC(GLOBAL(unordsf2))
+GLOBAL(unordsf2):
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	not	r4,r0
+	tst	r1,r0
+	not	r5,r0
+	bt	LOCAL(unord)
+	tst	r1,r0
+LOCAL(unord):
+	rts
+	movt	r0
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(unordsf2))
+#endif /* L_unordsf2 */
+
+#if defined(L_gtsf2t) || defined(L_gtsf2t_trap)
+/* -ffinite-math-only inline version, T := r4:SF > r5:SF ? 0 : 1
+	cmp/pz	r4
+	mov	r4,r0
+	bf/s	0f
+	 cmp/hs	r5,r4
+	cmp/ge	r4,r5
+	or	r5,r0
+	bt	0f
+	add	r0,r0
+	tst	r0,r0
+	0:			*/
+#ifdef L_gtsf2t
+#define fun_label GLOBAL(gtsf2t)
+#else
+#define fun_label GLOBAL(gtsf2t_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater, the result true, unless
+	   any of them is a nan (but infinity is fine), or both values are
+	   +- zero.  Otherwise, the result false.  */
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	cmp/pz	r4
+	not	r5,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	r4,r0
+	bt	LOCAL(nan)
+	cmp/gt	r5,r4
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/gt	r4,r1)
+	bf	LOCAL(nan)
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	not	r4,r0
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	r4,r5
+#if defined(L_gtsf2t) && defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+#endif /* DELAYED_BRANCHES */
+	rts
+	movt	r0
+#ifdef L_gtsf2t
+LOCAL(check_nan):
+LOCAL(nan):
+	rts
+	mov	#0,r0
+#else /* ! L_gtsf2t */
+LOCAL(check_nan):
+	SLI(cmp/gt	r4,r1)
+	bf	LOCAL(nan)
+	rts
+	movt	r0
+LOCAL(nan):
+	mov	#0,r0
+	trapa	#0
+#endif /* ! L_gtsf2t */
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(fun_label)
+#endif /* L_gtsf2t */
+
+#if defined(L_gesf2f) || defined(L_gesf2f_trap)
+/* -ffinite-math-only inline version, T := r4:SF >= r5:SF */
+	cmp/pz	r5
+	mov	r4,r0
+	bf/s	0f
+	 cmp/hs	r4,r5
+	cmp/ge	r5,r4
+	or	r5,r0
+	bt	0f
+	add	r0,r0
+	tst	r0,r0
+	0:
+#ifdef L_gesf2f
+#define fun_label GLOBAL(gesf2f)
+#else
+#define fun_label GLOBAL(gesf2f_trap)
+#endif
+	.balign 4
+	.global fun_label
+	HIDDEN_FUNC(fun_label)
+fun_label:
+	/* If the raw values compare greater or equal, the result is
+	   true, unless any of them is a nan.  If both are -+zero, the
+	   result is true; otherwise, it is false.
+	   We use 0 as true and nonzero as false for this function.  */
+	mov.l	LOCAL(c_SF_NAN_MASK),r1
+	cmp/pz	r5
+	not	r4,r0
+	SLC(bf,	LOCAL(neg),
+	 tst	r1,r0)
+	mov	r4,r0
+	bt	LOCAL(nan)
+	cmp/gt	r4,r5
+	SLC(bf,	LOCAL(check_nan),
+	 cmp/ge	r1,r5)
+	bt	LOCAL(nan)
+	or	r5,r0
+	rts
+	add	r0,r0
+LOCAL(neg):
+	SLI(tst	r1,r0)
+	bt	LOCAL(nan)
+	not	r5,r0
+	tst	r1,r0
+	bt	LOCAL(nan)
+	cmp/hi	r5,r4
+#if defined(L_gesf2f) && defined(DELAYED_BRANCHES)
+LOCAL(nan): LOCAL(check_nan):
+#endif
+	rts
+	movt	r0
+#if defined(L_gesf2f) && ! defined(DELAYED_BRANCHES)
+LOCAL(check_nan):
+	cmp/ge	r1,r5
+LOCAL(nan):
+	rts
+	movt	r0
+#endif /* ! DELAYED_BRANCHES */
+#ifdef L_gesf2f_trap
+LOCAL(check_nan):
+	SLI(cmp/ge	r1,r5)
+	bt	LOCAL(nan)
+	rts
+LOCAL(nan):
+	movt	r0
+	trapa	#0
+#endif /* L_gesf2f_trap */
+	.balign	4
+LOCAL(c_SF_NAN_MASK):
+	.long SF_NAN_MASK
+	ENDFUNC(GLOBAL(gesf2f))
+#endif /* L_gesf2f */
+
+#ifndef DYN_SHIFT /* SH1 / SH2 code */
+#ifdef L_add_sub_sf3
+#include "IEEE-754/addsf3.S"
+#endif /* _add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+#include "IEEE-754/fixunssfsi.S"
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+#include "IEEE-754/fixsfsi.S"
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/divsf3.S"
+#endif /* L_divsf3 */
+#endif /* ! DYN_SHIFT */
+
+/* The actual arithmetic uses dynamic shift.  Supporting SH1 / SH2 here would
+   make this code too hard to maintain, so if you want to add SH1 / SH2
+   support, do it in a separate copy.  */
+#ifdef DYN_SHIFT
+#ifdef L_add_sub_sf3
+#include "IEEE-754/m3/addsf3.S"
+#endif /* L_add_sub_sf3 */
+
+#ifdef L_mulsf3
+#include "IEEE-754/m3/mulsf3.S"
+#endif /* L_mulsf3 */
+
+#ifdef L_fixunssfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get UINT_MAX, for set sign bit, you get 0.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixunssfsi)
+	FUNC(GLOBAL(fixunssfsi))
+GLOBAL(fixunssfsi):
+	mov.l	LOCAL(max),r2
+	mov	#-23,r1
+	mov	r4,r0
+	shad	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/ge	r2,r0
+	or	r2,r0
+	bt	LOCAL(retmax)
+	cmp/pz	r4
+	and	r1,r0
+	bf	LOCAL(ret0)
+	add	#-23,r4
+	rts
+	shld	r4,r0
+LOCAL(ret0):
+LOCAL(retmax):
+	rts
+	subc	r0,r0
+	.balign 4
+LOCAL(mask):
+	.long	0x00ffffff
+LOCAL(max):
+	.long	0x4f800000
+	ENDFUNC(GLOBAL(fixunssfsi))
+#endif /* L_fixunssfsi */
+
+#ifdef L_fixsfsi
+	! What is a bit unusal about this implementation is that the
+	! sign bit influences the result for NANs: for cleared sign bit, you
+	! get INT_MAX, for set sign bit, you get INT_MIN.
+	! However, since the result for NANs is undefined, this should be no
+	! problem.
+	! N.B. This is scheduled both for SH4-200 and SH4-300
+	.balign 4
+	.global GLOBAL(fixsfsi)
+	FUNC(GLOBAL(fixsfsi))
+	.balign	4
+GLOBAL(fixsfsi):
+	mov	r4,r0
+	shll	r4
+	mov	#-24,r1
+	bt	LOCAL(neg)
+	mov.l	LOCAL(max),r2
+	shld	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/pz	r4
+	add	#-23,r4
+	bf	LOCAL(ret0)
+	cmp/gt	r0,r2
+	bf	LOCAL(retmax)
+	and	r1,r0
+	addc	r1,r0
+	rts
+	shld	r4,r0
+
+	.balign	4
+LOCAL(neg):
+	mov.l	LOCAL(min),r2
+	shld	r1,r4
+	mov.l	LOCAL(mask),r1
+	add	#-127,r4
+	cmp/pz	r4
+	add	#-23,r4
+	bf	LOCAL(ret0)
+	cmp/gt	r0,r2
+	bf	LOCAL(retmin)
+	and	r1,r0
+	addc	r1,r0
+	shld	r4,r0	! SH4-200 will start this insn on a new cycle
+	rts
+	neg	r0,r0
+
+	.balign	4
+LOCAL(ret0):
+	rts
+	mov	#0,r0
+
+LOCAL(retmax):
+	mov	#-1,r0
+	rts
+	shlr	r0
+
+LOCAL(retmin):
+	mov	#1,r0
+	rts
+	rotr	r0
+
+	.balign 4
+LOCAL(mask):
+	.long	0x007fffff
+LOCAL(max):
+	.long	0x4f000000
+LOCAL(min):
+	.long	0xcf000000
+	ENDFUNC(GLOBAL(fixsfsi))
+#endif /* L_fixsfsi */
+
+#ifdef L_floatunssisf
+#include "IEEE-754/m3/floatunssisf.S"
+#endif /* L_floatunssisf */
+
+#ifdef L_floatsisf
+#include "IEEE-754/m3/floatsisf.S"
+#endif /* L_floatsisf */
+
+#ifdef L_divsf3
+#include "IEEE-754/m3/divsf3.S"
+#endif /* L_divsf3 */
+
+#ifdef L_hypotf
+	.balign 4
+	.global GLOBAL(hypotf)
+	FUNC(GLOBAL(hypotf))
+GLOBAL(hypotf):
+/* This integer implementation takes 71 to 72 cycles in the main path.
+   This is a bit slower than the SH4 can do this computation using double
+   precision hardware floating point - 57 cycles, or 69 with mode switches.  */
+ /* First, calculate x (r4) as the sum of the square of the fractions -
+    the exponent is calculated separately in r3.
+    Then, calculate sqrt(x) for the fraction by reciproot iteration.
+    We get an 7.5 bit inital value using linear approximation with two slopes
+    that are powers of two.
+    x (- [1. .. 2.)  y0 := 1.25 - x/4 - tab(x)   y (- (0.8 .. 1.0)
+    x (- [2. .. 4.)  y0 := 1.   - x/8 - tab(x)   y (- (0.5 .. 0.8)
+ x is represented with two bits before the point,
+ y with 0 bits before the binary point.
+ Thus, to calculate y0 := 1. - x/8 - tab(x), all you have to do is to shift x
+ right by 1, negate it, and subtract tab(x).  */
+
+ /* y1 := 1.5*y0 - 0.5 * (x * y0) * (y0 * y0)
+    z0 := x * y1
+    z1 := z0 + 0.5 * (y1 - (y1*y1) * z0) */
+
+	mov.l	LOCAL(xff000000),r1
+	add	r4,r4
+	mov	r4,r0
+	add	r5,r5
+	cmp/hs	r5,r4
+	sub	r5,r0
+	mov	#-24,r2
+	bf/s	LOCAL(r5_large)
+	shad	r2,r0
+	mov	r4,r3
+	shll8	r4
+	rotcr	r4
+	tst	#0xe0,r0
+	neg	r0,r0
+	bt	LOCAL(ret_abs_r3)
+	tst	r1,r5
+	shll8	r5
+	bt/s	LOCAL(denorm_r5)
+	cmp/hi	r3,r1
+	dmulu.l	r4,r4
+	bf	LOCAL(inf_nan)
+	rotcr	r5
+	shld	r0,r5
+LOCAL(denorm_r5_done):
+	sts	mach,r4
+	dmulu.l	r5,r5
+	mov.l	r6,@-r15
+	mov	#20,r6
+
+	sts	mach,r5
+LOCAL(add_frac):
+	mova	LOCAL(tab)-32,r0
+	mov.l	r7,@-r15
+	mov.w	LOCAL(x1380),r7
+	and	r1,r3
+	addc	r5,r4
+	mov.w	LOCAL(m25),r2	! -25
+	bf	LOCAL(frac_ok)
+	sub	r1,r3
+	rotcr	r4
+	cmp/eq	r1,r3	! did we generate infinity ?
+	bt	LOCAL(inf_nan)
+	shlr	r4
+	mov	r4,r1
+	shld	r2,r1
+	mov.b	@(r0,r1),r0
+	mov	r4,r1
+	shld	r6,r1
+	bra	LOCAL(frac_low2)
+	sub	r1,r7
+
+LOCAL(frac_ok):
+	mov	r4,r1
+	shld	r2,r1
+	mov.b	@(r0,r1),r1
+	cmp/pz	r4
+	mov	r4,r0
+	bt/s	LOCAL(frac_low)
+	shld	r6,r0
+	mov.w	LOCAL(xf80),r7
+	shlr	r0
+LOCAL(frac_low):
+	sub	r0,r7
+LOCAL(frac_low2):
+	mov.l	LOCAL(x40000080),r0 ! avoid denorm results near 1. << r3
+	sub	r1,r7	! {0.12}
+	mov.l	LOCAL(xfffe0000),r5 ! avoid rounding overflow near 4. << r3
+	swap.w	r7,r1	! {0.28}
+	dmulu.l	r1,r4 /* two issue cycles */
+	mulu.w	r7,r7  /* two issue cycles */
+	sts	mach,r2	! {0.26}
+	mov	r1,r7
+	shlr	r1
+	sts	macl,r6	! {0.24}
+	cmp/hi	r0,r4
+	shlr2	r2
+	bf	LOCAL(near_one)
+	shlr	r2	! {0.23} systemic error of linear approximation keeps y1 < 1
+	dmulu.l	r2,r6
+	cmp/hs	r5,r4
+	add	r7,r1	! {1.28}
+	bt	LOCAL(near_four)
+	shlr2	r1	! {1.26}
+	sts	mach,r0	! {0.15} x*y0^3 == {0.16} 0.5*x*y0^3
+	shlr2	r1	! {1.24}
+	shlr8	r1	! {1.16}
+	sett		! compensate for truncation of subtrahend, keep y1 < 1
+	subc	r0,r1   ! {0.16} y1;  max error about 3.5 ulp
+	swap.w	r1,r0
+	dmulu.l	r0,r4	! { 1.30 }
+	mulu.w	r1,r1
+	sts	mach,r2
+	shlr2	r0
+	sts	macl,r1
+	add	r2,r0
+	mov.l	LOCAL(xff000000),r6
+	add	r2,r0
+	dmulu.l	r1,r2
+	add	#127,r0
+	add	r6,r3	! precompensation for adding leading 1
+	sts	mach,r1
+	shlr	r3
+	mov.l	@r15+,r7
+	sub	r1,r0	! {0.31} max error about 50 ulp (+127)
+	mov.l	@r15+,r6
+	shlr8	r0	! {0.23} max error about 0.7 ulp
+	rts
+	add	r3,r0
+	
+LOCAL(r5_large):
+	mov	r5,r3
+	mov	#-31,r2
+	cmp/ge	r2,r0
+	shll8	r5
+	bf	LOCAL(ret_abs_r3)
+	rotcr	r5
+	tst	r1,r4
+	shll8	r4
+	bt/s	LOCAL(denorm_r4)
+	cmp/hi	r3,r1
+	dmulu.l	r5,r5
+	bf	LOCAL(inf_nan)
+	rotcr	r4
+LOCAL(denorm_r4_done):
+	shld	r0,r4
+	sts	mach,r5
+	dmulu.l	r4,r4
+	mov.l	r6,@-r15
+	mov	#20,r6
+	bra	LOCAL(add_frac)
+	sts	mach,r4
+
+LOCAL(near_one):
+	bra	LOCAL(assemble_sqrt)
+	mov	#0,r0
+LOCAL(near_four):
+	! exact round-to-nearest would add 255.  We add 256 for speed & compactness.
+	mov	r4,r0
+	shlr8	r0
+	add	#1,r0
+	tst	r0,r0
+	addc	r0,r3	! might generate infinity.
+LOCAL(assemble_sqrt):
+	mov.l	@r15+,r7
+	shlr	r3
+	mov.l	@r15+,r6
+	rts
+	add	r3,r0
+LOCAL(inf_nan):
+LOCAL(ret_abs_r3):
+	mov	r3,r0
+	rts
+	shlr	r0
+LOCAL(denorm_r5):
+	bf	LOCAL(inf_nan)
+	tst	r1,r4
+	bt	LOCAL(denorm_both)
+	dmulu.l	r4,r4
+	bra	LOCAL(denorm_r5_done)
+	shld	r0,r5
+LOCAL(denorm_r4):
+	bf	LOCAL(inf_nan)
+	tst	r1,r5
+	dmulu.l	r5,r5
+	bf	LOCAL(denorm_r4_done)
+LOCAL(denorm_both):	! normalize according to r3.
+	extu.w	r3,r2
+	mov.l	LOCAL(c__clz_tab),r0
+	cmp/eq	r3,r2
+	mov	#-8,r2
+	bt	0f
+	tst	r1,r3
+	mov	#-16,r2
+	bt	0f
+	mov	#-24,r2
+0:
+	shld	r2,r3
+	mov.l	r7,@-r15
+#ifdef __pic__
+	add	r0,r3
+	mova	 LOCAL(c__clz_tab),r0
+#endif
+	mov.b	@(r0,r3),r0
+	add	#32,r2
+	sub	r0,r2
+	shld	r2,r4
+	mov	r2,r7
+	dmulu.l	r4,r4
+	sts.l	pr,@-r15
+	mov	#1,r3
+	bsr	LOCAL(denorm_r5_done)
+	shld	r2,r5
+	mov.l	LOCAL(x01000000),r1
+	neg	r7,r2
+	lds.l	@r15+,pr
+	tst	r1,r0
+	mov.l	@r15+,r7
+	bt	0f
+	add	#1,r2
+	sub	r1,r0
+0:
+	rts
+	shld	r2,r0
+
+LOCAL(m25):
+	.word	-25
+LOCAL(x1380):
+	.word	0x1380
+LOCAL(xf80):
+	.word	0xf80
+	.balign	4
+LOCAL(xff000000):
+	.long	0xff000000
+LOCAL(x40000080):
+	.long	0x40000080
+LOCAL(xfffe0000):
+	.long	0xfffe0000
+LOCAL(x01000000):
+	.long	0x01000000
+LOCAL(c__clz_tab):
+#ifdef __pic__
+	.long	GLOBAL(clz_tab) - .
+#else
+	.long	GLOBAL(clz_tab)
+#endif
+
+/*
+double err(double x)
+{
+  return (x < 2. ? 1.25 - x/4. : 1. - x/8.) - 1./sqrt(x);
+}
+
+int
+main ()
+{
+  int i = 0;
+  double x, s, v;
+  double lx, hx;
+
+  s = 1./32.;
+  for (x = 1.; x < 4; x += s, i++)
+    {
+      lx = x;
+      hx = x + s - 1. / (1 << 30);
+      v = 0.5 * (err (lx) + err (hx));
+      printf ("%s% 4d%c",
+              (i & 7) == 0 ? "\t.byte\t" : "",
+              (int)(v * 4096 + 0.5) - 128,
+              (i & 7) == 7 ? '\n' : ',');
+    }
+  return 0;
+} */
+
+	.balign	4
+LOCAL(tab):
+	.byte	-113, -84, -57, -33, -11,   8,  26,  41
+	.byte	  55,  67,  78,  87,  94, 101, 106, 110
+	.byte	 113, 115, 115, 115, 114, 112, 109, 106
+	.byte	 101,  96,  91,  84,  77,  69,  61,  52
+	.byte	  51,  57,  63,  68,  72,  77,  80,  84
+	.byte	  87,  89,  91,  93,  95,  96,  97,  97
+	.byte	  97,  97,  97,  96,  95,  94,  93,  91
+	.byte	  89,  87,  84,  82,  79,  76,  72,  69
+	.byte	  65,  61,  57,  53,  49,  44,  39,  34
+	.byte	  29,  24,  19,  13,   8,   2,  -4, -10
+	.byte	 -17, -23, -29, -36, -43, -50, -57, -64
+	.byte	 -71, -78, -85, -93,-101,-108,-116,-124
+	ENDFUNC(GLOBAL(hypotf))
+#endif /* L_hypotf */
+#endif /* DYN_SHIFT */
+
+#endif /* __SH_FPU_ANY__ */
Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md	(revision 162269)
+++ gcc/config/sh/sh.md	(working copy)
@@ -107,6 +107,7 @@ (define_constants [
   (DR0_REG	64)
   (DR2_REG	66)
   (DR4_REG	68)
+  (FR4_REG	68)
   (FR23_REG	87)
 
   (TR0_REG	128)
@@ -174,6 +175,16 @@ (define_constants [
   (UNSPECV_WINDOW_END	10)
   (UNSPECV_CONST_END	11)
   (UNSPECV_EH_RETURN	12)
+
+  ;; NaN handling for software floating point:
+  ;; We require one bit specific for a precision to be set in all NaNs,
+  ;; so that we can test them with a not / tst sequence.
+  ;; ??? Ironically, this is the quiet bit for now, because that is the
+  ;; only bit set by __builtin_nan ("").
+  ;; ??? Should really use one bit lower and force it set by using
+  ;; a custom encoding function.
+  (SF_NAN_MASK         0x7fc00000)
+  (DF_NAN_MASK         0x7ff80000)
 ])
 
 ;; -------------------------------------------------------------------------
@@ -615,6 +626,14 @@ (define_insn "cmpeqsi_t"
 	cmp/eq	%1,%0"
    [(set_attr "type" "mt_group")])
 
+(define_insn "fpcmp_i1"
+  [(set (reg:SI T_REG)
+	(match_operator:SI 1 "soft_fp_comparison_operator"
+	  [(match_operand 0 "soft_fp_comparison_operand" "r") (const_int 0)]))]
+  "TARGET_SH1_SOFTFP"
+  "tst	%0,%0"
+   [(set_attr "type" "mt_group")])
+
 (define_insn "cmpgtsi_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:SI 0 "arith_reg_operand" "r,r")
@@ -1154,9 +1173,9 @@ (define_insn_and_split "*movsicc_umin"
 
 (define_insn "*movsicc_t_false"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
-	(if_then_else (eq (reg:SI T_REG) (const_int 0))
-		      (match_operand:SI 1 "general_movsrc_operand" "r,I08")
-		      (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+	(if_then_else:SI (eq (reg:SI T_REG) (const_int 0))
+			 (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+			 (match_operand:SI 2 "arith_reg_operand" "0,0")))]
   "TARGET_PRETEND_CMOVE
    && (arith_reg_operand (operands[1], SImode)
        || (immediate_operand (operands[1], SImode)
@@ -1167,9 +1186,9 @@ (define_insn "*movsicc_t_false"
 
 (define_insn "*movsicc_t_true"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
-	(if_then_else (ne (reg:SI T_REG) (const_int 0))
-		      (match_operand:SI 1 "general_movsrc_operand" "r,I08")
-		      (match_operand:SI 2 "arith_reg_operand" "0,0")))]
+	(if_then_else:SI (ne (reg:SI T_REG) (const_int 0))
+			 (match_operand:SI 1 "general_movsrc_operand" "r,I08")
+			 (match_operand:SI 2 "arith_reg_operand" "0,0")))]
   "TARGET_PRETEND_CMOVE
    && (arith_reg_operand (operands[1], SImode)
        || (immediate_operand (operands[1], SImode)
@@ -6849,6 +6868,50 @@ (define_insn "stuff_delay_slot"
 \f
 ;; Conditional branch insns
 
+(define_expand "cmpun_sdf"
+  [(unordered (match_operand 0 "" "") (match_operand 1 "" ""))]
+  ""
+  "
+{
+  HOST_WIDE_INT mask;
+  switch (GET_MODE (operands[0]))
+    {
+    case SFmode:
+      mask = SF_NAN_MASK;
+      break;
+    case DFmode:
+      mask = DF_NAN_MASK;
+      break;
+    default:
+      FAIL;
+    }
+  emit_insn (gen_cmpunsf_i1 (operands[0], operands[1],
+                            force_reg (SImode, GEN_INT (mask))));
+  DONE;
+}")
+
+(define_expand "cmpuneq_sdf"
+  [(uneq (match_operand 0 "" "") (match_operand 1 "" ""))]
+  ""
+  "
+{
+  HOST_WIDE_INT mask;
+  switch (GET_MODE (operands[0]))
+    {
+    case SFmode:
+      mask = SF_NAN_MASK;
+      break;
+    case DFmode:
+      mask = DF_NAN_MASK;
+      break;
+    default:
+      FAIL;
+    }
+  emit_insn (gen_cmpuneqsf_i1 (operands[0], operands[1],
+                              force_reg (SImode, GEN_INT (mask))));
+  DONE;
+}")
+
 (define_expand "cbranchint4_media"
   [(set (pc)
 	(if_then_else (match_operator 0 "shmedia_cbranch_comparison_operator"
@@ -9394,11 +9457,15 @@ (define_split
 (define_expand "cstoresf4"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(match_operator:SI 1 "sh_float_comparison_operator"
-	 [(match_operand:SF 2 "arith_operand" "")
-	  (match_operand:SF 3 "arith_operand" "")]))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+	 [(match_operand:SF 2 "nonmemory_operand" "")
+	  (match_operand:SF 3 "nonmemory_operand" "")]))]
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "if (TARGET_SHMEDIA)
      {
+       if (!arith_operand (operands[2], DFmode))
+	 operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+       if (!arith_operand (operands[3], DFmode))
+	 operands[3] = copy_to_mode_reg (DFmode, operands[3]);
        emit_insn (gen_cstore4_media (operands[0], operands[1],
 				     operands[2], operands[3]));
        DONE;
@@ -9407,18 +9474,22 @@ (define_expand "cstoresf4"
    if (! currently_expanding_to_rtl)
      FAIL;
    
-   sh_emit_compare_and_set (operands, SFmode);
+   sh_expand_float_scc (operands);
    DONE;
 ")
 
 (define_expand "cstoredf4"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(match_operator:SI 1 "sh_float_comparison_operator"
-	 [(match_operand:DF 2 "arith_operand" "")
-	  (match_operand:DF 3 "arith_operand" "")]))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+	 [(match_operand:DF 2 "nonmemory_operand" "")
+	  (match_operand:DF 3 "nonmemory_operand" "")]))]
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "if (TARGET_SHMEDIA)
      {
+       if (!arith_operand (operands[2], DFmode))
+	 operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+       if (!arith_operand (operands[3], DFmode))
+	 operands[3] = copy_to_mode_reg (DFmode, operands[3]);
        emit_insn (gen_cstore4_media (operands[0], operands[1],
 				     operands[2], operands[3]));
        DONE;
@@ -9427,7 +9498,7 @@ (define_expand "cstoredf4"
     if (! currently_expanding_to_rtl)
       FAIL;
    
-   sh_emit_compare_and_set (operands, DFmode);
+   sh_expand_float_scc (operands);
    DONE;
 ")
 
@@ -9765,7 +9836,7 @@ (define_expand "addsf3"
   [(set (match_operand:SF 0 "arith_reg_operand" "")
 	(plus:SF (match_operand:SF 1 "arith_reg_operand" "")
 		 (match_operand:SF 2 "arith_reg_operand" "")))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SH2E)
@@ -9773,6 +9844,12 @@ (define_expand "addsf3"
       expand_sf_binop (&gen_addsf3_i, operands);
       DONE;
     }
+  else if (TARGET_OSFP)
+    {
+      expand_sfunc_binop (SFmode, &gen_addsf3_i3, \"__addsf3\", PLUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*addsf3_media"
@@ -9871,6 +9948,22 @@ (define_insn_and_split "binary_sf_op1"
 }"
   [(set_attr "type" "fparith_media")])
 
+(define_insn "addsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(plus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R6_REG))
+   (clobber (reg:SI R7_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "addsf3_i"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(plus:SF (match_operand:SF 1 "fp_arith_reg_operand" "%0")
@@ -9885,7 +9978,7 @@ (define_expand "subsf3"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
 	(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
 		  (match_operand:SF 2 "fp_arith_reg_operand" "")))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SH2E)
@@ -9893,6 +9986,12 @@ (define_expand "subsf3"
       expand_sf_binop (&gen_subsf3_i, operands);
       DONE;
     }
+  else if (TARGET_OSFP)
+    {
+      expand_sfunc_binop (SFmode, &gen_subsf3_i3, \"__subsf3\", MINUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*subsf3_media"
@@ -9903,6 +10002,23 @@ (define_insn "*subsf3_media"
   "fsub.s	%1, %2, %0"
   [(set_attr "type" "fparith_media")])
 
+(define_insn "subsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(minus:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R5_REG))
+   (clobber (reg:SI R6_REG))
+   (clobber (reg:SI R7_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "subsf3_i"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(minus:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")
@@ -9915,10 +10031,15 @@ (define_insn "subsf3_i"
 
 (define_expand "mulsf3"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
-	(mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
-		 (match_operand:SF 2 "fp_arith_reg_operand" "")))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
-  "")
+        (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "")
+                 (match_operand:SF 2 "fp_arith_reg_operand" "")))]
+  "TARGET_SH2E || TARGET_SH3 || TARGET_SHMEDIA_FPU"
+  "if (TARGET_SH1_SOFTFP_MODE (SFmode))
+     { 
+       expand_sfunc_binop (SFmode, &gen_mulsf3_i3, \"__mulsf3\", MULT,
+			   operands);
+       DONE;
+    }")
 
 (define_insn "*mulsf3_media"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
@@ -9959,6 +10080,22 @@ (define_insn "mulsf3_i4"
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "single")])
 
+(define_insn "mulsf3_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(mult:SF (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_insn "mac_media"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(plus:SF (mult:SF (match_operand:SF 1 "fp_arith_reg_operand" "%f")
@@ -10119,6 +10256,149 @@ (define_insn "*fixsfsi"
   "ftrc	%1,%0"
   [(set_attr "type" "fp")])
 
+(define_insn "cmpnesf_i1"
+  [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+	(compare:CC_FP_NE (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtsf_i1"
+  [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+	(compare:CC_FP_GT (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltsf_i1"
+  [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+	(compare:CC_FP_UNLT (reg:SF R4_REG) (reg:SF R5_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqsf_i1_finite"
+  [(set (reg:SI T_REG)
+	(eq:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+	       (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+   (clobber (match_scratch:SI 2 "=0,1,?r"))]
+  "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+  "*
+{
+  if (which_alternative == 0)
+     output_asm_insn (\"cmp/eq\t%0,%1\;or\t%1,%2\;bt\t0f\", operands);
+  else if (which_alternative == 1)
+     output_asm_insn (\"cmp/eq\t%0,%1\;or\t%0,%2\;bt\t0f\", operands);
+  else
+    output_asm_insn (\"cmp/eq\t%0,%1\;mov\t%0,%2\;bt\t0f\;or\t%1,%2\",
+		     operands);
+  return \"add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+  [(set_attr "length" "10,10,12")])
+
+(define_insn "cmplesf_i1_finite"
+  [(set (reg:SI T_REG)
+	(le:SI (match_operand:SF 0 "arith_reg_operand" "r,r,r")
+	       (match_operand:SF 1 "arith_reg_operand" "r,r,r")))
+   (clobber (match_scratch:SI 2 "=0,1,r"))]
+  "TARGET_SH1 && ! TARGET_SH2E && flag_finite_math_only"
+  "*
+{
+  output_asm_insn (\"cmp/pz\t%0\", operands);
+  if (which_alternative == 2)
+    output_asm_insn (\"mov\t%0,%2\", operands);
+  if (TARGET_SH2)
+    output_asm_insn (\"bf/s\t0f\;cmp/hs\t%1,%0\;cmp/ge\t%0,%1\", operands);
+  else
+    output_asm_insn (\"bt\t1f\;bra\t0f\;cmp/hs\t%1,%0\\n1:\tcmp/ge\t%0,%1\",
+		     operands);
+  if (which_alternative == 1)
+    output_asm_insn (\"or\t%0,%2\", operands);
+  else
+    output_asm_insn (\"or\t%1,%2\", operands);
+  return \"bt\t0f\;add\t%2,%2\;tst\t%2,%2\\n0:\";
+}"
+  [(set_attr "length" "18,18,20")])
+
+(define_insn "cmpunsf_i1"
+  [(set (reg:SI T_REG)
+	(unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
+		      (match_operand:SF 1 "arith_reg_operand" "r,r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+   (clobber (match_scratch:SI 3 "=0,&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+  [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqsf_i1"
+  [(set (reg:SI T_REG)
+	(uneq:SI (match_operand:SF 0 "arith_reg_operand" "r")
+		 (match_operand:SF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "*
+{
+  output_asm_insn (\"not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\", operands);
+  output_asm_insn (\"bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%0,%1\", operands);
+  output_asm_insn (\"mov\t%0,%3\;bt\t0f\;or\t%1,%3\", operands);
+  return \"add\t%3,%3\;tst\t%3,%3\\n0:\";
+}"
+  [(set_attr "length" "24")])
+
+(define_insn "movcc_fp_ne"
+  [(set (match_operand:CC_FP_NE 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_NE 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_gt"
+  [(set (match_operand:CC_FP_GT 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_GT 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
+(define_insn "movcc_fp_unlt"
+  [(set (match_operand:CC_FP_UNLT 0 "general_movdst_operand"
+	    "=r,r,m")
+	(match_operand:CC_FP_UNLT 1 "general_movsrc_operand"
+	 "rI08,mr,r"))]
+  "TARGET_SH1"
+  "@
+	mov	%1,%0
+	mov.l	%1,%0
+	mov.l	%1,%0"
+  [(set_attr "type" "move,load,store")])
+
 (define_insn "cmpgtsf_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
@@ -10146,6 +10426,22 @@ (define_insn "ieee_ccmpeqsf_t"
   "* return output_ieee_ccmpeq (insn, operands);"
   [(set_attr "length" "4")])
 
+(define_insn "*cmpltgtsf_t"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+  "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")])
+
+(define_insn "*cmporderedsf_t"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:SF 1 "fp_arith_reg_operand" "f")))]
+  "TARGET_SH2E && ! (TARGET_SH4 || TARGET_SH2A_SINGLE)"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")])
+
 
 (define_insn "cmpgtsf_t_i4"
   [(set (reg:SI T_REG)
@@ -10178,6 +10474,26 @@ (define_insn "*ieee_ccmpeqsf_t_4"
   [(set_attr "length" "4")
    (set_attr "fp_mode" "single")])
 
+(define_insn "*cmpltgtsf_t_4"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "single")])
+
+(define_insn "*cmporderedsf_t_4"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:SF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:SF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "single")])
+
 (define_insn "cmpeqsf_media"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(eq:SI (match_operand:SF 1 "fp_arith_reg_operand" "f")
@@ -10213,18 +10529,24 @@ (define_insn "cmpunsf_media"
 (define_expand "cbranchsf4"
   [(set (pc)
 	(if_then_else (match_operator 0 "sh_float_comparison_operator"
-		       [(match_operand:SF 1 "arith_operand" "")
-			(match_operand:SF 2 "arith_operand" "")])
+		       [(match_operand:SF 1 "nonmemory_operand" "")
+			(match_operand:SF 2 "nonmemory_operand" "")])
 		      (match_operand 3 "" "")
 		      (pc)))]
-  "TARGET_SH2E || TARGET_SHMEDIA_FPU"
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SHMEDIA)
-    emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
-					  operands[3]));
+    {
+      if (!arith_operand (operands[1], SFmode))
+	operands[1] = copy_to_mode_reg (SFmode, operands[1]);
+      if (!arith_operand (operands[2], SFmode))
+	operands[2] = copy_to_mode_reg (SFmode, operands[2]);
+      emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+					    operands[2], operands[3]));
+    }
   else
-    sh_emit_compare_and_branch (operands, SFmode);
+    sh_expand_float_cbranch (operands);
   DONE;
 }")
 
@@ -10426,11 +10748,39 @@ (define_insn "abssf2_i"
   [(set_attr "type" "fmove")
    (set_attr "fp_mode" "single")])
 
+(define_expand "abssc2"
+  [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
+	(abs:SF (match_operand:SC 1 "fp_arith_reg_operand" "")))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "
+{
+  expand_sfunc_unop (SCmode, &gen_abssc2_i3, \"__hypotf\", ABS, operands);
+  DONE;
+}")
+
+(define_insn "abssc2_i3"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+	(abs:SF (reg:SC R4_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI R5_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_OSFP && ! TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "adddf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(plus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
 		 (match_operand:DF 2 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_FPU_DOUBLE || TARGET_SH3"
   "
 {
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10438,6 +10788,12 @@ (define_expand "adddf3"
       expand_df_binop (&gen_adddf3_i, operands);
       DONE;
     }
+  else if (TARGET_SH3)
+    {
+      expand_sfunc_binop (DFmode, &gen_adddf3_i3_wrap, \"__adddf3\", PLUS,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*adddf3_media"
@@ -10458,6 +10814,30 @@ (define_insn "adddf3_i"
   [(set_attr "type" "dfp_arith")
    (set_attr "fp_mode" "double")])
 
+(define_expand "adddf3_i3_wrap"
+  [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+  "TARGET_SH3"
+  "
+{
+  emit_insn (gen_adddf3_i3 (operands[1]));
+  emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+  DONE;
+}")
+
+(define_insn "adddf3_i3"
+  [(set (reg:DF R0_REG)
+	(plus:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:DI R2_REG))
+   (clobber (reg:DF R4_REG))
+   (clobber (reg:DF R6_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH3"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "subdf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(minus:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10494,7 +10874,7 @@ (define_expand "muldf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(mult:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
 		 (match_operand:DF 2 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_FPU_DOUBLE || TARGET_SH3"
   "
 {
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10502,6 +10882,12 @@ (define_expand "muldf3"
       expand_df_binop (&gen_muldf3_i, operands);
       DONE;
     }
+  else if (TARGET_SH3)
+    {
+      expand_sfunc_binop (DFmode, &gen_muldf3_i3_wrap, \"__muldf3\", MULT,
+			  operands);
+      DONE;
+    }
 }")
 
 (define_insn "*muldf3_media"
@@ -10522,6 +10908,32 @@ (define_insn "muldf3_i"
   [(set_attr "type" "dfp_mul")
    (set_attr "fp_mode" "double")])
 
+(define_expand "muldf3_i3_wrap"
+  [(match_operand:DF 0 "" "") (match_operand:SI 1 "" "")]
+  "TARGET_SH3"
+  "
+{
+  emit_insn (gen_muldf3_i3 (operands[1]));
+  emit_move_insn (operands[0], gen_rtx_REG (DFmode, R0_REG));
+  DONE;
+}")
+
+(define_insn "muldf3_i3"
+  [(set (reg:DF R0_REG)
+	(mult:DF (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI MACH_REG))
+   (clobber (reg:SI MACL_REG))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:DI R2_REG))
+   (clobber (reg:DF R4_REG))
+   (clobber (reg:DF R6_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH3"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "divdf3"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(div:DF (match_operand:DF 1 "fp_arith_reg_operand" "")
@@ -10651,6 +11063,73 @@ (define_insn "fix_truncdfsi2_i"
 ;; 	      (use (match_dup 2))])
 ;;    (set (match_dup 0) (reg:SI FPUL_REG))])
 
+(define_insn "cmpnedf_i1"
+  [(set (match_operand:CC_FP_NE 0 "register_operand" "=z")
+	(compare:CC_FP_NE (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpgtdf_i1"
+  [(set (match_operand:CC_FP_GT 0 "register_operand" "=z")
+	(compare:CC_FP_GT (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpunltdf_i1"
+  [(set (match_operand:CC_FP_UNLT 0 "register_operand" "=z")
+	(compare:CC_FP_UNLT (reg:DF R4_REG) (reg:DF R6_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "cmpeqdf_i1_finite"
+  [(set (reg:SI T_REG)
+	(eq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+	       (match_operand:DF 1 "arith_reg_operand" "r")))
+   (clobber (match_scratch:SI 2 "=&r"))]
+  "TARGET_SH1_SOFTFP && flag_finite_math_only"
+  "cmp/eq\t%R0,%R1\;mov\t%S0,%2\;bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;or\t%S1,%2\;add\t%2,%2\;or\t%R0,%2\;tst\t%2,%2\\n0:"
+  [(set_attr "length" "18")])
+
+(define_insn "cmpundf_i1"
+  [(set (reg:SI T_REG)
+	(unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
+		      (match_operand:DF 1 "arith_reg_operand" "r,r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
+   (clobber (match_scratch:SI 3 "=0,&r"))]
+  "TARGET_SH1 && ! TARGET_SH2E"
+  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
+  [(set_attr "length" "10")])
+
+;; ??? This is a lot of code with a lot of branches; a library function
+;; might be better.
+(define_insn "cmpuneqdf_i1"
+  [(set (reg:SI T_REG)
+	(uneq:SI (match_operand:DF 0 "arith_reg_operand" "r")
+		 (match_operand:DF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
+  "TARGET_SH1_SOFTFP"
+  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;bt\t0f\;cmp/eq\t%R0,%R1\; bf\t0f\;cmp/eq\t%S0,%S1\;bt\t0f\;mov\t%S0,%3\;or\t%S1,%3\;add\t%3,%3\;or\t%R0,%3\;tst\t%3,%3\\n0:"
+  [(set_attr "length" "30")])
+
 (define_insn "cmpgtdf_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:DF 0 "arith_reg_operand" "f")
@@ -10682,6 +11161,26 @@ (define_insn "*ieee_ccmpeqdf_t"
   [(set_attr "length" "4")
    (set_attr "fp_mode" "double")])
 
+(define_insn "*cmpltgtdf_t"
+  [(set (reg:SI T_REG)
+	(ltgt:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+		 (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_DOUBLE"
+  "fcmp/gt\t%1,%0\;bt\t0f\;fcmp/gt\t%0,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "double")])
+
+(define_insn "*cmpordereddf_t_4"
+  [(set (reg:SI T_REG)
+	(ordered:SI (match_operand:DF 0 "fp_arith_reg_operand" "f")
+		    (match_operand:DF 1 "fp_arith_reg_operand" "f")))
+   (use (match_operand:PSI 2 "fpscr_operand" "c"))]
+  "TARGET_SH4 || TARGET_SH2A_SINGLE"
+  "fcmp/eq\t%0,%0\;bf\t0f\;fcmp/eq\t%1,%1\\n0:"
+  [(set_attr "length" "6")
+   (set_attr "fp_mode" "double")])
+
 (define_insn "cmpeqdf_media"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(eq:SI (match_operand:DF 1 "fp_arith_reg_operand" "f")
@@ -10717,18 +11216,24 @@ (define_insn "cmpundf_media"
 (define_expand "cbranchdf4"
   [(set (pc)
 	(if_then_else (match_operator 0 "sh_float_comparison_operator"
-		       [(match_operand:DF 1 "arith_operand" "")
-			(match_operand:DF 2 "arith_operand" "")])
+		       [(match_operand:DF 1 "nonmemory_operand" "")
+			(match_operand:DF 2 "nonmemory_operand" "")])
 		      (match_operand 3 "" "")
 		      (pc)))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SHMEDIA)
-    emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1], operands[2],
-					  operands[3]));
+    {
+      if (!arith_operand (operands[1], DFmode))
+	operands[1] = copy_to_mode_reg (DFmode, operands[1]);
+      if (!arith_operand (operands[2], DFmode))
+	operands[2] = copy_to_mode_reg (DFmode, operands[2]);
+      emit_jump_insn (gen_cbranchfp4_media (operands[0], operands[1],
+					    operands[2], operands[3]));
+    }
   else
-    sh_emit_compare_and_branch (operands, DFmode);
+    sh_expand_float_cbranch (operands);
   DONE;
 }")
 
@@ -10823,7 +11328,7 @@ (define_insn "absdf2_i"
 (define_expand "extendsfdf2"
   [(set (match_operand:DF 0 "fp_arith_reg_operand" "")
 	(float_extend:DF (match_operand:SF 1 "fpul_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "
 {
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
@@ -10832,6 +11337,18 @@ (define_expand "extendsfdf2"
 					get_fpscr_rtx ()));
       DONE;
     }
+  else if (TARGET_SH2E)
+    {
+      expand_sfunc_unop (SFmode, &gen_extendsfdf2_i2e, \"__extendsfdf2\",
+			 FLOAT_EXTEND, operands);
+      DONE;
+    }
+  else if (TARGET_SH1)
+    {
+      expand_sfunc_unop (SFmode, &gen_extendsfdf2_i1, \"__extendsfdf2\",
+			 FLOAT_EXTEND, operands);
+      DONE;
+    }
 }")
 
 (define_insn "*extendsfdf2_media"
@@ -10850,16 +11367,94 @@ (define_insn "extendsfdf2_i4"
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "double")])
 
+;; ??? In order to use this efficiently, we'd have to have an extra
+;; register class for r0 and r1 - and that would cause repercussions in
+;; register allocation elsewhere.  So just say we clobber r0 / r1, and
+;; that we can use an arbitrary target.  */
+(define_insn_and_split "extendsfdf2_i1"
+  [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+	(float_extend:DF (reg:SF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (reg:DF R0_REG))]
+  "emit_insn (gen_extendsfdf2_i1_r0 (operands[1]));"
+  [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i1_r0"
+  [(set (reg:DF R0_REG) (float_extend:DF (reg:SF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn_and_split "extendsfdf2_i2e"
+  [(set (match_operand:DF 0 "arith_reg_dest" "=r")
+	(float_extend:DF (reg:SF FR4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI FPUL_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (reg:DF R0_REG))]
+  "emit_insn (gen_extendsfdf2_i2e_r0 (operands[1]));"
+  [(set_attr "type" "sfunc")])
+
+(define_insn "extendsfdf2_i2e_r0"
+  [(set (reg:DF R0_REG) (float_extend:DF (reg:SF FR4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (clobber (reg:SI R4_REG))
+   (clobber (reg:SI FPUL_REG))
+   (use (match_operand:SI 0 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "jsr	@%0%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 (define_expand "truncdfsf2"
   [(set (match_operand:SF 0 "fpul_operand" "")
-	(float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
-  "(TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU"
-  "
-{
+        (float_truncate:SF (match_operand:DF 1 "fp_arith_reg_operand" "")))]
+  "TARGET_SH1 || TARGET_SHMEDIA_FPU" 
+  "                      
+{                     
   if (TARGET_SH4 || TARGET_SH2A_DOUBLE)
     {
       emit_df_insn (gen_truncdfsf2_i4 (operands[0], operands[1],
-				       get_fpscr_rtx ()));
+                                       get_fpscr_rtx ()));
+      DONE;
+    }
+  else if (TARGET_SH2E)
+    {
+      expand_sfunc_unop (DFmode, &gen_truncdfsf2_i2e, \"__truncdfsf2\",
+			 FLOAT_TRUNCATE, operands);
+      DONE;
+    }
+  else if (TARGET_SH1)
+    {
+      expand_sfunc_unop (DFmode, &gen_truncdfsf2_i1, \"__truncdfsf2\",
+			 FLOAT_TRUNCATE, operands);
       DONE;
     }
 }")
@@ -10879,6 +11474,37 @@ (define_insn "truncdfsf2_i4"
   "fcnvds  %1,%0"
   [(set_attr "type" "fp")
    (set_attr "fp_mode" "double")])
+
+(define_insn "truncdfsf2_i1"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=z")
+       (float_truncate:SF (reg:DF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && !TARGET_SH2E"
+  "jsr @%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
+(define_insn "truncdfsf2_i2e"
+  [(set (match_operand:SF 0 "arith_reg_dest" "=w")
+	(float_truncate:SF (reg:DF R4_REG)))
+   (clobber (reg:SI T_REG))
+   (clobber (reg:SI PR_REG))
+   (clobber (reg:SI FPUL_REG))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
+   (clobber (reg:SI R3_REG))
+   (use (match_operand:SI 1 "arith_reg_operand" "r"))]
+  "TARGET_SH1_SOFTFP && TARGET_SH2E"
+  "jsr	@%1%#"
+  [(set_attr "type" "sfunc")
+   (set_attr "needs_delay_slot" "yes")])
+
 \f
 ;; Bit field extract patterns.  These give better code for packed bitfields,
 ;; because they allow auto-increment addresses to be generated.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: SH optimized software floating point routines
  2010-07-17 21:23         ` Kaz Kojima
@ 2010-07-19  9:23           ` Naveen H. S
  0 siblings, 0 replies; 30+ messages in thread
From: Naveen H. S @ 2010-07-19  9:23 UTC (permalink / raw)
  To: amylaar; +Cc: gcc, Prafulla Thakare, Kaz Kojima

Hi.

Thank you for the modified patch. 

I have applied the patch to gcc-4.5 sources and checking the regression
for SH[1234].
I will run some more tests on the modified (patched) toolchain.  

I will share the test results after the regression and other tests are
complete.

Regards,
Naveen


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-19  0:59       ` Joern Rennecke
@ 2010-07-20 13:35         ` Kaz Kojima
  2010-07-21 11:25           ` Christian Bruel
                             ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Kaz Kojima @ 2010-07-20 13:35 UTC (permalink / raw)
  To: joern.rennecke; +Cc: Naveen.S, gcc, Prafulla.Thakare

Joern Rennecke <joern.rennecke@embecosm.com> wrote:
> I've found two bugs in truncdfsf2;
> I've also added back a number of hunks that Naveen had dropped.
> 
> Note that most of the patch has been prepared in 2006, so that is the
> proper most recent copyright date for those files that haven't been touched
> save for swapping the Copyright notice.

I've got some regressions with "make check" on sh4-unknown-linux-gnu.
It looks that all of them are failed with the undefined references to
__unorddf2/__unordsf2 when -mieee enabled.
I'm trying the attached patch over sh-softfp-20100718-2131 patch.
All regressions go away with it on cross sh4-unknown-linux-gnu,
though the native bootstrap will take a few days more.

BTW, it looks that softfp __unord?f2 routines check signaling NaNs
only.  This makes __builtin_isnan return false for quiet NaNs for
which current fp-bit ones return true when -mieee enabled.  Perhaps
that change of behavior might be OK for software FP.

Regards,
	kaz
--

diff -uprN ORIG/trunk/gcc/config/sh/ieee-754-df.S trunk/gcc/config/sh/ieee-754-df.S
--- ORIG/trunk/gcc/config/sh/ieee-754-df.S	2010-07-20 11:39:29.000000000 +0900
+++ trunk/gcc/config/sh/ieee-754-df.S	2010-07-20 11:36:15.000000000 +0900
@@ -23,11 +23,11 @@ see the files COPYING3 and COPYING.RUNTI
 !! STMicroelectronics ST40 CPUs
 !! Contributed by J"orn Rennecke joern.rennecke@st.com
 
-#ifndef __SH_FPU_DOUBLE__
-
 #include "lib1funcs.h"
 #include "insn-constants.h"
 
+#ifndef __SH_FPU_DOUBLE__
+
 /* Double-precision floating-point emulation.
    We handle NANs, +-infinity, and +-zero.
    However, we assume that for NANs, the topmost bit of the fraction is set.  */
@@ -123,7 +123,7 @@ GLOBAL(unorddf2):
 	mov.l	LOCAL(c_DF_NAN_MASK),r1
 	not	DBL0H,r0
 	tst	r1,r0
-	not	r6,r0
+	not	DBL1H,r0
 	bt	LOCAL(unord)
 	tst	r1,r0
 LOCAL(unord):
@@ -788,4 +788,52 @@ LOCAL(x7ff80000):
 #endif /* L_divdf3 */
 #endif /* DYN_SHIFT */
 
+#else /* __SH_FPU_DOUBLE__ */
+
+#ifdef L_unorddf2
+	.balign 4
+	.global GLOBAL(unorddf2)
+	FUNC(GLOBAL(unorddf2))
+GLOBAL(unorddf2):
+	flds	fr4,fpul
+	sts	fpul,r4
+	shll	r4
+	mov.l	LOCAL(c_DF_QNAN_MASK),r1
+	shlr	r4
+	cmp/eq	r4,r1
+	bf/s	LOCAL(unord_check_qnan0)
+	flds	fr5,fpul
+	sts	fpul,r5
+	tst	r5,r5
+	bt	LOCAL(unord_next)
+LOCAL(unord_check_qnan0):
+	not	r4,r0
+	tst	r1,r0
+	bt	LOCAL(unord)
+LOCAL(unord_next):
+	flds	fr6,fpul
+	sts	fpul,r6
+	shll	r6
+	shlr	r6
+	cmp/eq	r6,r1
+	bf/s	LOCAL(unord_check_qnan1)
+	flds	fr7,fpul
+	sts	fpul,r7
+	tst	r7,r7
+	bt	LOCAL(unord_fail)	
+LOCAL(unord_check_qnan1):
+	not	r6,r0
+	tst	r1,r0
+LOCAL(unord):
+	rts
+	movt	r0
+LOCAL(unord_fail):
+	rts
+	mov	#0,r0
+	.balign	4
+LOCAL(c_DF_QNAN_MASK):
+	.long 0x7ff00000
+	ENDFUNC(GLOBAL(unorddf2))
+#endif /* L_unorddf2 */
+
 #endif /* __SH_FPU_DOUBLE__ */
diff -uprN ORIG/trunk/gcc/config/sh/ieee-754-sf.S trunk/gcc/config/sh/ieee-754-sf.S
--- ORIG/trunk/gcc/config/sh/ieee-754-sf.S	2010-07-20 11:39:30.000000000 +0900
+++ trunk/gcc/config/sh/ieee-754-sf.S	2010-07-20 11:35:58.000000000 +0900
@@ -23,11 +23,11 @@ see the files COPYING3 and COPYING.RUNTI
 !! STMicroelectronics ST40 CPUs
 !! Contributed by J"orn Rennecke joern.rennecke@st.com
 
-#ifndef __SH_FPU_ANY__
-
 #include "lib1funcs.h"
 #include "insn-constants.h"
 
+#ifndef __SH_FPU_ANY__
+
 /* Single-precision floating-point emulation.
    We handle NANs, +-infinity, and +-zero.
    However, we assume that for NANs, the topmost bit of the fraction is set.  */
@@ -689,4 +689,42 @@ LOCAL(tab):
 #endif /* L_hypotf */
 #endif /* DYN_SHIFT */
 
+#else /* __SH_FPU_ANY__ */
+
+#ifdef L_unordsf2
+	.balign 4
+	.global GLOBAL(unordsf2)
+	FUNC(GLOBAL(unordsf2))
+GLOBAL(unordsf2):
+	flds	fr5,fpul
+	sts	fpul,r4
+	shll	r4
+	mov.l	LOCAL(c_SF_QNAN_MASK),r1
+	shlr	r4
+	cmp/eq	r4,r1
+	bt/s	LOCAL(unord_next)
+	not	r4,r0
+	tst	r1,r0
+	bt	LOCAL(unord)
+LOCAL(unord_next):
+	flds	fr4,fpul
+	sts	fpul,r5
+	shll	r5
+	shlr	r5
+	cmp/eq	r5,r1
+	bt/s	LOCAL(unord_fail)
+	not	r5,r0
+	tst	r1,r0
+LOCAL(unord):
+	rts
+	movt	r0
+LOCAL(unord_fail):
+	rts
+	mov	#0,r0
+	.balign	4
+LOCAL(c_SF_QNAN_MASK):
+	.long 0x7f800000
+	ENDFUNC(GLOBAL(unordsf2))
+#endif /* L_unordsf2 */
+
 #endif /* __SH_FPU_ANY__ */

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-20 13:35         ` Kaz Kojima
@ 2010-07-21 11:25           ` Christian Bruel
  2010-07-22  7:22             ` Christian Bruel
  2010-07-22  0:45           ` Kaz Kojima
  2010-07-22  6:41           ` Joern Rennecke
  2 siblings, 1 reply; 30+ messages in thread
From: Christian Bruel @ 2010-07-21 11:25 UTC (permalink / raw)
  To: Kaz Kojima; +Cc: joern.rennecke, Naveen.S, gcc, Prafulla.Thakare

[-- Attachment #1: Type: text/plain, Size: 790 bytes --]

Hi Kaz,

Kaz Kojima wrote:

> 
> BTW, it looks that softfp __unord?f2 routines check signaling NaNs
> only.  This makes __builtin_isnan return false for quiet NaNs for
> which current fp-bit ones return true when -mieee enabled.  Perhaps
> that change of behavior might be OK for software FP.

I use the attached patch to handle the QNaNs in the assembly solf-fp. 
Need to be updated for trunk (and update the dates in changelogs). Will do.

Cheers

Christian

2010-04-20  Christian Bruel  <christian.bruel@st.com>

	* gcc.dg/builtins-nan.c: New test.

2010-04-20  Christian Bruel  <christian.bruel@st.com>

	* config/sh/ieee-754-df.S (nedf2f): Don't check Qbit for NaNs.
	* config/sh/ieee-754-sf.S (nesf2f): Likewise.
	* config/sh/sh.md (cmpunsf_i1, cmpundf_i1): Likewise. Clobber R2.





[-- Attachment #2: qnans.patch --]
[-- Type: text/plain, Size: 4377 bytes --]

2010-04-20  Christian Bruel  <christian.bruel@st.com>

	* gcc.dg/builtins-nan.c: New test.

2010-04-20  Christian Bruel  <christian.bruel@st.com>

	* config/sh/ieee-754-df.S (nedf2f): Don't check Qbit for NaNs.
	* config/sh/ieee-754-sf.S (nesf2f): Likewise.
	* config/sh/sh.md (cmpunsf_i1, cmpundf_i1): Likewise. Clobber R2.

Index: gcc/config/sh/ieee-754-df.S
===================================================================
--- gcc/config/sh/ieee-754-df.S	(revision 1352)
+++ gcc/config/sh/ieee-754-df.S	(revision 1373)
@@ -88,11 +88,12 @@
 	HIDDEN_FUNC(GLOBAL(nedf2f))
 GLOBAL(nedf2f):
 	cmp/eq	DBL0L,DBL1L
+	bf.s 	LOCAL(ne)
+	mov     #1,r0
+	cmp/eq	DBL0H,DBL1H
 	mov.l   LOCAL(c_DF_NAN_MASK),r1
-	bf LOCAL(ne)
-	cmp/eq	DBL0H,DBL1H
-	not	DBL0H,r0
-	bt	LOCAL(check_nan)
+	bt.s	LOCAL(check_nan)
+	not	DBL0H,r0	
 	mov	DBL0H,r0
 	or	DBL1H,r0
 	add	r0,r0
@@ -100,11 +101,17 @@
 	or	DBL0L,r0
 LOCAL(check_nan):
 	tst	r1,r0
-	rts
+	bt.s 	LOCAL(nan)
+	mov	#12,r2
+	shll16  r2
+	xor 	r2,r1
+	tst 	r1,r0
+LOCAL(nan):	
 	movt	r0
 LOCAL(ne):
 	rts
-	mov #1,r0
+	nop
+	
 	.balign 4
 LOCAL(c_DF_NAN_MASK):
 	.long DF_NAN_MASK

Index: gcc/config/sh/ieee-754-sf.S
===================================================================
--- gcc/config/sh/ieee-754-sf.S	(revision 1352)
+++ gcc/config/sh/ieee-754-sf.S	(revision 1373)
@@ -55,19 +55,27 @@
 	   the values are NaN.  */
 	cmp/eq	r4,r5
 	mov.l   LOCAL(c_SF_NAN_MASK),r1
+	bt.s	LOCAL(check_nan)
 	not	r4,r0
-	bt	LOCAL(check_nan)
 	mov	r4,r0
 	or	r5,r0
 	rts
 	add	r0,r0
 LOCAL(check_nan):
 	tst	r1,r0
+	bt.s 	LOCAL(nan)
+	mov	#96,r2
+	shll16  r2
+	xor 	r2,r1
+	tst	r1,r0	
+LOCAL(nan):		
 	rts
 	movt	r0
+	
 	.balign 4
 LOCAL(c_SF_NAN_MASK):
 	.long SF_NAN_MASK
+LOCAL(c_SF_SNAN_MASK):
 	ENDFUNC(GLOBAL(nesf2f))
 #endif /* L_nesf2f */
 
Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md	(revision 1352)
+++ gcc/config/sh/sh.md	(revision 1373)
@@ -11182,6 +11182,7 @@
     (clobber (reg:SI T_REG))
     (clobber (reg:SI PR_REG))
     (clobber (reg:SI R1_REG))
+    (clobber (reg:SI R2_REG))
     (use (match_operand:SI 1 "arith_reg_operand" "r"))]
    "TARGET_SH1 && ! TARGET_SH2E"
    "jsr	@%1%#"
@@ -11257,13 +11258,18 @@
 
  (define_insn "cmpunsf_i1"
    [(set (reg:SI T_REG)
- 	(unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
- 		      (match_operand:SF 1 "arith_reg_operand" "r,r")))
-    (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
-    (clobber (match_scratch:SI 3 "=0,&r"))]
+ 	(unordered:SI (match_operand:SF 0 "arith_reg_operand" "r")
+ 		      (match_operand:SF 1 "arith_reg_operand" "r")))
+    (use (match_operand:SI 2 "arith_reg_operand" "r"))
+    (clobber (match_scratch:SI 3 "=&r"))]
    "TARGET_SH1 && ! TARGET_SH2E"
-   "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
-   [(set_attr "length" "10")])
+   "not\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+\tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f
+\tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2
+\tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+\tnot\t%1,%3\;tst\t%2,%3
+0:"
+   [(set_attr "length" "28")])
 
  ;; ??? This is a lot of code with a lot of branches; a library function
  ;; might be better.
@@ -11967,6 +11973,7 @@
    (clobber (reg:SI T_REG))
    (clobber (reg:SI PR_REG))
    (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
    (use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1_SOFTFP"
   "jsr	@%1%#"
@@ -12008,13 +12015,18 @@
 
 (define_insn "cmpundf_i1"
   [(set (reg:SI T_REG)
-	(unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
-		      (match_operand:DF 1 "arith_reg_operand" "r,r")))
-   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
-   (clobber (match_scratch:SI 3 "=0,&r"))]
+	(unordered:SI (match_operand:DF 0 "arith_reg_operand" "r")
+		      (match_operand:DF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
-  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
-  [(set_attr "length" "10")])
+   "not\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+  \tnot\t%S1,%3\;tst\t%2,%3\;bt.s\t0f
+  \tmov\t#12,%3\;shll16\t%3\;xor\t%3,%2
+  \tnot\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+  \tnot\t%S1,%3\;tst\t%2,%3
+0:"
+  [(set_attr "length" "28")])
 
 ;; ??? This is a lot of code with a lot of branches; a library function
 ;; might be better.

[-- Attachment #3: buitins-nan.c --]
[-- Type: text/plain, Size: 724 bytes --]

/* { dg-do run } */
/* { dg-options "-mieee" { target sh*-*-* } } */

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

static int lisnan(double v)
{
  return (v != v);
}

static int lisnanf(float v)
{
  return (v != v);
}

int main(void)
{
  double d;
  float f;

  /* double */
  d = __builtin_nans("");
  if (! lisnan(d))
    abort();

  if (! __builtin_isnan(d))
    abort();

  d = __builtin_nan("");
  if (! lisnan(d))
    abort();

  if (! __builtin_isnan(d))
    abort();

  /* float */
  f = __builtin_nansf("");
  if (! lisnanf(f))
    abort();

  if (! __builtin_isnanf(f))
    abort();

  f = __builtin_nanf("");
  if (! lisnanf(f))
    abort();

  if (! __builtin_isnanf(f))
    abort();

  exit (0);
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-20 13:35         ` Kaz Kojima
  2010-07-21 11:25           ` Christian Bruel
@ 2010-07-22  0:45           ` Kaz Kojima
  2010-07-22  6:41           ` Joern Rennecke
  2 siblings, 0 replies; 30+ messages in thread
From: Kaz Kojima @ 2010-07-22  0:45 UTC (permalink / raw)
  To: joern.rennecke; +Cc: Naveen.S, gcc, Prafulla.Thakare

> I'm trying the attached patch over sh-softfp-20100718-2131 patch.
> All regressions go away with it on cross sh4-unknown-linux-gnu,
> though the native bootstrap will take a few days more.

There are a few warnings in bootstrap:

../trunk/gcc/config/sh/sh.c: In function 'sh_soft_fp_cmp':
../trunk/gcc/config/sh/sh.c:2193:8: error: enum conversion in initialization is invalid in C++ [-Werror=c++-compat]
../trunk/gcc/config/sh/sh.c:2248:3: error: enum conversion when passing argument 1 of 'gen_rtx_fmt_ee_stat' is invalid in C++ [-Werror=c++-compat]
./genrtl.h:24:1: note: expected 'enum rtx_code' but argument is of type 'int'
../trunk/gcc/config/sh/sh.c: In function 'expand_sfunc_op':
../trunk/gcc/config/sh/sh.c:8744:19: error: variable 'insn' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

The attached is a fixup for them.

Unfortunately, the bootstrap has failed with stage2/stage3
comparison failure, though it might be irrelevant with softfp
patches.  I'll try again with some older revision.

Regards,
	kaz
--

--- ORIG/trunk/gcc/config/sh/sh.c	2010-07-19 10:58:36.000000000 +0900
+++ trunk/gcc/config/sh/sh.c	2010-07-21 06:45:18.000000000 +0900
@@ -2185,12 +2185,13 @@ sh_emit_cheap_store_flag (enum machine_m
 }
 
 static rtx
-sh_soft_fp_cmp (int code, enum machine_mode op_mode, rtx op0, rtx op1)
+sh_soft_fp_cmp (enum rtx_code code, enum machine_mode op_mode, rtx op0,
+		rtx op1)
 {
   const char *name = NULL;
   rtx (*fun) (rtx, rtx), addr, tmp, last, equiv;
   int df = op_mode == DFmode;
-  enum machine_mode mode = CODE_FOR_nothing; /* shut up warning.  */
+  enum machine_mode mode = MAX_MACHINE_MODE; /* shut up warning.  */
 
   switch (code)
     {
@@ -8741,13 +8742,13 @@ expand_sfunc_op (int nargs, enum machine
 		 const char *name, rtx equiv, rtx *operands)
 {
   int next_reg = FIRST_PARM_REG, i;
-  rtx addr, last, insn;
+  rtx addr, last;
 
   addr = gen_reg_rtx (Pmode);
   function_symbol (addr, name, SFUNC_FREQUENT);
   for ( i = 1; i <= nargs; i++)
     {
-      insn = emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
+      emit_move_insn (gen_rtx_REG (mode, next_reg), operands[i]);
       next_reg += GET_MODE_SIZE (mode) / UNITS_PER_WORD;
     }
   last = emit_insn ((*fun) (operands[0], addr));

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-20 13:35         ` Kaz Kojima
  2010-07-21 11:25           ` Christian Bruel
  2010-07-22  0:45           ` Kaz Kojima
@ 2010-07-22  6:41           ` Joern Rennecke
  2010-07-22 12:23             ` Kaz Kojima
  2 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-22  6:41 UTC (permalink / raw)
  To: Kaz Kojima; +Cc: Naveen.S, gcc, Prafulla.Thakare

Quoting Kaz Kojima <kkojima@rr.iij4u.or.jp>:

> I've got some regressions with "make check" on sh4-unknown-linux-gnu.
> It looks that all of them are failed with the undefined references to
> __unorddf2/__unordsf2 when -mieee enabled.

That's a bug, then; we shouldn't use a library function there,
but the cmpordered[sd]f_t_4 patterns.

> I'm trying the attached patch over sh-softfp-20100718-2131 patch.
> All regressions go away with it on cross sh4-unknown-linux-gnu,
> though the native bootstrap will take a few days more.

But it's really the instruction expansion that needs to be fixed.

> BTW, it looks that softfp __unord?f2 routines check signaling NaNs
> only.  This makes __builtin_isnan return false for quiet NaNs for
> which current fp-bit ones return true when -mieee enabled.  Perhaps
> that change of behavior might be OK for software FP.

The SH port so far has been using fp-bit.c, which does not actually
support floating point signals, and neither does this optimized software
floating point.
So in essence, we only have quit NaNs.  We might as well choose a bit pattern
that is easy to process, to keep code size down and improve performance.
Having a mantissa bit set that is adjacent to the exponent makes for easier
testing.
There is precedence for having the signalling bit in different places and
with different values (i.e. some have 1 == signalling, others 0 ==  
signalling).
So we could say that the bit two below the exponent is the signalling bit,
and is active-low.  Thus a 0xffffffff in the high or only word is a  quiet
NaN.
Tests that feed specific NaN hex values could be disabled or feed modified
values for the SH[123].

OTOH for unorddf / unordsf support with sh4, you would want to keep the
distinction between signalling / quiet NaNs.
(Although I doubt many use signalling support,considering the cost when
  you take a trap on every floating point instruction).

The sh.md cmpordered* patterns should do the right thing there, we just
have to keep emitting them.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-21 11:25           ` Christian Bruel
@ 2010-07-22  7:22             ` Christian Bruel
  2010-07-22  7:37               ` Joern Rennecke
  2010-07-22 13:58               ` Christian Bruel
  0 siblings, 2 replies; 30+ messages in thread
From: Christian Bruel @ 2010-07-22  7:22 UTC (permalink / raw)
  To: Kaz Kojima; +Cc: joern.rennecke, Naveen.S, gcc, Prafulla.Thakare

[-- Attachment #1: Type: text/plain, Size: 695 bytes --]

Christian Bruel wrote:
> Hi Kaz,
> 
> Kaz Kojima wrote:
> 
>>
>> BTW, it looks that softfp __unord?f2 routines check signaling NaNs
>> only.  This makes __builtin_isnan return false for quiet NaNs for
>> which current fp-bit ones return true when -mieee enabled.  Perhaps
>> that change of behavior might be OK for software FP.
> 
> I use the attached patch to handle the QNaNs in the assembly solf-fp. 
> Need to be updated for trunk (and update the dates in changelogs). Will do.

Edited to apply on top of latest Joern's patch. Certainly not optimal 
but it fixes the QNaNs checks for builtins and inlined unordered 
comparisons for -mieee or -fno-inite-math-only.

Best Regards

Christian



[-- Attachment #2: qnans_trunk.patch --]
[-- Type: text/plain, Size: 4848 bytes --]

2010-07-22  Christian Bruel  <christian.bruel@st.com>

	* gcc.dg/builtins-nan.c: New test.

2010-07-22  Christian Bruel  <christian.bruel@st.com>

	* config/sh/ieee-754-df.S (nedf2f): Don't check Qbit for NaNs.
	* config/sh/ieee-754-sf.S (nesf2f): Likewise.
	* config/sh/sh.md (cmpunsf_i1, cmpundf_i1): Likewise. 
	(cmpnesf_i1, cmpnedf_i1): Clobber R2.

diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S	2010-07-21 18:04:17.000000000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S	2010-07-21 18:09:10.000000000 +0200
@@ -92,11 +92,12 @@
 	HIDDEN_FUNC(GLOBAL(nedf2))
 GLOBAL(nedf2):
 	cmp/eq	DBL0L,DBL1L
-	mov.l   LOCAL(c_DF_NAN_MASK),r1
-	bf LOCAL(ne)
+	bf.s 	LOCAL(ne)
+	mov     #1,r0
 	cmp/eq	DBL0H,DBL1H
+	mov.l   LOCAL(c_DF_NAN_MASK),r1
+	bt.s	LOCAL(check_nan)
 	not	DBL0H,r0
-	bt	LOCAL(check_nan)
 	mov	DBL0H,r0
 	or	DBL1H,r0
 	add	r0,r0
@@ -104,11 +105,17 @@
 	or	DBL0L,r0
 LOCAL(check_nan):
 	tst	r1,r0
-	rts
+	bt.s 	LOCAL(nan)
+	mov	#12,r2
+	shll16  r2
+	xor 	r2,r1
+	tst 	r1,r0
+LOCAL(nan):	
 	movt	r0
 LOCAL(ne):
 	rts
-	mov #1,r0
+	nop
+	
 	.balign 4
 LOCAL(c_DF_NAN_MASK):
 	.long DF_NAN_MASK
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S	2010-07-21 18:04:18.000000000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S	2010-07-21 18:09:10.000000000 +0200
@@ -51,13 +51,19 @@
 	cmp/eq	r4,r5
 	mov.l   LOCAL(c_SF_NAN_MASK),r1
 	not	r4,r0
-	bt	LOCAL(check_nan)
+	bt.s	LOCAL(check_nan)
 	mov	r4,r0
 	or	r5,r0
 	rts
 	add	r0,r0
 LOCAL(check_nan):
 	tst	r1,r0
+ 	bt.s 	LOCAL(nan)
+ 	mov	#96,r2
+ 	shll16  r2
+ 	xor 	r2,r1
+ 	tst	r1,r0	
+ LOCAL(nan):			
 	rts
 	movt	r0
 	.balign 4
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/sh.md gnu_trunk/gcc/gcc/config/sh/sh.md
--- gnu_trunk.ref/gcc/gcc/config/sh/sh.md	2010-07-21 18:06:25.000000000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/sh.md	2010-07-22 09:13:12.000000000 +0200
@@ -10262,6 +10262,7 @@
    (clobber (reg:SI T_REG))
    (clobber (reg:SI PR_REG))
    (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
    (use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
   "jsr	@%1%#"
@@ -10337,13 +10338,18 @@
 
 (define_insn "cmpunsf_i1"
   [(set (reg:SI T_REG)
-	(unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
-		      (match_operand:SF 1 "arith_reg_operand" "r,r")))
-   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
-   (clobber (match_scratch:SI 3 "=0,&r"))]
+  	(unordered:SI (match_operand:SF 0 "arith_reg_operand" "r")
+  		      (match_operand:SF 1 "arith_reg_operand" "r")))
+     (use (match_operand:SI 2 "arith_reg_operand" "r"))
+     (clobber (match_scratch:SI 3 "=&r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
-  "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
-  [(set_attr "length" "10")])
+    "not\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+    \tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f
+    \tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2
+    \tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+    \tnot\t%1,%3\;tst\t%2,%3
+     0:"
+    [(set_attr "length" "28")])
 
 ;; ??? This is a lot of code with a lot of branches; a library function
 ;; might be better.
@@ -11069,6 +11075,7 @@
    (clobber (reg:SI T_REG))
    (clobber (reg:SI PR_REG))
    (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
    (use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1_SOFTFP"
   "jsr	@%1%#"
@@ -11093,6 +11100,7 @@
    (clobber (reg:SI T_REG))
    (clobber (reg:SI PR_REG))
    (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
    (use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1_SOFTFP"
   "jsr	@%1%#"
@@ -11110,13 +11118,18 @@
 
 (define_insn "cmpundf_i1"
   [(set (reg:SI T_REG)
-	(unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
-		      (match_operand:DF 1 "arith_reg_operand" "r,r")))
-   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
-   (clobber (match_scratch:SI 3 "=0,&r"))]
+	(unordered:SI (match_operand:DF 0 "arith_reg_operand" "r")
+		      (match_operand:DF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
-  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
-  [(set_attr "length" "10")])
+   "not\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+  \tnot\t%S1,%3\;tst\t%2,%3\;bt.s\t0f
+  \tmov\t#12,%3\;shll16\t%3\;xor\t%3,%2
+  \tnot\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+  \tnot\t%S1,%3\;tst\t%2,%3
+0:"
+  [(set_attr "length" "28")])
 
 ;; ??? This is a lot of code with a lot of branches; a library function
 ;; might be better.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-22  7:22             ` Christian Bruel
@ 2010-07-22  7:37               ` Joern Rennecke
  2010-07-22 11:58                 ` Christian Bruel
  2010-07-22 13:58               ` Christian Bruel
  1 sibling, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-22  7:37 UTC (permalink / raw)
  To: Christian Bruel; +Cc: Kaz Kojima, Naveen.S, gcc, Prafulla.Thakare

[-- Attachment #1: Type: text/plain, Size: 412 bytes --]

Quoting Christian Bruel <christian.bruel@st.com>:

> Edited to apply on top of latest Joern's patch. Certainly not optimal
> but it fixes the QNaNs checks for builtins and inlined unordered
> comparisons for -mieee or -fno-inite-math-only.

You are still on the wrong track; as I said in my earlier message, we
should not emit the library call for SH4 in the first place.

Please try the attached patch instead.

[-- Attachment #2: sh-softfp-predicate-fix --]
[-- Type: text/plain, Size: 665 bytes --]

--- predicates.md-20100718	2010-07-22 08:17:37.273500678 +0100
+++ predicates.md	2010-07-22 08:28:57.257502902 +0100
@@ -575,9 +575,10 @@
 ;; UNORDERED is only supported on SHMEDIA.
 
 (define_predicate "sh_float_comparison_operator"
-  (ior (match_operand 0 "ordered_comparison_operator")
-       (and (match_test "TARGET_SHMEDIA")
-	    (match_code "unordered"))))
+  (if_then_else (match_test "TARGET_SHMEDIA")
+		(ior (match_operand 0 "ordered_comparison_operator")
+		     (match_code "unordered"))
+		(match_operand 0 "comparison_operator")))
 
 (define_predicate "shmedia_cbranch_comparison_operator"
   (ior (match_operand 0 "equality_comparison_operator")

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-22  7:37               ` Joern Rennecke
@ 2010-07-22 11:58                 ` Christian Bruel
  2010-07-22 14:25                   ` Joern Rennecke
  0 siblings, 1 reply; 30+ messages in thread
From: Christian Bruel @ 2010-07-22 11:58 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Kaz Kojima, Naveen.S, gcc, Prafulla.Thakare

Joern Rennecke wrote:
> Quoting Christian Bruel <christian.bruel@st.com>:
> 
>> Edited to apply on top of latest Joern's patch. Certainly not optimal
>> but it fixes the QNaNs checks for builtins and inlined unordered
>> comparisons for -mieee or -fno-inite-math-only.
> 
> You are still on the wrong track; as I said in my earlier message, we
> should not emit the library call for SH4 in the first place.
> 
 >
> Please try the attached patch instead.
> 

Hello, Sorry for the mails that crossed.

I think we are dealing with 2 different problems here, that have the 
same root. Original one was about undefined __unorddf2/__unordsf2 
regression, for which you said that the library functions should not be 
called. I agree, and my patch is not exclusive with yours in this regard.

I was dealing with functional issues in the SNanS bit checking in the 
cmpun_ patterns (in addition to the floating point comparisons 
functions). Which is exposed by the regression test that I provided (for 
-m4-nofpu -mieee).

About the other part of your answer, non supporting SNaNs in the 
fp-bit.c, it is a possibility that I didn't consider in my fix. This 
restriction is quite a surprise to me because, related to NaNs, it is 
not what I guess from the implementation of the fp-bit.c's isnan 
function that does check for CLASS_SNAN, and CLASS_QNAN.

See for example the result of

static int misnanf(float v)
{
   return (v != v);
}

called with either a QNaN or a SNaN. IMO The assembly model should have 
the same semantic that the C model, which is not the case today.

Using -fsignaling-nans and eventually putting #ifdef  __SUPPORT_SNAN__ 
around the checking doesn't change anything since the same call is done 
to the floating point comparison function, that really needs to check 
for both formats. If your are concerned about the extra cycles needed in 
the nesf2f implementation (wich is nothing anyway compared to the C 
model), we could certainly provide a specialized one just for 
-fsignaling-nans.

Best Regards

Christian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-22  6:41           ` Joern Rennecke
@ 2010-07-22 12:23             ` Kaz Kojima
  2010-07-23 13:45               ` Kaz Kojima
  0 siblings, 1 reply; 30+ messages in thread
From: Kaz Kojima @ 2010-07-22 12:23 UTC (permalink / raw)
  To: joern.rennecke; +Cc: Naveen.S, gcc, Prafulla.Thakare

Joern Rennecke <joern.rennecke@embecosm.com> wrote:
> That's a bug, then; we shouldn't use a library function there,
> but the cmpordered[sd]f_t_4 patterns.

Argh, I've missed the required patterns are incorporated already
in your patch.  I'll test it again with sh-softfp-predicate-fix
when the tests for 4.5.1-rc are done.  Thanks!

Regards,
	kaz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-22  7:22             ` Christian Bruel
  2010-07-22  7:37               ` Joern Rennecke
@ 2010-07-22 13:58               ` Christian Bruel
  2010-07-22 16:14                 ` Joern Rennecke
  2010-07-22 16:23                 ` Joern Rennecke
  1 sibling, 2 replies; 30+ messages in thread
From: Christian Bruel @ 2010-07-22 13:58 UTC (permalink / raw)
  To: Kaz Kojima; +Cc: joern.rennecke, Naveen.S, gcc, Prafulla.Thakare

[-- Attachment #1: Type: text/plain, Size: 1220 bytes --]

oops, resending it with a small typo fix (a branch became delayed :-().

Just in case it we accepted that SNaNs and QNaNs are not exclusive and 
mimic the C model, a synthetic illustrative test case:

Compile with
sh-superh-elf-gcc -O2 -mieee -m4-nofpu snan.c snan2.c -g -o l.u ; 
sh-superh-elf-run l.u ; echo $?

Original 4.6 fp-bit C model:
OK

Using the ieee-sf.S implementation:
FAIL

Using the ieee-sf.S + this patch
OK

same for sh4-linux.

Best Regards,

Christian



Christian Bruel wrote:
> Christian Bruel wrote:
>> Hi Kaz,
>>
>> Kaz Kojima wrote:
>>
>>> BTW, it looks that softfp __unord?f2 routines check signaling NaNs
>>> only.  This makes __builtin_isnan return false for quiet NaNs for
>>> which current fp-bit ones return true when -mieee enabled.  Perhaps
>>> that change of behavior might be OK for software FP.
>> I use the attached patch to handle the QNaNs in the assembly solf-fp. 
>> Need to be updated for trunk (and update the dates in changelogs). Will do.
> 
> Edited to apply on top of latest Joern's patch. Certainly not optimal 
> but it fixes the QNaNs checks for builtins and inlined unordered 
> comparisons for -mieee or -fno-inite-math-only.
> 
> Best Regards
> 
> Christian
> 
> 
> 

[-- Attachment #2: qnans_trunk.patch --]
[-- Type: text/plain, Size: 4357 bytes --]

diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S	2010-07-21 18:04:17.949950000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S	2010-07-21 18:09:10.602376000 +0200
@@ -92,11 +92,12 @@
 	HIDDEN_FUNC(GLOBAL(nedf2))
 GLOBAL(nedf2):
 	cmp/eq	DBL0L,DBL1L
-	mov.l   LOCAL(c_DF_NAN_MASK),r1
-	bf LOCAL(ne)
+	bf.s 	LOCAL(ne)
+	mov     #1,r0
 	cmp/eq	DBL0H,DBL1H
+	mov.l   LOCAL(c_DF_NAN_MASK),r1
+	bt.s	LOCAL(check_nan)
 	not	DBL0H,r0
-	bt	LOCAL(check_nan)
 	mov	DBL0H,r0
 	or	DBL1H,r0
 	add	r0,r0
@@ -104,11 +105,17 @@
 	or	DBL0L,r0
 LOCAL(check_nan):
 	tst	r1,r0
-	rts
+	bt.s 	LOCAL(nan)
+	mov	#12,r2
+	shll16  r2
+	xor 	r2,r1
+	tst 	r1,r0
+LOCAL(nan):	
 	movt	r0
 LOCAL(ne):
 	rts
-	mov #1,r0
+	nop
+	
 	.balign 4
 LOCAL(c_DF_NAN_MASK):
 	.long DF_NAN_MASK
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S	2010-07-22 14:21:50.606831000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S	2010-07-22 15:30:17.928097000 +0200
@@ -58,6 +58,12 @@
 	add	r0,r0
 LOCAL(check_nan):
 	tst	r1,r0
+ 	bt.s 	LOCAL(nan)
+ 	mov	#96,r2
+ 	shll16  r2
+ 	xor 	r2,r1
+ 	tst	r1,r0	
+ LOCAL(nan):			
 	rts
 	movt	r0
 	.balign 4
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/sh.md gnu_trunk/gcc/gcc/config/sh/sh.md
--- gnu_trunk.ref/gcc/gcc/config/sh/sh.md	2010-07-21 18:06:25.978547000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/sh.md	2010-07-22 09:13:12.599669000 +0200
@@ -10262,6 +10262,7 @@
    (clobber (reg:SI T_REG))
    (clobber (reg:SI PR_REG))
    (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
    (use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
   "jsr	@%1%#"
@@ -10337,13 +10338,18 @@
 
 (define_insn "cmpunsf_i1"
   [(set (reg:SI T_REG)
-	(unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
-		      (match_operand:SF 1 "arith_reg_operand" "r,r")))
-   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
-   (clobber (match_scratch:SI 3 "=0,&r"))]
+  	(unordered:SI (match_operand:SF 0 "arith_reg_operand" "r")
+  		      (match_operand:SF 1 "arith_reg_operand" "r")))
+     (use (match_operand:SI 2 "arith_reg_operand" "r"))
+     (clobber (match_scratch:SI 3 "=&r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
-  "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
-  [(set_attr "length" "10")])
+    "not\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+    \tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f
+    \tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2
+    \tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+    \tnot\t%1,%3\;tst\t%2,%3
+     0:"
+    [(set_attr "length" "28")])
 
 ;; ??? This is a lot of code with a lot of branches; a library function
 ;; might be better.
@@ -11069,6 +11075,7 @@
    (clobber (reg:SI T_REG))
    (clobber (reg:SI PR_REG))
    (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
    (use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1_SOFTFP"
   "jsr	@%1%#"
@@ -11093,6 +11100,7 @@
    (clobber (reg:SI T_REG))
    (clobber (reg:SI PR_REG))
    (clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
    (use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1_SOFTFP"
   "jsr	@%1%#"
@@ -11110,13 +11118,18 @@
 
 (define_insn "cmpundf_i1"
   [(set (reg:SI T_REG)
-	(unordered:SI (match_operand:DF 0 "arith_reg_operand" "r,r")
-		      (match_operand:DF 1 "arith_reg_operand" "r,r")))
-   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
-   (clobber (match_scratch:SI 3 "=0,&r"))]
+	(unordered:SI (match_operand:DF 0 "arith_reg_operand" "r")
+		      (match_operand:DF 1 "arith_reg_operand" "r")))
+   (use (match_operand:SI 2 "arith_reg_operand" "r"))
+   (clobber (match_scratch:SI 3 "=&r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
-  "not\t%S0,%3\;tst\t%2,%3\;not\t%S1,%3\;bt\t0f\;tst\t%2,%3\;0:"
-  [(set_attr "length" "10")])
+   "not\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+  \tnot\t%S1,%3\;tst\t%2,%3\;bt.s\t0f
+  \tmov\t#12,%3\;shll16\t%3\;xor\t%3,%2
+  \tnot\t%S0,%3\;tst\t%2,%3\;bt.s\t0f
+  \tnot\t%S1,%3\;tst\t%2,%3
+0:"
+  [(set_attr "length" "28")])
 
 ;; ??? This is a lot of code with a lot of branches; a library function
 ;; might be better.

[-- Attachment #3: snan.c --]
[-- Type: text/plain, Size: 401 bytes --]

#include <stdlib.h>

extern int misnanf(float v);
extern int eqnf(float f3, float f4);

int main(void)
{
  float f1 = __builtin_nansf("");
  float f2 = __builtin_nanf("");
  float f3 = 2.0;
  float f4 = 3.3;

  if (! misnanf (f1))
    abort();

  if (! misnanf (f2))
    abort();

  if (misnanf (f3))
    abort();

  if (!eqnf (f3, f4))
    abort();

  if (eqnf (f4, f4))
    abort();

  return 0;
}


[-- Attachment #4: snan2.c --]
[-- Type: text/plain, Size: 97 bytes --]

int eqnf(float f3, float f4)
{
  return f3 != f4;
}

int misnanf(float v)
{
  return (v != v);
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-22 11:58                 ` Christian Bruel
@ 2010-07-22 14:25                   ` Joern Rennecke
  0 siblings, 0 replies; 30+ messages in thread
From: Joern Rennecke @ 2010-07-22 14:25 UTC (permalink / raw)
  To: Christian Bruel; +Cc: Kaz Kojima, Naveen.S, gcc, Prafulla.Thakare

Quoting Christian Bruel <christian.bruel@st.com>:

  > About the other part of your answer, non supporting SNaNs in the
> fp-bit.c, it is a possibility that I didn't consider in my fix. This
> restriction is quite a surprise to me because, related to NaNs, it is
> not what I guess from the implementation of the fp-bit.c's isnan
> function that does check for CLASS_SNAN, and CLASS_QNAN.

Well, it looks like a classic top-down implementation, carving up the
problem in little sub-problems, and then not implementing some of these
so that the case distinction between CLASS_SNAM and CLASS_QNAN becomes
pointless.

> See for example the result of
>
> static int misnanf(float v)
> {
>   return (v != v);
> }
>
> called with either a QNaN or a SNaN. IMO The assembly model should have
> the same semantic that the C model, which is not the case today.

I would consider the exact bit patterns used for NaNs an implementation
detail, which the user should not need to care about.
We only implement QNaNs.  fp-bit.c recognizes all NaN patterns, but treats
them all as QNaNs.

> Using -fsignaling-nans and eventually putting #ifdef  __SUPPORT_SNAN__
> around the checking doesn't change anything since the same call is done
> to the floating point comparison function, that really needs to check
> for both formats.

Considering that the signals don't work, wouldn't a better implementation
of -fsignaling-nans be to issue a diagnostic when using this for a software
floating point ABI in sh.h OVERRIDE_OPTIONS ?
And somehow make using __builtin_nans / __builtin_nansf give a
diagnostic, too.

Unless you want to go further and really implement the signals.
I suppose you could use config/soft-fp for that.

> If your are concerned about the extra cycles needed

Both cycles and bytes.

> in the nesf2f implementation (wich is nothing anyway compared to the C
> model),

fp-bit is so slow that it can't be taken seriously as a benchmark for
software floating point emulation speed.  The point of having a
hand-optimized assembly version is that you actually can show reasonable
performance for codes with light fpu usage, compared to a processors with
hardware floating point (which needs more die space and power, and might
not clock as high as the fpu-less version).
IIRC some EEMBC benchmarks are in that class, i.e. with the hand-optimized
software floating point they run several times faster than with fp-bit,
but going all the way to hardware floating point then gives diminishing
returns.

> we could certainly provide a specialized one just for
> -fsignaling-nans.

You'd also have to handle the other comparisons.  grep for F_NAN_MASK
in ieee-754-sf.S / ieee-754-df.S.

The original intent was that the faster & more compact NaN check would
be available for all the software emulation code, although I used a more
inclusive check if I saw it could be done with the same cycle count.
I can't remember if I ended up using the mask check anywhere but in
ieee-754-sf.S / ieee-754-df.S .

If you want all possible IEEE NaN patterns to be honoured, someone should
check all these checks in the config/sh/IEEE-754/m3 directory...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-22 13:58               ` Christian Bruel
@ 2010-07-22 16:14                 ` Joern Rennecke
  2010-07-23  9:32                   ` Christian Bruel
  2010-07-22 16:23                 ` Joern Rennecke
  1 sibling, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-22 16:14 UTC (permalink / raw)
  To: Christian Bruel; +Cc: Kaz Kojima, Naveen.S, gcc, Prafulla.Thakare

Quoting Christian Bruel <christian.bruel@st.com>:

> Using the ieee-sf.S + this patch
> OK

Is this only a proof-of-concept, because you only change the ne[sd]f2  
implementation?  And you go out of your way to only accept a restricted
set of values.  Plus, the overuse of the arithmetic unit hurts SH4-100 /
SH4-200 instruction pairing.

AFAICT you need only one cycle penalty, in the check_nan path:

GLOBAL(nesf2):
         /* If the raw values are unequal, the result is unequal, unless
            both values are +-zero.
            If the raw values are equal, the result is equal, unless
            the values are NaN.  */
         cmp/eq  r4,r5
         mov.l   LOCAL(inf2),r1
         bt/s     LOCAL(check_nan)
         mov     r4,r0
         or      r5,r0
         rts
         add     r0,r0
LOCAL(check_nan):
         add     r0,r0
         cmp/hi  r1,r0
         rts
         movt    r0
         .balign 4
LOCAL(inf2):
         .long 0xff000000

You could even save four bytes by putting the check_nan label into the
delay slot, but I'm not sure if that'll discomfit any branch  
prediction mechanism.

Disclaimer: I've not tested this code.

For the DFmode case, what about NaNs denoted by the low word, e.g.
0x7ff00000 000000001 ?

If so, the DFmode code could become something like this:

GLOBAL(nedf2):
         cmp/eq  DBL0L,DBL1L
         mov.l   LOCAL(inf2),r1
         bf LOCAL(ne)
         cmp/eq  DBL0H,DBL1H
         bt/s    LOCAL(check_nan)
         mov     DBL0H,r0
         or      DBL1H,r0

         add     r0,r0
         rts
         or      DBL0L,r0
LOCAL(check_nan):
         tst     DBL0L,DBL0L
         add     r0,r0
         subc    r1,r0
         mov     #-1,r0
         rts
         negc    r0,r0
LOCAL(ne):
         rts
         mov #1,r0
         .balign 4
LOCAL(inf2):
         .long 0xffe00000

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-22 13:58               ` Christian Bruel
  2010-07-22 16:14                 ` Joern Rennecke
@ 2010-07-22 16:23                 ` Joern Rennecke
  1 sibling, 0 replies; 30+ messages in thread
From: Joern Rennecke @ 2010-07-22 16:23 UTC (permalink / raw)
  To: gcc

Quoting Christian Bruel <christian.bruel@st.com>:

> oops, resending it with a small typo fix (a branch became delayed :-().

For an actual patch, you need to use the SL* macros from
config/sh/lib1funcs.h because the SH1 does not have delayed branches.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: SH optimized software floating point routines
  2010-07-16 10:26     ` Joern Rennecke
@ 2010-07-22 23:10       ` Joseph S. Myers
  2010-07-23  2:00         ` Joern Rennecke
  0 siblings, 1 reply; 30+ messages in thread
From: Joseph S. Myers @ 2010-07-22 23:10 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Naveen H. S, Kaz Kojima, gcc, Prafulla Thakare

On Fri, 16 Jul 2010, Joern Rennecke wrote:

> Quoting "Naveen H. S" <Naveen.S@kpitcummins.com>:
> 
> > extendsfdf2 - gcc.c-torture/execute/conversion.c
> > gcc.dg/torture/fp-int-convert-float.c, gcc.dg/pr28796-2.c
> 
> Note that some tests invoke undefined behaviour; I've also come across this
> when doing optimized soft FP for ARCompact:
> 
> http://gcc.gnu.org/viewcvs/branches/arc-4_4-20090909-branch/gcc/testsuite/gcc.dg/torture/fp-int-convert.h?r1=151539&r2=151545

That diff does not appear to relate to undefined behavior.  GCC considers 
these out-of-range conversions to yield an unspecified value, possibly 
raising an exception, as per Annex F, and does not take the liberty of 
optimizing on the basis of them being undefined when not in an IEEE mode.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: SH optimized software floating point routines
  2010-07-22 23:10       ` Joseph S. Myers
@ 2010-07-23  2:00         ` Joern Rennecke
  2010-07-23  9:02           ` Joseph S. Myers
  0 siblings, 1 reply; 30+ messages in thread
From: Joern Rennecke @ 2010-07-23  2:00 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Naveen H. S, Kaz Kojima, gcc, Prafulla Thakare

Quoting "Joseph S. Myers" <joseph@codesourcery.com>:

> That diff does not appear to relate to undefined behavior.  GCC considers
> these out-of-range conversions to yield an unspecified value, possibly
> raising an exception, as per Annex F, and does not take the liberty of
> optimizing on the basis of them being undefined when not in an IEEE mode.

Well, still, the test is wrong in possibly raising an exception there,
with no provisions to ignore the exception or catch any signal raised.

For the ARCompact, in order to test the floating point emulation better,
I had (there are still there in #if 0 /*DEBUG */ blocks) small wrappers
for each function to evaluate it once with the hand-optimized version,
and once with fp-bit.c, and abort on getting different values.
Now, fp-bit generally tries to yield some value that the programmer thought
might mean something, whereas the hand-optimized version treats computations
of unspecified values as irrelevant.

Considering:

GLOBAL(fixunsdfsi):
         mov.w   LOCAL(x413),r1  ! bias + 20
         mov     DBL0H,r0
         shll    DBL0H
         mov.l   LOCAL(mask),r3
         mov     #-21,r2
         shld    r2,DBL0H        ! SH4-200 will start this insn in a new cycle
         bt/s    LOCAL(ret0)
         sub     r1,DBL0H
         cmp/pl  DBL0H           ! SH4-200 will start this insn in a new cycle
         and     r3,r0
         bf/s    LOCAL(ignore_low)
         addc    r3,r0   ! uses T == 1; sets implict 1
         mov     #11,r2
         shld    DBL0H,r0        ! SH4-200 will start this insn in a new cycle
         cmp/gt  r2,DBL0H
         add     #-32,DBL0H
         bt      LOCAL(retmax)
         shld    DBL0H,DBL0L
         rts
         or      DBL0L,r0

and:

__fixunsdfsi:
         bbit0 DBL0H,30,.Lret0or1
         lsr r2,DBL0H,20
         bmsk_s DBL0H,DBL0H,19
         sub_s r2,r2,19; 0x3ff+20-0x400
         neg_s r3,r2
         btst_s r3,10
         bset_s DBL0H,DBL0H,20
#ifdef __LITTLE_ENDIAN__
         mov.ne DBL0L,DBL0H
         asl DBL0H,DBL0H,r2
#else
         asl.eq DBL0H,DBL0H,r2
         lsr.ne DBL0H,DBL0H,r3
#endif
         lsr DBL0L,DBL0L,r3
         j_s.d [blink]
         add.eq r0,r0,r1
.Lret0:
         j_s.d [blink]
         mov_l r0,0
.Lret0or1:
         add_s DBL0H,DBL0H,0x100000
         lsr_s DBL0H,DBL0H,30
         j_s.d [blink]
         bmsk_l r0,DBL0H,0

You can see that an SH4-300 can perform software floating point
fixunsdfsi in ten cycles, and the SH4-400 (SH4-200 sans FPU)
and ARC700 in twelve.

Adding any code in order to compute nice, fluffy values for
unspecified results would cause a significant performance degradation.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: SH optimized software floating point routines
  2010-07-23  2:00         ` Joern Rennecke
@ 2010-07-23  9:02           ` Joseph S. Myers
  0 siblings, 0 replies; 30+ messages in thread
From: Joseph S. Myers @ 2010-07-23  9:02 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Naveen H. S, Kaz Kojima, gcc, Prafulla Thakare

On Thu, 22 Jul 2010, Joern Rennecke wrote:

> Quoting "Joseph S. Myers" <joseph@codesourcery.com>:
> 
> > That diff does not appear to relate to undefined behavior.  GCC considers
> > these out-of-range conversions to yield an unspecified value, possibly
> > raising an exception, as per Annex F, and does not take the liberty of
> > optimizing on the basis of them being undefined when not in an IEEE mode.
> 
> Well, still, the test is wrong in possibly raising an exception there,
> with no provisions to ignore the exception or catch any signal raised.

The expectation is that floating-point exceptions will not trap by 
default, again in accordance with Annex F even when not in an IEEE mode.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-22 16:14                 ` Joern Rennecke
@ 2010-07-23  9:32                   ` Christian Bruel
  0 siblings, 0 replies; 30+ messages in thread
From: Christian Bruel @ 2010-07-23  9:32 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Kaz Kojima, Naveen.S, gcc, Prafulla.Thakare

Joern Rennecke wrote:
> Quoting Christian Bruel <christian.bruel@st.com>:
> 
>> Using the ieee-sf.S + this patch
>> OK
> 
> Is this only a proof-of-concept, because you only change the ne[sd]f2  
> implementation?  

I changed also the unordered comparison patterns. (cmpunsf_i1, 
cmpundf_i1). But yes, the other functions that would need the same kind 
of check would be unordsf2, and all the comparisons (gtsf2, gesf2f...) 
for floats and doubles.
But I will only consider those after/if we all agree that this needs to 
be done instead of keeping the current QNaN only restrictions.

And you go out of your way to only accept a restricted
> set of values.  

This hold for the original optimized implementation as well, for example 
I don't think that 0x7f800001 was caught. In fact implementing correctly 
the isnan check without restricted set of value makes the original 
discussion pointless, since the Q/S bits are a subpart of all possible 
codings, with any fractional part != 0.

Plus, the overuse of the arithmetic unit hurts SH4-100 /
> SH4-200 instruction pairing.
 >
> AFAICT you need only one cycle penalty, in the check_nan path:
> 
> GLOBAL(nesf2):
>          /* If the raw values are unequal, the result is unequal, unless
>             both values are +-zero.
>             If the raw values are equal, the result is equal, unless
>             the values are NaN.  */
>          cmp/eq  r4,r5
>          mov.l   LOCAL(inf2),r1
>          bt/s     LOCAL(check_nan)
>          mov     r4,r0
>          or      r5,r0
>          rts
>          add     r0,r0
> LOCAL(check_nan):
>          add     r0,r0
>          cmp/hi  r1,r0
>          rts
>          movt    r0
>          .balign 4
> LOCAL(inf2):
>          .long 0xff000000
> 
> You could even save four bytes by putting the check_nan label into the
> delay slot, but I'm not sure if that'll discomfit any branch  
> prediction mechanism.

Thanks a lot of this one, It should fix the original problem on the 
restricted set of values as well. The cmpund patterns fix should 
probably have a similar checks.

> 
> Disclaimer: I've not tested this code.
> 
> For the DFmode case, what about NaNs denoted by the low word, e.g.
> 0x7ff00000 000000001 ?
> 
> If so, the DFmode code could become something like this:
> 
> GLOBAL(nedf2):
>          cmp/eq  DBL0L,DBL1L
>          mov.l   LOCAL(inf2),r1
>          bf LOCAL(ne)
>          cmp/eq  DBL0H,DBL1H
>          bt/s    LOCAL(check_nan)
>          mov     DBL0H,r0
>          or      DBL1H,r0
> 
>          add     r0,r0
>          rts
>          or      DBL0L,r0
> LOCAL(check_nan):
>          tst     DBL0L,DBL0L
>          add     r0,r0
>          subc    r1,r0
>          mov     #-1,r0
>          rts
>          negc    r0,r0
> LOCAL(ne):
>          rts
>          mov #1,r0
>          .balign 4
> LOCAL(inf2):
>          .long 0xffe00000

 > For an actual patch, you need to use the SL* macros from
 > config/sh/lib1funcs.h because the SH1 does not have delayed branches.

OK, thanks

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-07-22 12:23             ` Kaz Kojima
@ 2010-07-23 13:45               ` Kaz Kojima
  2010-08-04 10:03                 ` Naveen H. S
  0 siblings, 1 reply; 30+ messages in thread
From: Kaz Kojima @ 2010-07-23 13:45 UTC (permalink / raw)
  To: joern.rennecke; +Cc: christian.bruel, Naveen.S, gcc, Prafulla.Thakare

> Joern Rennecke <joern.rennecke@embecosm.com> wrote:
>> That's a bug, then; we shouldn't use a library function there,
>> but the cmpordered[sd]f_t_4 patterns.
> 
> Argh, I've missed the required patterns are incorporated already
> in your patch.  I'll test it again with sh-softfp-predicate-fix
> when the tests for 4.5.1-rc are done.  Thanks!

I've tested sh-softfp-20100718-2131 + sh-softfp-predicate-fix
on -m1, -m2, -m3, -m3 -ml, -m2a on sh-elf, sh4-linux and
sh64-linux.  sh64-linux required the first hunk of the attached
patch to build.  The test with -m3 -ml shows some regressions
which look to be solved with the second hunk of the patch.
Now all test results look clean.

For NaN issue, I'd like to wait a full patch from Christian.
I'm curious again about the numbers with/without it.

Regards,
	kaz
--
	* config/sh/sh.md (cstoresf4): Fix typos.
	* config/sh/ieee-754-df.S (unorddf2): Use DBL1H instead of r6.

diff -upr ORIG/trunk/gcc/config/sh/sh.md trunk/gcc/config/sh/sh.md
--- ORIG/trunk/gcc/config/sh/sh.md	Wed Jul 21 08:12:23 2010
+++ trunk/gcc/config/sh/sh.md	Thu Jul 22 10:36:36 2010
@@ -9462,10 +9462,10 @@ mov.l\\t1f,r0\\n\\
   "TARGET_SH1 || TARGET_SHMEDIA_FPU"
   "if (TARGET_SHMEDIA)
      {
-       if (!arith_operand (operands[2], DFmode))
-	 operands[2] = copy_to_mode_reg (DFmode, operands[2]);
-       if (!arith_operand (operands[3], DFmode))
-	 operands[3] = copy_to_mode_reg (DFmode, operands[3]);
+       if (!arith_operand (operands[2], SFmode))
+	 operands[2] = copy_to_mode_reg (SFmode, operands[2]);
+       if (!arith_operand (operands[3], SFmode))
+	 operands[3] = copy_to_mode_reg (SFmode, operands[3]);
        emit_insn (gen_cstore4_media (operands[0], operands[1],
 				     operands[2], operands[3]));
        DONE;
diff -uprN ORIG/trunk/gcc/config/sh/ieee-754-df.S trunk/gcc/config/sh/ieee-754-df.S
--- ORIG/trunk/gcc/config/sh/ieee-754-df.S	2010-07-20 11:39:29.000000000 +0900
+++ trunk/gcc/config/sh/ieee-754-df.S	2010-07-22 13:16:07.000000000 +0900
@@ -123,7 +123,7 @@ GLOBAL(unorddf2):
 	mov.l	LOCAL(c_DF_NAN_MASK),r1
 	not	DBL0H,r0
 	tst	r1,r0
-	not	r6,r0
+	not	DBL1H,r0
 	bt	LOCAL(unord)
 	tst	r1,r0
 LOCAL(unord):

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: SH optimized software floating point routines
  2010-07-23 13:45               ` Kaz Kojima
@ 2010-08-04 10:03                 ` Naveen H. S
  2010-08-04 11:36                   ` Kaz Kojima
  0 siblings, 1 reply; 30+ messages in thread
From: Naveen H. S @ 2010-08-04 10:03 UTC (permalink / raw)
  To: Kaz Kojima, joern.rennecke; +Cc: christian.bruel, gcc, Prafulla Thakare

Hi,

>> I've tested sh-softfp-20100718-2131 + sh-softfp-predicate-fix
>> on -m1, -m2, -m3, -m3 -ml, -m2a on sh-elf, sh4-linux and
>> sh64-linux  

The SH toolchain was built with the following patches and regression
is completed.
1. sh-softfp-20100718-2131
2. sh-softfp-predicate-fix
3. Patch by Kaz Kojima-san at following link
http://gcc.gnu.org/ml/gcc/2010-07/msg00352.html

However, there were some regressions compared to fresh toolchain.
The following list summarizes the regressions for each target.

m1, m2, m2a-nofpu 
gcc.dg/pr28796-2.c
gcc.dg/torture/type-generic-1.c

m2e, m3e, -m2a-single-only                
gcc.c-torture/execute/ieee/fp-cmp-4l.c
gcc.c-torture/execute/ieee/fp-cmp-8l.c              
gcc.dg/builtins-43.c
gcc.dg/pr28796-2.c
gcc.dg/torture/type-generic-1.c

m3, m4-nofpu, m4-single-only, m4a-nofpu, m4a-single-only
gcc.c-torture/execute/20000605-1.c (-O0)
gcc.c-torture/execute/20060420-1.c (-Os) 
gcc.c-torture/execute/loop-ivopts-1.c
gcc.c-torture/execute/pr39228.c(-O0) 
gcc.dg/pr28796-2.c
gcc.dg/torture/type-generic-1.c
gcc.dg/pr41963.c
c-c++-common/torture/complex-sign-mixed-mul.c
gcc.target/sh/sh4a-fprun.c

>> Now all test results look clean

Please let me know whether these regressions are known and okay?
OR
Am I missing something in the patch which solves them?

Thanks & Regards,
Naveen


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SH optimized software floating point routines
  2010-08-04 10:03                 ` Naveen H. S
@ 2010-08-04 11:36                   ` Kaz Kojima
  0 siblings, 0 replies; 30+ messages in thread
From: Kaz Kojima @ 2010-08-04 11:36 UTC (permalink / raw)
  To: Naveen.S; +Cc: joern.rennecke, christian.bruel, gcc, Prafulla.Thakare

"Naveen H. S" <Naveen.S@kpitcummins.com> wrote:
> The SH toolchain was built with the following patches and regression
> is completed.
> 1. sh-softfp-20100718-2131
> 2. sh-softfp-predicate-fix
> 3. Patch by Kaz Kojima-san at following link
> http://gcc.gnu.org/ml/gcc/2010-07/msg00352.html

Thanks for testing.

> However, there were some regressions compared to fresh toolchain.
> The following list summarizes the regressions for each target.
[snip]
>>> Now all test results look clean
> 
> Please let me know whether these regressions are known and okay?

"clean" might be a bad choice of words.  I meant that results
are expected i.e. regressions I saw are NaN issues we discussed
in the list.

Although your list of arches and regressions is different from
mine, some regressions you've caught

> gcc.c-torture/execute/20000605-1.c (-O0)
> gcc.c-torture/execute/20060420-1.c (-Os) 
> gcc.c-torture/execute/loop-ivopts-1.c
> gcc.c-torture/execute/pr39228.c(-O0)
> gcc.dg/pr41963.c
> c-c++-common/torture/complex-sign-mixed-mul.c
> gcc.target/sh/sh4a-fprun.c

don't look like NaN issues and we should know what is going on
in these cases.

Regards,
	kaz

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-08-04 11:36 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-10 12:01 SH optimized software floating point routines Naveen H. S
2010-06-14  4:50 ` Kaz Kojima
2010-06-14  7:27   ` Joern Rennecke
2010-07-16 10:04   ` Naveen H. S
2010-07-16 10:26     ` Joern Rennecke
2010-07-22 23:10       ` Joseph S. Myers
2010-07-23  2:00         ` Joern Rennecke
2010-07-23  9:02           ` Joseph S. Myers
2010-07-16 14:01     ` Kaz Kojima
2010-07-17 14:31       ` Joern Rennecke
2010-07-17 21:23         ` Kaz Kojima
2010-07-19  9:23           ` Naveen H. S
2010-07-17 13:30     ` Joern Rennecke
2010-07-19  0:59       ` Joern Rennecke
2010-07-20 13:35         ` Kaz Kojima
2010-07-21 11:25           ` Christian Bruel
2010-07-22  7:22             ` Christian Bruel
2010-07-22  7:37               ` Joern Rennecke
2010-07-22 11:58                 ` Christian Bruel
2010-07-22 14:25                   ` Joern Rennecke
2010-07-22 13:58               ` Christian Bruel
2010-07-22 16:14                 ` Joern Rennecke
2010-07-23  9:32                   ` Christian Bruel
2010-07-22 16:23                 ` Joern Rennecke
2010-07-22  0:45           ` Kaz Kojima
2010-07-22  6:41           ` Joern Rennecke
2010-07-22 12:23             ` Kaz Kojima
2010-07-23 13:45               ` Kaz Kojima
2010-08-04 10:03                 ` Naveen H. S
2010-08-04 11:36                   ` Kaz Kojima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).