From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-101805-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 4507 invoked by alias); 31 Aug 2004 22:46:50 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 4472 invoked from network); 31 Aug 2004 22:46:47 -0000
Received: from unknown (HELO palrel11.hp.com) (156.153.255.246)
  by sourceware.org with SMTP; 31 Aug 2004 22:46:47 -0000
Received: from smtp2.ptp.hp.com (smtp2.ptp.hp.com [15.1.28.240])
	by palrel11.hp.com (Postfix) with ESMTP id 5FC5DFFA2
	for <gcc@gcc.gnu.org>; Tue, 31 Aug 2004 15:46:47 -0700 (PDT)
Received: from hpsje.cup.hp.com (hpsje.cup.hp.com [15.244.96.221])
	by smtp2.ptp.hp.com (Postfix) with ESMTP id 353F91089
	for <gcc@gcc.gnu.org>; Tue, 31 Aug 2004 15:46:47 -0700 (PDT)
Received: (from sje@localhost) by hpsje.cup.hp.com (8.9.3 (PHNE_24419+JAGae58098)/8.7.3 TIS Messaging 5.0) id PAA26407 for gcc@gcc.gnu.org; Tue, 31 Aug 2004 15:46:47 -0700 (PDT)
Date: Tue, 31 Aug 2004 23:20:00 -0000
From: Steve Ellcey <sje@cup.hp.com>
Message-Id: <200408312246.PAA26407@hpsje.cup.hp.com>
To: gcc@gcc.gnu.org
Subject: IA64 floating point division question
Reply-To: sje@cup.hp.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-SW-Source: 2004-08/txt/msg01695.txt.bz2


I have been experimenting with the IA64 floating point division code
sequence.  Currently the code sequence for floating point division is
expanded late and thus isn't scheduled very well.  The reason for this
(as I understood it) was that we access some registers using multiple
modes by creating operands for existing registers with different modes.

So I tried to address this by using more temporary registers during the
code sequence and accessing each of them only in a single mode.  I got
that to work for divsf3_internal_thr (which is a define_insn_and_split)
and things looked good but the splitting of the division into multiple
instructions was still happening late in code generation and I wasn't
getting any improvement in my scheduling.  I saw that I had "&&
reload_completed" in the define_insn_and_expand so I tried removing that
but then I got:

y.c: In function `foo':
y.c:9: error: unrecognizable insn:
(insn 36 19 37 0 (parallel [
            (set (reg:SF 351)
                (div:SF (const_int 1 [0x1])
                    (reg:SF 350 [ b ])))
            (set (scratch:BI)
                (unspec:BI [
                        (reg:SF 349 [ a ])
                        (reg:SF 350 [ b ])
                    ] 14))
            (use (const_int 1 [0x1]))
        ]) -1 (nil)
    (expr_list:REG_UNUSED (scratch:BI)
        (expr_list:REG_UNUSED (scratch:BI)
            (nil))))
y.c:9: internal compiler error: in extract_insn, at recog.c:2037


This instruction was recognized and expanded when I had "&&
reload_completed" in the define_insn_and_split so I don't understand why
it is not recognized now.  Is removing "&& reload_completed" what I need
to do to allow this instruction to be split up earlier?  I am sure there
is something basic I don't understand about the machine description
setup but I don't know what it is.  Is it related to the predication?

Any help?

Steve Ellcey
sje@cup.hp.com


Here is my new divsf3_internal_thr instruction (without the "&&
reload_completed"):

(define_insn_and_split "divsf3_internal_thr"
  [(set (match_operand:SF 0 "fr_register_operand" "=&f")
	(div:SF (match_operand:SF 1 "fr_register_operand" "f")
		(match_operand:SF 2 "fr_register_operand" "f")))
   (clobber (match_scratch:XF 3 "=&f"))
   (clobber (match_scratch:XF 4 "=&f"))
   (clobber (match_scratch:XF 5 "=&f"))
   (clobber (match_scratch:SF 6 "=&f"))
   (clobber (match_scratch:XF 7 "=f"))
   (clobber (match_scratch:BI 8 "=c"))]
  ""
  "#"
  ""
  [(parallel [(set (match_dup 0) (div:SF (const_int 1) (match_dup 2)))
	      (set (match_dup 8) (unspec:BI [(match_dup 1) (match_dup 2)]
					    UNSPEC_FR_RECIP_APPROX))
	      (use (const_int 1))])
   (cond_exec (ne (match_dup 8) (const_int 0))
     (parallel [(set (match_dup 3)
		     (minus:XF (match_dup 9)
			       (mult:XF (float_extend:XF (match_dup 2))
                                        (float_extend:XF (match_dup 0)))))
		(use (const_int 1))]))
   (cond_exec (ne (match_dup 8) (const_int 0))
     (parallel [(set (match_dup 4)
		     (plus:XF (mult:XF (match_dup 3) (match_dup 3))
			      (match_dup 3)))
		(use (const_int 1))]))
   (cond_exec (ne (match_dup 8) (const_int 0))
     (parallel [(set (match_dup 5)
		     (plus:XF (mult:XF (match_dup 4)
                                       (float_extend:XF (match_dup 0)))
			      (float_extend:XF (match_dup 0))))
		(use (const_int 1))]))
   (cond_exec (ne (match_dup 8) (const_int 0))
     (parallel [(set (match_dup 6)
		     (float_truncate:SF
		       (mult:XF (float_extend:XF (match_dup 1))
                                (match_dup 5))))
		(use (const_int 1))]))
   (cond_exec (ne (match_dup 8) (const_int 0))
     (parallel [(set (match_dup 7)
		     (minus:XF (float_extend:XF (match_dup 1))
			       (mult:XF (float_extend:XF (match_dup 2))
                                        (float_extend:XF (match_dup 6)))))
		(use (const_int 1))]))
   (cond_exec (ne (match_dup 8) (const_int 0))
     (parallel [(set (match_dup 0)
                      (float_truncate:SF 
	                (plus:XF (mult:XF (match_dup 7) (match_dup 5))
		                 (float_extend:XF (match_dup 6)))))
		(use (const_int 1))]))
  ]
{
  operands[9] = CONST1_RTX (XFmode);
}
  [(set_attr "predicable" "no")])