From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32012 invoked by alias); 5 Oct 2009 14:03:09 -0000 Received: (qmail 31827 invoked by uid 22791); 5 Oct 2009 14:03:07 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail-pz0-f181.google.com (HELO mail-pz0-f181.google.com) (209.85.222.181) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 05 Oct 2009 14:03:02 +0000 Received: by pzk11 with SMTP id 11so1522837pzk.14 for ; Mon, 05 Oct 2009 07:03:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.3.13 with SMTP id 13mr1835wfc.302.1254751381090; Mon, 05 Oct 2009 07:03:01 -0700 (PDT) In-Reply-To: <4AAE5A2D.6090605@redhat.com> References: <4AAE5A2D.6090605@redhat.com> From: Mohamed Shafi Date: Mon, 05 Oct 2009 14:03:00 -0000 Message-ID: Subject: Re: How to split 40bit data types load/store? To: Richard Henderson Cc: GCC Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-10/txt/msg00091.txt.bz2 2009/9/14 Richard Henderson : > On 09/14/2009 07:24 AM, Mohamed Shafi wrote: >> >> Hello all, >> >> I am doing a port for a 32bit target in GCC 4.4.0. I have to support a >> 40bit data (_Accum) in the port. The target has 40bit registers which >> is a GPR and works as 32bit reg in other modes. The load and store for >> _Accum happens in two step. The lower 32bit in one instruction and the >> upper 8bit in the next instruction. I want to split the instruction >> after reload. I tired to have a pattern (for load) like this: >> >> (define_insn "fn_load_ext_sa" >> =A0[(set (unspec:SA [(match_operand:DA 0 "register_operand" "")] >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0UNSPEC_FN_EXT) >> =A0 =A0 =A0 =A0(match_operand:SA 1 "memory_operand" ""))] >> >> (define_insn "fn_load_sa" >> =A0[(set (unspec:SA [(match_operand:DA 0 "register_operand" "")] >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 UNSPEC_FN) >> =A0 =A0 =A0 =A0(match_operand:SA 1 "memory_operand" ""))] > > Unspec on the left-hand-side isn't something that's supposed to happen, a= nd > is more than likely the cause of your problems. =A0Try moving the unspec = to > the right-hand-side like: > > =A0(set (reg:SI reg) (mem:SI addr)) > > =A0(set (reg:SA reg) > =A0 =A0 =A0 (unspec:SA [(reg:SI reg) (mem:QI addr)] > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0UNSPEC_ACCUM_INSERT)) > > and > > =A0(set (mem:SI addr) (reg:SI reg)) > > =A0(set (mem:QI addr) > =A0 =A0 =A0 (unspec:QI [(reg:SA reg)] > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0UNSPEC_ACCUM_EXTRACT)) > > Note that after reload it's perfectly acceptable for a hard register to > appear with the different SI and SAmodes. > > It's probably not too hard to define this with zero_extract sequences > instead of unspecs, but given that these only appear after reload, it may > not be worth the effort. > I was able to implement this with unspecs. But now it seems that i need to split the pattern before reload also. So i am thinking of removing this and doing a split before reload. The issue is that there is no support to for register indirect addressing mode for accessing the upper eight bits of the 40bit register. The only addressing mode supported for accessing this section is (SP+offset). So what i thought was to allow this addressing mode and at the time of reloading, at the time of secondary reload with the help of a scratch register and a scratch memory. But it seems that in GCC it is not possible to have both scratch memory and a scratch register for the same operation. Am i right? So what i did was to implement this at the define_expand stage itself. The idea is to generate the following sequence: for load (R0), D0 generate load (R0), D0 // 32bit mode , SAmode move load (R0+4), scratch_reg // 32bit mode, SAmode store scratch_reg, (SP+off) //32bit mode, SAmode load.ext (SP+off), D0.u8 and similarly for store. Here are the patterns that i used for this purpose: (define_expand "movda" [(set (match_operand:DA 0 "nonimmediate_operand" "") (match_operand:DA 1 "nonimmediate_operand" ""))] "" "{ if (MEM_P (operands[1]) && REG_P (XEXP (operands[1], 0)) && XEXP (operands[1], 0) !=3D virtual_stack_vars_rtx)) { rtx lo_half, hi_half; rtx scratch_mem, scratch_reg, subreg; gcc_assert (can_create_pseudo_p ()); scratch_reg =3D gen_reg_rtx (SAmode); scratch_mem =3D assign_stack_temp (SAmode, GET_MODE_SIZE (SAmode), 0)= ;\ subreg =3D gen_rtx_SUBREG (SAmode, operands[0], 0); lo_half =3D adjust_address (operands[1], SAmode, 0); hi_half =3D adjust_address (operands[1], SAmode, 4); emit_insn (gen_rtx_SET (SAmode, subreg, lo_half)); emit_insn (gen_rtx_SET (SAmode, scratch_reg, hi_half)); emit_insn (gen_rtx_SET (SAmode, scratch_mem, scratch_reg)); emit_insn (gen_load_reg_ext (operands[0], scratch_mem)); DONE; } /* and similarly for store operation */ }" ) (define_insn "load_reg_ext" [(set (subreg:SA (zero_extract:DA (match_operand:DA 0 "register_operand" "= =3Dd") (const_int 8) (const_int 24)) 4) (match_operand:SA 1 "memory_operand" "Sd3"))] (define_insn "store_reg_ext" [(set (match_operand:SA 0 "memory_operand" "=3DSd3") (zero_extract:SA (match_operand:DA 1 "register_operand" "d") (const_int 8) (const_int 24)))] (define_insn "*movsa_internal" [(set (match_operand:SA 0 "nonimmediate_operand" "=3Dm,d,d") (match_operand:SA 1 "nonimmediate_operand" "d,m,d"))] By default -fomit-frame-pointer will passed to the complier. Without optimization compiler generates the expected output. But with optimization that is not the case. It seems that the pattern that i have written above are not proper. For the simple function like the following _Accum foo (_Accum *a) { _Accum b =3D *a; return b; } with optimization enabled the complier generates only load (R0), D0 // 32bit mode , SAmode move the 1st instruction in the expected 4 instruction sequence. How can i write the patterns properly? Regards Shafi