From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4083 invoked by alias); 23 Apr 2015 07:02:49 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 4069 invoked by uid 89); 23 Apr 2015 07:02:48 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: eusmtp01.atmel.com Received: from eusmtp01.atmel.com (HELO eusmtp01.atmel.com) (212.144.249.242) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 23 Apr 2015 07:02:47 +0000 Received: from apsmtp01.atmel.com (10.168.254.30) by eusmtp01.atmel.com (10.161.101.30) with Microsoft SMTP Server id 14.2.347.0; Thu, 23 Apr 2015 09:02:36 +0200 Received: from PENCHT02.corp.atmel.com (10.168.5.162) by apsmtp01.corp.atmel.com (10.168.254.30) with Microsoft SMTP Server (TLS) id 14.2.347.0; Thu, 23 Apr 2015 15:03:42 +0800 Received: from atmel.com (10.168.5.13) by cas-ap.atmel.com (10.168.5.162) with Microsoft SMTP Server (TLS) id 14.2.342.3; Thu, 23 Apr 2015 15:02:38 +0800 Date: Thu, 23 Apr 2015 07:02:00 -0000 From: Senthil Kumar Selvaraj To: Georg-Johann Lay CC: , Subject: Re: [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call Message-ID: <20150423070424.GA1915@atmel.com> References: <20150416064347.GA31491@atmel.com> <552F7A8D.1080709@gjlay.de> <20150416092802.GC31491@atmel.com> <552FE79E.2040509@gjlay.de> <20150417074655.GA3186@atmel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20150417074655.GA3186@atmel.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes X-SW-Source: 2015-04/txt/msg01382.txt.bz2 On Fri, Apr 17, 2015 at 01:16:58PM +0530, Senthil Kumar Selvaraj wrote: > On Thu, Apr 16, 2015 at 06:47:26PM +0200, Georg-Johann Lay wrote: > > Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj: > > >On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote: > > >>Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj: > > >>>This patch fixes PR 65657. > > >> > > >>The following artifact appears to be PR63633. > > >> > > > > > >I did see that one - unfortunately, that fix won't help here. IIUC, you > > >check if input/output operand hard regs are in the clobber list, > > >and if yes, you generate pseudos to save and restore clobbered hard > > >regs. > > > > > >In this case, the reg is actually clobbered by a different insn (one > > > > Arrgh, yes... > > > > >that loads the next argument to the function). So unless I blindly generate pseudos for > > >all regs in the clobber list, the clobbering will still happen. > > > > > >FWIW, I did try saving/restoring all clobbered regs, and it did fix the > > >problem - just that it appeared like a (worse) hack to me. Aren't we > > >manually replicating what RA/reload should be doing? > > > > As it appears, we'll have to do it by hand. The attaches patch is just a > > sketch that indicates how the problem could be approached. Notice the new > > assertions in the split expanders; they will throw ICE until the fix is > > actually installed. > > > > The critical insn are generated in movMM expander and are changed to have no > > clobbers (SCRATCHes actually). An a later pass, when appropriate life info > > can be made available, run yet another avr pass that > > > > 1a) Save-restore needed hard regs around the insn. > > > > 2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new > > pseudos. Maybe that could happen due to some hard regs progagation, or we > > can use a new predicate similar combine_pseudo_register_operand. > > > > 3) Replace scratch -> hard regs for all scratch_operands. > > > > 2b) Restore respective hard regs from their pseudos. > > > > 1b) Restore respective hard regs from their pseudos. > > > > > > And maybe we can get better code by allocating things like address register > > by hand and get better code then. > > > > When I implemented some of the libgcc insns I tried to express the operand > > by means of constraints, e.h. for (reg:HI 22) and let register allocator do > > the job. > > > > The resulting code was functional but *horrific*. > > > > The register allocator is not yet ready to generate efficient code in such > > demanding situations... > > > > > > > >What do you think? > > > > > > > IMO sooner or later we'll need such an infrastructure; maybe also for > > non-mov insn that are implemented by transparent libcalls like divmod, mul, > > etc. > > I'm curious how other embedded targets handle this though - arm, for > example? Surely they should have some libcalls/builtins which need > specific hard regs? I should check that out. > > > > > >>>When cfgexpand.c expands a function call, it first figures out the > > >>>registers in which the arguments will go, followed by expansion of the > > >>>arguments themselves (right to left). It later emits mov insns to set > > >>>the precomputed registers with the expanded RTXes. > > >>> > > >>>If one of the arguments is a __memx char pointer dereference, the mov > > >>>eventually expands to gen_xload_A (for certain cases), which > > >>>clobbers R22, R21 and Z. This causes problems if one of those > > >>>clobbered registers was precomputed to hold another argument. > > >>> > > >>>In general, call expansion does not appear to take clobbers into account - > > > > We had been warned that using hard regs is evil... But without that > > technique the code quality would decrease way too much. > > Oh ok - do you know some place where this is documented or was > discussed? > > > > >>>when it checks for argument overlap, the RTX (args[i].value) is only a MEM > > >>>in QImode for the memx deref - the clobber shows up when it eventually > > >>>calls emit_move_insn, at which point, it is too late. > > > > Such situations could only be handled by a target hook which allowed to > > expand specific trees by hand... Such a hook could cater for insn that must > > use hard registers. > > Yes, I guess so :( > > > > >>>This does not happen for a int pointer dereference - turns out that > > >>>precompute_register_parameters does a copy_to_mode_reg if the > > >>>cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a > > >>>pseudo and later assigns the pseudo to the arg register. This is done > > >>>before any moves to arg registers is done, so other arguments are not > > >>>overwritten. > > >>> > > >>>Doing the same thing - providing a better cost estimate for a MEM rtx in > > >>>the non-default address space, makes this problem go away, and that is > > >>>what this patch does. Regression testing does not show any new failures. > > > > Can you tell something about overall code quality? If it is not > > significantly worse then I'd propose to apply your rtx-costs solution soon. > > The full fix will take more time to work it out. > > > > For this piece of code - > > void foo ( char c, unsigned int d); > void readx (const char __memx *x) > { > foo (*x, 0xABCD); > } > > the buggy code (for readx) looks like this > > mov r18,r22 > mov r25,r23 > ldi r22,lo8(-51) > ldi r23,lo8(-85) > mov r30,r18 > mov r31,r25 > mov r21,r24 > call __xload_1 > mov r24,r22 > jmp foo > > With the patch, the code looks like this > > movw r30,r22 > mov r21,r24 > call __xload_1 > mov r24,r22 > ldi r22,lo8(-51) > ldi r23,lo8(-85) > jmp foo > > The code turns out to be better in this case, as the loads to r22,r23 > get done later. > > The dump for cfgexpand is where the change originates - the memx dereference is > saved in a pseudo before the moves to argument registers start. > > Before: > > (insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ]) > (reg:PSI 22 r22 [ x ])) foo.c:4 -1 > (nil)) > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) > (insn 6 3 7 2 (set (reg:HI 22 r22) > (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1 > (nil)) > (insn 7 6 8 2 (parallel [ > (set (reg:QI 24 r24) > (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7])) > (clobber (reg:QI 22 r22)) > (clobber (reg:QI 21 r21)) > (clobber (reg:HI 30 r30)) > ]) foo.c:5 -1 > (nil)) > (call_insn/j 8 7 9 2 (parallel [ > (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41] ) [0 foo S2 A8]) > (const_int 0 [0])) > (use (const_int 1 [0x1])) > ]) foo.c:5 -1 > (nil) > (expr_list:REG_EQUAL (use (reg:QI 24 r24)) > (expr_list:REG_NONNEG (use (reg:HI 22 r22)) > (nil)))) > (barrier 9 8 0) > > > After: > > (insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ]) > (reg:PSI 22 r22 [ x ])) foo.c:4 -1 > (nil)) > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) > (insn 6 3 7 2 (parallel [ > (set (reg:QI 44) > (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7])) > (clobber (reg:QI 22 r22)) > (clobber (reg:QI 21 r21)) > (clobber (reg:HI 30 r30)) > ]) foo.c:5 -1 > (nil)) > (insn 7 6 8 2 (set (reg:HI 22 r22) > (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1 > (nil)) > (insn 8 7 9 2 (set (reg:QI 24 r24) > (reg:QI 44)) foo.c:5 -1 > (nil)) > (call_insn/j 9 8 10 2 (parallel [ > (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41] ) [0 foo S2 A8]) > (const_int 0 [0])) > (use (const_int 1 [0x1])) > ]) foo.c:5 -1 > (nil) > (expr_list:REG_EQUAL (use (reg:QI 24 r24)) > (expr_list:REG_NONNEG (use (reg:HI 22 r22)) > (nil)))) > (barrier 10 9 0) > > Without the patch, and with the arguments > reversed, like this > > void foo ( unsigned int d, char c); > void readx (const char __memx *x) > { > foo (0xABCD, *x); > } > > the compiler elides the dereference itself - it just emits > > ldi r24,lo8(-51) > ldi r25,lo8(-85) > jmp foo > > Scary :). That's because expand has > > (insn 6 3 7 2 (parallel [ > (set (reg:QI 22 r22) > (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7])) > (clobber (reg:QI 22 r22)) > (clobber (reg:QI 21 r21)) > (clobber (reg:HI 30 r30)) > ]) foo.c:5 -1 > (nil)) > > but this, I guess, will not happen if the fix for PR63633 was applied. > > With my cost patch, I get > > movw r30,r22 > mov r21,r24 > call __xload_1 > ldi r24,lo8(-51) > ldi r25,lo8(-85) > jmp foo > > For the testcase posted originally for the bug, > > > extern void writeFlashByte(uint8_t byte, uint16_t handle); > > void writeBuf(const uint8_t __memx *from, const uint8_t __memx *to) > { > uint16_t handle = ((uint16_t)(((__uint24)to) & 0xFFFFUL)); > writeFlashByte(*from++, handle++); > } > > > > Before: > > mov r27,r22 > mov r26,r23 > mov r21,r24 > mov r24,r20 > movw r22,r18 > mov r30,r27 > mov r31,r26 > call __xload_1 > mov r24,r22 > jmp writeFlashByte > > After: > > push r16 > push r17 > movw r16,r18 > mov r18,r20 > movw r30,r22 > mov r21,r24 > call __xload_1 > mov r24,r22 > movw r22,r16 > pop r17 > pop r16 > jmp writeFlashByte > > Looks like worse code, but reload had to choose call saved regs (r16 and r17) for zero_extendpsisi's > output operand, as we don't allow REG_X (or greater) for SI mode values > in avr_hard_regno_mode_ok. > > Looks reasonable to me - what do you guys say? I realize Johann is working on eliminating hard regs usage in favor of register classes/constraints instead, but I think this patch that sets the correct cost for MEM rtx in non-default address space does nothing "wrong", even if it is only a tangential fix to the problem. What do you guys say? > > Regards > Senthil > > > > Johann > >