From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29384 invoked by alias); 17 Apr 2015 07:45:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 27222 invoked by uid 89); 17 Apr 2015 07:45:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: DVREDG02.corp.atmel.com Received: from nasmtp01.atmel.com (HELO DVREDG02.corp.atmel.com) (192.199.1.246) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 17 Apr 2015 07:45:28 +0000 Received: from apsmtp01.atmel.com (10.168.254.31) by DVREDG02.corp.atmel.com (10.42.103.31) with Microsoft SMTP Server (TLS) id 14.2.347.0; Fri, 17 Apr 2015 01:45:18 -0600 Received: from PENCHT01.corp.atmel.com (10.168.5.161) by apsmtp01.atmel.com (10.168.254.31) with Microsoft SMTP Server (TLS) id 14.2.347.0; Fri, 17 Apr 2015 15:45:38 +0800 Received: from atmel.com (10.168.5.13) by cas-ap.atmel.com (10.168.5.161) with Microsoft SMTP Server (TLS) id 14.2.342.3; Fri, 17 Apr 2015 15:45:14 +0800 Date: Fri, 17 Apr 2015 07:45:00 -0000 From: Senthil Kumar Selvaraj To: Georg-Johann Lay CC: , Subject: Re: [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call Message-ID: <20150417074655.GA3186@atmel.com> References: <20150416064347.GA31491@atmel.com> <552F7A8D.1080709@gjlay.de> <20150416092802.GC31491@atmel.com> <552FE79E.2040509@gjlay.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <552FE79E.2040509@gjlay.de> User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes X-SW-Source: 2015-04/txt/msg00853.txt.bz2 On Thu, Apr 16, 2015 at 06:47:26PM +0200, Georg-Johann Lay wrote: > Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj: > >On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote: > >>Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj: > >>>This patch fixes PR 65657. > >> > >>The following artifact appears to be PR63633. > >> > > > >I did see that one - unfortunately, that fix won't help here. IIUC, you > >check if input/output operand hard regs are in the clobber list, > >and if yes, you generate pseudos to save and restore clobbered hard > >regs. > > > >In this case, the reg is actually clobbered by a different insn (one > > Arrgh, yes... > > >that loads the next argument to the function). So unless I blindly generate pseudos for > >all regs in the clobber list, the clobbering will still happen. > > > >FWIW, I did try saving/restoring all clobbered regs, and it did fix the > >problem - just that it appeared like a (worse) hack to me. Aren't we > >manually replicating what RA/reload should be doing? > > As it appears, we'll have to do it by hand. The attaches patch is just a > sketch that indicates how the problem could be approached. Notice the new > assertions in the split expanders; they will throw ICE until the fix is > actually installed. > > The critical insn are generated in movMM expander and are changed to have no > clobbers (SCRATCHes actually). An a later pass, when appropriate life info > can be made available, run yet another avr pass that > > 1a) Save-restore needed hard regs around the insn. > > 2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new > pseudos. Maybe that could happen due to some hard regs progagation, or we > can use a new predicate similar combine_pseudo_register_operand. > > 3) Replace scratch -> hard regs for all scratch_operands. > > 2b) Restore respective hard regs from their pseudos. > > 1b) Restore respective hard regs from their pseudos. > > > And maybe we can get better code by allocating things like address register > by hand and get better code then. > > When I implemented some of the libgcc insns I tried to express the operand > by means of constraints, e.h. for (reg:HI 22) and let register allocator do > the job. > > The resulting code was functional but *horrific*. > > The register allocator is not yet ready to generate efficient code in such > demanding situations... > > > > >What do you think? > > > > IMO sooner or later we'll need such an infrastructure; maybe also for > non-mov insn that are implemented by transparent libcalls like divmod, mul, > etc. I'm curious how other embedded targets handle this though - arm, for example? Surely they should have some libcalls/builtins which need specific hard regs? I should check that out. > > >>>When cfgexpand.c expands a function call, it first figures out the > >>>registers in which the arguments will go, followed by expansion of the > >>>arguments themselves (right to left). It later emits mov insns to set > >>>the precomputed registers with the expanded RTXes. > >>> > >>>If one of the arguments is a __memx char pointer dereference, the mov > >>>eventually expands to gen_xload_A (for certain cases), which > >>>clobbers R22, R21 and Z. This causes problems if one of those > >>>clobbered registers was precomputed to hold another argument. > >>> > >>>In general, call expansion does not appear to take clobbers into account - > > We had been warned that using hard regs is evil... But without that > technique the code quality would decrease way too much. Oh ok - do you know some place where this is documented or was discussed? > > >>>when it checks for argument overlap, the RTX (args[i].value) is only a MEM > >>>in QImode for the memx deref - the clobber shows up when it eventually > >>>calls emit_move_insn, at which point, it is too late. > > Such situations could only be handled by a target hook which allowed to > expand specific trees by hand... Such a hook could cater for insn that must > use hard registers. Yes, I guess so :( > > >>>This does not happen for a int pointer dereference - turns out that > >>>precompute_register_parameters does a copy_to_mode_reg if the > >>>cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a > >>>pseudo and later assigns the pseudo to the arg register. This is done > >>>before any moves to arg registers is done, so other arguments are not > >>>overwritten. > >>> > >>>Doing the same thing - providing a better cost estimate for a MEM rtx in > >>>the non-default address space, makes this problem go away, and that is > >>>what this patch does. Regression testing does not show any new failures. > > Can you tell something about overall code quality? If it is not > significantly worse then I'd propose to apply your rtx-costs solution soon. > The full fix will take more time to work it out. > For this piece of code - void foo ( char c, unsigned int d); void readx (const char __memx *x) { foo (*x, 0xABCD); } the buggy code (for readx) looks like this mov r18,r22 mov r25,r23 ldi r22,lo8(-51) ldi r23,lo8(-85) mov r30,r18 mov r31,r25 mov r21,r24 call __xload_1 mov r24,r22 jmp foo With the patch, the code looks like this movw r30,r22 mov r21,r24 call __xload_1 mov r24,r22 ldi r22,lo8(-51) ldi r23,lo8(-85) jmp foo The code turns out to be better in this case, as the loads to r22,r23 get done later. The dump for cfgexpand is where the change originates - the memx dereference is saved in a pseudo before the moves to argument registers start. Before: (insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ]) (reg:PSI 22 r22 [ x ])) foo.c:4 -1 (nil)) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 7 2 (set (reg:HI 22 r22) (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1 (nil)) (insn 7 6 8 2 (parallel [ (set (reg:QI 24 r24) (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7])) (clobber (reg:QI 22 r22)) (clobber (reg:QI 21 r21)) (clobber (reg:HI 30 r30)) ]) foo.c:5 -1 (nil)) (call_insn/j 8 7 9 2 (parallel [ (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41] ) [0 foo S2 A8]) (const_int 0 [0])) (use (const_int 1 [0x1])) ]) foo.c:5 -1 (nil) (expr_list:REG_EQUAL (use (reg:QI 24 r24)) (expr_list:REG_NONNEG (use (reg:HI 22 r22)) (nil)))) (barrier 9 8 0) After: (insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ]) (reg:PSI 22 r22 [ x ])) foo.c:4 -1 (nil)) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 7 2 (parallel [ (set (reg:QI 44) (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7])) (clobber (reg:QI 22 r22)) (clobber (reg:QI 21 r21)) (clobber (reg:HI 30 r30)) ]) foo.c:5 -1 (nil)) (insn 7 6 8 2 (set (reg:HI 22 r22) (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1 (nil)) (insn 8 7 9 2 (set (reg:QI 24 r24) (reg:QI 44)) foo.c:5 -1 (nil)) (call_insn/j 9 8 10 2 (parallel [ (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41] ) [0 foo S2 A8]) (const_int 0 [0])) (use (const_int 1 [0x1])) ]) foo.c:5 -1 (nil) (expr_list:REG_EQUAL (use (reg:QI 24 r24)) (expr_list:REG_NONNEG (use (reg:HI 22 r22)) (nil)))) (barrier 10 9 0) Without the patch, and with the arguments reversed, like this void foo ( unsigned int d, char c); void readx (const char __memx *x) { foo (0xABCD, *x); } the compiler elides the dereference itself - it just emits ldi r24,lo8(-51) ldi r25,lo8(-85) jmp foo Scary :). That's because expand has (insn 6 3 7 2 (parallel [ (set (reg:QI 22 r22) (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7])) (clobber (reg:QI 22 r22)) (clobber (reg:QI 21 r21)) (clobber (reg:HI 30 r30)) ]) foo.c:5 -1 (nil)) but this, I guess, will not happen if the fix for PR63633 was applied. With my cost patch, I get movw r30,r22 mov r21,r24 call __xload_1 ldi r24,lo8(-51) ldi r25,lo8(-85) jmp foo For the testcase posted originally for the bug, extern void writeFlashByte(uint8_t byte, uint16_t handle); void writeBuf(const uint8_t __memx *from, const uint8_t __memx *to) { uint16_t handle = ((uint16_t)(((__uint24)to) & 0xFFFFUL)); writeFlashByte(*from++, handle++); } Before: mov r27,r22 mov r26,r23 mov r21,r24 mov r24,r20 movw r22,r18 mov r30,r27 mov r31,r26 call __xload_1 mov r24,r22 jmp writeFlashByte After: push r16 push r17 movw r16,r18 mov r18,r20 movw r30,r22 mov r21,r24 call __xload_1 mov r24,r22 movw r22,r16 pop r17 pop r16 jmp writeFlashByte Looks like worse code, but reload had to choose call saved regs (r16 and r17) for zero_extendpsisi's output operand, as we don't allow REG_X (or greater) for SI mode values in avr_hard_regno_mode_ok. Looks reasonable to me - what do you guys say? Regards Senthil > > Johann >