public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com>
To: Georg-Johann Lay <avr@gjlay.de>
Cc: <gcc-patches@gcc.gnu.org>, <chertykov@gmail.com>
Subject: Re: [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call
Date: Fri, 17 Apr 2015 07:45:00 -0000	[thread overview]
Message-ID: <20150417074655.GA3186@atmel.com> (raw)
In-Reply-To: <552FE79E.2040509@gjlay.de>

On Thu, Apr 16, 2015 at 06:47:26PM +0200, Georg-Johann Lay wrote:
> Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj:
> >On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote:
> >>Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
> >>>This patch fixes PR 65657.
> >>
> >>The following artifact appears to be PR63633.
> >>
> >
> >I did see that one - unfortunately, that fix won't help here. IIUC, you
> >check if input/output operand hard regs are in the clobber list,
> >and if yes, you generate pseudos to save and restore clobbered hard
> >regs.
> >
> >In this case, the reg is actually clobbered by a different insn (one
> 
> Arrgh, yes...
> 
> >that loads the next argument to the function). So unless I blindly generate pseudos for
> >all regs in the clobber list, the clobbering will still happen.
> >
> >FWIW, I did try saving/restoring all clobbered regs, and it did fix the
> >problem - just that it appeared like a (worse) hack to me. Aren't we
> >manually replicating what RA/reload should be doing?
> 
> As it appears, we'll have to do it by hand.  The attaches patch is just a
> sketch that indicates how the problem could be approached.  Notice the new
> assertions in the split expanders; they will throw ICE until the fix is
> actually installed.
> 
> The critical insn are generated in movMM expander and are changed to have no
> clobbers (SCRATCHes actually).  An a later pass, when appropriate life info
> can be made available, run yet another avr pass that
> 
> 1a) Save-restore needed hard regs around the insn.
> 
> 2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new
> pseudos.  Maybe that could happen due to some hard regs progagation, or we
> can use a new predicate similar combine_pseudo_register_operand.
> 
> 3) Replace scratch -> hard regs for all scratch_operands.
> 
> 2b) Restore respective hard regs from their pseudos.
> 
> 1b) Restore respective hard regs from their pseudos.
> 
> 
> And maybe we can get better code by allocating things like address register
> by hand and get better code then.
> 
> When I implemented some of the libgcc insns I tried to express the operand
> by means of constraints, e.h. for (reg:HI 22) and let register allocator do
> the job.
> 
> The resulting code was functional but *horrific*.
> 
> The register allocator is not yet ready to generate efficient code in such
> demanding situations...
> 
> >
> >What do you think?
> >
> 
> IMO sooner or later we'll need such an infrastructure; maybe also for
> non-mov insn that are implemented by transparent libcalls like divmod, mul,
> etc.

I'm curious how other embedded targets handle this though - arm, for
example? Surely they should have some libcalls/builtins which need 
specific hard regs? I should check that out.

> 
> >>>When cfgexpand.c expands a function call, it first figures out the
> >>>registers in which the arguments will go, followed by expansion of the
> >>>arguments themselves (right to left). It later emits mov insns to set
> >>>the precomputed registers with the expanded RTXes.
> >>>
> >>>If one of the arguments is a __memx char pointer dereference, the mov
> >>>eventually expands to gen_xload<mode>_A (for certain cases), which
> >>>clobbers R22, R21 and Z.  This causes problems if one of those
> >>>clobbered registers was precomputed to hold another argument.
> >>>
> >>>In general, call expansion does not appear to take clobbers into account -
> 
> We had been warned that using hard regs is evil...  But without that
> technique the code quality would decrease way too much.

Oh ok - do you know some place where this is documented or was
discussed?
> 
> >>>when it checks for argument overlap, the RTX (args[i].value) is only a MEM
> >>>in QImode for the memx deref - the clobber shows up when it eventually
> >>>calls emit_move_insn, at which point, it is too late.
> 
> Such situations could only be handled by a target hook which allowed to
> expand specific trees by hand...  Such a hook could cater for insn that must
> use hard registers.

Yes, I guess so :(
> 
> >>>This does not happen for a int pointer dereference - turns out that
> >>>precompute_register_parameters does a copy_to_mode_reg if the
> >>>cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
> >>>pseudo and later assigns the pseudo to the arg register. This is done
> >>>before any moves to arg registers is done, so other arguments are not
> >>>overwritten.
> >>>
> >>>Doing the same thing - providing a better cost estimate for a MEM rtx in
> >>>the non-default address space, makes this problem go away, and that is
> >>>what this patch does. Regression testing does not show any new failures.
> 
> Can you tell something about overall code quality? If it is not
> significantly worse then I'd propose to apply your rtx-costs solution soon.
> The full fix will take more time to work it out.
> 

For this piece of code - 

void foo ( char c, unsigned int d);
void readx (const char __memx *x)
{
    foo (*x, 0xABCD);
}

the buggy code (for readx) looks like this

	mov r18,r22
	mov r25,r23
	ldi r22,lo8(-51)
	ldi r23,lo8(-85)
	mov r30,r18
	mov r31,r25
	mov r21,r24
	call __xload_1
	mov r24,r22
	jmp foo

With the patch, the code looks like this

	movw r30,r22
	mov r21,r24
	call __xload_1
	mov r24,r22
	ldi r22,lo8(-51)
	ldi r23,lo8(-85)
	jmp foo

The code turns out to be better in this case, as the loads to r22,r23
get done later.

The dump for cfgexpand is where the change originates - the memx dereference is 
saved in a pseudo before the moves to argument registers start.

Before:

(insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ])
        (reg:PSI 22 r22 [ x ])) foo.c:4 -1
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:HI 22 r22)
        (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1
     (nil))
(insn 7 6 8 2 (parallel [
            (set (reg:QI 24 r24)
                (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
            (clobber (reg:QI 22 r22))
            (clobber (reg:QI 21 r21))
            (clobber (reg:HI 30 r30))
        ]) foo.c:5 -1
     (nil))
(call_insn/j 8 7 9 2 (parallel [
            (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41]  <function_decl 0x7f7129589900 foo>) [0 foo S2 A8])
                (const_int 0 [0]))
            (use (const_int 1 [0x1]))
        ]) foo.c:5 -1
     (nil)
    (expr_list:REG_EQUAL (use (reg:QI 24 r24))
        (expr_list:REG_NONNEG (use (reg:HI 22 r22))
            (nil))))
(barrier 9 8 0)


After:

(insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ])
        (reg:PSI 22 r22 [ x ])) foo.c:4 -1
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
            (set (reg:QI 44)
                (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
            (clobber (reg:QI 22 r22))
            (clobber (reg:QI 21 r21))
            (clobber (reg:HI 30 r30))
        ]) foo.c:5 -1
     (nil))
(insn 7 6 8 2 (set (reg:HI 22 r22)
        (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1
     (nil))
(insn 8 7 9 2 (set (reg:QI 24 r24)
        (reg:QI 44)) foo.c:5 -1
     (nil))
(call_insn/j 9 8 10 2 (parallel [
            (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41]  <function_decl 0x7fa986a0f900 foo>) [0 foo S2 A8])
                (const_int 0 [0]))
            (use (const_int 1 [0x1]))
        ]) foo.c:5 -1
     (nil)
    (expr_list:REG_EQUAL (use (reg:QI 24 r24))
        (expr_list:REG_NONNEG (use (reg:HI 22 r22))
            (nil))))
(barrier 10 9 0)

Without the patch, and with the arguments
reversed, like this

void foo (  unsigned int d, char c);
void readx (const char __memx *x)
{
    foo (0xABCD, *x);
}

the compiler elides the dereference itself - it just emits

	ldi r24,lo8(-51)
	ldi r25,lo8(-85)
	jmp foo

Scary :). That's because expand has

(insn 6 3 7 2 (parallel [
            (set (reg:QI 22 r22)
                (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
            (clobber (reg:QI 22 r22))
            (clobber (reg:QI 21 r21))
            (clobber (reg:HI 30 r30))
        ]) foo.c:5 -1
     (nil))

but this, I guess, will not happen if the fix for PR63633 was applied.

With my cost patch, I get

	movw r30,r22
	mov r21,r24
	call __xload_1
	ldi r24,lo8(-51)
	ldi r25,lo8(-85)
	jmp foo

For the testcase posted originally for the bug,
<snip>

extern void writeFlashByte(uint8_t byte, uint16_t handle);

void writeBuf(const uint8_t __memx *from, const uint8_t __memx *to)
{
    uint16_t handle = ((uint16_t)(((__uint24)to) & 0xFFFFUL));
    writeFlashByte(*from++, handle++);
}

</snip>

Before:

	mov r27,r22
	mov r26,r23
	mov r21,r24
	mov r24,r20
	movw r22,r18
	mov r30,r27
	mov r31,r26
	call __xload_1
	mov r24,r22
	jmp writeFlashByte

After:

	push r16
	push r17
	movw r16,r18
	mov r18,r20
	movw r30,r22
	mov r21,r24
	call __xload_1
	mov r24,r22
	movw r22,r16
	pop r17
	pop r16
	jmp writeFlashByte

Looks like worse code, but reload had to choose call saved regs (r16 and r17) for zero_extendpsisi's 
output operand, as we don't allow REG_X (or greater) for SI mode values
in avr_hard_regno_mode_ok.

Looks reasonable to me - what do you guys say?

Regards
Senthil
> 
> Johann
> 

  parent reply	other threads:[~2015-04-17  7:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-16  6:42 Senthil Kumar Selvaraj
2015-04-16  9:02 ` Georg-Johann Lay
2015-04-16  9:26   ` Senthil Kumar Selvaraj
2015-04-16 16:47     ` Georg-Johann Lay
2015-04-16 16:49       ` Georg-Johann Lay
2015-04-16 16:55       ` Denis Chertykov
2015-04-17  7:45       ` Senthil Kumar Selvaraj [this message]
2015-04-17 11:34         ` Denis Chertykov
2015-04-23  7:02         ` Senthil Kumar Selvaraj

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150417074655.GA3186@atmel.com \
    --to=senthil_kumar.selvaraj@atmel.com \
    --cc=avr@gjlay.de \
    --cc=chertykov@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).