From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-395283-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 80619 invoked by alias); 16 Apr 2015 16:47:37 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 79914 invoked by uid 89); 16 Apr 2015 16:47:36 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2
X-HELO: mo4-p00-ob.smtp.rzone.de
Received: from mo4-p00-ob.smtp.rzone.de (HELO mo4-p00-ob.smtp.rzone.de) (81.169.146.161) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 16 Apr 2015 16:47:35 +0000
X-RZG-AUTH: :LXoWVUeid/7A29J/hMvvT3ol15ykJcYwTPbBBR62PQx1xqvTHw==
X-RZG-CLASS-ID: mo00
Received: from [192.168.0.22] (ip5b43a95f.dynamic.kabel-deutschland.de [91.67.169.95])	by smtp.strato.de (RZmta 37.5 DYNA|AUTH)	with ESMTPSA id j0388fr3GGlQIKD	(using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA))	(Client did not present a certificate);	Thu, 16 Apr 2015 18:47:26 +0200 (CEST)
Message-ID: <552FE79E.2040509@gjlay.de>
Date: Thu, 16 Apr 2015 16:47:00 -0000
From: Georg-Johann Lay <avr@gjlay.de>
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com>
CC: gcc-patches@gcc.gnu.org, chertykov@gmail.com
Subject: Re: [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call
References: <20150416064347.GA31491@atmel.com> <552F7A8D.1080709@gjlay.de> <20150416092802.GC31491@atmel.com>
In-Reply-To: <20150416092802.GC31491@atmel.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2015-04/txt/msg00823.txt.bz2

Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj:
> On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote:
>> Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
>>> This patch fixes PR 65657.
>>
>> The following artifact appears to be PR63633.
>>
>
> I did see that one - unfortunately, that fix won't help here. IIUC, you
> check if input/output operand hard regs are in the clobber list,
> and if yes, you generate pseudos to save and restore clobbered hard
> regs.
>
> In this case, the reg is actually clobbered by a different insn (one

Arrgh, yes...

> that loads the next argument to the function). So unless I blindly generate pseudos for
> all regs in the clobber list, the clobbering will still happen.
>
> FWIW, I did try saving/restoring all clobbered regs, and it did fix the
> problem - just that it appeared like a (worse) hack to me. Aren't we
> manually replicating what RA/reload should be doing?

As it appears, we'll have to do it by hand.  The attaches patch is just a 
sketch that indicates how the problem could be approached.  Notice the new 
assertions in the split expanders; they will throw ICE until the fix is 
actually installed.

The critical insn are generated in movMM expander and are changed to have no 
clobbers (SCRATCHes actually).  An a later pass, when appropriate life info can 
be made available, run yet another avr pass that

1a) Save-restore needed hard regs around the insn.

2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new 
pseudos.  Maybe that could happen due to some hard regs progagation, or we can 
use a new predicate similar combine_pseudo_register_operand.

3) Replace scratch -> hard regs for all scratch_operands.

2b) Restore respective hard regs from their pseudos.

1b) Restore respective hard regs from their pseudos.


And maybe we can get better code by allocating things like address register by 
hand and get better code then.

When I implemented some of the libgcc insns I tried to express the operand by 
means of constraints, e.h. for (reg:HI 22) and let register allocator do the job.

The resulting code was functional but *horrific*.

The register allocator is not yet ready to generate efficient code in such 
demanding situations...

>
> What do you think?
>

IMO sooner or later we'll need such an infrastructure; maybe also for non-mov 
insn that are implemented by transparent libcalls like divmod, mul, etc.

>>> When cfgexpand.c expands a function call, it first figures out the
>>> registers in which the arguments will go, followed by expansion of the
>>> arguments themselves (right to left). It later emits mov insns to set
>>> the precomputed registers with the expanded RTXes.
>>>
>>> If one of the arguments is a __memx char pointer dereference, the mov
>>> eventually expands to gen_xload<mode>_A (for certain cases), which
>>> clobbers R22, R21 and Z.  This causes problems if one of those
>>> clobbered registers was precomputed to hold another argument.
>>>
>>> In general, call expansion does not appear to take clobbers into account -

We had been warned that using hard regs is evil...  But without that technique 
the code quality would decrease way too much.

>>> when it checks for argument overlap, the RTX (args[i].value) is only a MEM
>>> in QImode for the memx deref - the clobber shows up when it eventually
>>> calls emit_move_insn, at which point, it is too late.

Such situations could only be handled by a target hook which allowed to expand 
specific trees by hand...  Such a hook could cater for insn that must use hard 
registers.

>>> This does not happen for a int pointer dereference - turns out that
>>> precompute_register_parameters does a copy_to_mode_reg if the
>>> cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
>>> pseudo and later assigns the pseudo to the arg register. This is done
>>> before any moves to arg registers is done, so other arguments are not
>>> overwritten.
>>>
>>> Doing the same thing - providing a better cost estimate for a MEM rtx in
>>> the non-default address space, makes this problem go away, and that is
>>> what this patch does. Regression testing does not show any new failures.

Can you tell something about overall code quality? If it is not significantly 
worse then I'd propose to apply your rtx-costs solution soon.  The full fix 
will take more time to work it out.


Johann