public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Modifying ARM code generator for elimination of 8bit writes - need help
@ 2006-05-28 20:23 Wolfgang Mües
  2006-05-30 20:04 ` Paul Brook
  0 siblings, 1 reply; 54+ messages in thread
From: Wolfgang Mües @ 2006-05-28 20:23 UTC (permalink / raw)
  To: gcc

Hello,

I am trying to port big C/C++ programs (see www.dslinux.org) to the 
nintendo DS console.

The console has 4 Mbytes internal memory, and 32 MBytes external
memory which is *not* 8bit writable (only 16 and 32 bits). CPU is an ARM 
946. Using the external memory for ROM(XIP) and the internal memory for 
data, linux in console mode is possible, but graphical environments are 
very limited...

The idea to overcome this problem is to
a) activate data cache in writeback mode for the external memory.
b) modify the gcc code generator. "strb" opcode is transformed to 
"swpb". swpb will load the cache because of the read-modify-write,
and at cache writeback time, the whole cached half-line will be written 
back, eliminating the 8bit write problem. 

I have proven the solution with an assembler program, but I think I need 
some help modifying the compiler....

I found arm.md and the moveqi insns, but because of the different 
addressing modes of strb and swpb, its not easy to make the change.
And there must be a compiler option for this, too.

Could somebody please tell me how to implement this change?

regards

Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-05-28 20:23 Modifying ARM code generator for elimination of 8bit writes - need help Wolfgang Mües
@ 2006-05-30 20:04 ` Paul Brook
  2006-05-30 21:47   ` Daniel Jacobowitz
  2006-05-31 20:49   ` Wolfgang Mües
  0 siblings, 2 replies; 54+ messages in thread
From: Paul Brook @ 2006-05-30 20:04 UTC (permalink / raw)
  To: gcc; +Cc: Wolfgang Mües

> I found arm.md and the moveqi insns, but because of the different
> addressing modes of strb and swpb, its not easy to make the change.
> And there must be a compiler option for this, too.
>
> Could somebody please tell me how to implement this change?

Short answer is probably not.

There are a couple of complications that spring to mind. The different 
addressing modes and the fact that swp clobbers a register are the most 
immediate ones.

You'll need to modify at least the movqi insn patterns, memory constraints and 
the legitimate address stuff. I'm not sure about the clobber, that might need 
additional reload-related machinery.

Paul

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-05-30 20:04 ` Paul Brook
@ 2006-05-30 21:47   ` Daniel Jacobowitz
  2006-05-30 23:14     ` Paul Brook
  2006-05-31 20:40     ` Wolfgang Mües
  2006-05-31 20:49   ` Wolfgang Mües
  1 sibling, 2 replies; 54+ messages in thread
From: Daniel Jacobowitz @ 2006-05-30 21:47 UTC (permalink / raw)
  To: Paul Brook; +Cc: gcc, Wolfgang Mües

On Tue, May 30, 2006 at 09:03:54PM +0100, Paul Brook wrote:
> > I found arm.md and the moveqi insns, but because of the different
> > addressing modes of strb and swpb, its not easy to make the change.
> > And there must be a compiler option for this, too.
> >
> > Could somebody please tell me how to implement this change?
> 
> Short answer is probably not.
> 
> There are a couple of complications that spring to mind. The different 
> addressing modes and the fact that swp clobbers a register are the most 
> immediate ones.
> 
> You'll need to modify at least the movqi insn patterns, memory constraints and 
> the legitimate address stuff. I'm not sure about the clobber, that might need 
> additional reload-related machinery.

I suspect it would be better to make GCC do halfword stores instead
(read/modify/write).

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-05-30 21:47   ` Daniel Jacobowitz
@ 2006-05-30 23:14     ` Paul Brook
  2006-05-31 20:40     ` Wolfgang Mües
  1 sibling, 0 replies; 54+ messages in thread
From: Paul Brook @ 2006-05-30 23:14 UTC (permalink / raw)
  To: gcc; +Cc: Daniel Jacobowitz, Wolfgang Mües

> > There are a couple of complications that spring to mind. The different
> > addressing modes and the fact that swp clobbers a register are the most
> > immediate ones.
> >
> > You'll need to modify at least the movqi insn patterns, memory
> > constraints and the legitimate address stuff. I'm not sure about the
> > clobber, that might need additional reload-related machinery.
>
> I suspect it would be better to make GCC do halfword stores instead
> (read/modify/write).

Does gcc have mechanisms for doing this, or would you have to hide it all 
inside the movqi pattern?
If the latter I don't see much difference in terms of implementation. strb and 
strh use different addressing modes :-)

Whether ldrh/strh are faster than swpb probably depends whether your cpu 
supports unaligned loads. It also means byte stores are no longer atomic, 
though I don't know whether that is important.

Paul

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-05-30 21:47   ` Daniel Jacobowitz
  2006-05-30 23:14     ` Paul Brook
@ 2006-05-31 20:40     ` Wolfgang Mües
  1 sibling, 0 replies; 54+ messages in thread
From: Wolfgang Mües @ 2006-05-31 20:40 UTC (permalink / raw)
  To: Paul Brook, gcc

On Tuesday 30 May 2006 23:47, Daniel Jacobowitz wrote:
> On Tue, May 30, 2006 at 09:03:54PM +0100, Paul Brook wrote:
> > > I found arm.md and the moveqi insns, but because of the different
> > > addressing modes of strb and swpb, its not easy to make the
> > > change. And there must be a compiler option for this, too.
> > >
> > > Could somebody please tell me how to implement this change?
> >
> > Short answer is probably not.
> >
> > There are a couple of complications that spring to mind. The
> > different addressing modes and the fact that swp clobbers a
> > register are the most immediate ones.
> >
> > You'll need to modify at least the movqi insn patterns, memory
> > constraints and the legitimate address stuff. I'm not sure about
> > the clobber, that might need additional reload-related machinery.
>
> I suspect it would be better to make GCC do halfword stores instead
> (read/modify/write).

Hmmm... I have thought about that. But how do the compiler know if the 
byte address is even or odd? Every time testing the LSB of the address 
and making conditional statements is no joke...

regards

Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-05-30 20:04 ` Paul Brook
  2006-05-30 21:47   ` Daniel Jacobowitz
@ 2006-05-31 20:49   ` Wolfgang Mües
  2006-05-31 20:59     ` Paul Brook
                       ` (2 more replies)
  1 sibling, 3 replies; 54+ messages in thread
From: Wolfgang Mües @ 2006-05-31 20:49 UTC (permalink / raw)
  To: gcc

Paul,

thank you for commenting...

On Tuesday 30 May 2006 22:03, Paul Brook wrote:
> > I found arm.md and the moveqi insns, but because of the different
> > addressing modes of strb and swpb, its not easy to make the change.
> > And there must be a compiler option for this, too.
> >
> > Could somebody please tell me how to implement this change?
>
> Short answer is probably not.
>
> There are a couple of complications that spring to mind. The
> different addressing modes and the fact that swp clobbers a register
> are the most immediate ones.
>
> You'll need to modify at least the movqi insn patterns, memory
> constraints and the legitimate address stuff. I'm not sure about the
> clobber, that might need additional reload-related machinery.

For the first shot, I have changed

> (define_insn "*arm_movqi_insn"
>   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
> 	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]
>   "TARGET_ARM
>    && (   register_operand (operands[0], QImode)
>
>        || register_operand (operands[1], QImode))"
>
>   "@
>    mov%?\\t%0, %1
>    mvn%?\\t%0, #%B1
>    ldr%?b\\t%0, %1
>    str%?b\\t%1, %0"
>   [(set_attr "type" "*,*,load1,store1")
>    (set_attr "predicable" "yes")]
> )

into

> (define_insn "*arm_movqi_insn"
>   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,Q")
> 	(match_operand:QI 1 "general_operand" "rI,K,m,+r"))]
>   "TARGET_ARM
>    && (   register_operand (operands[0], QImode)
>
>        || register_operand (operands[1], QImode))"
>
>   "@
>    mov%?\\t%0, %1
>    mvn%?\\t%0, #%B1
>    ldr%?b\\t%0, %1
>    swp%?b\\t%1, %1, [%M0]"
>   [(set_attr "type" "*,*,load1,store1")
>    (set_attr "predicable" "yes")]
> )

Changing "m" to "Q", narrowing the address modes
Changing "r" to "+r", (register is globbered)
and of course making the swpb call..

Gcc compiles, but does a segfault while compiling ARM programs.

regards

Wolfgang












-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-05-31 20:49   ` Wolfgang Mües
@ 2006-05-31 20:59     ` Paul Brook
  2006-06-01 14:18     ` Rask Ingemann Lambertsen
  2006-06-04 21:36     ` Rask Ingemann Lambertsen
  2 siblings, 0 replies; 54+ messages in thread
From: Paul Brook @ 2006-05-31 20:59 UTC (permalink / raw)
  To: gcc; +Cc: Wolfgang Mües

> > (define_insn "*arm_movqi_insn"
> >   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,Q")
> > 	(match_operand:QI 1 "general_operand" "rI,K,m,+r"))]
> Changing "m" to "Q", narrowing the address modes
> Changing "r" to "+r", (register is globbered)

This is wrong. the "+" applies to the operand, not the individual alternative 
(thus should go at the start of the constraint string).

Also, I doubt you can just say "movqi clobbers its input".

I expect you still have to effectively do the same as if you were doing a 
read-modify-write implementation. Otherwise reload will get very upset with 
you. swpb effectively just implements that in a single instruction rather 
than having to do it manually.
See the alpha backend for an example of how this can be implemented.

Paul

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-05-31 20:49   ` Wolfgang Mües
  2006-05-31 20:59     ` Paul Brook
@ 2006-06-01 14:18     ` Rask Ingemann Lambertsen
  2006-06-02  6:24       ` Wolfgang Mües
  2006-06-04 21:36     ` Rask Ingemann Lambertsen
  2 siblings, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-01 14:18 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Wed, May 31, 2006 at 10:49:35PM +0200, Wolfgang Mües wrote:

> > (define_insn "*arm_movqi_insn"
> >   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,Q")
> > 	(match_operand:QI 1 "general_operand" "rI,K,m,+r"))]
> >   "TARGET_ARM
> >    && (   register_operand (operands[0], QImode)
> >
> >        || register_operand (operands[1], QImode))"
> >
> >   "@
> >    mov%?\\t%0, %1
> >    mvn%?\\t%0, #%B1
> >    ldr%?b\\t%0, %1
> >    swp%?b\\t%1, %1, [%M0]"
> >   [(set_attr "type" "*,*,load1,store1")
> >    (set_attr "predicable" "yes")]
> > )
> 
> Changing "m" to "Q", narrowing the address modes
> Changing "r" to "+r", (register is globbered)
> and of course making the swpb call..

I think you will need to remove the '+' as already suggested and add
(clobber (match_scratch:QI "=X,X,X,1")) to tell GCC that the register
allocated to operand 1 is clobbered by the instruction for this particular
alternative. You will also have to modify any code which expands this
pattern accordingly.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-01 14:18     ` Rask Ingemann Lambertsen
@ 2006-06-02  6:24       ` Wolfgang Mües
  2006-06-02  7:24         ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 54+ messages in thread
From: Wolfgang Mües @ 2006-06-02  6:24 UTC (permalink / raw)
  To: gcc

Rask,

On Thursday 01 June 2006 16:13, Rask Ingemann Lambertsen wrote:
> I think you will need to remove the '+' as already suggested and add
> (clobber (match_scratch:QI "=X,X,X,1")) to tell GCC that the register
> allocated to operand 1 is clobbered by the instruction for this
> particular alternative.

Using 

(define_insn "*arm_movqi_insn"
  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
	(match_operand:QI 1 "general_operand" "rI,K,m,r"))
	(clobber (match_scratch:QI 2 "=X,X,X,1"))]
  "TARGET_ARM
   && (   register_operand (operands[0], QImode)
       || register_operand (operands[1], QImode))"
  "@
   mov%?\\t%0, %1
   mvn%?\\t%0, #%B1
   ldr%?b\\t%0, %1
   str%?b\\t%1, %0"
  [(set_attr "type" "*,*,load1,store1")
   (set_attr "predicable" "yes")]
)

(_only_ adding the clobber statement),
I get

> /data1/home/wolfgang/Projekte/DSO/devkitpro/buildscripts/newlib-1.14.
>0/newlib/li bc/argz/argz_create_sep.c: In function 'argz_create_sep':
> /data1/home/wolfgang/Projekte/DSO/devkitpro/buildscripts/newlib-1.14.
>0/newlib/li bc/argz/argz_create_sep.c:60: error: unrecognizable insn:
> (insn 192 21 24 0
> /data1/home/wolfgang/Projekte/DSO/devkitpro/buildscripts/newli
> b-1.14.0/newlib/libc/argz/argz_create_sep.c:29 (set (reg:QI 1 r1)
> (reg:QI 4 r4)) -1 (nil)
>     (nil))
> /data1/home/wolfgang/Projekte/DSO/devkitpro/buildscripts/newlib-1.14.
>0/newlib/li bc/argz/argz_create_sep.c:60: internal compiler error: in
> extract_insn, at recog .c:2020

What do you mean with

> You will also have to modify any code which 
> expands this pattern accordingly.

I will use this weekend to digg deeper into the documentation...

thank you for your help so far...

Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-02  6:24       ` Wolfgang Mües
@ 2006-06-02  7:24         ` Rask Ingemann Lambertsen
  2006-06-04 10:31           ` Wolfgang Mües
  2006-06-17  8:52           ` Rask Ingemann Lambertsen
  0 siblings, 2 replies; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-02  7:24 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Fri, Jun 02, 2006 at 08:23:49AM +0200, Wolfgang Mües wrote:
> Rask,
> 
> (_only_ adding the clobber statement),
> I get

> >0/newlib/li bc/argz/argz_create_sep.c:60: error: unrecognizable insn:
> (insn 192 21 24 0 (set (reg:QI 1 r1) (reg:QI 4 r4)) -1
>      (nil) (nil))
> 
> What do you mean with
> 
> > You will also have to modify any code which 
> > expands this pattern accordingly.

The rest of the ARM backend presently assumes that the pattern has the form

(set (operand:QI 0) (operand:QI 1))

but now we've changed it to

(parallel [(set (operand:QI 0) (operand:QI 1))
	   (clobber (operand:QI 2))
])

so that's why you get "unrecognizable insn" errors now. Any place which
intended to generate an *arm_movqi_insn has to add a clobber also. For a
start, this means the "movqi" pattern.

There may be a faster way of seeing if the modification is going to work for
the DS at all. I noticed from the output template "swp%?b\\t%1, %1, [%M0]"
that "swp" takes three operands. I don't know ARM assembler, but you may be
able to choose to always clobber a specific register. Make it a fixed register
(see FIXED_REGISTERS), refer to this register directly in the output template
and don't add a clobber to the movqi patterns. IMHO, that's an acceptable hack
at an experimental stage. If the resulting code runs correctly on the DS, you
can then undo the FIXED_REGISTERS change and add the clobber statements.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-02  7:24         ` Rask Ingemann Lambertsen
@ 2006-06-04 10:31           ` Wolfgang Mües
  2006-06-04 11:25             ` Paul Brook
  2006-06-04 20:10             ` Rask Ingemann Lambertsen
  2006-06-17  8:52           ` Rask Ingemann Lambertsen
  1 sibling, 2 replies; 54+ messages in thread
From: Wolfgang Mües @ 2006-06-04 10:31 UTC (permalink / raw)
  To: gcc

Hello Rask,

On Friday 02 June 2006 09:24, Rask Ingemann Lambertsen wrote:
> There may be a faster way of seeing if the modification is going to
> work for the DS at all. I noticed from the output template
> "swp%?b\\t%1, %1, [%M0]" that "swp" takes three operands. I don't
> know ARM assembler, but you may be able to choose to always clobber a
> specific register. Make it a fixed register (see FIXED_REGISTERS),
> refer to this register directly in the output template and don't add
> a clobber to the movqi patterns. IMHO, that's an acceptable hack at
> an experimental stage. If the resulting code runs correctly on the
> DS, you can then undo the FIXED_REGISTERS change and add the clobber
> statements.

I have tried this. No luck. Problem is the lack of addressing modes for 
the swp instruction. Only a simple pointer in a register (no offset, no 
auto-increment is allowed).

After reading most of the gcc rtl documentation (and forgetting way too 
much..) I came up to the following conclusion:

Splitting the insn

(define_insn "*arm_movqi_insn"
  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]

into 4 different insns:

(define_insn "*arm_movqi_insn"
  [(set (match_operand:QI 0 "register_operand" "")
         (match_operand:QI 1 "register_operand" ""))]

(define_insn "*arm_movnqi_insn"
  [(set (match_operand:QI 0 "register_operand" "")
         (match_operand:QI 1 "constant_operand" ""))]

(define_insn "*arm_loadqi_insn"
  [(set (match_operand:QI 0 "register_operand" "")
         (match_operand:QI 1 "memory_operand" ""))]

(define_insn "*arm_storeqi_insn"
  [(set (match_operand:QI 0 "memory_operand" "")
         (match_operand:QI 1 "register_operand" ""))]

This should give the same function as before, but I then I can do

(define_insn "*arm_storeqi_insn"
  [(set (match_operand:QI 0 "simple_memory_operand" "")
         (match_operand:QI 1 "register_operand" ""))]

etc

to limit the addressing modes of the store insn to the limits of the 
swpb instruction.

And then I can recode the 

(define_expand "movqi"
  [(set (match_operand:QI 0 "general_operand" "")
        (match_operand:QI 1 "general_operand" ""))]

to cope with the movqi requirements defined in the gcc manual.

Hmmm... wondering who all these xxx_operand functions are defined, and 
where they are documented...

Is this the right way to go?

regards

Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-04 10:31           ` Wolfgang Mües
@ 2006-06-04 11:25             ` Paul Brook
  2006-06-04 15:26               ` Wolfgang Mües
  2006-06-04 21:01               ` Rask Ingemann Lambertsen
  2006-06-04 20:10             ` Rask Ingemann Lambertsen
  1 sibling, 2 replies; 54+ messages in thread
From: Paul Brook @ 2006-06-04 11:25 UTC (permalink / raw)
  To: gcc; +Cc: Wolfgang Mües

On Sunday 04 June 2006 11:31, Wolfgang Mües wrote:
> Hello Rask,
>
> On Friday 02 June 2006 09:24, Rask Ingemann Lambertsen wrote:
> > There may be a faster way of seeing if the modification is going to
> > work for the DS at all. I noticed from the output template
> > "swp%?b\\t%1, %1, [%M0]" that "swp" takes three operands. I don't
> > know ARM assembler, but you may be able to choose to always clobber a
> > specific register. Make it a fixed register (see FIXED_REGISTERS),
> > refer to this register directly in the output template and don't add
> > a clobber to the movqi patterns. IMHO, that's an acceptable hack at
> > an experimental stage. If the resulting code runs correctly on the
> > DS, you can then undo the FIXED_REGISTERS change and add the clobber
> > statements.
>
> I have tried this. No luck. Problem is the lack of addressing modes for
> the swp instruction. Only a simple pointer in a register (no offset, no
> auto-increment is allowed).
>
> After reading most of the gcc rtl documentation (and forgetting way too
> much..) I came up to the following conclusion:
>
> Splitting the insn
>
> (define_insn "*arm_movqi_insn"
>   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
> 	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]
>
> into 4 different insns:

No. This is completely the wrong approach. You should just change the valid 
QImode memory addresses, adding a new constraint if neccessary. You also need 
to tweak the reload legitimate address bits to obey the new restrictions.

For the record these hacks are unlikely to ever be acceptable in mainline gcc. 
They're relatively invasive changes who's only purpose is to support 
fundamentally broken hardware.

Paul

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-04 11:25             ` Paul Brook
@ 2006-06-04 15:26               ` Wolfgang Mües
  2006-06-04 15:57                 ` Paul Brook
  2006-06-04 21:22                 ` Rask Ingemann Lambertsen
  2006-06-04 21:01               ` Rask Ingemann Lambertsen
  1 sibling, 2 replies; 54+ messages in thread
From: Wolfgang Mües @ 2006-06-04 15:26 UTC (permalink / raw)
  To: gcc

Paul,

On Sunday 04 June 2006 13:24, Paul Brook wrote:
> On Sunday 04 June 2006 11:31, Wolfgang Mües wrote:
> > Splitting the insn
> >
> > (define_insn "*arm_movqi_insn"
> >   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
> > 	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]
> >
> > into 4 different insns:
>
> No. This is completely the wrong approach.

Why? I am learning.

> You should just change the valid QImode memory addresses, adding a new 
> constraint if neccessary. 

Hmmmm... I have tried this. I have changed the operand constraint from 
"m" to "Q". But these constraints are only used to select the right 
alternative inside the insn, not which insn is invoked. It might be 
possible to modify "nonimmediate_operand"
into something else, to select this insn only if the address is fitting 
in a single register, without offset or increment.

But this will not give me the freedom to allocate a temporary register. 
According to the manual, mov insns are not supposed to clobber a 
register. I suppose I will have to allocate these registers in 

(define_expand "movqi"
  [(set (match_operand:QI 0 "general_operand" "")
        (match_operand:QI 1 "general_operand" ""))]

So I have to narrow down the constraint "nonimmediate_operand", so that 
every memory address not fitting in a single register will not invoke 
arm_movqi_insn.

Please correct me if I'm wrong. This is my first encounter with the 
inner contents of gcc. I may have completely missed your point.

> You also need to tweak the reload legitimate address bits to obey the
> new restrictions.

Can you show me what you mean here? What to do where?

> For the record these hacks are unlikely to ever be acceptable in
> mainline gcc. They're relatively invasive changes who's only purpose
> is to support fundamentally broken hardware.

Paul, this is clear to me. Homebrew software on the DS is not so 
important to justify such a change in mainline gcc. A patch will be 
fine.

Its a big amount of - sometimes frustrating - work for a gcc newbie to 
make this change. I am doing this only because I know it's the only 
solution, and to turn the command line only DS linux into some nice 
PDA/browser/wireless client machine.

regards

Wolfgang

-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-04 15:26               ` Wolfgang Mües
@ 2006-06-04 15:57                 ` Paul Brook
  2006-06-05 10:07                   ` Wolfgang Mües
  2006-06-04 21:22                 ` Rask Ingemann Lambertsen
  1 sibling, 1 reply; 54+ messages in thread
From: Paul Brook @ 2006-06-04 15:57 UTC (permalink / raw)
  To: gcc; +Cc: Wolfgang Mües

On Sunday 04 June 2006 16:26, Wolfgang Mües wrote:
> Paul,
>
> On Sunday 04 June 2006 13:24, Paul Brook wrote:
> > On Sunday 04 June 2006 11:31, Wolfgang Mües wrote:
> > > Splitting the insn
> > >
> > > (define_insn "*arm_movqi_insn"
> > >   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
> > > 	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]
> > >
> > > into 4 different insns:
> >
> > No. This is completely the wrong approach.
>
> Why? I am learning.

Because then you have several different patterns for the same operation.
The different variants of movsi should be part of the same pattern so that the 
compiler can change its mind which variant it wants to use.

> > You should just change the valid QImode memory addresses, adding a new
> > constraint if neccessary.
>
> Hmmmm... I have tried this. I have changed the operand constraint from
> "m" to "Q". But these constraints are only used to select the right
> alternative inside the insn, not which insn is invoked. It might be
> possible to modify "nonimmediate_operand"
> into something else, to select this insn only if the address is fitting
> in a single register, without offset or increment.
>
> But this will not give me the freedom to allocate a temporary register.
> According to the manual, mov insns are not supposed to clobber a
> register. I suppose I will have to allocate these registers in
>
> (define_expand "movqi"
>   [(set (match_operand:QI 0 "general_operand" "")
>         (match_operand:QI 1 "general_operand" ""))]
>
> So I have to narrow down the constraint "nonimmediate_operand", so that
> every memory address not fitting in a single register will not invoke
> arm_movqi_insn.

You're confusing constraints and predicates. general_operand is the predicate.
The predicate says under which conditions the insn will match.
The constraints tell regalooc/reload how to make sure the operands of the 
instruction are valid.

To simplify greatly the predicates are used for pattern matching during early 
RTL generation and the constraints are used in the later regalloc/reload 
stages.

Tightening the predicates isn't sufficient (and may not even be neccessary). 
You need to set the constraints so that the compiler knows *how* to fix 
invalid instructions.

The compilcation is that while constraints give sufficient information for the 
compiler to generate correct code they don't help generating good code.  
There are often non-obvious target specific ways of reloading invalid 
addresses. So reload has additional hooks (eg. GO_IF_LEGITIMATE_ADDRESS) to 
provice clever ways of fixing invalid operands. Generally speaking it's up to 
the target backend to make sure the code generated by these satisfies the 
appropriate constraints, and gcc will ICE if they aren't consistent.

I'm afraid I can't really be more specific that this. That would require me 
implementing your changes and figuring out what's going wrong.

Paul

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-04 10:31           ` Wolfgang Mües
  2006-06-04 11:25             ` Paul Brook
@ 2006-06-04 20:10             ` Rask Ingemann Lambertsen
  1 sibling, 0 replies; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-04 20:10 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Sun, Jun 04, 2006 at 12:31:08PM +0200, Wolfgang Mües wrote:
> Hello Rask,
> 
> On Friday 02 June 2006 09:24, Rask Ingemann Lambertsen wrote:
> > There may be a faster way of seeing if the modification is going to
> > work for the DS at all. I noticed from the output template
> > "swp%?b\\t%1, %1, [%M0]" that "swp" takes three operands. I don't
> > know ARM assembler, but you may be able to choose to always clobber a
> > specific register. Make it a fixed register (see FIXED_REGISTERS),
> > refer to this register directly in the output template and don't add
> > a clobber to the movqi patterns. IMHO, that's an acceptable hack at
> > an experimental stage. If the resulting code runs correctly on the
> > DS, you can then undo the FIXED_REGISTERS change and add the clobber
> > statements.
> 
> I have tried this. No luck. Problem is the lack of addressing modes for 
> the swp instruction. Only a simple pointer in a register (no offset, no 
> auto-increment is allowed).

What about the "Q" constraint? From gcc/config/arm/constraints.md:
"In ARM state an address that is a single base register." Reload knows how
to fix up addresses in a number of cases that happen often, and I believe
this is one of them.

> Hmmm... wondering who all these xxx_operand functions are defined, and 
> where they are documented...

They are documented in section "Machine-Independent Predicates" of the
manual. I haven't checked where they are defined, but 

	grep -e '^.*_operand_p' -e '^.*_address_p' gcc/*.c

should tell you where. Note that strict_memory_address_p() is just a special
case of memory_address_p() which checks for specific hard registers as
base/index registers if the target has such requirements. This is used
starting with the reload pass.

> Is this the right way to go?

I'm pretty sure it is best to have one insn pattern with several
alternatives. Reload might change the instruction enough that it ends up
using a different alternative. For example, using a register alternative for
a constant operand because the constant isn't a valid immediate operand.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-04 11:25             ` Paul Brook
  2006-06-04 15:26               ` Wolfgang Mües
@ 2006-06-04 21:01               ` Rask Ingemann Lambertsen
  2006-06-05  0:12                 ` Dave Murphy
  2006-06-05 10:07                 ` Richard Earnshaw
  1 sibling, 2 replies; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-04 21:01 UTC (permalink / raw)
  To: Paul Brook; +Cc: gcc, Wolfgang Mües

On Sun, Jun 04, 2006 at 12:24:53PM +0100, Paul Brook wrote:

> For the record these hacks are unlikely to ever be acceptable in mainline gcc. 
> They're relatively invasive changes who's only purpose is to support 
> fundamentally broken hardware.

We don't yet know if they'll be invasive. There's a good chance that they
won't be more than a few new insn patterns, a secondary output reload
to provide a scratch register and a pair of new options.

There are other targets with targets specific options to work around this or
that bug, quirk, defect or errata. In this case, why would two options
-mno-byte-writes and -mbyte-writes, with the latter being the default, be
unlikely to be acceptable? In particular, the MT port has these two options:

-mbacc
	Use byte loads and stores when generating code.

-mno-bacc
	Do not use byte loads and stores when generating code.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-04 15:26               ` Wolfgang Mües
  2006-06-04 15:57                 ` Paul Brook
@ 2006-06-04 21:22                 ` Rask Ingemann Lambertsen
  1 sibling, 0 replies; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-04 21:22 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Sun, Jun 04, 2006 at 05:26:42PM +0200, Wolfgang Mües wrote:
> Paul,
> 
> On Sunday 04 June 2006 13:24, Paul Brook wrote:
> 
> > You should just change the valid QImode memory addresses, adding a new 
> > constraint if neccessary. 
> 
> Hmmmm... I have tried this. I have changed the operand constraint from 
> "m" to "Q". But these constraints are only used to select the right 
> alternative inside the insn, not which insn is invoked. It might be 
> possible to modify "nonimmediate_operand"
> into something else, to select this insn only if the address is fitting 
> in a single register, without offset or increment.

There's no need to modify "nonimmediate_operand" or use another predicate.
It matches a memory operand, which is fine.

> But this will not give me the freedom to allocate a temporary register. 

See TARGET_SECONDARY_RELOAD in the section "Register Classes".

> According to the manual, mov insns are not supposed to clobber a 
> register.

I can't find any statement to that effect. Where does it say so? But I can
see why it would be a problem when reload_in_progress.
(It would make it difficult to convert the m68k backend from cc0 to CC_REG.)

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-05-31 20:49   ` Wolfgang Mües
  2006-05-31 20:59     ` Paul Brook
  2006-06-01 14:18     ` Rask Ingemann Lambertsen
@ 2006-06-04 21:36     ` Rask Ingemann Lambertsen
  2006-06-05 12:20       ` Wolfgang Mües
  2 siblings, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-04 21:36 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Wed, May 31, 2006 at 10:49:35PM +0200, Wolfgang Mües wrote:

> > (define_insn "*arm_movqi_insn"
> >   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
> > 	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]
> >   "TARGET_ARM
> >    && (   register_operand (operands[0], QImode)
> >
> >        || register_operand (operands[1], QImode))"
> >
> >   "@
> >    mov%?\\t%0, %1
> >    mvn%?\\t%0, #%B1
> >    ldr%?b\\t%0, %1
> >    str%?b\\t%1, %0"
> >   [(set_attr "type" "*,*,load1,store1")
> >    (set_attr "predicable" "yes")]
> > )

I think you should go back to this (i.e. the unmodified version) and only
change the "m" into "Q" in the fourth alternative of operand 0. See if that
works, i.e. generates addresses that are valid for the swp instruction. If
it does, then begin to add other changes.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes -  need help
  2006-06-04 21:01               ` Rask Ingemann Lambertsen
@ 2006-06-05  0:12                 ` Dave Murphy
  2006-06-05 11:31                   ` Wolfgang Mües
  2006-06-05 10:07                 ` Richard Earnshaw
  1 sibling, 1 reply; 54+ messages in thread
From: Dave Murphy @ 2006-06-05  0:12 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: Paul Brook, gcc, Wolfgang Mües

Rask Ingemann Lambertsen wrote:
> There are other targets with targets specific options to work around this or
> that bug, quirk, defect or errata. In this case, why would two options
> -mno-byte-writes and -mbyte-writes, with the latter being the default, be
> unlikely to be acceptable? In particular, the MT port has these two options:
>
> -mbacc
> 	Use byte loads and stores when generating code.
>
> -mno-bacc
> 	Do not use byte loads and stores when generating code.
I was just about to ask about this very thing since I'm quite sure that 
there would be interest in adding this to devkitARM.

 How much work would it be to implement these switches? I assume that 
the toolchain would need multilibs for these options in order to use 
newlib etc.

Dave

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes -  need help
  2006-06-04 21:01               ` Rask Ingemann Lambertsen
  2006-06-05  0:12                 ` Dave Murphy
@ 2006-06-05 10:07                 ` Richard Earnshaw
  2006-06-05 11:47                   ` Wolfgang Mües
  1 sibling, 1 reply; 54+ messages in thread
From: Richard Earnshaw @ 2006-06-05 10:07 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: Paul Brook, gcc, Wolfgang Mües

On Sun, 2006-06-04 at 22:01, Rask Ingemann Lambertsen wrote:
> On Sun, Jun 04, 2006 at 12:24:53PM +0100, Paul Brook wrote:
> 
> > For the record these hacks are unlikely to ever be acceptable in mainline gcc. 
> > They're relatively invasive changes who's only purpose is to support 
> > fundamentally broken hardware.
> 
> We don't yet know if they'll be invasive. There's a good chance that they
> won't be more than a few new insn patterns, a secondary output reload
> to provide a scratch register and a pair of new options.
> 
I'm confident right now that these will be too invasive to include in
mainline.

> There are other targets with targets specific options to work around this or
> that bug, quirk, defect or errata. In this case, why would two options
> -mno-byte-writes and -mbyte-writes, with the latter being the default, be
> unlikely to be acceptable? In particular, the MT port has these two options:

The changes that tend to get incorporated into the compiler are to work
around bugs in the CPU, not bugs in some H/W developer's use of the
CPU.  The former affect all users of the processor, the latter only that
one case.

If we started putting in hacks for the latter the compiler back-ends
would become unmaintainable in almost no time at all.

Sorry, it's just not going to happen.

R.

PS.  Using swp is a bad idea IMO, this instruction is *very* slow on
some CPU implementations because of the way it interacts with caches. 
It certainly isn't suitable as a generic solution to this sort of issue.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-04 15:57                 ` Paul Brook
@ 2006-06-05 10:07                   ` Wolfgang Mües
  2006-06-05 12:25                     ` Paul Brook
  0 siblings, 1 reply; 54+ messages in thread
From: Wolfgang Mües @ 2006-06-05 10:07 UTC (permalink / raw)
  To: gcc, Paul Brook

Paul,

On Sunday 04 June 2006 17:57, Paul Brook wrote:
> Because then you have several different patterns for the same
> operation. The different variants of movsi should be part of the same
> pattern so that the compiler can change its mind which variant it
> wants to use.

Together with the comments of Rask Ingemann (Thanks, Rask!), I 
understand now what you mean.

But regarding the fact that swpb() needs a temporary register - or 
alternative - clobber the input register - how can I model this 
behaviour in a single insn? 

> You're confusing constraints and predicates. general_operand is the
> predicate. The predicate says under which conditions the insn will
> match. The constraints tell regalooc/reload how to make sure the
> operands of the instruction are valid.

Yes, my wording was incorrect. But I know already the difference from 
the manual.

> Tightening the predicates isn't sufficient (and may not even be
> neccessary). You need to set the constraints so that the compiler
> knows *how* to fix invalid instructions.

And if I have 4 different constraints in a single insn, and only one of 
them is needing a temporary register, how do I model this?
This may be the biggest problem. And because byte writes are so common, 
it deserves a good implementation. I can't waste a temporary register 
for each load/store.

> The compilcation is that while constraints give sufficient
> information for the compiler to generate correct code they don't help
> generating good code. There are often non-obvious target specific
> ways of reloading invalid addresses. So reload has additional hooks
> (eg. GO_IF_LEGITIMATE_ADDRESS) to provice clever ways of fixing
> invalid operands.

I will look into this region of code to understand what's going on 
there.

Thanks, Paul.

regards

Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-05  0:12                 ` Dave Murphy
@ 2006-06-05 11:31                   ` Wolfgang Mües
  0 siblings, 0 replies; 54+ messages in thread
From: Wolfgang Mües @ 2006-06-05 11:31 UTC (permalink / raw)
  To: Dave Murphy; +Cc: Rask Ingemann Lambertsen, Paul Brook, gcc

Hello Dave ;-)

On Monday 05 June 2006 02:12, Dave Murphy wrote:
> I was just about to ask about this very thing since I'm quite sure
> that there would be interest in adding this to devkitARM.

You are following the process in dslinux, don't you?

In fact, devkitARM is my current build environment. The first thing that 
will happen is a patch to devkitARM.

>  How much work would it be to implement these switches?

Good question ;-)

>  I assume that the toolchain would need multilibs for these options in
>  order to use newlib etc.

I have not looked into library issues now. Compiler comes first.
We will need an asm macro for 8bit writes to the hardware registers.

And the devkitARM libraries *must* implement writeback caching for the 
GBA slot ROM area.

regards
Wolfgang

-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-05 10:07                 ` Richard Earnshaw
@ 2006-06-05 11:47                   ` Wolfgang Mües
  0 siblings, 0 replies; 54+ messages in thread
From: Wolfgang Mües @ 2006-06-05 11:47 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Rask Ingemann Lambertsen, Paul Brook, gcc

Richard,

On Monday 05 June 2006 12:06, Richard Earnshaw wrote:
> I'm confident right now that these will be too invasive to include in
> mainline.

As said before, this is OK for me. 

> The changes that tend to get incorporated into the compiler are to
> work around bugs in the CPU, not bugs in some H/W developer's use of
> the CPU.  The former affect all users of the processor, the latter
> only that one case.
>
> If we started putting in hacks for the latter the compiler back-ends
> would become unmaintainable in almost no time at all.

Agreed.

> PS.  Using swp is a bad idea IMO, this instruction is *very* slow on
> some CPU implementations because of the way it interacts with caches.

Yes, swp forces a cache load. But in this particular case, forcing a 
cache load is the ONLY way to circumvent the hardware problem.
If there is a block write, cache loads are forced only each 32 byte 
accesses.

Other possible solutions:

a) code a 16bit read-modify-write. This will also cause a cache load,
   and will need much more code, because it will have to look at the
   LSB of the address to know where to insert the byte into the word.

b) use the protection unit and make a data abort for a write to that 
memory region. This has the advantage of affecting ONLY the critical 
memory region (not all the other ones), but the disadvantages are big:
all memory writes are affected, and a data abort handler is very slow. 
This solution was implemented before, it was 100 times slower than 
native access. Unusable.

regards

Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-04 21:36     ` Rask Ingemann Lambertsen
@ 2006-06-05 12:20       ` Wolfgang Mües
  2006-06-05 12:22         ` Richard Earnshaw
                           ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Wolfgang Mües @ 2006-06-05 12:20 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: gcc

Hello,

my first little success...

in arm.h, I have changed

> /* Output the address of an operand.  */
> #define ARM_PRINT_OPERAND_ADDRESS(STREAM, X)			\
> {									\
>     int is_minus = GET_CODE (X) == MINUS;					\
> 									\
>     if (GET_CODE (X) == REG)						\
>       asm_fprintf (STREAM, "[%r, #0]", REGNO (X)); 				\

into

> /* Output the address of an operand.  */
> #define ARM_PRINT_OPERAND_ADDRESS(STREAM, X)			\
> {									\
>     int is_minus = GET_CODE (X) == MINUS;					\
> 									\
>     if (GET_CODE (X) == REG)						\
>       asm_fprintf (STREAM, "[%r]", REGNO (X));				\

I don't know why the form "[%r, #0]" was coded before, because the 
assembler understands "[%r]" very well for all instructions. The form 
"[%r]" has a wider usage because it covers swp too.

On Sunday 04 June 2006 23:36, Rask Ingemann Lambertsen wrote:
> On Wed, May 31, 2006 at 10:49:35PM +0200, Wolfgang Mües wrote:
> > > (define_insn "*arm_movqi_insn"
> > >   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
> > > 	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]

> I think you should go back to this (i.e. the unmodified version) and
> only change the "m" into "Q" in the fourth alternative of operand 0.
> See if that works, i.e. generates addresses that are valid for the
> swp instruction.

No, that doesn't work:

> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c: In function
> __register_frame_info_table_bases':
> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:146: error: insn does not
> satisfy its constraints: (insn 63 28 29 0
> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:136 (set (mem/s/j:QI (plus:SI
> (reg/v/f:SI 1 r1 [orig:102 ob ] [102]) (const_int 16 [0x10])) [0 S1
> A32])
>         (reg:QI 12 ip)) 155 {*arm_movqi_insn} (nil)
>     (nil))
> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:146: internal compiler error:
> in reload_ cse_simplify_operands, at postreload.c:391

Also, I wonder what the "Q" constraint really means:

from the GCC manual:

> Q
> A memory reference where the exact address is in a single register
> (``m'' is preferable for asm statements)

but in arm.h:

> /* For the ARM, `Q' means that this is a memory operand that is just
>    an offset from a register.
> #define EXTRA_CONSTRAINT_STR_ARM(OP, C, STR)			\
>    ((C) == 'Q') ? (GET_CODE (OP) == MEM					\
> 		 && GET_CODE (XEXP (OP, 0)) == REG) :			\

Obviously, GCC tries to implement REG+CONSTANT with Q.

Maybe I must define a new constraint?

regards

Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes -  need help
  2006-06-05 12:20       ` Wolfgang Mües
@ 2006-06-05 12:22         ` Richard Earnshaw
  2006-06-05 15:10         ` Rask Ingemann Lambertsen
  2006-06-06  9:45         ` Richard Sandiford
  2 siblings, 0 replies; 54+ messages in thread
From: Richard Earnshaw @ 2006-06-05 12:22 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: Rask Ingemann Lambertsen, gcc

On Mon, 2006-06-05 at 12:47, Wolfgang Mües wrote:

> > /* Output the address of an operand.  */
> > #define ARM_PRINT_OPERAND_ADDRESS(STREAM, X)			\
> > {									\
> >     int is_minus = GET_CODE (X) == MINUS;					\
> > 									\
> >     if (GET_CODE (X) == REG)						\
> >       asm_fprintf (STREAM, "[%r]", REGNO (X));				\
> 
> I don't know why the form "[%r, #0]" was coded before, because the 
> assembler understands "[%r]" very well for all instructions. The form 
> "[%r]" has a wider usage because it covers swp too.

Historical reasons.  In the past gas would incorrectly assemble '[rn]'
as '[rn], #0' which was unpredictable if rn appeared in the load/store
target.  However, this bug was fixed several years ago now, so it's
probably safe to switch back to the short form.

R.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-05 10:07                   ` Wolfgang Mües
@ 2006-06-05 12:25                     ` Paul Brook
  0 siblings, 0 replies; 54+ messages in thread
From: Paul Brook @ 2006-06-05 12:25 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

> > Tightening the predicates isn't sufficient (and may not even be
> > neccessary). You need to set the constraints so that the compiler
> > knows *how* to fix invalid instructions.
>
> And if I have 4 different constraints in a single insn, and only one of
> them is needing a temporary register, how do I model this?
> This may be the biggest problem. And because byte writes are so common,
> it deserves a good implementation. I can't waste a temporary register
> for each load/store.

You use "X" constraints on the scratch for the other alternatives.
Not that this means your "*arm_movqi" pattern now has three operands, so 
doesn't mathc the named "movqi" pattern. You'll probably have to make the 
movqi expander allocate a pseudo for the scratch, and also implement 
reload_inq/reload_outqi for the cases where no new pseudos are allowed. As I 
mentioned before the Alpha backend is probably a good reference for these.

Paul

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-05 12:20       ` Wolfgang Mües
  2006-06-05 12:22         ` Richard Earnshaw
@ 2006-06-05 15:10         ` Rask Ingemann Lambertsen
  2006-06-06  6:11           ` Wolfgang Mües
  2006-06-06  9:45         ` Richard Sandiford
  2 siblings, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-05 15:10 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc, gcc-patches

On Mon, Jun 05, 2006 at 01:47:10PM +0200, Wolfgang Mües wrote:

> I don't know why the form "[%r, #0]" was coded before, because the 
> assembler understands "[%r]" very well for all instructions. The form 
> "[%r]" has a wider usage because it covers swp too.

Does GCC happen to accept "[%r, #0]" for swp?

> Also, I wonder what the "Q" constraint really means:
> 
> from the GCC manual:
> 
> > Q
> > A memory reference where the exact address is in a single register
> > (``m'' is preferable for asm statements)
> 
> but in arm.h:
> 
> > /* For the ARM, `Q' means that this is a memory operand that is just
> >    an offset from a register.
> > #define EXTRA_CONSTRAINT_STR_ARM(OP, C, STR)			\
> >    ((C) == 'Q') ? (GET_CODE (OP) == MEM					\
> > 		 && GET_CODE (XEXP (OP, 0)) == REG) :			\

I think the comment in arm.h is wrong. The manual seems to agree with the
code.

> Obviously, GCC tries to implement REG+CONSTANT with Q.
> 
> Maybe I must define a new constraint?

I tried 'V' instead, but it looks as if reload completely ignores the
meaning of the constraint. There is already a comment in arm.md about that.
It should be investigated further.

Meanwhile, I changed arm_legitimate_address_p() to enforce the correct
address form. This hurts byte loads too, though.

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 114119)
+++ gcc/config/arm/arm.c	(working copy)
@@ -3509,6 +3509,9 @@
   if (arm_address_register_rtx_p (x, strict_p))
     return 1;
 
+  if (TARGET_ARM && TARGET_SWP_BYTE_WRITES && mode == QImode && outer == SET)
+    return 0;
+
   use_ldrd = (TARGET_LDRD
 	      && (mode == DImode
 		  || (mode == DFmode && (TARGET_SOFT_FLOAT || TARGET_VFP))));

Index: gcc/config/arm/arm.opt
===================================================================
--- gcc/config/arm/arm.opt	(revision 114119)
+++ gcc/config/arm/arm.opt	(working copy)
@@ -153,3 +153,7 @@
 mwords-little-endian
 Target Report RejectNegative Mask(LITTLE_WORDS)
 Assume big endian bytes, little endian words
+
+mswp-byte-writes
+Target Report Mask(SWP_BYTE_WRITES)
+Use the swp instruction for byte writes

Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 114119)
+++ gcc/config/arm/arm.md	(working copy)
@@ -5158,7 +5158,7 @@
 (define_insn "*arm_movqi_insn"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
 	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]
-  "TARGET_ARM
+  "TARGET_ARM && !TARGET_SWP_BYTE_WRITES
    && (   register_operand (operands[0], QImode)
        || register_operand (operands[1], QImode))"
   "@
@@ -5170,6 +5170,44 @@
    (set_attr "predicable" "yes")]
 )
 
+; This is for the Nintendo DS external RAM.
+(define_insn "*arm_movqi_insn_swp"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,Q")
+	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]
+  "TARGET_ARM && TARGET_SWP_BYTE_WRITES
+   && (   register_operand (operands[0], QImode)
+       || register_operand (operands[1], QImode))"
+  "@
+   mov%?\\t%0, %1
+   mvn%?\\t%0, #%B1
+   ldr%?b\\t%0, %1
+   swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0"
+  [(set_attr "type" "*,*,load1,store1")
+   (set_attr "predicable" "yes")]
+)
+
+(define_insn "*arm_movqi_insn_swp_clobber"
+  [(set (match_operand:QI 0 "memory_operand" "=Q")
+        (match_operand:QI 1 "register_operand" "r"))
+   (clobber (match_operand:QI 2 "register_operand" "=r"))]
+  "TARGET_ARM && TARGET_SWP_BYTE_WRITES"
+  "swp%?b\\t%2, %1, %0"
+  [(set_attr "type" "store1")
+   (set_attr "predicable" "yes")]
+)
+
+; Avoid reading the stored value back if we have a spare register.
+(define_peephole2
+  [(match_scratch:QI 2 "r")
+   (set (match_operand:QI 0 "memory_operand" "")
+        (match_operand:QI 1 "register_operand" ""))]
+  "TARGET_ARM && TARGET_SWP_BYTE_WRITES"
+  [(parallel [
+    (set (match_dup 0) (match_dup 1))
+    (clobber (match_dup 2))]
+  )]
+)
+
 (define_insn "*thumb_movqi_insn"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=l,l,m,*r,*h,l")
 	(match_operand:QI 1 "general_operand"      "l, m,l,*h,*r,I"))]

This seems to work as intended on a small test case:

struct foobar
{
	int  i1;
	char c1;
	char c2;
	int  i2;
};

void bytewritetest (struct foobar *x)
{
	x->i2 = x->i1;
	x->i1 = x->c1 + x->c2;
	x->c2 ^= x->c1;
}

With just -O2:

bytewritetest:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	ldrb	r3, [r0, #5]	@ zero_extendqisi2
	ldrb	r2, [r0, #4]	@ zero_extendqisi2
	ldr	ip, [r0, #0]
	eor	r1, r3, r2
	add	r3, r3, r2
	@ lr needed for prologue
	strb	r1, [r0, #5]
	str	ip, [r0, #8]
	str	r3, [r0, #0]
	bx	lr

With -O2 -mswp-byte-writes:

bytewritetest:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	str	lr, [sp, #-4]!
	add	r2, r0, #4
	add	lr, r0, #5
	ldrb	r3, [lr, #0]	@ zero_extendqisi2
	ldrb	r1, [r2, #0]	@ zero_extendqisi2
	eor	r2, r1, r3
	add	r3, r3, r1
	ldr	ip, [r0, #0]
	str	r3, [r0, #0]
	swpb	r3, r2, [lr, #0]
	str	ip, [r0, #8]
	ldr	pc, [sp], #4


The register allocator chooses to use the lr register, in turn causing link
register save alimination to fail, which doesn't help.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-05 15:10         ` Rask Ingemann Lambertsen
@ 2006-06-06  6:11           ` Wolfgang Mües
  2006-06-06 19:37             ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 54+ messages in thread
From: Wolfgang Mües @ 2006-06-06  6:11 UTC (permalink / raw)
  To: gcc; +Cc: Rask Ingemann Lambertsen, gcc-patches

Rask,

On Monday 05 June 2006 16:16, Rask Ingemann Lambertsen wrote:
> On Mon, Jun 05, 2006 at 01:47:10PM +0200, Wolfgang Mües wrote:
> Does GCC happen to accept "[%r, #0]" for swp?

No. But no problem here to change that.

> I think the comment in arm.h is wrong. The manual seems to agree with
> the code.

Just to make it easy for beginners...

> I tried 'V' instead, but it looks as if reload completely ignores the
> meaning of the constraint. There is already a comment in arm.md about
> that. It should be investigated further.

Hmmm... I have searched 'Q' in the arm files. Not used in arm.md, only 
for some variants of arm (cirrus). Maybe only implemented for them?

> Meanwhile, I changed arm_legitimate_address_p() to enforce the
> correct address form. This hurts byte loads too, though.

I assume there is no way to tell the direction in 
arm_legitimate_address_p() ? Hmmm.

> Index: gcc/config/arm/arm.opt
> ===================================================================
> --- gcc/config/arm/arm.opt	(revision 114119)
> +++ gcc/config/arm/arm.opt	(working copy)
> @@ -153,3 +153,7 @@
>  mwords-little-endian
>  Target Report RejectNegative Mask(LITTLE_WORDS)
>  Assume big endian bytes, little endian words
> +
> +mswp-byte-writes
> +Target Report Mask(SWP_BYTE_WRITES)
> +Use the swp instruction for byte writes

In my environment (gcc 4.0.2), this is different. But I was able to find 
the definitions in arm.h and implement these changes. Easyer than 
expected...

(The DSLINUX team is not using gcc 4.1 because of compile problems with 
the 2.6.14er kernel).

> +   swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0"

You should get a price for cleverness here!

> +; Avoid reading the stored value back if we have a spare register.
> +(define_peephole2
> +  [(match_scratch:QI 2 "r")
> +   (set (match_operand:QI 0 "memory_operand" "")
> +        (match_operand:QI 1 "register_operand" ""))]
> +  "TARGET_ARM && TARGET_SWP_BYTE_WRITES"
> +  [(parallel [
> +    (set (match_dup 0) (match_dup 1))
> +    (clobber (match_dup 2))]
> +  )]
> +)

As far as I can tell now, this works good. But I think there are many 
cases in which the source operand is not needed after the store. Is 
there a possibility to clobber the source operand and not using another 
register?

Hmmm. Most of the code I have seen in the first tests have no problem 
with this extra register...it's available.

> With -O2 -mswp-byte-writes:
>
> bytewritetest:
> 	@ args = 0, pretend = 0, frame = 0
> 	@ frame_needed = 0, uses_anonymous_args = 0
> 	str	lr, [sp, #-4]!
> 	add	r2, r0, #4
> 	add	lr, r0, #5
> 	ldrb	r3, [lr, #0]	@ zero_extendqisi2
> 	ldrb	r1, [r2, #0]	@ zero_extendqisi2
> 	eor	r2, r1, r3
> 	add	r3, r3, r1
> 	ldr	ip, [r0, #0]
> 	str	r3, [r0, #0]
> 	swpb	r3, r2, [lr, #0]
> 	str	ip, [r0, #8]
> 	ldr	pc, [sp], #4
>
>
> The register allocator chooses to use the lr register, in turn
> causing link register save alimination to fail, which doesn't help.

I can't understand this without explanation... is it bad?

Rask, thank you very much for your work.

regards
Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-05 12:20       ` Wolfgang Mües
  2006-06-05 12:22         ` Richard Earnshaw
  2006-06-05 15:10         ` Rask Ingemann Lambertsen
@ 2006-06-06  9:45         ` Richard Sandiford
  2006-06-06 18:49           ` Rask Ingemann Lambertsen
  2 siblings, 1 reply; 54+ messages in thread
From: Richard Sandiford @ 2006-06-06  9:45 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: Rask Ingemann Lambertsen, gcc

Wolfgang Mües <wolfgang@iksw-muees.de> writes:
> On Sunday 04 June 2006 23:36, Rask Ingemann Lambertsen wrote:
>> On Wed, May 31, 2006 at 10:49:35PM +0200, Wolfgang Mües wrote:
>> > > (define_insn "*arm_movqi_insn"
>> > >   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
>> > > 	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]
>
>> I think you should go back to this (i.e. the unmodified version) and
>> only change the "m" into "Q" in the fourth alternative of operand 0.
>> See if that works, i.e. generates addresses that are valid for the
>> swp instruction.
>
> No, that doesn't work:
>
>> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c: In function
>> __register_frame_info_table_bases':
>> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:146: error: insn does not
>> satisfy its constraints: (insn 63 28 29 0
>> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:136 (set (mem/s/j:QI (plus:SI
>> (reg/v/f:SI 1 r1 [orig:102 ob ] [102]) (const_int 16 [0x10])) [0 S1
>> A32])
>>         (reg:QI 12 ip)) 155 {*arm_movqi_insn} (nil)
>>     (nil))
>> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:146: internal compiler error:
>> in reload_ cse_simplify_operands, at postreload.c:391

This is just a guess, but the insn above might be an output reload.
Reload instructions are not themselves reloaded.  In other words,
if gcc needs to reload a QImode register out to a memory location,
it will assume that any m<-r move is OK; it will not restrict the
reload to Q<-r.  This is by design.  You can:

  (1) Trap this in the movqi expander, if you can fix up the general
      case without clobbering additional registers (unlikely).
  (2) Define a reload_outqi pattern to handle general m<-r moves.
      You then get a scratch register to play with.  (This interface
      has changed (and improved) since I last used it.)
  (3) Modify reload so that output memory operands are legitimised
      differently (only if you're brave).
  (4) Restrict QImode addresses to 'Q'.

It looks downthread like you've already decided to do (4).  I just
wasn't sure from the thread whether you realised that output reloads
might be a specific problem.  Sorry in advance if this was just noise. ;)

Richard

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-06  9:45         ` Richard Sandiford
@ 2006-06-06 18:49           ` Rask Ingemann Lambertsen
  2006-06-08 18:22             ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-06 18:49 UTC (permalink / raw)
  To: Wolfgang Mües, gcc, richard

On Tue, Jun 06, 2006 at 10:39:46AM +0100, Richard Sandiford wrote:
> Wolfgang Mües <wolfgang@iksw-muees.de> writes:
> >> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c: In function
> >> __register_frame_info_table_bases':

> >> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:146: error: insn does not
> >> satisfy its constraints:
[cut]
> >> ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:146: internal compiler error:
> >> in reload_ cse_simplify_operands, at postreload.c:391
> 
> This is just a guess, but the insn above might be an output reload.

It is, in a peculiar (and not useful) way. Diffing the greg dump against
the lreg dump shows (using the example code I posted):

+(insn:HI 25 17 38 2 (set (reg:QI 3 r3)
+        (reg:QI 3 r3 [110])) 158 {*arm_movqi_insn_swp} (nil)
+    (nil))

-(insn:HI 25 17 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 101 [ x ])
+(insn 38 25 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 0 r0 [orig:101 x ]
+ [101])
                 (const_int 5 [0x5])) [0 <variable>.c2+0 S1 A8])
-        (subreg:QI (reg:SI 110) 0)) 158 {*arm_movqi_insn_swp} (nil)
-    (expr_list:REG_DEAD (reg:SI 110)
-        (expr_list:REG_DEAD (reg/v/f:SI 101 [ x ])
-            (nil))))
+        (reg:QI 3 r3)) 158 {*arm_movqi_insn_swp} (nil)
+    (nil))

I.e. change insn 25 to a nop and then add insn 38 as essentially a duplicate
of the original insn 25. I don't think reload was supposed to do that. The
documentation as well as comments in reload.c suggest that the address would
be loaded into a register. I'm running the code in a debugger to find out
why it doesn't happen.

> Reload instructions are not themselves reloaded.  In other words,
> if gcc needs to reload a QImode register out to a memory location,
> it will assume that any m<-r move is OK; it will not restrict the
> reload to Q<-r.  This is by design.  You can:
> 
>   (1) Trap this in the movqi expander, if you can fix up the general
>       case without clobbering additional registers (unlikely).
>   (2) Define a reload_outqi pattern to handle general m<-r moves.
>       You then get a scratch register to play with.  (This interface
>       has changed (and improved) since I last used it.)

This option stands a reasonable chance of being the end result. It just adds
more hair than I'd like to. It is one more thing that could go wrong, and so
on.

>   (3) Modify reload so that output memory operands are legitimised
>       differently (only if you're brave).
>   (4) Restrict QImode addresses to 'Q'.
> 
> It looks downthread like you've already decided to do (4).

Only as a temporary measure.

> I just
> wasn't sure from the thread whether you realised that output reloads
> might be a specific problem.

I was not aware of this. It is only the second time I've seen postreload
complain about unsatisfied constraints. Thanks for pointing out this
problem.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-06  6:11           ` Wolfgang Mües
@ 2006-06-06 19:37             ` Rask Ingemann Lambertsen
  2006-06-07  5:35               ` Wolfgang Mües
  0 siblings, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-06 19:37 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Tue, Jun 06, 2006 at 07:42:20AM +0200, Wolfgang Mües wrote:
> Rask,
> 
> On Monday 05 June 2006 16:16, Rask Ingemann Lambertsen wrote:

> > I think the comment in arm.h is wrong. The manual seems to agree with
> > the code.
> 
> Just to make it easy for beginners...

In mainline GCC, it is defined like this in arm/constraints.md:

(define_memory_constraint "Q"
 "@internal
  In ARM state an address that is a single base register."
 (and (match_code "mem")
      (match_test "REG_P (XEXP (op, 0))")))

> Hmmm... I have searched 'Q' in the arm files. Not used in arm.md, only 
> for some variants of arm (cirrus). Maybe only implemented for them?
> 
> I assume there is no way to tell the direction in 
> arm_legitimate_address_p() ? Hmmm.

There isn't. arm_legitimate_address_p() just implements the macro
GO_IF_LEGITIMATE_ADDRESS(MODE, X, LABEL). The only trick I know here is to
use a different mode for special addresses. I'm writing an Intel 8086
backend which uses

#define FUNCTION_MODE PQImode

to let GO_IF_LEGITIMATE_ADDRESS handle function addresses specially. I just
can't think of a way of using such a trick in this case.

> > +   swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0"
> 
> You should get a price for cleverness here!

Thanks! Indeed it looks good until you think of volatile variables.
 
> > +; Avoid reading the stored value back if we have a spare register.
> > +(define_peephole2
> > +  [(match_scratch:QI 2 "r")
[snip]
> 
> As far as I can tell now, this works good. But I think there are many 
> cases in which the source operand is not needed after the store. Is 
> there a possibility to clobber the source operand and not using another 
> register?

I don't know if (match_scratch ...) might reuse the source operand. It can
be attempted more specifically with an additional peephole definition:

(define_peephole2
  [(set (match_operand:QI 0 "memory_operand" "")
        (match_operand:QI 1 "register_operand" ""))]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES && peep2_reg_dead_p (1, operands[1])"
  [(parallel
    [(set (match_dup 0) (match_dup 1))
     (clobber (match_dup 1))]
  )]
)

Yet another register which stands a good chance of being reusable is the
register containing the address. This can be covered as well (assuming the
match_scratch version doesn't do this):

(define_peephole2
  [(set (mem:QI (match_operand 0 "pmode_register_operand" ""))
        (match_operand:QI 1 "register_operand" ""))]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES && peep2_reg_dead_p (1, operands[0])"
  [(parallel
    [(set (mem:QI (match_dup 0)) (match_dup 1))
     (clobber (match_dup 0))]
  )]
)

I haven't tested these two peephole definitions. I can't think of any
preferred order. It'll be your call, I guess.

> > The register allocator chooses to use the lr register, in turn
> > causing link register save alimination to fail, which doesn't help.
> 
> I can't understand this without explanation... is it bad?

GCC now needs more registers to hold addresses. The increased number of
registers used disable som ARM specific optimizations of the function
prologue and epilogue. This is bad, but only because the code becomes
larger and slower. But I think that bytewritetest() suffers relatively
much because it is a small function, using only a few registers to begin
with.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-06 19:37             ` Rask Ingemann Lambertsen
@ 2006-06-07  5:35               ` Wolfgang Mües
  2006-06-07  9:38                 ` Richard Earnshaw
  2006-07-20 15:14                 ` Rask Ingemann Lambertsen
  0 siblings, 2 replies; 54+ messages in thread
From: Wolfgang Mües @ 2006-06-07  5:35 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: gcc

Rask,

On Tuesday 06 June 2006 21:33, Rask Ingemann Lambertsen wrote:
> > > +   swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0"
> >
> > You should get a price for cleverness here!
>
> Thanks! Indeed it looks good until you think of volatile variables.

Because volatile variables can change their values from another thread, 
and the readback will be false. Oh.

gcc knows the volatile attribute here, I assume?

> > As far as I can tell now, this works good. But I think there are
> > many cases in which the source operand is not needed after the
> > store. Is there a possibility to clobber the source operand and not
> > using another register?
>
> I don't know if (match_scratch ...) might reuse the source operand.
> It can be attempted more specifically with an additional peephole
> definition:
>
> (define_peephole2
>   [(set (match_operand:QI 0 "memory_operand" "")
>         (match_operand:QI 1 "register_operand" ""))]
>   "TARGET_ARM && TARGET_SWP_BYTE_WRITES && peep2_reg_dead_p (1,
> operands[1])" [(parallel
>     [(set (match_dup 0) (match_dup 1))
>      (clobber (match_dup 1))]
>   )]
> )

I will try this.

> Yet another register which stands a good chance of being reusable is
> the register containing the address.

Yes, but that is not allowed according to the specification of the swp 
instruction. The address register must be different from the other 2 
registers. Is there any chance of gcc violating this constraint? 

regards
Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes -  need help
  2006-06-07  5:35               ` Wolfgang Mües
@ 2006-06-07  9:38                 ` Richard Earnshaw
  2006-07-20 15:14                 ` Rask Ingemann Lambertsen
  1 sibling, 0 replies; 54+ messages in thread
From: Richard Earnshaw @ 2006-06-07  9:38 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: Rask Ingemann Lambertsen, gcc

On Wed, 2006-06-07 at 06:22, Wolfgang Mües wrote:
> Rask,
> 
> On Tuesday 06 June 2006 21:33, Rask Ingemann Lambertsen wrote:
> > > > +   swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0"
> > >
> > > You should get a price for cleverness here!
> >
> > Thanks! Indeed it looks good until you think of volatile variables.
> 
> Because volatile variables can change their values from another thread, 
> and the readback will be false. Oh.
> 

It's quite simple.  Declaring a volatile char will be unpredictable in
your compilation environment (in the same way as volatile short was on
pre-v4 ARM devices).

However, for some uses (eg device drivers) you could explicitly use an
asm that did a ldrb/strb sequence (if some parts of your memory system
support that.

R.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-06 18:49           ` Rask Ingemann Lambertsen
@ 2006-06-08 18:22             ` Rask Ingemann Lambertsen
  2006-06-08 21:11               ` Wolfgang Mües
  0 siblings, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-08 18:22 UTC (permalink / raw)
  To: Wolfgang Mües, gcc, richard

On Tue, Jun 06, 2006 at 08:27:10PM +0200, Rask Ingemann Lambertsen wrote:
> On Tue, Jun 06, 2006 at 10:39:46AM +0100, Richard Sandiford wrote:

> > This is just a guess, but the insn above might be an output reload.
> 
> It is, in a peculiar (and not useful) way. Diffing the greg dump against
> the lreg dump shows (using the example code I posted):
> 
> +(insn:HI 25 17 38 2 (set (reg:QI 3 r3)
> +        (reg:QI 3 r3 [110])) 158 {*arm_movqi_insn_swp} (nil)
> +    (nil))
> 
> -(insn:HI 25 17 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 101 [ x ])
> +(insn 38 25 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 0 r0 [orig:101 x ]
> + [101])
>                  (const_int 5 [0x5])) [0 <variable>.c2+0 S1 A8])
> -        (subreg:QI (reg:SI 110) 0)) 158 {*arm_movqi_insn_swp} (nil)
> -    (expr_list:REG_DEAD (reg:SI 110)
> -        (expr_list:REG_DEAD (reg/v/f:SI 101 [ x ])
> -            (nil))))
> +        (reg:QI 3 r3)) 158 {*arm_movqi_insn_swp} (nil)
> +    (nil))
> 
> I.e. change insn 25 to a nop and then add insn 38 as essentially a duplicate
> of the original insn 25. I don't think reload was supposed to do that.

The reason this happens is because reload decides that the cheapest
alternative is the first one, which is reg->reg. Then of course it has to
make an output reload to copy the value to memory...

I have not yet looked into why reload thinks the reg->reg alternative is the
cheapest one. That'll be tomorrow at the earliest.

Meanwhile, to ensure that reload decides upon the alternative we want,
delete the *arm_movqi_insn_swp pattern and use these two instead:

(define_insn "*arm_movqi_insn_to_reg"
  [(set (match_operand:QI 0 "register_operand" "=r,r,r")
        (match_operand:QI 1 "general_operand" "rI,K,m"))]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES
   && (   register_operand (operands[0], QImode)
       || register_operand (operands[1], QImode))"
  "@
   mov%?\\t%0, %1
   mvn%?\\t%0, #%B1
   ldr%?b\\t%0, %1"
  [(set_attr "type" "*,*,load1")
   (set_attr "predicable" "yes")]
)

(define_insn "*arm_movqi_insn_swp"
  [(set (match_operand:QI 0 "memory_operand" "=Q")
        (match_operand:QI 1 "nonmemory_operand" "r"))]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES
   && (   register_operand (operands[0], QImode)
       || register_operand (operands[1], QImode))"
  "swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0"
  [(set_attr "type" "store1")
   (set_attr "predicable" "yes")]
)

Also, undo the change to arm_legitimate_address_p() arm.c.

Btw, why are there calls to register_operand() in the insn conditions?

With the changes above, I now get this (-O2 -swp-byte-writes):

bytewritetest:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	ldrb	r3, [r0, #5]	@ zero_extendqisi2
	ldrb	r2, [r0, #4]	@ zero_extendqisi2
	eor	r1, r3, r2
	add	r3, r3, r2
	add	r2, r0, #5
	ldr	ip, [r0, #0]
	swpb	r1, r1, [r2, #0]
	@ lr needed for prologue
	str	ip, [r0, #8]
	str	r3, [r0, #0]
	bx	lr

That's a lot better. It is now only one instruction longer than without
-swp-byte-writes.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-08 18:22             ` Rask Ingemann Lambertsen
@ 2006-06-08 21:11               ` Wolfgang Mües
  2006-06-09 15:08                 ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 54+ messages in thread
From: Wolfgang Mües @ 2006-06-08 21:11 UTC (permalink / raw)
  To: gcc; +Cc: Rask Ingemann Lambertsen, richard

Rask,

On Thursday 08 June 2006 20:12, Rask Ingemann Lambertsen wrote:

> Also, undo the change to arm_legitimate_address_p() arm.c.

Hmmm....

> arm-elf-gcc -g -mswp-byte-writes -Wall -O2 -fomit-frame-pointer
> -ffast-math -mthumb-interwork -isystem
> /usr/lib/devkitpro/libnds/include -mcpu=arm9tdmi -mtune=arm9tdmi
> -DARM9 -S arm9_main.c -o arm9_main.S arm9_main.c: In function 'test':
> arm9_main.c:20: error: unable to generate reloads for:
> (insn:HI 20 21 22 1 arm9_main.c:16 (set (mem/v:QI (post_inc:SI
> (reg/v/f:SI 3 r3 [orig:102 p ] [102])) [0 S1 A8]) (subreg/s/u:QI
> (reg:SI 2 r2 [orig:103 c.36 ] [103]) 0)) 157 {*arm_movqi_insn_swp}
> (nil) (expr_list:REG_INC (reg/v/f:SI 3 r3 [orig:102 p ] [102])
> (nil)))
> arm9_main.c:20: internal compiler error: in find_reloads, at
> reload.c:3720

void test(void)
{
	static unsigned char c = 20;
	volatile unsigned char * p;
	int i;

	p = (volatile unsigned char *) 0x08000000;
	for (i = 0; i < 1000; i++)
		*p++ = c;

	c = 40;
	c = c;
}

Without the change in arm_legitimate_address_p, we get post increment 
pointer into swpb. The non-working 'Q' constraint....

regards
Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-08 21:11               ` Wolfgang Mües
@ 2006-06-09 15:08                 ` Rask Ingemann Lambertsen
  0 siblings, 0 replies; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-09 15:08 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc, richard

On Thu, Jun 08, 2006 at 11:02:12PM +0200, Wolfgang Mües wrote:
> Rask,
> 
> > arm-elf-gcc -g -mswp-byte-writes -Wall -O2 -fomit-frame-pointer
> > -ffast-math -mthumb-interwork -isystem
> > /usr/lib/devkitpro/libnds/include -mcpu=arm9tdmi -mtune=arm9tdmi
> > -DARM9 -S arm9_main.c -o arm9_main.S arm9_main.c: In function 'test':
> > arm9_main.c:20: error: unable to generate reloads for:
> > (insn:HI 20 21 22 1 arm9_main.c:16 (set (mem/v:QI (post_inc:SI
> > (reg/v/f:SI 3 r3 [orig:102 p ] [102])) [0 S1 A8]) (subreg/s/u:QI
> > (reg:SI 2 r2 [orig:103 c.36 ] [103]) 0)) 157 {*arm_movqi_insn_swp}
> > (nil) (expr_list:REG_INC (reg/v/f:SI 3 r3 [orig:102 p ] [102])
> > (nil)))
> > arm9_main.c:20: internal compiler error: in find_reloads, at
> > reload.c:3720
> 
> void test(void)
[...]
> 
> Without the change in arm_legitimate_address_p, we get post increment 
> pointer into swpb. The non-working 'Q' constraint....

The constraint works, but reload doesn't figure out how to fix up the
operand. It does compile for me because reload fixes it:

Reloads for insn # 17
Reload 0: reload_in (SI) = (post_inc:SI (reg/v/f:SI 2 r2 [orig:102 p ] [102]))
        reload_out (VOID) = (post_inc:SI (reg/v/f:SI 2 r2 [orig:102 p ] [102]))
        GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 0), inc by 1
        reload_in_reg: (post_inc:SI (reg/v/f:SI 2 r2 [orig:102 p ] [102]))
        reload_reg_rtx: (reg:SI 3 r3)

test:
	@ Function supports interworking.
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	ldr	r0, .L8
	@ lr needed for prologue
	ldrb	r1, [r0, #0]	@ zero_extendqisi2
	mov	r2, #134217728
.L2:
	mov	r3, r2
	swpb	ip, r1, [r3, #0]
	ldr	r3, .L8+4
	add	r2, r2, #1
	cmp	r2, r3
	bne	.L2
	mov	r3, #40
	swpb	r2, r3, [r0, #0]
	bx	lr

Reload probably got smarter since the version you use. The post_inc address
first appears during the life1 pass. It should be possible to prevent it
from appearing with an appropriate predicate. Try adding this:

; Match (mem (reg ...)), just like the Q constraint.
(define_predicate "Q_memory_operand"
  (and (match_code "mem")
       (match_code "reg" "0"))
)

Then change the *swp* patterns to use Q_memory_operand instead of
memory_operand. I then get

test:
	@ Function supports interworking.
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	ldr	r0, .L8
	@ lr needed for prologue
	ldrb	r1, [r0, #0]	@ zero_extendqisi2
	mov	r2, #134217728
.L2:
	ldr	r3, .L8+4
	swpb	ip, r1, [r2, #0]
	add	r2, r2, #1
	cmp	r2, r3
	bne	.L2
	mov	r3, #40
	swpb	r2, r3, [r0, #0]
	bx	lr

which is an improvement.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-02  7:24         ` Rask Ingemann Lambertsen
  2006-06-04 10:31           ` Wolfgang Mües
@ 2006-06-17  8:52           ` Rask Ingemann Lambertsen
  2006-07-19  7:49             ` Wolfgang Mües
  1 sibling, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-06-17  8:52 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc, gcc-patches

On Fri, Jun 02, 2006 at 09:24:17AM +0200, Rask Ingemann Lambertsen wrote:

> The rest of the ARM backend presently assumes that the pattern has the form
> 
> (set (operand:QI 0) (operand:QI 1))
> 
> but now we've changed it to
> 
> (parallel [(set (operand:QI 0) (operand:QI 1))
> 	   (clobber (operand:QI 2))
> ])
> 
> so that's why you get "unrecognizable insn" errors now. Any place which
> intended to generate an *arm_movqi_insn has to add a clobber also. For a
> start, this means the "movqi" pattern.

I've now implemented it. This brings a small improvement to the code
generated for bytewritetest:

bytewritetest:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	ldrb	r3, [r0, #5]	@ zero_extendqisi2
	ldrb	ip, [r0, #4]	@ zero_extendqisi2
	ldr	r2, [r0, #0]
	add	r1, r3, ip
	str	r2, [r0, #8]
	str	r1, [r0], #5	<--
	eor	r3, r3, ip
	swpb	r2, r3, [r0]
	@ lr needed for prologue
	bx	lr

Exactly the same number of instructions as without -mswp-byte-writes because
of postincrement. Basicly, it pays off to get the insn expanded correctly to
begin with, rather than leaving it to reload to fix it up later. This should
work fine with volatile variables because there is no need to read back from
memory. The peephole optimizations are gone for the same reason. I do wonder
if the ability to reuse the input register as a scratch register has been
preserved, though.

Compiling unwind-dw2-fde.c, I noticed that the code produced for
__register_frame_info_table_bases() differs more than expected:

__register_frame_info_table_bases:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
 1	stmfd	sp!, {r4, lr}
 2	mov	lr, #0
 3	str	lr, [r1, #16]
 4	ldrb	ip, [r1, #16]	@ zero_extendqisi2
 5	orr	ip, ip, #2
 6	strb	ip, [r1, #16]
 7	ldr	r4, .L28
 8	ldrh	ip, [r1, #16]
 9	ldr	lr, [r4, #0]
10	orr	ip, ip, #2032
11	str	r0, [r1, #12]
12	orr	ip, ip, #8
13	mvn	r0, #0
14	strh	ip, [r1, #16]	@ movhi
15	str	lr, [r1, #20]
16	str	r0, [r1, #0]
17	str	r1, [r4, #0]
18	stmib	r1, {r2, r3}	@ phole stm
19	ldmfd	sp!, {r4, pc}

vs.

__register_frame_info_table_bases:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
 2	mov	ip, #0
 3	str	ip, [r1, #16]
 1	str	lr, [sp, #-4]!
 4	ldrb	lr, [r1, #16]	@ zero_extendqisi2
11	str	r0, [r1, #12]
 5	orr	lr, lr, #2
13	mvn	r0, #0
 6a	add	ip, r1, #16
16+18?	stmia	r1, {r0, r2, r3}	@ phole stm
 6b	swpb	r3, lr, [ip]
 7	ldr	r0, .L28
 8	ldrh	r3, [r1, #16]
 9	ldr	r2, [r0, #0]
10	orr	r3, r3, #2032
12	orr	r3, r3, #8
14	strh	r3, [r1, #16]	@ movhi
15	str	r2, [r1, #20]
17	str	r1, [r0, #0]
19	ldr	pc, [sp], #4

But the swp version seems to be equivalent, doesn't it?

I'm not sure that the reload_outqi expander will correctly handle
cases where reload spills a register to memory. If the memory address
doesn't have the right form, it becomes more complicated.

Index: gcc/config/arm/arm.h
===================================================================
--- gcc/config/arm/arm.h	(revision 114119)
+++ gcc/config/arm/arm.h	(working copy)
@@ -1094,6 +1094,8 @@
    ? vfp_secondary_reload_class (MODE, X)			\
    : TARGET_ARM							\
    ? (((MODE) == HImode && ! arm_arch4 && true_regnum (X) == -1) \
+   || ((MODE) == QImode && TARGET_ARM && TARGET_SWP_BYTE_WRITES	\
+       && true_regnum (X) == -1)				\
     ? GENERAL_REGS : NO_REGS)					\
    : THUMB_SECONDARY_OUTPUT_RELOAD_CLASS (CLASS, MODE, X))
 
Index: gcc/config/arm/arm.opt
===================================================================
--- gcc/config/arm/arm.opt	(revision 114119)
+++ gcc/config/arm/arm.opt	(working copy)
@@ -153,3 +153,7 @@
 mwords-little-endian
 Target Report RejectNegative Mask(LITTLE_WORDS)
 Assume big endian bytes, little endian words
+
+mswp-byte-writes
+Target Report Mask(SWP_BYTE_WRITES)
+Use the swp instruction for byte writes. The default is to use str
Index: gcc/config/arm/predicates.md
===================================================================
--- gcc/config/arm/predicates.md	(revision 114119)
+++ gcc/config/arm/predicates.md	(working copy)
@@ -125,6 +125,14 @@
 			 || (GET_CODE (op) == REG
 			     && REGNO (op) >= FIRST_PSEUDO_REGISTER)))")))
 
+;; Match register operands or memory operands of the form (mem (reg ...)),
+;; as permitted by the "Q" memory constraint.
+(define_predicate "reg_or_Qmem_operand"
+  (ior (match_operand 0 "register_operand")
+       (and (match_code "mem")
+	    (match_code "reg" "0")))
+)
+
 ;; True for valid operands for the rhs of an floating point insns.
 ;;   Allows regs or certain consts on FPA, just regs for everything else.
 (define_predicate "arm_float_rhs_operand"
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 114119)
+++ gcc/config/arm/arm.md	(working copy)
@@ -5151,6 +5151,16 @@
       emit_insn (gen_movsi (operands[0], operands[1]));
       DONE;
     }
+  if (TARGET_ARM && TARGET_SWP_BYTE_WRITES)
+    {
+      /* Ensure that operands[0] is (mem (reg ...)) if a memory operand. */
+      if (MEM_P (operands[0]) && !REG_P (XEXP (operands[0], 0)))
+	    operands[0]
+	      = replace_equiv_address (operands[0],
+				       copy_to_reg (XEXP (operands[0], 0)));
+      emit_insn (gen__arm_movqi_insn_swp (operands[0], operands[1]));
+      DONE;
+    }
   "
 )
 
@@ -5158,7 +5168,7 @@
 (define_insn "*arm_movqi_insn"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
 	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]
-  "TARGET_ARM
+  "TARGET_ARM && !TARGET_SWP_BYTE_WRITES
    && (   register_operand (operands[0], QImode)
        || register_operand (operands[1], QImode))"
   "@
@@ -5170,6 +5180,31 @@
    (set_attr "predicable" "yes")]
 )
 
+;; This is primarily a hack for the Nintendo DS external RAM.
+(define_insn "_arm_movqi_insn_swp"
+  [(set (match_operand:QI 0 "reg_or_Qmem_operand" "=r,r,r,Q")
+	(match_operand:QI 1 "general_operand" "rI,K,m,r"))
+        (clobber (match_scratch:QI 2 "=X,X,X,r"))]
+  "TARGET_ARM && TARGET_SWP_BYTE_WRITES
+   && (   register_operand (operands[0], QImode)
+       || register_operand (operands[1], QImode))"
+  "@
+   mov%?\\t%0, %1
+   mvn%?\\t%0, #%B1
+   ldr%?b\\t%0, %1
+   swp%?b\\t%2, %1, [%|%m0]"
+  [(set_attr "type" "*,*,load1,store1")
+   (set_attr "predicable" "yes")]
+)
+
+;; The earlyclobber is required by default_secondary_reload() in targhooks.c.
+(define_expand "reload_outqi"
+  [(set (match_operand:QI 0 "memory_operand" "=Q")
+	(match_operand:QI 1 "register_operand" "r"))
+   (clobber (match_operand:QI 2 "register_operand" "=&r"))]
+  "TARGET_ARM && TARGET_SWP_BYTE_WRITES"
+)
+
 (define_insn "*thumb_movqi_insn"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=l,l,m,*r,*h,l")
 	(match_operand:QI 1 "general_operand"      "l, m,l,*h,*r,I"))]


-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-17  8:52           ` Rask Ingemann Lambertsen
@ 2006-07-19  7:49             ` Wolfgang Mües
  2006-07-19 12:09               ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 54+ messages in thread
From: Wolfgang Mües @ 2006-07-19  7:49 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: gcc

Hello,

after getting a "working" version of the gcc 4.0.2 with the Nintendo 
8-bit-write problem, I was busy the last weeks trying to adapt the 
linux system (replacing I/O with writeb() macros, removing strb 
assembler calls).

However, it turned out that the sources of the linux kernel are a far 
more demanding test than every single small test case.

I have tried my very best to implement the last patch from Rask (thank 
you very much!). There was one place I was not shure I have coded the
right solution:

Rasks patch (gcc 4.2.x):

> +;; Match register operands or memory operands of the form (mem (reg
> ...)), +;; as permitted by the "Q" memory constraint.
> +(define_predicate "reg_or_Qmem_operand"
> +  (ior (match_operand 0 "register_operand")
> +       (and (match_code "mem")
> +	    (match_code "reg" "0")))
> +)
> +

My patch (without the second operand for match_code):

> ;; Match register operands or memory operands of the form (mem (reg
> ...)), ;; as permitted by the "Q" memory constraint.
> (define_predicate "reg_or_Qmem_operand"
>   (ior (match_operand 0 "register_operand")
>        (and (match_code "mem")
> 	    (match_test "GET_CODE (XEXP (op, 0)) == REG")))
> )

Is this the right substitution?

If I compile the linux kernel with this patch, many files get compiled 
without problems, but in fs/vfat/namei.c I get:

> fs/vfat/namei.c: In function 'vfat_add_entry':
> fs/vfat/namei.c:694: error: unrecognizable insn:
> (insn 2339 2338 2340 188 (set (mem/s/j:QI (reg:SI 14 lr) [0
> <variable>.attr+0 S1 A8]) (reg:QI 12 ip)) -1 (nil)
>     (nil))
> fs/vfat/namei.c:694: internal compiler error: in extract_insn, at
> recog.c:2020 Please submit a full bug report,

I can't see what is going on here...

regards

Wolfgang


The full patch of Rask is appended below:

> Index: gcc/config/arm/arm.h
> ===================================================================
> --- gcc/config/arm/arm.h	(revision 114119)
> +++ gcc/config/arm/arm.h	(working copy)
> @@ -1094,6 +1094,8 @@
>     ? vfp_secondary_reload_class (MODE, X)			\
>
>     : TARGET_ARM							\
>
>     ? (((MODE) == HImode && ! arm_arch4 && true_regnum (X) == -1) \
> +   || ((MODE) == QImode && TARGET_ARM && TARGET_SWP_BYTE_WRITES	\
> +       && true_regnum (X) == -1)				\
>      ? GENERAL_REGS : NO_REGS)					\
>
>     : THUMB_SECONDARY_OUTPUT_RELOAD_CLASS (CLASS, MODE, X))
>
> Index: gcc/config/arm/arm.opt
> ===================================================================
> --- gcc/config/arm/arm.opt	(revision 114119)
> +++ gcc/config/arm/arm.opt	(working copy)
> @@ -153,3 +153,7 @@
>  mwords-little-endian
>  Target Report RejectNegative Mask(LITTLE_WORDS)
>  Assume big endian bytes, little endian words
> +
> +mswp-byte-writes
> +Target Report Mask(SWP_BYTE_WRITES)
> +Use the swp instruction for byte writes. The default is to use str
> Index: gcc/config/arm/predicates.md
> ===================================================================
> --- gcc/config/arm/predicates.md	(revision 114119)
> +++ gcc/config/arm/predicates.md	(working copy)
> @@ -125,6 +125,14 @@
>
>  			 || (GET_CODE (op) == REG
>
>  			     && REGNO (op) >= FIRST_PSEUDO_REGISTER)))")))
>
> +;; Match register operands or memory operands of the form (mem (reg
> ...)), +;; as permitted by the "Q" memory constraint.
> +(define_predicate "reg_or_Qmem_operand"
> +  (ior (match_operand 0 "register_operand")
> +       (and (match_code "mem")
> +	    (match_code "reg" "0")))
> +)
> +
>  ;; True for valid operands for the rhs of an floating point insns.
>  ;;   Allows regs or certain consts on FPA, just regs for everything
> else. (define_predicate "arm_float_rhs_operand"
> Index: gcc/config/arm/arm.md
> ===================================================================
> --- gcc/config/arm/arm.md	(revision 114119)
> +++ gcc/config/arm/arm.md	(working copy)
> @@ -5151,6 +5151,16 @@
>        emit_insn (gen_movsi (operands[0], operands[1]));
>        DONE;
>      }
> +  if (TARGET_ARM && TARGET_SWP_BYTE_WRITES)
> +    {
> +      /* Ensure that operands[0] is (mem (reg ...)) if a memory
> operand. */ +      if (MEM_P (operands[0]) && !REG_P (XEXP
> (operands[0], 0))) +	    operands[0]
> +	      = replace_equiv_address (operands[0],
> +				       copy_to_reg (XEXP (operands[0], 0)));
> +      emit_insn (gen__arm_movqi_insn_swp (operands[0],
> operands[1])); +      DONE;
> +    }
>    "
>  )
>
> @@ -5158,7 +5168,7 @@
>  (define_insn "*arm_movqi_insn"
>    [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
>  	(match_operand:QI 1 "general_operand" "rI,K,m,r"))]
> -  "TARGET_ARM
> +  "TARGET_ARM && !TARGET_SWP_BYTE_WRITES
>     && (   register_operand (operands[0], QImode)
>
>         || register_operand (operands[1], QImode))"
>
>    "@
> @@ -5170,6 +5180,31 @@
>     (set_attr "predicable" "yes")]
>  )
>
> +;; This is primarily a hack for the Nintendo DS external RAM.
> +(define_insn "_arm_movqi_insn_swp"
> +  [(set (match_operand:QI 0 "reg_or_Qmem_operand" "=r,r,r,Q")
> +	(match_operand:QI 1 "general_operand" "rI,K,m,r"))
> +        (clobber (match_scratch:QI 2 "=X,X,X,r"))]
> +  "TARGET_ARM && TARGET_SWP_BYTE_WRITES
> +   && (   register_operand (operands[0], QImode)
> +       || register_operand (operands[1], QImode))"
> +  "@
> +   mov%?\\t%0, %1
> +   mvn%?\\t%0, #%B1
> +   ldr%?b\\t%0, %1
> +   swp%?b\\t%2, %1, [%|%m0]"
> +  [(set_attr "type" "*,*,load1,store1")
> +   (set_attr "predicable" "yes")]
> +)
> +
> +;; The earlyclobber is required by default_secondary_reload() in
> targhooks.c. +(define_expand "reload_outqi"
> +  [(set (match_operand:QI 0 "memory_operand" "=Q")
> +	(match_operand:QI 1 "register_operand" "r"))
> +   (clobber (match_operand:QI 2 "register_operand" "=&r"))]
> +  "TARGET_ARM && TARGET_SWP_BYTE_WRITES"
> +)
> +
>  (define_insn "*thumb_movqi_insn"
>    [(set (match_operand:QI 0 "nonimmediate_operand" "=l,l,m,*r,*h,l")
>  	(match_operand:QI 1 "general_operand"      "l, m,l,*h,*r,I"))]

-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-07-19  7:49             ` Wolfgang Mües
@ 2006-07-19 12:09               ` Rask Ingemann Lambertsen
  2006-07-19 13:12                 ` Rask Ingemann Lambertsen
  2006-07-20 10:31                 ` Wolfgang Mües
  0 siblings, 2 replies; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-07-19 12:09 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Wed, Jul 19, 2006 at 07:52:32AM +0200, Wolfgang Mües wrote:
> Hello,
> 
> after getting a "working" version of the gcc 4.0.2 with the Nintendo 
> 8-bit-write problem, I was busy the last weeks trying to adapt the 
> linux system (replacing I/O with writeb() macros, removing strb 
> assembler calls).

It is good to hear from you again.

> However, it turned out that the sources of the linux kernel are a far 
> more demanding test than every single small test case.

Such is life. Since the ARM backend has many arm_expand_fubar() functions,
it wouldn't too surpricing if one of them would now generate invalid insns.

> I have tried my very best to implement the last patch from Rask (thank 
> you very much!). There was one place I was not shure I have coded the
> right solution:

> > +	    (match_code "reg" "0")))
> 
> My patch (without the second operand for match_code):
> 
> > 	    (match_test "GET_CODE (XEXP (op, 0)) == REG")))
> 
> Is this the right substitution?

I believe it is.

> If I compile the linux kernel with this patch, many files get compiled 
> without problems, but in fs/vfat/namei.c I get:
> 
> > fs/vfat/namei.c: In function 'vfat_add_entry':
> > fs/vfat/namei.c:694: error: unrecognizable insn:
> > (insn 2339 2338 2340 188 (set (mem/s/j:QI (reg:SI 14 lr) [0
> > <variable>.attr+0 S1 A8]) (reg:QI 12 ip)) -1 (nil)
> >     (nil))
> > fs/vfat/namei.c:694: internal compiler error: in extract_insn, at
> > recog.c:2020 Please submit a full bug report,
> 
> I can't see what is going on here...

The (clobber ...) part is missing. The first thing to do is to compile with
-fdump-rtl-all and see which pass creates this invalid insn. Grep is your
friend.

I've spotted a function named emit_set_insn() in arm.c. It might be the
problem, because it uses gen_rtx_SET() directly. Thus, if Y is a general
operand, the resulting insn should match one of the move patterns, but the
(clobber ...) expression needed for "_arm_movqi_insn_swp" will be missing.
For example, arm_split_constant() contains these potentially problematic
lines:

              emit_set_insn (target, GEN_INT (val));
               rtx temp = subtargets ? gen_reg_rtx (mode) : target;
              emit_set_insn (temp, GEN_INT (val));

There are other functions which call emit_set_insn(). I have no idea which
one, if any, of these calls is causing the problem. It could also be one of
the RTL passes.

The function named emit_move_insn() ought to do the trick here, but
is perhaps a bit heavyweight for this purpose. Anyway, try this patch
(untested), which should plug this particular hole:

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 115021)
+++ gcc/config/arm/arm.c	(working copy)
@@ -709,12 +709,15 @@
   TLS_LE32
 };
 
-/* Emit an insn that's a simple single-set.  Both the operands must be known
-   to be valid.  */
+/* Emit an insn that's a simple single-set if Y isn't a general operand.
+   Both the operands must be known to be valid.  */
 inline static rtx
 emit_set_insn (rtx x, rtx y)
 {
-  return emit_insn (gen_rtx_SET (VOIDmode, x, y));
+  if (general_operand (y, GET_MODE (x))
+    return emit_move_insn (x, y);
+  else
+    return emit_insn (gen_rtx_SET (VOIDmode, x, y));
 }
 
 /* Return the number of bits set in VALUE.  */

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-07-19 12:09               ` Rask Ingemann Lambertsen
@ 2006-07-19 13:12                 ` Rask Ingemann Lambertsen
  2006-07-20 10:31                 ` Wolfgang Mües
  1 sibling, 0 replies; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-07-19 13:12 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Wed, Jul 19, 2006 at 01:24:59PM +0200, Rask Ingemann Lambertsen wrote:
> 
> The function named emit_move_insn() ought to do the trick here, but
> is perhaps a bit heavyweight for this purpose. Anyway, try this patch
> (untested), which should plug this particular hole:

There was an unbalanced parantheses. Here's an updated patch:

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 115021)
+++ gcc/config/arm/arm.c	(working copy)
@@ -709,12 +709,15 @@
   TLS_LE32
 };
 
-/* Emit an insn that's a simple single-set.  Both the operands must be known
-   to be valid.  */
+/* Emit an insn that's a simple single-set if Y isn't a general operand.
+   Both the operands must be known to be valid.  */
 inline static rtx
 emit_set_insn (rtx x, rtx y)
 {
-  return emit_insn (gen_rtx_SET (VOIDmode, x, y));
+  if (general_operand (y, GET_MODE (x)))
+    return emit_move_insn (x, y);
+  else
+    return emit_insn (gen_rtx_SET (VOIDmode, x, y));
 }
 
 /* Return the number of bits set in VALUE.  */

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-07-19 12:09               ` Rask Ingemann Lambertsen
  2006-07-19 13:12                 ` Rask Ingemann Lambertsen
@ 2006-07-20 10:31                 ` Wolfgang Mües
  2006-07-20 13:09                   ` Dave Korn
  2006-07-20 14:38                   ` Rask Ingemann Lambertsen
  1 sibling, 2 replies; 54+ messages in thread
From: Wolfgang Mües @ 2006-07-20 10:31 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: gcc

Hello Rask,

On Wednesday 19 July 2006 13:24, Rask Ingemann Lambertsen wrote:
> I've spotted a function named emit_set_insn() in arm.c. It might be
> the problem, because it uses gen_rtx_SET() directly.

But it's not the only function which uses gen_rtx_SET. There are also
much places with

>     emit_constant_insn (cond,
> 	gen_rtx_SET (VOIDmode, target, source));

Isn't it better to replace gen_rtx_SET?

regards

Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-07-20 10:31                 ` Wolfgang Mües
@ 2006-07-20 13:09                   ` Dave Korn
  2006-07-20 13:28                     ` Richard Earnshaw
  2006-07-20 15:58                     ` Ian Lance Taylor
  2006-07-20 14:38                   ` Rask Ingemann Lambertsen
  1 sibling, 2 replies; 54+ messages in thread
From: Dave Korn @ 2006-07-20 13:09 UTC (permalink / raw)
  To: 'Wolfgang Mües', 'Rask Ingemann Lambertsen'; +Cc: gcc

On 20 July 2006 07:03, Wolfgang Mües wrote:

> Hello Rask,
> 
> On Wednesday 19 July 2006 13:24, Rask Ingemann Lambertsen wrote:
>> I've spotted a function named emit_set_insn() in arm.c. It might be
>> the problem, because it uses gen_rtx_SET() directly.
> 
> But it's not the only function which uses gen_rtx_SET. There are also
> much places with
> 
>>     emit_constant_insn (cond,
>> 	gen_rtx_SET (VOIDmode, target, source));
> 
> Isn't it better to replace gen_rtx_SET?
> 

  Is there any generic advice available as to when and why one should use
emit_insn (gen_rtx_SET (....)) as opposed to emit_move_insn (...)?

  (There are other similar pairs of functions with seemingly overlapping
functionality that I find equally confusing.  For instance, I *think* that you
should only use gen_rtx_REG when you want to create a new pseudo, and that in
the general case of wanting an rtx that refers to an existing pseudo or a hard
reg you're supposed to use gen_reg_rtx which takes care of sharing and
everything for you... but I haven't seen any solid documentation that backs up
my hypothesis.)


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: Modifying ARM code generator for elimination of 8bit writes -  need help
  2006-07-20 13:09                   ` Dave Korn
@ 2006-07-20 13:28                     ` Richard Earnshaw
  2006-07-20 15:58                     ` Ian Lance Taylor
  1 sibling, 0 replies; 54+ messages in thread
From: Richard Earnshaw @ 2006-07-20 13:28 UTC (permalink / raw)
  To: Dave Korn
  Cc: 'Wolfgang Mües', 'Rask Ingemann Lambertsen', gcc

On Thu, 2006-07-20 at 12:04, Dave Korn wrote:
> On 20 July 2006 07:03, Wolfgang Mües wrote:
> 
> > Hello Rask,
> > 
> > On Wednesday 19 July 2006 13:24, Rask Ingemann Lambertsen wrote:
> >> I've spotted a function named emit_set_insn() in arm.c. It might be
> >> the problem, because it uses gen_rtx_SET() directly.
> > 
> > But it's not the only function which uses gen_rtx_SET. There are also
> > much places with
> > 
> >>     emit_constant_insn (cond,
> >> 	gen_rtx_SET (VOIDmode, target, source));
> > 
> > Isn't it better to replace gen_rtx_SET?
> > 
> 
>   Is there any generic advice available as to when and why one should use
> emit_insn (gen_rtx_SET (....)) as opposed to emit_move_insn (...)?

emit_move_insn will validate that the operands and try to make things
work if they don't satisfy the predicates for mov<mode>; ultimately it
will end up calling the gen_mov<mode> insn to do the right thing. 
emit_insn (gen_rtx_set... just takes the two operands and forms a set
insn, so the operands can be anything that's valid in a set, for
example:

	op0 = reg
	op1 = plus(reg, const_int)

you can only use emit_insn (gen_rtx_set(...)) when you know that the
result is a valid insn (for the current context).

R.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-07-20 10:31                 ` Wolfgang Mües
  2006-07-20 13:09                   ` Dave Korn
@ 2006-07-20 14:38                   ` Rask Ingemann Lambertsen
  2006-07-21 11:40                     ` Rask Ingemann Lambertsen
  1 sibling, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-07-20 14:38 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Thu, Jul 20, 2006 at 08:02:45AM +0200, Wolfgang Mües wrote:
> But it's not the only function which uses gen_rtx_SET. There are also
> much places with
> 
> >     emit_constant_insn (cond,
> > 	gen_rtx_SET (VOIDmode, target, source));
> 
> Isn't it better to replace gen_rtx_SET?

gen_rtx_SET() is OK if, for example, the source is a PLUS, MINUS, ASHIFT,
etc. expression instead of REG, MEM or CONST.

Anyway, disregard the patch to emit_set_insn(). It causes an endless loop
between the movsi expander and arm_gen_constant().

There are two ways that the clobberless movqi insns are generated. One is
the "storeinthi" expander (and some similiar ones around it):

;; Subroutine to store a half word integer constant into memory.
(define_expand "storeinthi"
  [(set (match_operand 0 "" "")
        (match_operand 1 "" ""))
   (set (match_dup 3) (match_dup 2))]
  "TARGET_ARM"
[...]
    operands[3] = adjust_address (op0, QImode, 1);
    operands[0] = adjust_address (operands[0], QImode, 0);
    operands[2] = gen_lowpart (QImode, operands[2]);
    operands[1] = gen_lowpart (QImode, operands[1]);
  }"
)

This function isn't used with -march=armv4 or better, so a workaround is to
change -march=armv3 to -march=armv4 if the CPU supports that.

The other way is the "reload_outqi" pattern. First, the one I posted was
missing a (parallel [...]) and would produce an insn which didn't match
"_arm_movqi_insn_swp". Second, GCC uses it for reg->stack moves for pseudo
registers allocated on the stack, so the address wouldn't have satisfied the
"Q" constraint in many cases. It should look like this:

;; The earlyclobber is required by default_secondary_reload() in targhooks.c.
;; We may be asked to generate reg->stack moves from what was reg->reg moves.
;; This requires both a QImode scratch register to trash and a SImode scratch
;; register to hold the address. Since we can get only one scratch register,
;; we ask for a DImode scratch register and split it up.
(define_expand "reload_outqi"
  [(clobber (match_operand:QI 0 "memory_operand" "=Q"))
   (clobber (match_operand:DI 2 "register_operand" "=&r"))
   (set (match_dup 4) (match_dup 5))
   (parallel [
   (set (match_dup 6)
	(match_operand:QI 1 "register_operand" "r"))
   (clobber (match_dup 3))]
  )]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES"
{
  operands[3] = simplify_gen_subreg (QImode, operands[2], DImode, 0);
  operands[4] = simplify_gen_subreg (Pmode,  operands[2], DImode, 4);
  /* If necessary, reload the address. */
  if (REG_P (XEXP (operands[0], 0)))
    {
      operands[5] = operands[4];
      operands[6] = operands[0];
    }
  else
    {
      operands[5] = XEXP (operands[0], 0);
      operands[6] = gen_rtx_MEM (QImode, operands[4]);
    }
})

(Is it OK to use gen_rtx_MEM() during reload? Should I use
replace_equiv_address() instead?) 

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-06-07  5:35               ` Wolfgang Mües
  2006-06-07  9:38                 ` Richard Earnshaw
@ 2006-07-20 15:14                 ` Rask Ingemann Lambertsen
  2006-07-21 14:20                   ` Rask Ingemann Lambertsen
  1 sibling, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-07-20 15:14 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Wed, Jun 07, 2006 at 07:22:31AM +0200, Wolfgang Mües wrote:
> On Tuesday 06 June 2006 21:33, Rask Ingemann Lambertsen wrote:
> 
> > Yet another register which stands a good chance of being reusable is
> > the register containing the address.
> 
> Yes, but that is not allowed according to the specification of the swp 
> instruction. The address register must be different from the other 2 
> registers. Is there any chance of gcc violating this constraint? 

Yes. :-(

$ awk 'BEGIN { FS="[][\t ,]"; } $2 ~ /^swp/ && ($3 == $5 || $3 == $8 || \
$5 == $8) { print; }' fs/vfat/namei.s
        swpb    r6, r3, [r6]

The only fix I can think of is to mark the scratch register as
early-clobbered, but we preferably don't want to do that, because then it
can't be identical to the input register. Actually, this may work, but I
can't test it right now because the previous recompile hasn't finished yet:

;; This is primarily a hack for the Nintendo DS external RAM.
(define_insn "_arm_movqi_insn_swp"
  [(set (match_operand:QI 0 "reg_or_Qmem_operand" "=r,r,r,Q,Q")
	(match_operand:QI 1 "general_operand" "rI,K,m,r,r"))
        (clobber (match_scratch:QI 2 "=X,X,X,1,&r"))]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES
   && (   register_operand (operands[0], QImode)
       || register_operand (operands[1], QImode))"
  "@
   mov%?\\t%0, %1
   mvn%?\\t%0, #%B1
   ldr%?b\\t%0, %1
   swp%?b\\t%1, %1, [%|%m0]
   swp%?b\\t%2, %1, [%|%m0]"
  [(set_attr "type" "*,*,load1,store1,store1")
   (set_attr "predicable" "yes")]
)

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-07-20 13:09                   ` Dave Korn
  2006-07-20 13:28                     ` Richard Earnshaw
@ 2006-07-20 15:58                     ` Ian Lance Taylor
  2006-07-20 21:20                       ` Dave Korn
  1 sibling, 1 reply; 54+ messages in thread
From: Ian Lance Taylor @ 2006-07-20 15:58 UTC (permalink / raw)
  To: Dave Korn
  Cc: 'Wolfgang Mües', 'Rask Ingemann Lambertsen', gcc

"Dave Korn" <dave.korn@artimi.com> writes:

>   Is there any generic advice available as to when and why one should use
> emit_insn (gen_rtx_SET (....)) as opposed to emit_move_insn (...)?

It depends on where you are in the compiler.  You can only use
gen_rtx_SET if you know that the resulting insn will be recognized.  I
think it is usually cleaner to use gen_INSNNAME, but in some cases
that is inconvenient because there are different possible insns
involved (depending on processor type, etc.).

emit_move_insn should be used to move values around.  The middle end
only calls gen_rtx_SET if it is going to then call recog to make sure
the result is valid.

>   (There are other similar pairs of functions with seemingly overlapping
> functionality that I find equally confusing.  For instance, I *think* that you
> should only use gen_rtx_REG when you want to create a new pseudo, and that in
> the general case of wanting an rtx that refers to an existing pseudo or a hard
> reg you're supposed to use gen_reg_rtx which takes care of sharing and
> everything for you... but I haven't seen any solid documentation that backs up
> my hypothesis.)

You should normally use gen_reg_rtx to create a new pseudo-register.
gen_rtx_REG is normally used to create a reference to a hard register.
Using gen_rtx_REG to refer to an existing pseudo-register will
normally do the wrong thing.  If there is any code which does that, it
needs a very hard look.

Ian

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-07-20 15:58                     ` Ian Lance Taylor
@ 2006-07-20 21:20                       ` Dave Korn
  0 siblings, 0 replies; 54+ messages in thread
From: Dave Korn @ 2006-07-20 21:20 UTC (permalink / raw)
  To: 'Ian Lance Taylor'
  Cc: 'Wolfgang Mues', 'Rask Ingemann Lambertsen', gcc

On 20 July 2006 16:25, Ian Lance Taylor wrote:

> "Dave Korn" <dave.korn@artimi.com> writes:
> 
>>   Is there any generic advice available as to when and why one should use
>> emit_insn (gen_rtx_SET (....)) as opposed to emit_move_insn (...)?
> 
> It depends on where you are in the compiler.  You can only use
> gen_rtx_SET if you know that the resulting insn will be recognized.  I
> think it is usually cleaner to use gen_INSNNAME, but in some cases
> that is inconvenient because there are different possible insns
> involved (depending on processor type, etc.).

  Ah, so simple register/memory/constant moves with valid addressing modes are
(in general, on most cpus) going to be ok, but out-of-range constants,
complicated mem or subreg expressions, or complex (non-folded/non-foldable)
compound operands would make it barf IIUYC.

> emit_move_insn should be used to move values around.  The middle end
> only calls gen_rtx_SET if it is going to then call recog to make sure
> the result is valid.
> 
>>   (There are other similar pairs of functions with seemingly overlapping
>> functionality that I find equally confusing.  For instance, I *think* that
>> you should only use gen_rtx_REG when you want to create a new pseudo, and
>> that in the general case of wanting an rtx that refers to an existing
>> pseudo or a hard reg you're supposed to use gen_reg_rtx which takes care
>> of sharing and everything for you... but I haven't seen any solid
>> documentation that backs up my hypothesis.)
> 
> You should normally use gen_reg_rtx to create a new pseudo-register.
> gen_rtx_REG is normally used to create a reference to a hard register.

  Ah, I got it 100% the wrong way around!  I'm afraid I wrote the question off
the top of my head without doing my homework first.

> Using gen_rtx_REG to refer to an existing pseudo-register will
> normally do the wrong thing.  If there is any code which does that, it
> needs a very hard look.

  I'm not saying that there is any; I'm afraid I wrote the question off the
top of my head without checking I'd got the two the right way round.

  Thanks to you and R. for your explanations; it's not too complicated to get
to grips with, but it's one of those things that falls under the category of
unwritten knowledge that you only pick up by (sometimes bitter) experience.


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-07-20 14:38                   ` Rask Ingemann Lambertsen
@ 2006-07-21 11:40                     ` Rask Ingemann Lambertsen
  0 siblings, 0 replies; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-07-21 11:40 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Thu, Jul 20, 2006 at 03:27:49PM +0200, Rask Ingemann Lambertsen wrote:

> (define_expand "reload_outqi"
>   [(clobber (match_operand:QI 0 "memory_operand" "=Q"))
>    (clobber (match_operand:DI 2 "register_operand" "=&r"))
>    (set (match_dup 4) (match_dup 5))
>    (parallel [
>    (set (match_dup 6)
> 	(match_operand:QI 1 "register_operand" "r"))
>    (clobber (match_dup 3))]
>   )]
[...]

I should perhaps explain how this works. Let's say operand 0 is (mem:QI
(reg:SI 0)), operand 1 is (reg:QI 1) and operand 2 is (reg:DI 2). We then
get:

(clobber (mem:QI (reg:SI 0)))
(clobber (reg:DI 2))
(set (reg:SI 3) (reg:SI 3))	; {*arm_movsi_insn}, optimized away later.
(parallel [
	(set (mem:QI (reg:SI 0)) (reg:QI 1))
	(clobber (reg:QI 2))
])				; {_arm_movqi_insn_swp}

If operand 0 is (mem:QI (plus:SI (reg:SI 0) (const_int 16))), we get:

(clobber (mem:QI (reg:SI 0)))
(clobber (reg:DI 2))
(set (reg:SI 3)
     (plus:SI (reg:SI 0) (const_int 16))) ; {*arm_addsi3}
(parallel [
	(set (mem:QI (reg:SI 3)) (reg:QI 1))
	(clobber (reg:QI 2))
])				; {_arm_movqi_insn_swp}

I'll rewrite it to make clearer what is going on. Also, the two clobber
expressions have no purpose in the insn stream. They only exist because all
external operands must be declared using match_operand somewhere in the RTL
template and RTL offers no good way of doing that in a case like this one.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-07-20 15:14                 ` Rask Ingemann Lambertsen
@ 2006-07-21 14:20                   ` Rask Ingemann Lambertsen
  2006-08-05 19:03                     ` Wolfgang Mües
  0 siblings, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-07-21 14:20 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Thu, Jul 20, 2006 at 04:37:41PM +0200, Rask Ingemann Lambertsen wrote:
> ;; This is primarily a hack for the Nintendo DS external RAM.
> (define_insn "_arm_movqi_insn_swp"
>   [(set (match_operand:QI 0 "reg_or_Qmem_operand" "=r,r,r,Q,Q")
> 	(match_operand:QI 1 "general_operand" "rI,K,m,r,r"))
>         (clobber (match_scratch:QI 2 "=X,X,X,1,&r"))]
>   "TARGET_ARM && TARGET_SWP_BYTE_WRITES
>    && (   register_operand (operands[0], QImode)
>        || register_operand (operands[1], QImode))"
>   "@
>    mov%?\\t%0, %1
>    mvn%?\\t%0, #%B1
>    ldr%?b\\t%0, %1
>    swp%?b\\t%1, %1, [%|%m0]
>    swp%?b\\t%2, %1, [%|%m0]"
>   [(set_attr "type" "*,*,load1,store1,store1")
>    (set_attr "predicable" "yes")]
> )

I found that this peephole optimization improves the code a whole lot:

;; The register allocator is often stupid. Try to change
;;	mov	r2, r1
;;	swpb	r2, r2, [r0]
;; into
;;	swpb	r2, r1, [r0]
;; (and pretend it is just another way of allocating a scratch register).
(define_peephole2
  [(parallel
  [(set (match_operand:QI 2 "register_operand")
	(match_operand:QI 1 "register_operand"))
   (clobber (match_scratch:QI 3))])
   (parallel [
   (set (match_operand:QI 0 "memory_operand") (match_dup 2))
   (clobber (match_dup 2))])]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES"
  [(parallel
  [(set (match_dup 0) (match_dup 1))
   (clobber (match_dup 2))])]
)

Another way of improving the code was to swap the order of the two last
alternatives of _arm_movqi_insn_swp. There are a few differences in the
generated code, shown with "1,&r" to the left and "&r,1" to the right:

.L92:                             .L92:
        ldr     r2, [fp, #-144] |         ldr     r1, [fp, #-144]
        ldr     r3, [fp, #-152]           ldr     r3, [fp, #-152]
        cmp     r2, #0          |         cmp     r1, #0
        add     r2, r3, #2                add     r2, r3, #2
        ldreq   r0, [fp, #-144] |         moveq   r0, r1

Above, reload from memory [fp, #-144] for no apparent reason.

.L141:                                                          .L141:
        ldr     r0, [fp, #-152] |         ldr     r2, [fp, #-152]
        sub     r3, r0, #2      |         sub     r3, r2, #2
        cmp     r5, r3                    cmp     r5, r3
        beq     .L142                     beq     .L142
        cmp     r5, #0                    cmp     r5, #0
        movne   r2, r0          |         beq     .L144
        bne     .L146           |         b       .L146
        b       .L144           <

Some sort of register allocation mismatch.

        beq     .L160           |         beq     .L155
        cmp     r0, #44                   cmp     r0, #44
        cmpne   r0, #59                   cmpne   r0, #59
        beq     .L160           |         beq     .L155
        cmp     r0, #61                   cmp     r0, #61
        cmpne   r0, #43                   cmpne   r0, #43
        bne     .L158                     bne     .L158
                                > .L155:
                                >         mov     ip, #95
                                >         str     r8, [fp, #-120]
                                >         mov     r0, #1
                                >         swpb    r2, ip, [r6]
                                >         b       .L159
.L160:                            .L160:
        mov     r3, #95                   mov     r3, #95
        str     r8, [fp, #-120]           str     r8, [fp, #-120]
        mov     r0, #1                    mov     r0, #1
        swpb    r1, r3, [r6]              swpb    r1, r3, [r6]
        b       .L159                     b       .L159

Code duplication, presumably because of the different register allocation.

        ldr     lr, [fp, #-104]           ldr     lr, [fp, #-104]
        ldrb    r2, [r1, ip]              ldrb    r2, [r1, ip]
        add     r3, r1, lr                add     r3, r1, lr
        swpb    r2, r2, [r3]    |         swpb    lr, r2, [r3]
        ldr     r2, [fp, #-132]           ldr     r2, [fp, #-132]
        add     r1, r1, #1                add     r1, r1, #1
                                >         ldr     lr, [fp, #-104]
        add     r2, r2, #1                add     r2, r2, #1
        cmp     r1, r0                    cmp     r1, r0
        str     r2, [fp, #-132]           str     r2, [fp, #-132]
        add     r3, lr, r1                add     r3, lr, r1

Here, the register allocator is just plain stupid in not using the best
alternative. I suspect this is because only reload allocates scratch
registers and doesn't realize that the input register dies in this insn.

        ldr     r2, [fp, #-184] |         ldr     r5, [fp, #-184]
        strh    r3, [r4, #22]             strh    r3, [r4, #22]
        strh    r3, [r4, #14]             strh    r3, [r4, #14]
        mov     r0, r2, asr #16 <
        ldrh    r2, [fp, #-48]            ldrh    r2, [fp, #-48]
        mov     r1, #0                    mov     r1, #0
                                >         mov     r0, r5, asr #16
        add     r3, r4, #13               add     r3, r4, #13
        strh    r2, [r4, #24]             strh    r2, [r4, #24]
        strh    r2, [r4, #18]             strh    r2, [r4, #18]
        strh    r2, [r4, #16]             strh    r2, [r4, #16]
        swpb    ip, r1, [r3]    |         swpb    r6, r1, [r3]
        strh    r0, [r4, #20]             strh    r0, [r4, #20]
        str     r1, [r4, #28]             str     r1, [r4, #28]
        ldr     lr, [fp, #-184] |         strh    r5, [r4, #26]
        strh    lr, [r4, #26]   <

Again, needless reload from memory [fp, #-184]. One more example omitted.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-07-21 14:20                   ` Rask Ingemann Lambertsen
@ 2006-08-05 19:03                     ` Wolfgang Mües
  2006-08-06  0:05                       ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 54+ messages in thread
From: Wolfgang Mües @ 2006-08-05 19:03 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: gcc

Rask,

On Friday 21 July 2006 15:26, Rask Ingemann Lambertsen wrote:
> I found that this peephole optimization improves the code a whole
> lot:

Done.

> Another way of improving the code was to swap the order of the two
> last alternatives of _arm_movqi_insn_swp.

Done.

Anyway, the problems with reload continues...error: unrecognizable insn

First, I have had a problem with loading a register with a constant.
(no clobber). I have solved this problem by adding

> (define_insn "_arm_movqi_insn_const"
>   [(set (match_operand:QI 0 "register_operand" "=r")
> 	(match_operand:QI 1 "const_int_operand" ""))]
>   "TARGET_ARM && TARGET_SWP_BYTE_WRITES
>    && (   register_operand (operands[0], QImode))"
>   "@
>    mov%?\\t%0, %1"
>   [(set_attr "type" "*")
>    (set_attr "predicable" "yes")]
> )

I am very shure that this does only cure the symptoms, and it will 
better to fix this in the reload stage, but at least, it worked, and I 
was able to compile the whole linux kernel!

After testing that the kernel is running, I have tried to compile 
uCLinux. And there is the next problem....

> ../ncurses/./base/lib_set_term.c: In function '_nc_setupscreen':
> ../ncurses/./base/lib_set_term.c:470: error: unrecognizable insn:
> (insn 1199 1198 696 37 ../ncurses/./base/lib_set_term.c:429 (parallel
> [ (set (mem/s/j:QI (reg/f:SI 3 r3 [491]) [0 <variable>._clear+0 S1
> A8]                                                                  
>          ) (reg:QI 0 r0))
>             (clobber (subreg:QI (reg:DI 11 fp) 0))
>         ]) -1 (nil)
>     (nil))
> ../ncurses/./base/lib_set_term.c:470: internal compiler error: in
> extract_insn,                                                        
>                     at recog.c:2020 P

The source code line is:

>    newscr->_clear = TRUE;

Obviously, TRUE is loaded in r0, but I don't know why this construct 
(storing a byte into a struct member referenced by a pointer) is not
evaluated.

I fear that these problems are creating an endless story, and sorry for 
generating traffic on this list, because I'm still no gcc expert...

On the other hand, the compiler now has generated code from hundreds of 
files, and maybe I'm very near to success now.

regards

Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-08-05 19:03                     ` Wolfgang Mües
@ 2006-08-06  0:05                       ` Rask Ingemann Lambertsen
  2006-08-12 12:40                         ` Wolfgang Mües
  0 siblings, 1 reply; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-08-06  0:05 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Sat, Aug 05, 2006 at 09:03:34PM +0200, Wolfgang Mües wrote:

> First, I have had a problem with loading a register with a constant.
> (no clobber). I have solved this problem by adding
> 
> > (define_insn "_arm_movqi_insn_const"
[cut]
> 
> I am very shure that this does only cure the symptoms, and it will 
> better to fix this in the reload stage, but at least, it worked, and I 
> was able to compile the whole linux kernel!

Yes, it only cures the symptom, but it could take a lot of time to find the
cause, and the gain is small, so I think it is OK to leave it like this for
now.
 
> After testing that the kernel is running, I have tried to compile 
> uCLinux. And there is the next problem....
> 
> > ../ncurses/./base/lib_set_term.c: In function '_nc_setupscreen':
> > ../ncurses/./base/lib_set_term.c:470: error: unrecognizable insn:
> > (insn 1199 1198 696 37 ../ncurses/./base/lib_set_term.c:429 (parallel
> > [ (set (mem/s/j:QI (reg/f:SI 3 r3 [491]) [0 <variable>._clear+0 S1
> > A8]                                                                  
> >          ) (reg:QI 0 r0))
> >             (clobber (subreg:QI (reg:DI 11 fp) 0))
> >         ]) -1 (nil)
> >     (nil))
> > ../ncurses/./base/lib_set_term.c:470: internal compiler error: in
> > extract_insn,                                                        
> >                     at recog.c:2020 P

This insn was generated from the "reload_outqi" pattern. I don't completely
understand why it isn't recognized. The (subreg:QI (reg:DI 11 fp) 0) part
won't be matched by (match_scratch ...), but simplify_gen_subreg() should
have simplified it to (reg:QI 11 fp) since this is one of the main purposes
of having simplify_(gen_)subreg() in the first place. Try changing

   operands[3] = simplify_gen_subreg (QImode, operands[2], DImode, 0);

into

   operands[3] = gen_rtx_REG (QImode, REGNO (operands[2]));

(in "reload_outqi") and see if that works.

> I fear that these problems are creating an endless story, and sorry for 
> generating traffic on this list, because I'm still no gcc expert...

You shouldn't be sorry about that. GCC provides a good, solid foundation
for learning something new every day.

> On the other hand, the compiler now has generated code from hundreds of 
> files, and maybe I'm very near to success now.

I think so too.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-08-06  0:05                       ` Rask Ingemann Lambertsen
@ 2006-08-12 12:40                         ` Wolfgang Mües
  2006-10-02 10:42                           ` Wolfgang Mües
  0 siblings, 1 reply; 54+ messages in thread
From: Wolfgang Mües @ 2006-08-12 12:40 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: gcc

Rask,

On Sunday 06 August 2006 02:05, Rask Ingemann Lambertsen wrote:
> Yes, it only cures the symptom, but it could take a lot of time to
> find the cause, and the gain is small, so I think it is OK to leave
> it like this for now.
OK.

> This insn was generated from the "reload_outqi" pattern. I don't
> completely understand why it isn't recognized. The (subreg:QI (reg:DI
> 11 fp) 0) part won't be matched by (match_scratch ...), but
> simplify_gen_subreg() should have simplified it to (reg:QI 11 fp)
> since this is one of the main purposes of having
> simplify_(gen_)subreg() in the first place. Try changing
>
>    operands[3] = simplify_gen_subreg (QImode, operands[2], DImode,
> 0);
>
> into
>
>    operands[3] = gen_rtx_REG (QImode, REGNO (operands[2]));
>
> (in "reload_outqi") and see if that works.

Yes, it works. Kernel and userland are compiling now. I can't find any 
errors in the generated code. Many thanks!

regards

Wolfgang
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-08-12 12:40                         ` Wolfgang Mües
@ 2006-10-02 10:42                           ` Wolfgang Mües
  2006-10-20 17:27                             ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 54+ messages in thread
From: Wolfgang Mües @ 2006-10-02 10:42 UTC (permalink / raw)
  To: gcc

Now it's time to give a big "thank you" to all persons involved, 
ecpecially Rask Ingemann Lambertsen with his invaluable help.

As I started this project, I feared that I would never succeed, and 
now ... the modified compiler is used about 3 month now, and DSLINUX 
with this crude modification is working fine with 36 MBytes RAM 
available, and has a good future now.

During the last months, 3 issues have come up, all with invalid insns, 
but with my new-developed knowledge of the arm code generator, I was 
able to resolve them.

So very much thanks to all involved, and keep up the good work!

Wolfgang Mües

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Modifying ARM code generator for elimination of 8bit writes - need help
  2006-10-02 10:42                           ` Wolfgang Mües
@ 2006-10-20 17:27                             ` Rask Ingemann Lambertsen
  0 siblings, 0 replies; 54+ messages in thread
From: Rask Ingemann Lambertsen @ 2006-10-20 17:27 UTC (permalink / raw)
  To: Wolfgang Mües; +Cc: gcc

On Mon, Oct 02, 2006 at 12:42:04PM +0200, Wolfgang Mües wrote:
> Now it's time to give a big "thank you" to all persons involved, 
> ecpecially Rask Ingemann Lambertsen with his invaluable help.
> 
> As I started this project, I feared that I would never succeed, and 
> now ... the modified compiler is used about 3 month now, and DSLINUX 
> with this crude modification is working fine with 36 MBytes RAM 
> available, and has a good future now.

   I'm glad to hear that it works. I have a few improvements I'd like to
make, but for the next few months, my focus will be on getting my 16-bit x86
into GCC 4.3.

   You can take a look at the relout_outqi pattern. It uses two scratch
registers: One for the address and one as a scratch register for swpb. It is
worth investigating if you can safely trash the input operand, since the
reason we have an output reload is because a pseudo register got a stack
slot instead of a hard register. I.e. reload shouldn't expect the value in
operand 1 to be preserved for use in a later insn. But I think a reload
expert should comment on this.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2006-10-20 12:35 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-28 20:23 Modifying ARM code generator for elimination of 8bit writes - need help Wolfgang Mües
2006-05-30 20:04 ` Paul Brook
2006-05-30 21:47   ` Daniel Jacobowitz
2006-05-30 23:14     ` Paul Brook
2006-05-31 20:40     ` Wolfgang Mües
2006-05-31 20:49   ` Wolfgang Mües
2006-05-31 20:59     ` Paul Brook
2006-06-01 14:18     ` Rask Ingemann Lambertsen
2006-06-02  6:24       ` Wolfgang Mües
2006-06-02  7:24         ` Rask Ingemann Lambertsen
2006-06-04 10:31           ` Wolfgang Mües
2006-06-04 11:25             ` Paul Brook
2006-06-04 15:26               ` Wolfgang Mües
2006-06-04 15:57                 ` Paul Brook
2006-06-05 10:07                   ` Wolfgang Mües
2006-06-05 12:25                     ` Paul Brook
2006-06-04 21:22                 ` Rask Ingemann Lambertsen
2006-06-04 21:01               ` Rask Ingemann Lambertsen
2006-06-05  0:12                 ` Dave Murphy
2006-06-05 11:31                   ` Wolfgang Mües
2006-06-05 10:07                 ` Richard Earnshaw
2006-06-05 11:47                   ` Wolfgang Mües
2006-06-04 20:10             ` Rask Ingemann Lambertsen
2006-06-17  8:52           ` Rask Ingemann Lambertsen
2006-07-19  7:49             ` Wolfgang Mües
2006-07-19 12:09               ` Rask Ingemann Lambertsen
2006-07-19 13:12                 ` Rask Ingemann Lambertsen
2006-07-20 10:31                 ` Wolfgang Mües
2006-07-20 13:09                   ` Dave Korn
2006-07-20 13:28                     ` Richard Earnshaw
2006-07-20 15:58                     ` Ian Lance Taylor
2006-07-20 21:20                       ` Dave Korn
2006-07-20 14:38                   ` Rask Ingemann Lambertsen
2006-07-21 11:40                     ` Rask Ingemann Lambertsen
2006-06-04 21:36     ` Rask Ingemann Lambertsen
2006-06-05 12:20       ` Wolfgang Mües
2006-06-05 12:22         ` Richard Earnshaw
2006-06-05 15:10         ` Rask Ingemann Lambertsen
2006-06-06  6:11           ` Wolfgang Mües
2006-06-06 19:37             ` Rask Ingemann Lambertsen
2006-06-07  5:35               ` Wolfgang Mües
2006-06-07  9:38                 ` Richard Earnshaw
2006-07-20 15:14                 ` Rask Ingemann Lambertsen
2006-07-21 14:20                   ` Rask Ingemann Lambertsen
2006-08-05 19:03                     ` Wolfgang Mües
2006-08-06  0:05                       ` Rask Ingemann Lambertsen
2006-08-12 12:40                         ` Wolfgang Mües
2006-10-02 10:42                           ` Wolfgang Mües
2006-10-20 17:27                             ` Rask Ingemann Lambertsen
2006-06-06  9:45         ` Richard Sandiford
2006-06-06 18:49           ` Rask Ingemann Lambertsen
2006-06-08 18:22             ` Rask Ingemann Lambertsen
2006-06-08 21:11               ` Wolfgang Mües
2006-06-09 15:08                 ` Rask Ingemann Lambertsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).