From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-181978-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 21834 invoked by alias); 12 Feb 2014 09:37:26 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 21818 invoked by uid 89); 12 Feb 2014 09:37:26 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-ve0-f180.google.com
Received: from mail-ve0-f180.google.com (HELO mail-ve0-f180.google.com) (209.85.128.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 12 Feb 2014 09:37:25 +0000
Received: by mail-ve0-f180.google.com with SMTP id db12so7002774veb.39        for <gcc@gcc.gnu.org>; Wed, 12 Feb 2014 01:37:23 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.58.133.15 with SMTP id oy15mr31220512veb.19.1392197843134; Wed, 12 Feb 2014 01:37:23 -0800 (PST)
Received: by 10.52.121.47 with HTTP; Wed, 12 Feb 2014 01:37:23 -0800 (PST)
Date: Wed, 12 Feb 2014 09:37:00 -0000
Message-ID: <CAO9OKOP=Tw+uDaHvZXofhXoHZr=m=jpKYH19vvx3jcH8sWm00Q@mail.gmail.com>
Subject: m68k optimisation for beginners?
From: Fredrik Olsson <peylow@gmail.com>
To: gcc@gcc.gnu.org
Content-Type: text/plain; charset=ISO-8859-1
X-SW-Source: 2014-02/txt/msg00156.txt.bz2

Hi.

I would like to get started with how to improve code generation for a
backend. Any pointers, especially to good documentation is welcome.

For this example consider this C function for a reference counted type:
void TCRelease(TCTypeRef tc) {
  if (--tc->retainCount == 0) {
    if (tc->destroy) {
      tc->destroy(tc);
    }
    free((void *)tc);
  }
}

The generated m68k asm is this:
_TCRelease:
    move.l %a2,-(%sp)
    move.l 8(%sp),%a2
    move.w (%a2),%d0  ; Question 1:
    subq.w #1,%d0
    move.w %d0,(%a2)
jne .L7
    move.l 4(%a2),%a0  ; Question 2:
    cmp.w #0,%a0
jeq .L9
    move.l %a2,-(%sp)   ; Question 3:
    jsr (%a0)
    addq.l #4,%sp
.L9:
    move.l %a2,8(%sp)
    move.l (%sp)+,%a2
    jra _free
.L7:
    move.l (%sp)+,%a2
    rts

Question 1:
This could be done as one instructions "sub.l #1, (%a2)", the result
in d0 is never used again, and adding directly to memory will update
the status flags. Would save 4 bytes, and 8 cycles on a 68000.
How would I attack this problem? Peephole optimisation, or maybe the
gcc is not aware that the instruction updates flags?

Question 2:
Doing this as a "move.l 4(%a2), %d0" to a temporary data register
would update the status register, allowing for the branch without the
compare with immediate instruction. Obviously requiring an extra "move
%d0, %a0" if the branch is not taken to be able to make the jump. But
still 2 bytes, and 8 cycles saved in work case (12 cycles is best
case).
Is this a peephole optimisation? Or is it about providing accurate
instruction costs for inst?

Question 3:
Storing a2 on the stack is only ever needed if this code path is
taken. Is this even worth to bother with? And is this something that
moving from reload to LRA for the m68k target solves?

// Fredrik Olsson