From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27214 invoked by alias); 22 Jan 2014 20:57:09 -0000 Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org Received: (qmail 27204 invoked by uid 89); 22 Jan 2014 20:57:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=BAYES_00,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qc0-f173.google.com Received: from mail-qc0-f173.google.com (HELO mail-qc0-f173.google.com) (209.85.216.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 22 Jan 2014 20:57:06 +0000 Received: by mail-qc0-f173.google.com with SMTP id i8so1294534qcq.18 for ; Wed, 22 Jan 2014 12:57:04 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.140.22.41 with SMTP id 38mr5534755qgm.59.1390424224123; Wed, 22 Jan 2014 12:57:04 -0800 (PST) Received: by 10.97.1.233 with HTTP; Wed, 22 Jan 2014 12:57:03 -0800 (PST) Date: Wed, 22 Jan 2014 20:57:00 -0000 Message-ID: Subject: optimization issues in gcc 4,8 From: Johan Danielsson To: gcc-help@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 X-SW-Source: 2014-01/txt/msg00086.txt.bz2 Hello, I'm looking at moving GCC from 4.5 to 4.8 for a custom CPU. For the most part GCC 4.8 produces better code (smaller, which is mostly what matters in my case). There are however some parts where it does worse. Trying to isolate some of these cases I've come across the following. I have the this function (which is really part of a larger context), multiplying two signed 32-bit integers returning a 64-bit signed integer: long long mul_32_64(int a, int b) { unsigned int lo; unsigned int hi; __asm__("muls.w %0, %2, %3\n" "mulshi.w %1, %2, %3" : "=&r"(lo), "=r"(hi) : "r"(a), "r"(b)); return (long long)(((unsigned long long)hi << 32) | lo); } In this CPU r0..r3 are used for arguments, and are also temporary registers. GCC 4.5.3 (-Os) produces the following assembler: mul_32_64: psh.h 1,4 ; save r4 mov.h r3,r0 muls.w r4, r1, r2 ; mul a and b, put result in r4 mulshi.w r2, r1, r2 ; ... and the upper half in r2 sw.h 0(r3),r4 ; store low part to return value sw.h 4(r0),r2 ; ... and high popr.h 1,4 ; restore r4 and return This is close to optimal, except that if it had used r3 instead of r4 it would not have had to save any registers at all. GCC 4.7.3 gives the same result. Now this is what GCC 4.8.2 thinks about this: mul_32_64: psh.h 6,24 mov.h r3,r0 muls.w r4, r1, r2 mulshi.w r1, r1, r2 mov.h r2,r4 ldi.h r4,0 mov.h r5,r4 mov.h r4,r1 mov.h r6,r5 mov.h r7,r1 mov.h r9,r5 mov.h r8,r2 sw.h 0(r3),r2 sw.h 4(r3),r1 popr.h 6,24 Clearly it has given up on optimizing this. Looking at the RTL output, the step that removes all register shuffling is fwprop, but for 4.8 this does nothing. So my question is really what is causing this. Is it a bug in GCC 4.8, is there some misconfiguration in the backend that is only hitting some functions, or something else. How do I debug this? /Johan