From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 783 invoked by alias); 28 Dec 2012 17:14:53 -0000 Received: (qmail 772 invoked by uid 22791); 28 Dec 2012 17:14:52 -0000 X-SWARE-Spam-Status: No, hits=-3.6 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_OV X-Spam-Check-By: sourceware.org Received: from mail-ea0-f176.google.com (HELO mail-ea0-f176.google.com) (209.85.215.176) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 28 Dec 2012 17:14:44 +0000 Received: by mail-ea0-f176.google.com with SMTP id d13so4486815eaa.21 for ; Fri, 28 Dec 2012 09:14:42 -0800 (PST) X-Received: by 10.14.205.198 with SMTP id j46mr88067221eeo.27.1356714882575; Fri, 28 Dec 2012 09:14:42 -0800 (PST) Received: from kicer.localnet (095160139237.rudaslaska.vectranet.pl. [95.160.139.237]) by mx.google.com with ESMTPS id f49sm66917302eep.12.2012.12.28.09.14.41 (version=SSLv3 cipher=OTHER); Fri, 28 Dec 2012 09:14:42 -0800 (PST) From: Kicer To: David Brown Cc: Andrew Haley , gcc-help@gcc.gnu.org Subject: Re: problems with optimisation Date: Fri, 28 Dec 2012 17:14:00 -0000 Message-ID: <4179792.vI8coZ6zEV@kicer> User-Agent: KMail/4.8.5 (Linux/3.6.5-desktop-1.mga3; KDE/4.8.5; x86_64; ; ) In-Reply-To: <50DDC9F7.9070606@westcontrol.com> References: <3594412.lfrBexjLtS@kicer> <50DDB877.9000806@redhat.com> <50DDC9F7.9070606@westcontrol.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-IsSubscribed: yes Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org X-SW-Source: 2012-12/txt/msg00147.txt.bz2 Dnia pi=C4=85tek 28 grudnia 2012 17:33:59 David Brown pisze: > On 28/12/12 16:19, Andrew Haley wrote: > > With -O2 there's much less difference: > >=20 > > bar(): bar(): > >=20 > > .LFB14: .LFB14: > > .cfi_startproc .cfi_startproc > > movl $3, %edx movl $3, %edx > > in %dx, %al in %dx, %al > >=20=09 > > movb $6, %dl | movb $4, %dl > > movl %eax, %ecx movl %eax, %ecx > > in %dx, %al in %dx, %al > >=20=09 > > > movb $6, %dl > > > movl %eax, %edi > > > in %dx, %al > >=20=09 > > movb $7, %dl movb $7, %dl > > movl %eax, %esi movl %eax, %esi > >=20=09 > > > andl $1, %edi > >=20=09 > > in %dx, %al in %dx, %al > >=20=09 > > movl %eax, %edi | movl %eax, %r8d > >=20=09 > > > movsbl %sil, %esi > >=20=09 > > movb $8, %dl movb $8, %dl > > subb %dil, %cl | subb %r8b, %cl > > in %dx, %al in %dx, %al > >=20=09 > > andl $16, %esi | addl %edi, %ecx > >=20=09 > > > testb $16, %sil > >=20=09 > > setne %dl setne %dl > >=20=09 > > > andl $1, %esi > >=20=09 > > addl %edx, %ecx addl %edx, %ecx > >=20=09 > > > subb %sil, %cl > >=20=09 > > testb $16, %al testb $16, %al > > setne %al setne %al > > subb %al, %cl subb %al, %cl > > movl %ecx, %eax movl %ecx, %eax > > ret ret > >=20 > > Without inlining GCC can't tell what your program is doing, and by using > > -Os you're preventing GCC from inlining. > >=20 > > Andrew. >=20 > There are normally good reasons for picking -Os rather than -O2 for > small microcontrollers (the OP is targeting AVRs, which typically have > quite small program flash memories). >=20 > So the solution here is to manually declare the various functions as > "inline" (or at least "static", so that the compiler will inline them > automatically). Very often, code that manipulates bits is horrible on a > target like the AVR if the function is not inline, and the compiler has > the bit number(s) as variables - but with inline code generation and > constant folding, you end up with only an instruction or two for > compile-time constant bit numbers. >=20 > (To the OP) - also note that there can be significant differences in the > types of code generation and optimisations for different backends. I > assume you posted x86 assembly because you thought it would be more > familiar to people on this list, but I think it would be more important > to show the real assembly from the target you are using as you might see > different optimisations or missed optimisations. >=20 > Finally, there is a mailing list dedicated to gcc on the avr - it might > be worth posting there too, especially if you think the issue is > avr-specific. >=20 > David David: you are right - I used x86 due to its popularity ;) In my real case I'm observing weird thigs (speaking of inline):=20 1. when in my code I use -Os and inline functions - gcc doesn't inline code= =20 (and AFAIR, generates warning about it wont't inline because code would=20 grown). Code looks funny then: 00000044=20 <_ZNK7OneWire14InterruptBasedILt56ELh4EE10releaseBusEv.isra.0.1569.1517>: 44: bc 98 cbi 0x17, 4 ; 23 46: 08 95 ret plus a few calls like: rcall .-262 ; 0x44=20 <_ZNK7OneWire14InterruptBasedILt56ELh4EE10releaseBusEv.isra.0.1569.1517> those calls are completly useless as 'cbi' could be placed instead of them,= =20 and the whole function actually consists of 1 command (except ret). This is quite important for me as I loose certain amount of clock ticks her= e=20 :) 2. when I use -Os and always_inline attribute, I get a messy code like in m= y=20 first message (program gets bigger by 70%, and uses 2-3x more stack which i= s=20 half of available memory). It's hard to place whole avr program here as it's big, and it's difficult t= o=20 introduce a smaller exmaple, because it's getting messy only when program g= ets=20 bigger. Andrew: it's inconvenient to use O2 as Os produces a progam which size is 3= 0%=20 of O2's result. regards --=20 Micha=C5=82 Walenciak gmail.com kicer86 http://kicer.sileman.net.pl gg: 3729519