From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-help-return-51819-listarch-gcc-help=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 783 invoked by alias); 28 Dec 2012 17:14:53 -0000
Received: (qmail 772 invoked by uid 22791); 28 Dec 2012 17:14:52 -0000
X-SWARE-Spam-Status: No, hits=-3.6 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_OV
X-Spam-Check-By: sourceware.org
Received: from mail-ea0-f176.google.com (HELO mail-ea0-f176.google.com) (209.85.215.176)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 28 Dec 2012 17:14:44 +0000
Received: by mail-ea0-f176.google.com with SMTP id d13so4486815eaa.21        for <gcc-help@gcc.gnu.org>; Fri, 28 Dec 2012 09:14:42 -0800 (PST)
X-Received: by 10.14.205.198 with SMTP id j46mr88067221eeo.27.1356714882575;        Fri, 28 Dec 2012 09:14:42 -0800 (PST)
Received: from kicer.localnet (095160139237.rudaslaska.vectranet.pl. [95.160.139.237])        by mx.google.com with ESMTPS id f49sm66917302eep.12.2012.12.28.09.14.41        (version=SSLv3 cipher=OTHER);        Fri, 28 Dec 2012 09:14:42 -0800 (PST)
From: Kicer <kicer86@gmail.com>
To: David Brown <david@westcontrol.com>
Cc: Andrew Haley <aph@redhat.com>, gcc-help@gcc.gnu.org
Subject: Re: problems with optimisation
Date: Fri, 28 Dec 2012 17:14:00 -0000
Message-ID: <4179792.vI8coZ6zEV@kicer>
User-Agent: KMail/4.8.5 (Linux/3.6.5-desktop-1.mga3; KDE/4.8.5; x86_64; ; )
In-Reply-To: <50DDC9F7.9070606@westcontrol.com>
References: <3594412.lfrBexjLtS@kicer> <50DDB877.9000806@redhat.com> <50DDC9F7.9070606@westcontrol.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"
X-IsSubscribed: yes
Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-help.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-help/>
List-Post: <mailto:gcc-help@gcc.gnu.org>
List-Help: <mailto:gcc-help-help@gcc.gnu.org>
Sender: gcc-help-owner@gcc.gnu.org
X-SW-Source: 2012-12/txt/msg00147.txt.bz2

Dnia pi=C4=85tek 28 grudnia 2012 17:33:59 David Brown pisze:
> On 28/12/12 16:19, Andrew Haley wrote:
> > With -O2 there's much less difference:
> >=20
> > bar():								bar():
> >=20
> > .LFB14:								.LFB14:
> > 	.cfi_startproc							.cfi_startproc
> > 	movl	$3, %edx						movl	$3, %edx
> > 	in %dx, %al							in %dx, %al
> >=20=09
> > 	movb	$6, %dl					      |		movb	$4, %dl
> > 	movl	%eax, %ecx						movl	%eax, %ecx
> > 	in %dx, %al							in %dx, %al
> >=20=09
> > 							      >		movb	$6, %dl
> > 							      >		movl	%eax, %edi
> > 							      >		in %dx, %al
> >=20=09
> > 	movb	$7, %dl							movb	$7, %dl
> > 	movl	%eax, %esi						movl	%eax, %esi
> >=20=09
> > 							      >		andl	$1, %edi
> >=20=09
> > 	in %dx, %al							in %dx, %al
> >=20=09
> > 	movl	%eax, %edi				      |		movl	%eax, %r8d
> >=20=09
> > 							      >		movsbl	%sil, %esi
> >=20=09
> > 	movb	$8, %dl							movb	$8, %dl
> > 	subb	%dil, %cl				      |		subb	%r8b, %cl
> > 	in %dx, %al							in %dx, %al
> >=20=09
> > 	andl	$16, %esi				      |		addl	%edi, %ecx
> >=20=09
> > 							      >		testb	$16, %sil
> >=20=09
> > 	setne	%dl							setne	%dl
> >=20=09
> > 							      >		andl	$1, %esi
> >=20=09
> > 	addl	%edx, %ecx						addl	%edx, %ecx
> >=20=09
> > 							      >		subb	%sil, %cl
> >=20=09
> > 	testb	$16, %al						testb	$16, %al
> > 	setne	%al							setne	%al
> > 	subb	%al, %cl						subb	%al, %cl
> > 	movl	%ecx, %eax						movl	%ecx, %eax
> > 	ret								ret
> >=20
> > Without inlining GCC can't tell what your program is doing, and by using
> > -Os you're preventing GCC from inlining.
> >=20
> > Andrew.
>=20
> There are normally good reasons for picking -Os rather than -O2 for
> small microcontrollers (the OP is targeting AVRs, which typically have
> quite small program flash memories).
>=20
> So the solution here is to manually declare the various functions as
> "inline" (or at least "static", so that the compiler will inline them
> automatically).  Very often, code that manipulates bits is horrible on a
> target like the AVR if the function is not inline, and the compiler has
> the bit number(s) as variables - but with inline code generation and
> constant folding, you end up with only an instruction or two for
> compile-time constant bit numbers.
>=20
> (To the OP) - also note that there can be significant differences in the
> types of code generation and optimisations for different backends.  I
> assume you posted x86 assembly because you thought it would be more
> familiar to people on this list, but I think it would be more important
> to show the real assembly from the target you are using as you might see
> different optimisations or missed optimisations.
>=20
> Finally, there is a mailing list dedicated to gcc on the avr - it might
> be worth posting there too, especially if you think the issue is
> avr-specific.
>=20
> David

David: you are right - I used x86 due to its popularity ;)

In my real case I'm observing weird thigs (speaking of inline):=20

1. when in my code I use -Os and inline functions - gcc doesn't inline code=
=20
(and AFAIR, generates warning about it wont't inline because code would=20
grown).
Code looks funny then:

00000044=20
<_ZNK7OneWire14InterruptBasedILt56ELh4EE10releaseBusEv.isra.0.1569.1517>:
  44:	bc 98       	cbi	0x17, 4	; 23
  46:	08 95       	ret


plus a few calls like:
rcall	.-262    	; 0x44=20
<_ZNK7OneWire14InterruptBasedILt56ELh4EE10releaseBusEv.isra.0.1569.1517>


those calls are completly useless as 'cbi' could be placed instead of them,=
=20
and the whole function actually consists of 1 command (except ret).
This is quite important for me as I loose certain amount of clock ticks her=
e=20
:)

2. when I use -Os and always_inline attribute, I get a messy code like in m=
y=20
first message (program gets bigger by 70%, and uses 2-3x more stack which i=
s=20
half of available memory).


It's hard to place whole avr program here as it's big, and it's difficult t=
o=20
introduce a smaller exmaple, because it's getting messy only when program g=
ets=20
bigger.

Andrew: it's inconvenient to use O2 as Os produces a progam which size is 3=
0%=20
of O2's result.

regards

--=20
Micha=C5=82 Walenciak
gmail.com kicer86
http://kicer.sileman.net.pl
gg: 3729519