From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-help-return-22762-listarch-gcc-help=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 20489 invoked by alias); 6 Feb 2006 10:18:19 -0000
Received: (qmail 20479 invoked by uid 22791); 6 Feb 2006 10:18:18 -0000
X-Spam-Check-By: sourceware.org
Received: from Unknown (HELO mxout5.netvision.net.il) (194.90.9.29)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 06 Feb 2006 10:18:16 +0000
Received: from [192.168.0.202] ([62.0.88.5]) by mxout5.netvision.net.il  (Sun Java System Messaging Server 6.1 HotFix 0.11 (built Jan 28 2005))  with ESMTPA id <0IU9004ZOGM2UN10@mxout5.netvision.net.il> for  gcc-help@gcc.gnu.org; Mon, 06 Feb 2006 12:18:03 +0200 (IST)
Date: Mon, 06 Feb 2006 10:18:00 -0000
From: Yaro Pollak <yarop@altair-semi.com>
Subject: Re: Unaligned access to packed structs on ppc405
In-reply-to: <200602052102.k15L2BD28758@makai.watson.ibm.com>
To: David Edelsohn <dje@watson.ibm.com>, gcc-help@gcc.gnu.org
Message-id: <43E72258.708@altair-semi.com>
MIME-version: 1.0
Content-type: text/plain; charset=ISO-8859-1; format=flowed
Content-transfer-encoding: 7BIT
References: <4D87F853B8020F4888896B1507DC0F09026798@mail2.netezza.com>  <200602052102.k15L2BD28758@makai.watson.ibm.com>
User-Agent: Thunderbird 1.5 (Windows/20051201)
X-IsSubscribed: yes
Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-help/>
List-Post: <mailto:gcc-help@gcc.gnu.org>
List-Help: <mailto:gcc-help-help@gcc.gnu.org>
Sender: gcc-help-owner@gcc.gnu.org
X-SW-Source: 2006-02/txt/msg00049.txt.bz2

What seems odd to me is that packed structures accesses are inherently 
less efficient than non-packed structures.
In my example, the 3 lbz instructions instead of one lwz require 3 
memory accesses instead of 1, that is a penalty of 2 extra memory access 
over the slow bus, and in addition to that there is extra penalty when 
the bit field overlaps byte boundary (as in my example), where GCC must 
generate extra code to "or" those bytes, which, BTW, in my opinion 
contradicts what you wrote earlier:

David Edelsohn wrote:
" The lbz has to do with the size and the packed alignment. With the 
packed structure, GCC chooses the smallest memory access that covers the 
bitfield. Once GCC has chosen bytes, it cannot merge the loads together. 
If the structure were not declared packed, GCC would use wider loads 
with masking, and then determine that the loads refer to the same object."

In my case it shouldn't have chosen byte because it doesn't cover the bitfield that spans over byte boundary. I don't know whether what GCC does is "Right", and I guess if it was implemented in 4.1 somebody decided that it was "Right", but, if the code generated is 3 times the instruction count, and 3 times the memory accesses, for no apparent reason, then I can't see any reason why anyone would want this behavior. I mean the code produced in 4.0.1 for the same structure accessed not through a pointer is just fine, why break it like that? Something just doesn't seem right, I'm sorry.

I think I can summarize it by saying that if it's less efficient then 
there is no justification for it.

Yaro


David Edelsohn wrote:
>>>>>> John Yates writes:
>>>>>>             
>
> John> Do I read this correctly?  Are you truly saying that two structs
> John> with identical layout will trigger different code sequences just
> John> because one was declared packed?
>
> 	Yes.  Why is that strange?  attribute packed assigns the smallest
> possible alignment so that the compiler composes the layout of the
> structure or bitfield in the more compact form possible.  Even if the
> layout produced is the same, the smaller alignment is carried around with
> the fields and causes the compiler to use more conservative access
> operations. 
>
> David
>
>
>