From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-123668-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 11188 invoked by alias); 19 Jun 2015 11:41:39 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 11167 invoked by uid 89); 19 Jun 2015 11:41:38 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=no version=3.3.2
X-Spam-User: qpsmtpd, 2 recipients
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 19 Jun 2015 11:41:37 +0000
Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24])	by mx1.redhat.com (Postfix) with ESMTPS id B7B74365047;	Fri, 19 Jun 2015 11:41:36 +0000 (UTC)
Received: from [10.36.4.203] (vpn1-4-203.ams2.redhat.com [10.36.4.203])	by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t5JBfYLQ022655	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);	Fri, 19 Jun 2015 07:41:36 -0400
Message-ID: <5583FFEE.6060106@redhat.com>
Date: Fri, 19 Jun 2015 11:41:00 -0000
From: Nicholas Clifton <nickc@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Tristan Gingold <gingold@adacore.com>
CC: binutils@sourceware.org, gdb-patches@sourceware.org
Subject: Re: RFC: Prevent disassembly beyond symbolic boundaries
References: <87lhfhynoz.fsf@redhat.com> <3D81F97D-90EA-4769-8381-514BB6E81E3F@adacore.com>
In-Reply-To: <3D81F97D-90EA-4769-8381-514BB6E81E3F@adacore.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SW-Source: 2015-06/txt/msg00403.txt.bz2

Hi Tristan,

>>   This will disassemble as:
>>
>>     0000000000000000 <foo>:
>>        0:   24 2f                   and    $0x2f,%al
>>        2:   83 0f ba                orl    $0xffffffba,(%rdi)
>>
>>     0000000000000003 <bar>:
>>        3:   0f ba e2 03             bt     $0x3,%edx
>>
>>   Note how the instruction decoded at address 0x2 has stolen two bytes
>>   from "foo", but these bytes are also decoded (correctly this time) as
>>   part of the first instruction of foo.

> I am curious.  Why do you think it was a problem ?

Strangely enough, this actually causes regressions with the perf tool's 
testsuite:

   https://bugzilla.redhat.com/show_bug.cgi?id=1054767

What happens is that perf test 21 runs objdump on a binary, *parses* 
this output and compares that to the actual bytes in the binary. 
Because of the overrun feature shown above you actually get more bytes 
displayed in objdump's output than actually exist in the binary and so 
the perf test fails.


> Even if there is a symbol in the middle of an instruction, Iâd like
> to understand what the processor will execute.

Except that even the current the displayed disassembly is not what the 
processor would execute.  In the example above the processor would 
execute the ORL instruction starting at address 0x2. but it would not 
continue on to execute the BT instruction at address 0x3.  Instead it 
would start decoding from address 0x5, whatever instruction that might be...


>  Before the proposed
> change, it was possible, but after it isnât easy anymore.

True - but this only matters if the processor would execute from that 
piece of memory.  What if the byte(s) are actually data ?  (eg a 
constant pool).  Then it would make more sense to display the bytes as 
just byte values.

The point being that if there is a symbol that is in the middle of an 
instruction then something hinky is going on.  Either the symbol is 
misplaced or the instruction is not really an instruction or else an 
assembly programmer is being extra super clever and hiding data inside 
instructions.

How about a tweak to the patch then ?  What if the -D option 
(disassemble all) disables this feature, and so the disassembled 
instruction is displayed as before, whilst the -d option (disassemble 
code) leaves it enabled.  Then if you want to see bytes as instructions 
you can use the -D option (possibly combined with -j), but if you want 
to see a more likely, only real instructions disassembled version, then 
use the -d option.  (Obviously the patch would need to be extended with 
an update to the documentation too).

Cheers
   Nick