public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Edd Barrett <vext01@gmail.com>
To: gcc@gcc.gnu.org
Subject: GCC's switch table code generation
Date: Thu, 08 Oct 2009 01:25:00 -0000	[thread overview]
Message-ID: <a6e8c1d0910071750p7f72a1c3u76be3c29943ea13d@mail.gmail.com> (raw)

Hi there,

I am new here, so I wish to take a second to introduce myself and what
I am working on, then ultimately why I am posting on this list. Also I
will apologise if this is the wrong place to post this query. If this
is so, perhaps you could point me in the right direction.

I have just started a PhD on reverse engineering executable code at
the University of Kent in England. I arrived at this position after
taking an keen interest in operating systems and programming languages
during my undergraduate degree, where I eventually wrote an
experimental compiler. So basically I am quite new to compiler
technology, but not completely clueless.

One of the first things that became apparent when reviewing literature
on reverse engineering executable code, was that computer scientists
consider the code generated by switch statements to be awkward to draw
CFG's for. Following on from this, I have started to disassemble some
simple mock up programs. I have discovered:

 * GCC will sometimes just use simple conditional jumps, notably with -O0.
 * at -O1 and greater, depending upon the size of the case table, GCC
sometimes embeds a indirect jump table in the data segment of the
binary and uses arithmetic to lookup the address to jump to.

Next i scripted the creation of a sparse switch statement with little
clusters of cases next to each other, but with nothing in-between. The
exact test case was a switch statement (comparing an integer) with a
(differing) printf and break statement on each arm at positions:

0-9
300-319
9000-9040


What I expected was one indirect jump table for each of the 3 clusters
and some form of conditional as to which table to look in. What I got
was very different. In-fact no indirect jump tables were used at-all.
A series of conditional jumps were used. What I am confused about was
the selection of comparisons used to construct the switch. I started
to draw a tree for these findings, but obviously the tree is huge, so
I only did a bit (I need to make a tool to draw these trees really):

http://students.dec.bournemouth.ac.uk/ebarrett/files/switch-tree.png

I would be really interested to know how GCC:
 * Decides whether or not to embed tables in the data segment of the binary.
 * Selects the comparisons in the above tree.

Perhaps someone knows of a good paper/book on this topic which could
be of use, or a relevant section of code in the GCC sources (at the
moment I am overwhelmed by the source tree).

For the record, all of my tests have switched on a variable which can
not be known at compile time (argv[1]). This was to stop the optimiser
being clever and removing the switch table altogether.

My environment:
OpenBSD-4.6-current/i386
GCC-4.2.4

Sorry for the huge email.

-- 
Best Regards
Edd Barrett

http://students.dec.bournemouth.ac.uk/ebarrett

             reply	other threads:[~2009-10-08  0:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-08  1:25 Edd Barrett [this message]
2009-10-08  6:49 ` Ian Lance Taylor
2009-10-08 11:21   ` Edd Barrett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6e8c1d0910071750p7f72a1c3u76be3c29943ea13d@mail.gmail.com \
    --to=vext01@gmail.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).