From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-296292-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 30650 invoked by alias); 10 Oct 2009 01:15:46 -0000
Received: (qmail 30588 invoked by uid 48); 10 Oct 2009 01:15:32 -0000
Date: Sat, 10 Oct 2009 01:15:00 -0000
Message-ID: <20091010011532.30587.qmail@sourceware.org>
X-Bugzilla-Reason: CC
References: <bug-41351-18186@http.gcc.gnu.org/bugzilla/>
Subject: [Bug libstdc++/41351] std::rotate on RAI does not conform to ISO complexity requirement
In-Reply-To: <bug-41351-18186@http.gcc.gnu.org/bugzilla/>
Reply-To: gcc-bugzilla@gcc.gnu.org
To: gcc-bugs@gcc.gnu.org
From: "potswa at mac dot com" <gcc-bugzilla@gcc.gnu.org>
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2009-10/txt/msg00854.txt.bz2


------- Comment #32 from potswa at mac dot com  2009-10-10 01:15 -------
I have little experience in this field, so you would probably be a better judge
of the best alternative to complete revision. My suggested code is a complete
rewrite, based on creating a new algorithm from first principles. There's no
going halfway there's the version I posted, without special cases for
left/right by one, but that's not so much safer as merely inferior.

Now that I've reread the forward iterator version and realized that it's the
same as my algorithm, less my right-rotating case, countdown-style loops, and
std::copy calls, I can see that its performance is nearly as good. If you like,
you can benchmark it against the bidirectional and RAI cases.

If you like the gain from simply deleting the bidirectional and RAI functions,
you might check in that change. It is ironic that the highest performing (at
least for large n) algorithm is the one which never gets used (as forward
iterators are nonexistent among the STL classes).

As currently written, the RAI algorithm accesses memory in long strides which
are likely to thrash the cache and cause the worst possible memory bottleneck
for large and coprime k. The bidirectional algorithm accesses memory in linear
fashion, but reads and writes each location twice and furthermore performs up
to twice as many swaps as necessary.

The forward iterator algorithm touches memory locations mostly only once and
always sequentially, locations which are touched repeatedly are spatially local
and hence cached, and it always performs the minimum number of swaps. Of the
current code, it gets the most fundamentals correct.

My impression is that any new code is inherently "untrustworthy." I haven't
benchmarked this suggestion, but I'm not really keen to check it in myself ;v)
. Feel free if you guys like


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41351