From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30650 invoked by alias); 10 Oct 2009 01:15:46 -0000 Received: (qmail 30588 invoked by uid 48); 10 Oct 2009 01:15:32 -0000 Date: Sat, 10 Oct 2009 01:15:00 -0000 Message-ID: <20091010011532.30587.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug libstdc++/41351] std::rotate on RAI does not conform to ISO complexity requirement In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "potswa at mac dot com" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2009-10/txt/msg00854.txt.bz2 ------- Comment #32 from potswa at mac dot com 2009-10-10 01:15 ------- I have little experience in this field, so you would probably be a better judge of the best alternative to complete revision. My suggested code is a complete rewrite, based on creating a new algorithm from first principles. There's no going halfway… there's the version I posted, without special cases for left/right by one, but that's not so much safer as merely inferior. Now that I've reread the forward iterator version and realized that it's the same as my algorithm, less my right-rotating case, countdown-style loops, and std::copy calls, I can see that its performance is nearly as good. If you like, you can benchmark it against the bidirectional and RAI cases. If you like the gain from simply deleting the bidirectional and RAI functions, you might check in that change. It is ironic that the highest performing (at least for large n) algorithm is the one which never gets used (as forward iterators are nonexistent among the STL classes). As currently written, the RAI algorithm accesses memory in long strides which are likely to thrash the cache and cause the worst possible memory bottleneck for large and coprime k. The bidirectional algorithm accesses memory in linear fashion, but reads and writes each location twice and furthermore performs up to twice as many swaps as necessary. The forward iterator algorithm touches memory locations mostly only once and always sequentially, locations which are touched repeatedly are spatially local and hence cached, and it always performs the minimum number of swaps. Of the current code, it gets the most fundamentals correct. My impression is that any new code is inherently "untrustworthy." I haven't benchmarked this suggestion, but I'm not really keen to check it in myself ;v) . Feel free if you guys like… -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41351