From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11792 invoked by alias); 21 Jun 2012 08:47:23 -0000 Received: (qmail 11772 invoked by uid 22791); 21 Jun 2012 08:47:21 -0000 X-SWARE-Spam-Status: No, hits=-4.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,KHOP_THREADED,TW_CP X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 21 Jun 2012 08:47:08 +0000 From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32 Date: Thu, 21 Jun 2012 08:47:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: WAITING X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.8.0 X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-06/txt/msg01439.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726 --- Comment #18 from rguenther at suse dot de 2012-06-21 08:46:11 UTC --- On Wed, 20 Jun 2012, hjl.tools at gmail dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726 > > --- Comment #17 from H.J. Lu 2012-06-20 15:36:09 UTC --- > (In reply to comment #16) > > But I am not sure if a good library implementation shouldn't be always > > preferable to a byte-wise copy. We could, at least try to envision a way > > to retain and use the knowledge that the size is at most 8 when expanding > > the memcpy (with AVX we could use a masked store for example - quite fancy). > > string/memory functions in libc can be much faster than the ones generated > by GCC unless the size is very small, PR 43052. Yes. The question is what is "very small" and how can we possibly detect "very small". For this testcase we can derive an upper bound of the size, which is 8, but the size is not constant. I think unless we know we can expand the variable-size memcpy with, say, three CPU instructions inline there is no reason to not call memcpy. Thus if the CPU could do tem = unaligned-load-8-bytes-from-src-and-ignore-faults; mask = generate mask from size store-unaligned-8-bytes-with-maxk then expanding the memcpy call inline would be a win I suppose. AVX has VMASKMOV, but I'm not sure using that for sizes <= 16 bytes is profitable? Note that from the specs of VMASKMOV it seems the memory operands need to be aligned and the mask does not support byte-granularity. Which would leave us to inline expanding the case of at most 2 byte memcpy. Of course currently there is no way to record an upper bound for the size (we do not retain value-range information - but we of course should).