From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-422564-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 8284 invoked by alias); 18 May 2013 14:04:55 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 8189 invoked by uid 48); 18 May 2013 14:04:47 -0000
From: "msharov at users dot sourceforge.net" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/23684] Combine stores for non strict alignment targets
Date: Sat, 18 May 2013 14:04:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.1.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: msharov at users dot sourceforge.net
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-23684-4-FvuVEprd0S@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-23684-4@http.gcc.gnu.org/bugzilla/>
References: <bug-23684-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-05/txt/msg01237.txt.bz2

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684

--- Comment #12 from msharov at users dot sourceforge.net ---
I'd like to add that this is not some corner case; this is a very common issue.
In my own projects, the compiler's inability to combine stores is the single
largest reason for using inline assembly and raw casts. Pretty much every time
I have an object 8 or 16 bytes in size, I end up writing a zeroing ctor, copy
ctor, and operator= that use full-object memory access. That's cast to uint64_t
for 8 bytes, and movups/movaps for 16 bytes. It also shows up when writing raw
protocol data, such as X calls, where it is very common to write several
constants in succession. The last time I checked, forcing whole-object moves in
these cases results in projectwide code size reduction ~10%. Unfortunately, it
also causes a variety of aliasing pessimizations, so I also have to test
including or not including each of the above functions to get the smallest code
size. I would be a very big deal if the optimizer could do this.