From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Jbdw=IQ=nextmovesoftware.com=roger@sourceware.org>
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69])
	by sourceware.org (Postfix) with ESMTPS id 1D9E93858D20
	for <gcc-patches@gcc.gnu.org>; Sat,  6 Jan 2024 13:30:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1D9E93858D20
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1D9E93858D20
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704547839; cv=none;
	b=td4VyW04nLGp06JyxyQXwhUn9qu/zOeoWWFzILNb3r/ALYcYwyz3JkW3mXBotdnFX6Qda3uD+RAwj8LiaQnGciw70G5d8gP/keYpwYWTWlxS8sCQXFDu4I/nGIKTLjU3/9lAXdeJCtva8/eZ1/j+m+5tkC6/cOQdzUGimDLSmUc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1704547839; c=relaxed/simple;
	bh=yMWn6scKye5Smd2b79WpDxNW+6QtOl8QM6aDhbQBZh0=;
	h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=lAhctngD6wfOHexsyhqflCz55CzCF6szizMmnLxjjfonBoNH2IEiTX5A7Q2fPWTgtQIddaUNW/rL/Zn9e0s7b6rTx4p0Nb8m/+C1LZrGocpJ9w3KaKIppHyxwiqZ+MTFzncb5RkJe2nqT678/oYHOMYPziDVKtkRKO/dAnaiIFc=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
	Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
	Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
	:Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:
	List-Subscribe:List-Post:List-Owner:List-Archive;
	bh=uyDPpzm35YW5rFFGBN0TK9WGLcfhjO7FBv1kbnTRPJs=; b=oWSRUhZp8S2u8Nvoe4bjo6c+B9
	ealzmQPNJw9zaNkGnx5C/PoHBm1hMgJhnmW5FlongGZqQ48inF0Afh3Ye1gR1zNFpd66A13QY4wDe
	R3EPlA/Vp467ekwPKYRY72E+RQbjkesRMhVQlGzCjBio1QtqwFY3goFf78Ijkbv1SDEJ6M8uh8mcj
	4sPsTHecrA1taUR0JWa6GlU7Iqrzs9aN4O31XNMsbX8tkBVJkq6LXIlpabPqAi2xeoxGgFi46cEl+
	ThW58QyjxxwuBkTVlkqHX8oqnVTbiKz+0yNOBmrn243boQI0eecUUZP3We7/tm4YaBce1aUvkJM6X
	jy/W9svA==;
Received: from host109-154-238-190.range109-154.btcentralplus.com ([109.154.238.190]:59937 helo=Dell)
	by server.nextmovesoftware.com with esmtpsa  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.96.2)
	(envelope-from <roger@nextmovesoftware.com>)
	id 1rM6l2-0000aR-1g;
	Sat, 06 Jan 2024 08:30:36 -0500
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: <gcc-patches@gcc.gnu.org>
Cc: "'Uros Bizjak'" <ubizjak@gmail.com>
Subject: [x86 PATCH] PR target/113231: Improved costs in Scalar-To-Vector (STV) pass.
Date: Sat, 6 Jan 2024 13:30:34 -0000
Message-ID: <03c401da40a4$8819fe30$984dfa90$@nextmovesoftware.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----=_NextPart_000_03C5_01DA40A4.881A7360"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AdpApDbB5NHW6ssNQtCM9MBaqshtwQ==
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,LIKELY_SPAM_BODY,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

This is a multipart message in MIME format.

------=_NextPart_000_03C5_01DA40A4.881A7360
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit


This patch improves the cost/gain calculation used during the i386 backend's
SImode/DImode scalar-to-vector (STV) conversion pass.  The current code
handles loads and stores, but doesn't consider that converting other
scalar operations with a memory destination, requires an explicit load
before and an explicit store after the vector equivalent.

To ease the review, the significant change looks like:

         /* For operations on memory operands, include the overhead
            of explicit load and store instructions.  */
         if (MEM_P (dst))
           igain += !optimize_insn_for_size_p ()
                    ? (m * (ix86_cost->int_load[2]
                            + ix86_cost->int_store[2])
                       - (ix86_cost->sse_load[sse_cost_idx] +
                          ix86_cost->sse_store[sse_cost_idx]))
                    : -COSTS_N_BYTES (8);

however the patch itself is complicated by a change in indentation
which leads to a number of lines with only whitespace changes.
For architectures where integer load/store costs are the same as
vector load/store costs, there should be no change without -Os/-Oz.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-01-06  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR target/113231
        * config/i386/i386-features.cc (compute_convert_gain): Include
        the overhead of explicit load and store (movd) instructions when
        converting non-store scalar operations with memory destinations.

gcc/testsuite/ChangeLog
        PR target/113231
        * gcc.target/i386/pr113231.c: New test case.


Thanks again,
Roger
--


------=_NextPart_000_03C5_01DA40A4.881A7360
Content-Type: text/plain;
	name="patchvc.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="patchvc.txt"

diff --git a/gcc/config/i386/i386-features.cc =
b/gcc/config/i386/i386-features.cc=0A=
index 4ae3e75..3677aef 100644=0A=
--- a/gcc/config/i386/i386-features.cc=0A=
+++ b/gcc/config/i386/i386-features.cc=0A=
@@ -563,183 +563,195 @@ general_scalar_chain::compute_convert_gain ()=0A=
       else if (MEM_P (src) && REG_P (dst))=0A=
 	igain +=3D m * ix86_cost->int_load[2] - =
ix86_cost->sse_load[sse_cost_idx];=0A=
       else=0A=
-	switch (GET_CODE (src))=0A=
-	  {=0A=
-	  case ASHIFT:=0A=
-	  case ASHIFTRT:=0A=
-	  case LSHIFTRT:=0A=
-	    if (m =3D=3D 2)=0A=
-	      {=0A=
-		if (INTVAL (XEXP (src, 1)) >=3D 32)=0A=
-		  igain +=3D ix86_cost->add;=0A=
-		/* Gain for extend highpart case.  */=0A=
-		else if (GET_CODE (XEXP (src, 0)) =3D=3D ASHIFT)=0A=
-		  igain +=3D ix86_cost->shift_const - ix86_cost->sse_op;=0A=
-		else=0A=
-		  igain +=3D ix86_cost->shift_const;=0A=
-	      }=0A=
-=0A=
-	    igain +=3D ix86_cost->shift_const - ix86_cost->sse_op;=0A=
+	{=0A=
+	  /* For operations on memory operands, include the overhead=0A=
+	     of explicit load and store instructions.  */=0A=
+	  if (MEM_P (dst))=0A=
+	    igain +=3D !optimize_insn_for_size_p ()=0A=
+		     ? (m * (ix86_cost->int_load[2]=0A=
+			     + ix86_cost->int_store[2])=0A=
+			- (ix86_cost->sse_load[sse_cost_idx] +=0A=
+			   ix86_cost->sse_store[sse_cost_idx]))=0A=
+		     : -COSTS_N_BYTES (8);=0A=
 =0A=
-	    if (CONST_INT_P (XEXP (src, 0)))=0A=
-	      igain -=3D vector_const_cost (XEXP (src, 0));=0A=
-	    break;=0A=
+	  switch (GET_CODE (src))=0A=
+	    {=0A=
+	    case ASHIFT:=0A=
+	    case ASHIFTRT:=0A=
+	    case LSHIFTRT:=0A=
+	      if (m =3D=3D 2)=0A=
+		{=0A=
+		  if (INTVAL (XEXP (src, 1)) >=3D 32)=0A=
+		    igain +=3D ix86_cost->add;=0A=
+		  /* Gain for extend highpart case.  */=0A=
+		  else if (GET_CODE (XEXP (src, 0)) =3D=3D ASHIFT)=0A=
+		    igain +=3D ix86_cost->shift_const - ix86_cost->sse_op;=0A=
+		  else=0A=
+		    igain +=3D ix86_cost->shift_const;=0A=
+		}=0A=
 =0A=
-	  case ROTATE:=0A=
-	  case ROTATERT:=0A=
-	    igain +=3D m * ix86_cost->shift_const;=0A=
-	    if (TARGET_AVX512VL)=0A=
-	      igain -=3D ix86_cost->sse_op;=0A=
-	    else if (smode =3D=3D DImode)=0A=
-	      {=0A=
-		int bits =3D INTVAL (XEXP (src, 1));=0A=
-		if ((bits & 0x0f) =3D=3D 0)=0A=
-		  igain -=3D ix86_cost->sse_op;=0A=
-		else if ((bits & 0x07) =3D=3D 0)=0A=
-		  igain -=3D 2 * ix86_cost->sse_op;=0A=
-		else=0A=
-		  igain -=3D 3 * ix86_cost->sse_op;=0A=
-	      }=0A=
-	    else if (INTVAL (XEXP (src, 1)) =3D=3D 16)=0A=
-	      igain -=3D ix86_cost->sse_op;=0A=
-	    else=0A=
-	      igain -=3D 2 * ix86_cost->sse_op;=0A=
-	    break;=0A=
+	      igain +=3D ix86_cost->shift_const - ix86_cost->sse_op;=0A=
 =0A=
-	  case AND:=0A=
-	  case IOR:=0A=
-	  case XOR:=0A=
-	  case PLUS:=0A=
-	  case MINUS:=0A=
-	    igain +=3D m * ix86_cost->add - ix86_cost->sse_op;=0A=
-	    /* Additional gain for andnot for targets without BMI.  */=0A=
-	    if (GET_CODE (XEXP (src, 0)) =3D=3D NOT=0A=
-		&& !TARGET_BMI)=0A=
-	      igain +=3D m * ix86_cost->add;=0A=
-=0A=
-	    if (CONST_INT_P (XEXP (src, 0)))=0A=
-	      igain -=3D vector_const_cost (XEXP (src, 0));=0A=
-	    if (CONST_INT_P (XEXP (src, 1)))=0A=
-	      igain -=3D vector_const_cost (XEXP (src, 1));=0A=
-	    if (MEM_P (XEXP (src, 1)))=0A=
-	      {=0A=
-		if (optimize_insn_for_size_p ())=0A=
-		  igain -=3D COSTS_N_BYTES (m =3D=3D 2 ? 3 : 5);=0A=
-		else=0A=
-		  igain +=3D m * ix86_cost->int_load[2]=0A=
-			   - ix86_cost->sse_load[sse_cost_idx];=0A=
-	      }=0A=
-	    break;=0A=
+	      if (CONST_INT_P (XEXP (src, 0)))=0A=
+		igain -=3D vector_const_cost (XEXP (src, 0));=0A=
+	      break;=0A=
 =0A=
-	  case NEG:=0A=
-	  case NOT:=0A=
-	    igain -=3D ix86_cost->sse_op + COSTS_N_INSNS (1);=0A=
+	    case ROTATE:=0A=
+	    case ROTATERT:=0A=
+	      igain +=3D m * ix86_cost->shift_const;=0A=
+	      if (TARGET_AVX512VL)=0A=
+		igain -=3D ix86_cost->sse_op;=0A=
+	      else if (smode =3D=3D DImode)=0A=
+		{=0A=
+		  int bits =3D INTVAL (XEXP (src, 1));=0A=
+		  if ((bits & 0x0f) =3D=3D 0)=0A=
+		    igain -=3D ix86_cost->sse_op;=0A=
+		  else if ((bits & 0x07) =3D=3D 0)=0A=
+		    igain -=3D 2 * ix86_cost->sse_op;=0A=
+		  else=0A=
+		    igain -=3D 3 * ix86_cost->sse_op;=0A=
+		}=0A=
+	      else if (INTVAL (XEXP (src, 1)) =3D=3D 16)=0A=
+		igain -=3D ix86_cost->sse_op;=0A=
+	      else=0A=
+		igain -=3D 2 * ix86_cost->sse_op;=0A=
+	      break;=0A=
 =0A=
-	    if (GET_CODE (XEXP (src, 0)) !=3D ABS)=0A=
-	      {=0A=
+	    case AND:=0A=
+	    case IOR:=0A=
+	    case XOR:=0A=
+	    case PLUS:=0A=
+	    case MINUS:=0A=
+	      igain +=3D m * ix86_cost->add - ix86_cost->sse_op;=0A=
+	      /* Additional gain for andnot for targets without BMI.  */=0A=
+	      if (GET_CODE (XEXP (src, 0)) =3D=3D NOT=0A=
+		  && !TARGET_BMI)=0A=
 		igain +=3D m * ix86_cost->add;=0A=
-		break;=0A=
-	      }=0A=
-	    /* FALLTHRU */=0A=
-=0A=
-	  case ABS:=0A=
-	  case SMAX:=0A=
-	  case SMIN:=0A=
-	  case UMAX:=0A=
-	  case UMIN:=0A=
-	    /* We do not have any conditional move cost, estimate it as a=0A=
-	       reg-reg move.  Comparisons are costed as adds.  */=0A=
-	    igain +=3D m * (COSTS_N_INSNS (2) + ix86_cost->add);=0A=
-	    /* Integer SSE ops are all costed the same.  */=0A=
-	    igain -=3D ix86_cost->sse_op;=0A=
-	    break;=0A=
 =0A=
-	  case COMPARE:=0A=
-	    if (XEXP (src, 1) !=3D const0_rtx)=0A=
-	      {=0A=
-		/* cmp vs. pxor;pshufd;ptest.  */=0A=
-		igain +=3D COSTS_N_INSNS (m - 3);=0A=
-	      }=0A=
-	    else if (GET_CODE (XEXP (src, 0)) !=3D AND)=0A=
-	      {=0A=
-		/* test vs. pshufd;ptest.  */=0A=
-		igain +=3D COSTS_N_INSNS (m - 2);=0A=
-	      }=0A=
-	    else if (GET_CODE (XEXP (XEXP (src, 0), 0)) !=3D NOT)=0A=
-	      {=0A=
-		/* and;test vs. pshufd;ptest.  */=0A=
-		igain +=3D COSTS_N_INSNS (2 * m - 2);=0A=
-	      }=0A=
-	    else if (TARGET_BMI)=0A=
-	      {=0A=
-		/* andn;test vs. pandn;pshufd;ptest.  */=0A=
-		igain +=3D COSTS_N_INSNS (2 * m - 3);=0A=
-	      }=0A=
-	    else=0A=
-	      {=0A=
-		/* not;and;test vs. pandn;pshufd;ptest.  */=0A=
-		igain +=3D COSTS_N_INSNS (3 * m - 3);=0A=
-	      }=0A=
-	    break;=0A=
+	      if (CONST_INT_P (XEXP (src, 0)))=0A=
+		igain -=3D vector_const_cost (XEXP (src, 0));=0A=
+	      if (CONST_INT_P (XEXP (src, 1)))=0A=
+		igain -=3D vector_const_cost (XEXP (src, 1));=0A=
+	      if (MEM_P (XEXP (src, 1)))=0A=
+		{=0A=
+		  if (optimize_insn_for_size_p ())=0A=
+		    igain -=3D COSTS_N_BYTES (m =3D=3D 2 ? 3 : 5);=0A=
+		  else=0A=
+		    igain +=3D m * ix86_cost->int_load[2]=0A=
+			     - ix86_cost->sse_load[sse_cost_idx];=0A=
+		}=0A=
+	      break;=0A=
 =0A=
-	  case CONST_INT:=0A=
-	    if (REG_P (dst))=0A=
-	      {=0A=
-		if (optimize_insn_for_size_p ())=0A=
-		  {=0A=
-		    /* xor (2 bytes) vs. xorps (3 bytes).  */=0A=
-		    if (src =3D=3D const0_rtx)=0A=
-		      igain -=3D COSTS_N_BYTES (1);=0A=
-		    /* movdi_internal vs. movv2di_internal.  */=0A=
-		    /* =3D> mov (5 bytes) vs. movaps (7 bytes).  */=0A=
-		    else if (x86_64_immediate_operand (src, SImode))=0A=
-		      igain -=3D COSTS_N_BYTES (2);=0A=
-		    else=0A=
-		      /* ??? Larger immediate constants are placed in the=0A=
-			 constant pool, where the size benefit/impact of=0A=
-			 STV conversion is affected by whether and how=0A=
-			 often each constant pool entry is shared/reused.=0A=
-			 The value below is empirically derived from the=0A=
-			 CSiBE benchmark (and the optimal value may drift=0A=
-			 over time).  */=0A=
-		      igain +=3D COSTS_N_BYTES (0);=0A=
-		  }=0A=
-		else=0A=
-		  {=0A=
-		    /* DImode can be immediate for TARGET_64BIT=0A=
-		       and SImode always.  */=0A=
-		    igain +=3D m * COSTS_N_INSNS (1);=0A=
-		    igain -=3D vector_const_cost (src);=0A=
-		  }=0A=
-	      }=0A=
-	    else if (MEM_P (dst))=0A=
-	      {=0A=
-		igain +=3D (m * ix86_cost->int_store[2]=0A=
-			  - ix86_cost->sse_store[sse_cost_idx]);=0A=
-		igain -=3D vector_const_cost (src);=0A=
-	      }=0A=
-	    break;=0A=
+	    case NEG:=0A=
+	    case NOT:=0A=
+	      igain -=3D ix86_cost->sse_op + COSTS_N_INSNS (1);=0A=
 =0A=
-	  case VEC_SELECT:=0A=
-	    if (XVECEXP (XEXP (src, 1), 0, 0) =3D=3D const0_rtx)=0A=
-	      {=0A=
-		// movd (4 bytes) replaced with movdqa (4 bytes).=0A=
-		if (!optimize_insn_for_size_p ())=0A=
-		  igain +=3D ix86_cost->sse_to_integer - ix86_cost->xmm_move;=0A=
-	      }=0A=
-	    else=0A=
-	      {=0A=
-		// pshufd; movd replaced with pshufd.=0A=
-		if (optimize_insn_for_size_p ())=0A=
-		  igain +=3D COSTS_N_BYTES (4);=0A=
-		else=0A=
-		  igain +=3D ix86_cost->sse_to_integer;=0A=
-	      }=0A=
-	    break;=0A=
+	      if (GET_CODE (XEXP (src, 0)) !=3D ABS)=0A=
+		{=0A=
+		  igain +=3D m * ix86_cost->add;=0A=
+		  break;=0A=
+		}=0A=
+	      /* FALLTHRU */=0A=
+=0A=
+	    case ABS:=0A=
+	    case SMAX:=0A=
+	    case SMIN:=0A=
+	    case UMAX:=0A=
+	    case UMIN:=0A=
+	      /* We do not have any conditional move cost, estimate it as a=0A=
+		 reg-reg move.  Comparisons are costed as adds.  */=0A=
+	      igain +=3D m * (COSTS_N_INSNS (2) + ix86_cost->add);=0A=
+	      /* Integer SSE ops are all costed the same.  */=0A=
+	      igain -=3D ix86_cost->sse_op;=0A=
+	      break;=0A=
 =0A=
-	  default:=0A=
-	    gcc_unreachable ();=0A=
-	  }=0A=
+	    case COMPARE:=0A=
+	      if (XEXP (src, 1) !=3D const0_rtx)=0A=
+		{=0A=
+		  /* cmp vs. pxor;pshufd;ptest.  */=0A=
+		  igain +=3D COSTS_N_INSNS (m - 3);=0A=
+		}=0A=
+	      else if (GET_CODE (XEXP (src, 0)) !=3D AND)=0A=
+		{=0A=
+		  /* test vs. pshufd;ptest.  */=0A=
+		  igain +=3D COSTS_N_INSNS (m - 2);=0A=
+		}=0A=
+	      else if (GET_CODE (XEXP (XEXP (src, 0), 0)) !=3D NOT)=0A=
+		{=0A=
+		  /* and;test vs. pshufd;ptest.  */=0A=
+		  igain +=3D COSTS_N_INSNS (2 * m - 2);=0A=
+		}=0A=
+	      else if (TARGET_BMI)=0A=
+		{=0A=
+		  /* andn;test vs. pandn;pshufd;ptest.  */=0A=
+		  igain +=3D COSTS_N_INSNS (2 * m - 3);=0A=
+		}=0A=
+	      else=0A=
+		{=0A=
+		  /* not;and;test vs. pandn;pshufd;ptest.  */=0A=
+		  igain +=3D COSTS_N_INSNS (3 * m - 3);=0A=
+		}=0A=
+	      break;=0A=
+=0A=
+	    case CONST_INT:=0A=
+	      if (REG_P (dst))=0A=
+		{=0A=
+		  if (optimize_insn_for_size_p ())=0A=
+		    {=0A=
+		      /* xor (2 bytes) vs. xorps (3 bytes).  */=0A=
+		      if (src =3D=3D const0_rtx)=0A=
+			igain -=3D COSTS_N_BYTES (1);=0A=
+		      /* movdi_internal vs. movv2di_internal.  */=0A=
+		      /* =3D> mov (5 bytes) vs. movaps (7 bytes).  */=0A=
+		      else if (x86_64_immediate_operand (src, SImode))=0A=
+			igain -=3D COSTS_N_BYTES (2);=0A=
+		      else=0A=
+			/* ??? Larger immediate constants are placed in the=0A=
+			   constant pool, where the size benefit/impact of=0A=
+			   STV conversion is affected by whether and how=0A=
+			   often each constant pool entry is shared/reused.=0A=
+			   The value below is empirically derived from the=0A=
+			   CSiBE benchmark (and the optimal value may drift=0A=
+			   over time).  */=0A=
+			igain +=3D COSTS_N_BYTES (0);=0A=
+		    }=0A=
+		  else=0A=
+		    {=0A=
+		      /* DImode can be immediate for TARGET_64BIT=0A=
+			 and SImode always.  */=0A=
+		      igain +=3D m * COSTS_N_INSNS (1);=0A=
+		      igain -=3D vector_const_cost (src);=0A=
+		    }=0A=
+		}=0A=
+	      else if (MEM_P (dst))=0A=
+		{=0A=
+		  igain +=3D (m * ix86_cost->int_store[2]=0A=
+			    - ix86_cost->sse_store[sse_cost_idx]);=0A=
+		  igain -=3D vector_const_cost (src);=0A=
+		}=0A=
+	      break;=0A=
+=0A=
+	    case VEC_SELECT:=0A=
+	      if (XVECEXP (XEXP (src, 1), 0, 0) =3D=3D const0_rtx)=0A=
+		{=0A=
+		  // movd (4 bytes) replaced with movdqa (4 bytes).=0A=
+		  if (!optimize_insn_for_size_p ())=0A=
+		    igain +=3D ix86_cost->sse_to_integer - ix86_cost->xmm_move;=0A=
+		}=0A=
+	      else=0A=
+		{=0A=
+		  // pshufd; movd replaced with pshufd.=0A=
+		  if (optimize_insn_for_size_p ())=0A=
+		    igain +=3D COSTS_N_BYTES (4);=0A=
+		  else=0A=
+		    igain +=3D ix86_cost->sse_to_integer;=0A=
+		}=0A=
+	      break;=0A=
+=0A=
+	    default:=0A=
+	      gcc_unreachable ();=0A=
+	    }=0A=
+	}=0A=
 =0A=
       if (igain !=3D 0 && dump_file)=0A=
 	{=0A=
diff --git a/gcc/testsuite/gcc.target/i386/pr113231.c =
b/gcc/testsuite/gcc.target/i386/pr113231.c=0A=
new file mode 100644=0A=
index 0000000..f9dcd9a=0A=
--- /dev/null=0A=
+++ b/gcc/testsuite/gcc.target/i386/pr113231.c=0A=
@@ -0,0 +1,8 @@=0A=
+/* { dg-do compile } */=0A=
+/* { dg-options "-Os" } */=0A=
+=0A=
+void foo(int *i) { *i *=3D 2; }=0A=
+void bar(int *i) { *i <<=3D 2; }=0A=
+void baz(int *i) { *i >>=3D 2; }=0A=
+=0A=
+/* { dg-final { scan-assembler-not "movd" } } */=0A=

------=_NextPart_000_03C5_01DA40A4.881A7360--