public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Add x86 SSE5 instructions to the GNU binary utilities
@ 2007-08-31 21:46 Michael Meissner
  2007-09-07  2:27 ` Alan Modra
  2007-09-13 20:11 ` rajagopal, dwarak
  0 siblings, 2 replies; 17+ messages in thread
From: Michael Meissner @ 2007-08-31 21:46 UTC (permalink / raw)
  To: binutils, christophe.harle, dwarak.rajagopal; +Cc: michael.meissner

[-- Attachment #1: Type: text/plain, Size: 1543 bytes --]

On August 30th, AMD announced the new SSE5 instruction set that will be
part of future processors.  The programmer's guide for these new
instructions is located at:
http://developer.amd.com/assets/sse5_43479_BDAPMU_3-00_8-27-07.pdf

The enclosed patch adds support for the SSE5 instructions to the assembler and
disassembler.  I had to gzip the patch file to get it small enough to be able
to post (i386-tbl.h and the new test are too big).  Is it ok to apply this
patch?  If not, what changes would people prefer?

It turned out the old move from test register instructions that were in the
I386 and I486 processors but is not in modern processors overlaps with the SSE5
encoding (0x0f24).  These can be disambiguated because the move from test
register instruction always uses MODRM encoding with the upper 2 bits set (for
register/register encoding), and none of the SSE5 instructions set the upper 2
bits.

I needed 3 extra opcode_modifier bits to implement SSE5, and only two bits are
available without going to long long.  I modified the structures to add an
opcode_modifier2 field, and I changed the i386-gen.c program so that existing
lines in the i386-opc.tbl file will be handled without modifying them.  I can
modify all of the instruction lines to add the 0 field if people would prefer.

Note, I had originally posted this to the bug-bintutils@gnu.org, instead of
binutils@sourceware.org.  Sorry about the duplicate post.

-- 
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
michael.meissner@amd.com

[-- Attachment #2: sse5-binutils.patch.gz --]
[-- Type: application/x-gzip, Size: 70833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add x86 SSE5 instructions to the GNU binary utilities
  2007-08-31 21:46 [PATCH] Add x86 SSE5 instructions to the GNU binary utilities Michael Meissner
@ 2007-09-07  2:27 ` Alan Modra
  2007-09-13 20:11 ` rajagopal, dwarak
  1 sibling, 0 replies; 17+ messages in thread
From: Alan Modra @ 2007-09-07  2:27 UTC (permalink / raw)
  To: Michael Meissner; +Cc: binutils, christophe.harle, dwarak.rajagopal

On Fri, Aug 31, 2007 at 01:54:52PM -0400, Michael Meissner wrote:
> I needed 3 extra opcode_modifier bits to implement SSE5, and only two bits are
> available without going to long long.

Hmm, I see only the following combinations of your 3 new bits are
actually used:

 0
 Drex
 Drex|Drexv
 Drexc

So, you could encode in 2 bits..

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-08-31 21:46 [PATCH] Add x86 SSE5 instructions to the GNU binary utilities Michael Meissner
  2007-09-07  2:27 ` Alan Modra
@ 2007-09-13 20:11 ` rajagopal, dwarak
  2007-09-13 20:30   ` H.J. Lu
  2007-09-13 20:38   ` H.J. Lu
  1 sibling, 2 replies; 17+ messages in thread
From: rajagopal, dwarak @ 2007-09-13 20:11 UTC (permalink / raw)
  To: binutils; +Cc: Meissner, Michael, Harle, Christophe

[-- Attachment #1: Type: text/plain, Size: 2332 bytes --]

The enclosed patch adds support for the SSE5 instructions to the
assembler and disassembler. 

I have made changes to the original patch so that it uses bitfields (the
new infrastructure changes which H.J had checked in last week) for
cpu_flag, opcode_modifier and operand_types for the new instructions. 

Is this ok?

Thanks,
Dwarak



> -----Original Message-----
> From: Michael Meissner [mailto:michael.meissner@amd.com]
> Sent: Friday, August 31, 2007 12:55 PM
> To: binutils@sourceware.org; Harle, Christophe; rajagopal, dwarak
> Cc: Meissner, Michael
> Subject: [PATCH] Add x86 SSE5 instructions to the GNU binary utilities
> 
> On August 30th, AMD announced the new SSE5 instruction set that will
be
> part of future processors.  The programmer's guide for these new
> instructions is located at:
> http://developer.amd.com/assets/sse5_43479_BDAPMU_3-00_8-27-07.pdf
> 
> The enclosed patch adds support for the SSE5 instructions to the
assembler
> and
> disassembler.  I had to gzip the patch file to get it small enough to
be
> able
> to post (i386-tbl.h and the new test are too big).  Is it ok to apply
this
> patch?  If not, what changes would people prefer?
> 
> It turned out the old move from test register instructions that were
in
> the
> I386 and I486 processors but is not in modern processors overlaps with
the
> SSE5
> encoding (0x0f24).  These can be disambiguated because the move from
test
> register instruction always uses MODRM encoding with the upper 2 bits
set
> (for
> register/register encoding), and none of the SSE5 instructions set the
> upper 2
> bits.
> 
> I needed 3 extra opcode_modifier bits to implement SSE5, and only two
bits
> are
> available without going to long long.  I modified the structures to
add an
> opcode_modifier2 field, and I changed the i386-gen.c program so that
> existing
> lines in the i386-opc.tbl file will be handled without modifying them.
I
> can
> modify all of the instruction lines to add the 0 field if people would
> prefer.
> 
> Note, I had originally posted this to the bug-bintutils@gnu.org,
instead
> of
> binutils@sourceware.org.  Sorry about the duplicate post.
> 
> --
> Michael Meissner, AMD
> 90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
> michael.meissner@amd.com

[-- Attachment #2: Changelog --]
[-- Type: application/octet-stream, Size: 6468 bytes --]

<binutls changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>

	* NEWS: Add note about SSE5 being added on i386.

<gas changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* NEWS: Add SSE5 support to NEWS file.
	
	* config/tc-i386.h (drex_byte): Add fields to allow process_drex
	and build_modrm_byte to communicate.
	(DREX_OC0): New SSE5 macro.
	(DREX_OC0_MASK): Ditto.
	(DREX_OC1): Ditto.
	(DREX_OC1_MASK): Ditto.
	(DREX_XMEM_X1_X2_X2): Ditto.
	(DREX_X1_XMEM_X2_X2): Ditto.
	(DREX_X1_XMEM_X2_X1: Ditto.
	(DREX_X1_X2_XMEM_X1: Ditto.
	(DREX_XMEM_X1_X2): Ditto.
	(DREX_X1_XMEM_X2): Ditto.
	(drex_byte): New structure to describe the DREX byte.

	* config/tc-i386.c (process_drex): New function to handle SSE5
	DREX bits.
	(build_modrm_byte): Use the information cached away in
	process_drex in the case of DREX instructions.
	(i386_insn): Add drex field.
	(pi): Add debugging of drex field.
	(md_assemble): Treat SSE5 like SSE3 in instructions with an
	immediate byte.  Move REX field to DREX if this is a DREX
	instruction.
	(process_operands): Add SSE5 support.
	(build_modrm_byte): Ditto.
	(output_insn): Ditto.
	(cpu_arch): Ditto.
	(i386_align_code): Ditto.

<gas/testsuite changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* gas/i386/i386.exp (x86-64-sse5): Run SSE5 64-bit tests.
	(sse5): Do not currently run 32-bit  tests.

	* gas/i386/x86-64-sse5.s: New file for SSE5 support.
	* gas/i386/x86-64-sse5.d: Ditto.

<opcodes changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* i386-opc.h (CpuSSE5):  New macro.
	(i386_cpu_flags): Add Drex, Drexv and Drexc.

	* i386-gen.c (cpu_flag_init): Add CPU_SSE5_FLAGS.
	(operand_type_init): Add CpuSSE5.
	(opcode_modifiers): Add Drex, Drexv and Drexc.
	(i386_opcode_modifier): Ditto.

	* i386-opc.tbl (fmaddps,fmaddpd,fmaddss,fmaddsd): Define SSE5
	instructions here.
	(fmsubps,fmsubpd,fmsubss,fmsubsd): Ditto.
	(fnmaddps,fnmaddpd,fnmaddss,fnmaddsd): Ditto.
	(fnmsubps,fnmsubpd,fnmsubss,fnmsubsd): Ditto.
	(pmacssww,pmacsww,pmacsswd,pmacswd): Ditto.
	(pmacssdd,pmacsdd,pmacssdql,pmacssdqh): Ditto.
	(pmacsdql,pmacsdqh,pmadcsswd,pmadcswd): Ditto.
	(phaddbw,phaddbd,phaddbq,phaddwd): Ditto.
	(phaddwq,phadddq,phaddubw,phaddubd): Ditto.
	(phaddubq,phadduwd,phadduwq,phaddudq): Ditto.
	(phsubbw,phsubwd,phsubdq): Ditto.
	(pcmov,pperm,permps,permpd): Ditto.
	(protb,protw,protd,protq): Ditto.
	(pshlb,pshlw,pshld,pshlq): Ditto.
	(pshab,pshaw,pshad,pshaq): Ditto.
	(comps,comeqps,comltps,comungeps,comleps,comungtps): Ditto.
	(comunordps,comneps,comneqps,comnltps,comugeps): Ditto.
	(comnleps,comugtps,comordps,comueqps,comultps): Ditto.
	(comngeps,comuleps,comngtps,comfalseps,comuneps): Ditto.
	(comuneqps,comunltps,comgeps,comunleps,comgtps,comtrueps): Ditto.
	(compd,comeqpd,comltpd,comungepd,comlepd,comungtpd,comunordpd): Ditto.
	(comnepd,comneqpd,comnltpd,comugepd,comnlepd,comugtpd): Ditto.
	(comordpd,comueqpd,comultpd,comngepd,comulepd,comngtpd): Ditto.
	(comfalsepd,comunepd,comuneqpd,comunltpd,comgepd): Ditto.
	(comunlepd,comgtpd,comtruepd): Ditto.
	(comss,comeqss,comltss,comungess,comless,comungtss,comunordss): Ditto.
	(comness,comneqss,comnltss,comugess,comnless,comugtss): Ditto.
	(comordss,comueqss,comultss,comngess,comuless,comngtss): Ditto.
	(comfalsess,comuness,comuneqss,comunltss,comgess): Ditto.
	(comunless,comgtss,comtruess): Ditto.
	(comsd,comeqsd,comltsd,comungesd,comlesd,comungtsd,comunordsd): Ditto.
	(comnesd,comneqsd,comnltsd,comugesd,comnlesd,comugtsd): Ditto.
	(comordsd,comueqsd,comultsd,comngesd,comulesd,comngtsd): Ditto.
	(comfalsesd,comunesd,comuneqsd,comunltsd,comgesd): Ditto.
	(comunlesd,comgtsd,comtruesd): Ditto.
	(pcomub,pcomltub,pcomleub,pcomgtub,pcomgeub,pcomequb): Ditto.
	(pcomnequb,pcomneub): Ditto.
	(pcomuw,pcomltuw,pcomleuw,pcomgtuw,pcomgeuw,pcomequw): Ditto.
	(pcomnequw,pcomneuw): Ditto.
	(pcomud,pcomltud,pcomleud,pcomgtud,pcomgeud,pcomequd): Ditto.
	(pcomnequd,pcomneud): Ditto.
	(pcomuq,pcomltuq,pcomleuq,pcomgtuq,pcomgeuq,pcomequq): Ditto.
	(pcomnequq,pcomneuq): Ditto.
	(pcomb,pcomltb,pcomleb,pcomgtb,pcomgeb,pcomeqb): Ditto.
	(pcomneqb,pcomneb): Ditto.
	(pcomw,pcomltw,pcomlew,pcomgtw,pcomgew,pcomeqw): Ditto.
	(pcomneqw,pcomnew): Ditto.
	(pcomd,pcomltd,pcomled,pcomgtd,pcomged,pcomeqd): Ditto.
	(pcomneqd,pcomned): Ditto.
	(pcomq,pcomltq,pcomleq,pcomgtq,pcomgeq): Ditto.
	(pcomeqq,pcomneqq,pcomneq): Ditto.
	(pcomtrueb, pcomtruew, pcomtrued, pcomtrueq): Ditto.
	(pcomtrueub, pcomtrueuw, pcomtrueud, pcomtrueuq): Ditto.
	(pcomfalseb, pcomfalsew, pcomfalsed, pcomfalseq): Ditto.
	(pcomfalseub, pcomfalseuw, pcomfalseud, pcomfalseuq): Ditto.
	(frczps,frczpd,frczss,frczsd): Ditto.
	(cvtph2ps,cvtps2ph): Ditto.

	* i386-tbl.h: Regenerate from i386-opc.tbl.

	* i386-dis.c (libiberty.h): Include to get ARRAY_SIZE.
	(dis386_move_test): New disassembly support for move from test
	register instruction that overlaps with SSE5 instructions.
	(print_insn): Add support for special casing the i386/i486 move
	from test register instruction that overlaps with the SSE5
	0x0f24 4 operand instructions.
	(OP_DREX_ICMP): New macros for SSE5 DREX handling.
	(OP_DREX_FCMP): Ditto.
	(OP_E_extended): Rename from OP_E, add additional argument to skip
	the DREX byte.
	(OP_E): Call OP_E_extended.
	(DREX_REG_MEMORY): New macros for drex handling.
	(DREX_REG_UNKNOWN): Ditto.
	(DREX4_OC1): Ditto.
	(DREX4_NO_OC0): Ditto.
	(DREX4_MASK): Ditto.
	(three_byte_table): Add SSE5 instructions.
	(print_drex_arg): New function to print a DREX register or memory
	reference.
	(OP_DREX4): New function for handling DREX 4 argument ops.
	(OP_DREX3): New function for handling DREX 3 argument ops.
	(twobyte_has_modrm): 0f{25,7a,7b} all use the modrm byte.
	(THREE_BYTE_SSE5_0F{24,25,7A,7B}): New macros for initializing 3
	byte opcode support for SSE5 instructions.
	(OPC_EXT_{44,45}): Remove OPC_EXT_$$ and rename OPC_EXT_45 
	to OPC_EXT_44.
	(dis386_twobyte): Add SSE5 24/25/7a/7b support.
	(three_byte_table): Add rows for describing SSE5 instructions.

	* Makefile.am (i386-dis.lo): Add $(INCDIR)/libiberty.h.

	* Makefile.in (i386-dis.lo): Add $(INCDIR)/libiberty.h.

[-- Attachment #3: binutils-sse5.patch.gz --]
[-- Type: application/x-gzip, Size: 70193 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-13 20:11 ` rajagopal, dwarak
@ 2007-09-13 20:30   ` H.J. Lu
  2007-09-13 22:31     ` rajagopal, dwarak
  2007-09-13 20:38   ` H.J. Lu
  1 sibling, 1 reply; 17+ messages in thread
From: H.J. Lu @ 2007-09-13 20:30 UTC (permalink / raw)
  To: rajagopal, dwarak; +Cc: binutils, Meissner, Michael, Harle, Christophe

On Thu, Sep 13, 2007 at 03:08:54PM -0500, rajagopal, dwarak wrote:
> The enclosed patch adds support for the SSE5 instructions to the
> assembler and disassembler. 
> 
> I have made changes to the original patch so that it uses bitfields (the
> new infrastructure changes which H.J had checked in last week) for
> cpu_flag, opcode_modifier and operand_types for the new instructions. 
> 
> Is this ok?
> 

Here are my comments on assembler. I will take a look at disassembler
later.


H.J.
---
diff -purwN --exclude=autom4te.cache --exclude=po --exclude=intl src/binutils/NEWS src-sse5/binutils/NEWS
--- src/binutils/NEWS	2007-08-30 08:47:35.000000000 -0500
+      assert (i.imm_operands == 0
+	      && (i.operands <= 2
+		  || ((i.tm.cpu_flags.bitfield.cpusse5) != 0
+		      && i.operands <= 3)));

Please use

|| (i.tm.cpu_flags.bitfield.cpusse5
    && i.operands <= 3)
 
       exp = &im_expressions[i.imm_operands++];
       i.op[i.operands].imms = exp;
@@ -2338,7 +2349,14 @@ md_assemble (line)
 	}
     }
 
-  if (i.rex != 0)
+  /* If the instruction has the DREX attribute (aka SSE5), don't emit a
+     REX prefix.  */
+  if ((i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexc) != 0)

Please use

if (i.tm.opcode_modifier.drex || i.tm.opcode_modifier.drexc)

+      /* Case 2: 4 operand insn, dest = src1, src3 = memory.  */
+      else if (i.types[0].bitfield.regxmm != 0
+	       && i.types[1].bitfield.regxmm != 0
+	       && (i.types[2].bitfield.regxmm | operand_type_check (i.types[2], anymem)) != 0

Did you mean

(i.types[2].bitfield.regxmm
|| operand_type_check (i.types[2], anymem))

+      /* Case 3: 4 operand insn, dest = src1, src2 = memory.  */
+      else if (i.types[0].bitfield.regxmm != 0
+	       && (operand_type_check (i.types[1], anymem)) != 0

Need for () on operand_type_check.

+      /* Case 5: 4 operand insn, dest = src3, src2 = memory.  */
+      else if (i.types[0].bitfield.regxmm != 0
+	       && (i.types[1].bitfield.regxmm | operand_type_check (i.types[1], anymem)) != 0

See above

+  else if ((i.tm.opcode_modifier.drex) && !(i.tm.opcode_modifier.drexv) 

No need for ().

+	   && i.operands == 4)
+    {
+      /* Case 1: 4 operand insn, dest = src1, src3 = reg/mem.  */
+      if ((i.types[0].bitfield.regxmm) != 0
+	  && (i.types[1].bitfield.regxmm | operand_type_check(i.types[1], anymem)) != 0

See above.

+	  && i.types[2].bitfield.regxmm != 0
+	  && i.types[3].bitfield.regxmm != 0
+	  && i.op[0].regs->reg_num == i.op[3].regs->reg_num
+	  && i.op[0].regs->reg_flags == i.op[3].regs->reg_flags)
+	{
+	  /* clear the arguments that are stored in drex */
+	  UINTS_CLEAR (i.types[0]);
+	  UINTS_CLEAR (i.types[3]);
+	  i.reg_operands -= 2;
+
+	  /* Specify the modrm encoding and remember the register including the
+	     high bit normally stored in the REX byte.  */
+	  i.drex.modrm_reg = 2;
+	  i.drex.modrm_regmem = 1;
+	  i.drex.reg = (i.op[3].regs->reg_num
+			+ ((i.op[3].regs->reg_flags & RegRex) ? 8 : 0));
+	}
+
+      else
+	as_bad (_("Incorrect operands for the '%s' instruction"), i.tm.name);
+    }
+
+  /* SSE5 3 operand instructions that the result is a register, being either
+     operand can be a memory operand, using OC0 to note which one is the
+     memory.  */
+  else if (i.tm.opcode_modifier.drex && i.tm.opcode_modifier.drexv

Use a separate line for i.tm.opcode_modifier.drexv

+  else if ((i.tm.opcode_modifier.drexc) != 0 && i.operands == 4)

No need for ().

+    {
+      /* Case 1: 4 operand insn, src1 = reg/memory. */
+      if (operand_type_check (i.types[0], imm) != 0
+	  && (i.types[1].bitfield.regxmm 
+	      | operand_type_check (i.types[1], anymem)) != 0

See above.

+	 instruction.  */
+      else if ((i.types[0].bitfield.regxmm
+		| operand_type_check (i.types[0], anymem)) != 0

See above

   const seg_entry *default_seg = 0;
 
+  /* Handle all of the DREX munging that SSE5 needs.  */
+  if (i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv 
+      | i.tm.opcode_modifier.drexc)

Use || and one line for each condition.

+     GAS as if this is a 2 operand instruction.  */
+  if ((i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv 
+       | i.tm.opcode_modifier.drexc) != 0 && i.reg_operands == 2)

See above

 
+ 	  /* This has been precalculated for SSE5 instructions that have a DREX
+ 	     field earlier in process_drex.  */
+ 	  if ((i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv 
+	       | i.tm.opcode_modifier.drexc) != 0)

See above.

 	  unsigned int op;
 
+	  	  /* This has been precalculated for SSE5 instructions that have a DREX

This line is too lone.

+	     field earlier in process_drex.  */
+	  if ((i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv 
+	       | i.tm.opcode_modifier.drexc) != 0)
+	    {

See above.

-      if (i.tm.extension_opcode != None)
+      if (i.tm.extension_opcode != None
+	  && (i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv 
+	      | i.tm.opcode_modifier.drexc) == 0)

See above.

 	i.rm.reg = i.tm.extension_opcode;
     }
   return default_seg;
@@ -4569,10 +4947,11 @@ output_insn (void)
       int opc_3b;
 
       /* All opcodes on i386 have either 1 or 2 bytes.  SSSE3 and
-	 SSE4 instructions have 3 bytes.  We may use one more higher
+	 SSE4 and SSE5 instructions have 3 bytes.  We may use one more higher

This line is too long.

+	     multiple formats.  */
+	  if (i.tm.opcode_modifier.drex && i.tm.opcode_modifier.drexv 

One line for each condition.
 
+      /* Write the DREX byte if needed.  */
+      if (i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexc)

See above.

+	  /* Encode the OC0 bit if this encoding has multiple formats.  */
+	  if ((i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv) 
+	      && DREX_OC0 (i.tm.extension_opcode))

See above

+	    *p |= DREX_OC0_MASK;
+	}
+

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-13 20:11 ` rajagopal, dwarak
  2007-09-13 20:30   ` H.J. Lu
@ 2007-09-13 20:38   ` H.J. Lu
  2007-09-13 20:43     ` H.J. Lu
                       ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: H.J. Lu @ 2007-09-13 20:38 UTC (permalink / raw)
  To: rajagopal, dwarak; +Cc: binutils, Meissner, Michael, Harle, Christophe

On Thu, Sep 13, 2007 at 03:08:54PM -0500, rajagopal, dwarak wrote:
> The enclosed patch adds support for the SSE5 instructions to the
> assembler and disassembler. 
> 
> I have made changes to the original patch so that it uses bitfields (the
> new infrastructure changes which H.J had checked in last week) for
> cpu_flag, opcode_modifier and operand_types for the new instructions. 
> 

Did SSE5 reuse the same opcode for move test registers? If not,
please don't remove OPC_EXT_45. You can use

{
  /* OPC_EXT_45 */
  { THREE_BYTE_SSE5_0F7A }
  { "movL",          { Td, Rd } },
}


H.J.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-13 20:38   ` H.J. Lu
@ 2007-09-13 20:43     ` H.J. Lu
  2007-09-13 20:44     ` rajagopal, dwarak
  2007-09-13 20:45     ` Meissner, Michael
  2 siblings, 0 replies; 17+ messages in thread
From: H.J. Lu @ 2007-09-13 20:43 UTC (permalink / raw)
  To: rajagopal, dwarak; +Cc: binutils, Meissner, Michael, Harle, Christophe

On Thu, Sep 13, 2007 at 01:38:21PM -0700, H.J. Lu wrote:
> On Thu, Sep 13, 2007 at 03:08:54PM -0500, rajagopal, dwarak wrote:
> > The enclosed patch adds support for the SSE5 instructions to the
> > assembler and disassembler. 
> > 
> > I have made changes to the original patch so that it uses bitfields (the
> > new infrastructure changes which H.J had checked in last week) for
> > cpu_flag, opcode_modifier and operand_types for the new instructions. 
> > 
> 
> Did SSE5 reuse the same opcode for move test registers? If not,
> please don't remove OPC_EXT_45. You can use
> 
> {
>   /* OPC_EXT_45 */
>   { THREE_BYTE_SSE5_0F7A }
>   { "movL",          { Td, Rd } },
> }
> 

BTW, please put the patch inline and don't include the generated
files. It is easier to review.

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-13 20:38   ` H.J. Lu
  2007-09-13 20:43     ` H.J. Lu
@ 2007-09-13 20:44     ` rajagopal, dwarak
  2007-09-13 20:56       ` H.J. Lu
  2007-09-13 20:45     ` Meissner, Michael
  2 siblings, 1 reply; 17+ messages in thread
From: rajagopal, dwarak @ 2007-09-13 20:44 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Meissner, Michael, Harle, Christophe

Yes SSE5 uses the same opcode (0f 24) for move test registers. The only
way to disambiguate is that the 3rd opcode does not have upper 2 bits
set. 

It is better to do this test in print_insn() than in get_valid_dis386()
because the only way to disambiguate this case is by using the 3rd
opcode and not modrm bits. 
That is why I removed OPC_EXT_45.

Also, thanks for the comments on the assembler. I'll make the
appropriate changes to the patch.

Thanks,
- Dwarak


> -----Original Message-----
> From: H.J. Lu [mailto:hjl@lucon.org]
> Sent: Thursday, September 13, 2007 3:38 PM
> To: rajagopal, dwarak
> Cc: binutils@sourceware.org; Meissner, Michael; Harle, Christophe
> Subject: Re: [PATCH] Add x86 SSE5 instructions to the GNU binary
utilities
> 
> On Thu, Sep 13, 2007 at 03:08:54PM -0500, rajagopal, dwarak wrote:
> > The enclosed patch adds support for the SSE5 instructions to the
> > assembler and disassembler.
> >
> > I have made changes to the original patch so that it uses bitfields
(the
> > new infrastructure changes which H.J had checked in last week) for
> > cpu_flag, opcode_modifier and operand_types for the new
instructions.
> >
> 
> Did SSE5 reuse the same opcode for move test registers? If not,
> please don't remove OPC_EXT_45. You can use
> 
> {
>   /* OPC_EXT_45 */
>   { THREE_BYTE_SSE5_0F7A }
>   { "movL",          { Td, Rd } },
> }
> 
> 
> H.J.
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-13 20:38   ` H.J. Lu
  2007-09-13 20:43     ` H.J. Lu
  2007-09-13 20:44     ` rajagopal, dwarak
@ 2007-09-13 20:45     ` Meissner, Michael
  2 siblings, 0 replies; 17+ messages in thread
From: Meissner, Michael @ 2007-09-13 20:45 UTC (permalink / raw)
  To: H.J. Lu, rajagopal, dwarak; +Cc: binutils, Harle, Christophe

The problem is move test register is/was:
	0f 24 modrm

Where the top 2 bits of modrm are 11 to encode register/register move
(move test was only defined for registers).  The SSE5 4 argument
instructions use:
	0f 24 opcode

There are no SSE5 instructions with the 2 bits sets in the 0f 24 opcode
field.  You either have to duplicate a lot of 3 byte or MODRM if you do
it in a table fashion like you suggest.  It is much simpler to just do
the test, and reuse the rest of the dissembler for modrm (in the case of
move test) or 3 byte opcode (in the case of SSE5).

--
Michael Meissner
AMD, MS 83-29
90 Central Street
Boxborough, MA 01719

> -----Original Message-----
> From: H.J. Lu [mailto:hjl@lucon.org]
> Sent: Thursday, September 13, 2007 4:38 PM
> To: rajagopal, dwarak
> Cc: binutils@sourceware.org; Meissner, Michael; Harle, Christophe
> Subject: Re: [PATCH] Add x86 SSE5 instructions to the GNU binary
utilities
> 
> On Thu, Sep 13, 2007 at 03:08:54PM -0500, rajagopal, dwarak wrote:
> > The enclosed patch adds support for the SSE5 instructions to the
> > assembler and disassembler.
> >
> > I have made changes to the original patch so that it uses bitfields
(the
> > new infrastructure changes which H.J had checked in last week) for
> > cpu_flag, opcode_modifier and operand_types for the new
instructions.
> >
> 
> Did SSE5 reuse the same opcode for move test registers? If not,
> please don't remove OPC_EXT_45. You can use
> 
> {
>   /* OPC_EXT_45 */
>   { THREE_BYTE_SSE5_0F7A }
>   { "movL",          { Td, Rd } },
> }
> 
> 
> H.J.
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-13 20:44     ` rajagopal, dwarak
@ 2007-09-13 20:56       ` H.J. Lu
  2007-09-13 21:52         ` rajagopal, dwarak
  0 siblings, 1 reply; 17+ messages in thread
From: H.J. Lu @ 2007-09-13 20:56 UTC (permalink / raw)
  To: rajagopal, dwarak; +Cc: binutils, Meissner, Michael, Harle, Christophe

On Thu, Sep 13, 2007 at 03:44:19PM -0500, rajagopal, dwarak wrote:
> Yes SSE5 uses the same opcode (0f 24) for move test registers. The only
> way to disambiguate is that the 3rd opcode does not have upper 2 bits
> set. 
> 
> It is better to do this test in print_insn() than in get_valid_dis386()
> because the only way to disambiguate this case is by using the 3rd
> opcode and not modrm bits. 
> That is why I removed OPC_EXT_45.
> 

OPC_EXT_45 is

  {
    /* OPC_EXT_45 */
    { "(bad)",          { XX } }, 
    { "movL",           { Td, Rd } }, 
  },

    { "(bad)",          { XX } }, 

is for instructions whose 3rd opcode has upper 2 bits cleared.
You just replace it with a table. Everything should work.


H.J.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-13 20:56       ` H.J. Lu
@ 2007-09-13 21:52         ` rajagopal, dwarak
  2007-09-13 22:44           ` H.J. Lu
  0 siblings, 1 reply; 17+ messages in thread
From: rajagopal, dwarak @ 2007-09-13 21:52 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Meissner, Michael, Harle, Christophe

I tried this and it does not work.

  {
    /* OPC_EXT_45 */
    { THREE_BYTE_SSE5_0F24 },
    { "movL",		{ Td, Rd } },
  },

In addition to adding a case in get_valid_dis386() for handling
IS_3BYTE_OPCODE, it does not work.

To make it work I have to duplicate the modrm bit calculations which is
done for IS_3BYTE_OPCODE again for THREE_BYTE_SSE5_0F24 in print_insn.
Also I would need to pass an extra argument with the third opcode to
get_valid_dis386 to make it work so that it can index the
THREE_BYTE_SSE5_0F24 table correctly.

I felt it was easier and better to deal with this disambiguation in
print_insn() itself (as it is just 5 lines of code).

-Dwarak



> -----Original Message-----
> From: H.J. Lu [mailto:hjl@lucon.org]
> Sent: Thursday, September 13, 2007 3:56 PM
> To: rajagopal, dwarak
> Cc: binutils@sourceware.org; Meissner, Michael; Harle, Christophe
> Subject: Re: [PATCH] Add x86 SSE5 instructions to the GNU binary
utilities
> 
> On Thu, Sep 13, 2007 at 03:44:19PM -0500, rajagopal, dwarak wrote:
> > Yes SSE5 uses the same opcode (0f 24) for move test registers. The
only
> > way to disambiguate is that the 3rd opcode does not have upper 2
bits
> > set.
> >
> > It is better to do this test in print_insn() than in
get_valid_dis386()
> > because the only way to disambiguate this case is by using the 3rd
> > opcode and not modrm bits.
> > That is why I removed OPC_EXT_45.
> >
> 
> OPC_EXT_45 is
> 
>   {
>     /* OPC_EXT_45 */
>     { "(bad)",          { XX } },
>     { "movL",           { Td, Rd } },
>   },
> 
>     { "(bad)",          { XX } },
> 
> is for instructions whose 3rd opcode has upper 2 bits cleared.
> You just replace it with a table. Everything should work.
> 
> 
> H.J.
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-13 20:30   ` H.J. Lu
@ 2007-09-13 22:31     ` rajagopal, dwarak
  2007-09-13 22:47       ` H.J. Lu
  0 siblings, 1 reply; 17+ messages in thread
From: rajagopal, dwarak @ 2007-09-13 22:31 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Meissner, Michael, Harle, Christophe

[-- Attachment #1: Type: text/plain, Size: 6906 bytes --]

The patch does not contain the generated files (i386-init.h and
i386-tbl.h). I have also fixed the coding typos you mentioned in the
assembler.

Is this ok?

Thanks,
- Dwarak

> -----Original Message-----
> From: H.J. Lu [mailto:hjl@lucon.org]
> Sent: Thursday, September 13, 2007 3:30 PM
> To: rajagopal, dwarak
> Cc: binutils@sourceware.org; Meissner, Michael; Harle, Christophe
> Subject: Re: [PATCH] Add x86 SSE5 instructions to the GNU binary
utilities
> 
> On Thu, Sep 13, 2007 at 03:08:54PM -0500, rajagopal, dwarak wrote:
> > The enclosed patch adds support for the SSE5 instructions to the
> > assembler and disassembler.
> >
> > I have made changes to the original patch so that it uses bitfields
(the
> > new infrastructure changes which H.J had checked in last week) for
> > cpu_flag, opcode_modifier and operand_types for the new
instructions.
> >
> > Is this ok?
> >
> 
> Here are my comments on assembler. I will take a look at disassembler
> later.
> 
> 
> H.J.
> ---
> diff -purwN --exclude=autom4te.cache --exclude=po --exclude=intl
> src/binutils/NEWS src-sse5/binutils/NEWS
> --- src/binutils/NEWS	2007-08-30 08:47:35.000000000 -0500
> +      assert (i.imm_operands == 0
> +	      && (i.operands <= 2
> +		  || ((i.tm.cpu_flags.bitfield.cpusse5) != 0
> +		      && i.operands <= 3)));
> 
> Please use
> 
> || (i.tm.cpu_flags.bitfield.cpusse5
>     && i.operands <= 3)
> 
>        exp = &im_expressions[i.imm_operands++];
>        i.op[i.operands].imms = exp;
> @@ -2338,7 +2349,14 @@ md_assemble (line)
>  	}
>      }
> 
> -  if (i.rex != 0)
> +  /* If the instruction has the DREX attribute (aka SSE5), don't emit
a
> +     REX prefix.  */
> +  if ((i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexc) != 0)
> 
> Please use
> 
> if (i.tm.opcode_modifier.drex || i.tm.opcode_modifier.drexc)
> 
> +      /* Case 2: 4 operand insn, dest = src1, src3 = memory.  */
> +      else if (i.types[0].bitfield.regxmm != 0
> +	       && i.types[1].bitfield.regxmm != 0
> +	       && (i.types[2].bitfield.regxmm | operand_type_check
> (i.types[2], anymem)) != 0
> 
> Did you mean
> 
> (i.types[2].bitfield.regxmm
> || operand_type_check (i.types[2], anymem))
> 
> +      /* Case 3: 4 operand insn, dest = src1, src2 = memory.  */
> +      else if (i.types[0].bitfield.regxmm != 0
> +	       && (operand_type_check (i.types[1], anymem)) != 0
> 
> Need for () on operand_type_check.
> 
> +      /* Case 5: 4 operand insn, dest = src3, src2 = memory.  */
> +      else if (i.types[0].bitfield.regxmm != 0
> +	       && (i.types[1].bitfield.regxmm | operand_type_check
> (i.types[1], anymem)) != 0
> 
> See above
> 
> +  else if ((i.tm.opcode_modifier.drex) &&
!(i.tm.opcode_modifier.drexv)
> 
> No need for ().
> 
> +	   && i.operands == 4)
> +    {
> +      /* Case 1: 4 operand insn, dest = src1, src3 = reg/mem.  */
> +      if ((i.types[0].bitfield.regxmm) != 0
> +	  && (i.types[1].bitfield.regxmm |
operand_type_check(i.types[1],
> anymem)) != 0
> 
> See above.
> 
> +	  && i.types[2].bitfield.regxmm != 0
> +	  && i.types[3].bitfield.regxmm != 0
> +	  && i.op[0].regs->reg_num == i.op[3].regs->reg_num
> +	  && i.op[0].regs->reg_flags == i.op[3].regs->reg_flags)
> +	{
> +	  /* clear the arguments that are stored in drex */
> +	  UINTS_CLEAR (i.types[0]);
> +	  UINTS_CLEAR (i.types[3]);
> +	  i.reg_operands -= 2;
> +
> +	  /* Specify the modrm encoding and remember the register
including
> the
> +	     high bit normally stored in the REX byte.  */
> +	  i.drex.modrm_reg = 2;
> +	  i.drex.modrm_regmem = 1;
> +	  i.drex.reg = (i.op[3].regs->reg_num
> +			+ ((i.op[3].regs->reg_flags & RegRex) ? 8 : 0));
> +	}
> +
> +      else
> +	as_bad (_("Incorrect operands for the '%s' instruction"),
> i.tm.name);
> +    }
> +
> +  /* SSE5 3 operand instructions that the result is a register, being
> either
> +     operand can be a memory operand, using OC0 to note which one is
the
> +     memory.  */
> +  else if (i.tm.opcode_modifier.drex && i.tm.opcode_modifier.drexv
> 
> Use a separate line for i.tm.opcode_modifier.drexv
> 
> +  else if ((i.tm.opcode_modifier.drexc) != 0 && i.operands == 4)
> 
> No need for ().
> 
> +    {
> +      /* Case 1: 4 operand insn, src1 = reg/memory. */
> +      if (operand_type_check (i.types[0], imm) != 0
> +	  && (i.types[1].bitfield.regxmm
> +	      | operand_type_check (i.types[1], anymem)) != 0
> 
> See above.
> 
> +	 instruction.  */
> +      else if ((i.types[0].bitfield.regxmm
> +		| operand_type_check (i.types[0], anymem)) != 0
> 
> See above
> 
>    const seg_entry *default_seg = 0;
> 
> +  /* Handle all of the DREX munging that SSE5 needs.  */
> +  if (i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv
> +      | i.tm.opcode_modifier.drexc)
> 
> Use || and one line for each condition.
> 
> +     GAS as if this is a 2 operand instruction.  */
> +  if ((i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv
> +       | i.tm.opcode_modifier.drexc) != 0 && i.reg_operands == 2)
> 
> See above
> 
> 
> + 	  /* This has been precalculated for SSE5 instructions that have
a
> DREX
> + 	     field earlier in process_drex.  */
> + 	  if ((i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv
> +	       | i.tm.opcode_modifier.drexc) != 0)
> 
> See above.
> 
>  	  unsigned int op;
> 
> +	  	  /* This has been precalculated for SSE5 instructions
that
> have a DREX
> 
> This line is too lone.
> 
> +	     field earlier in process_drex.  */
> +	  if ((i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv
> +	       | i.tm.opcode_modifier.drexc) != 0)
> +	    {
> 
> See above.
> 
> -      if (i.tm.extension_opcode != None)
> +      if (i.tm.extension_opcode != None
> +	  && (i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv
> +	      | i.tm.opcode_modifier.drexc) == 0)
> 
> See above.
> 
>  	i.rm.reg = i.tm.extension_opcode;
>      }
>    return default_seg;
> @@ -4569,10 +4947,11 @@ output_insn (void)
>        int opc_3b;
> 
>        /* All opcodes on i386 have either 1 or 2 bytes.  SSSE3 and
> -	 SSE4 instructions have 3 bytes.  We may use one more higher
> +	 SSE4 and SSE5 instructions have 3 bytes.  We may use one more
> higher
> 
> This line is too long.
> 
> +	     multiple formats.  */
> +	  if (i.tm.opcode_modifier.drex && i.tm.opcode_modifier.drexv
> 
> One line for each condition.
> 
> +      /* Write the DREX byte if needed.  */
> +      if (i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexc)
> 
> See above.
> 
> +	  /* Encode the OC0 bit if this encoding has multiple formats.
*/
> +	  if ((i.tm.opcode_modifier.drex | i.tm.opcode_modifier.drexv)
> +	      && DREX_OC0 (i.tm.extension_opcode))
> 
> See above
> 
> +	    *p |= DREX_OC0_MASK;
> +	}
> +
> 


[-- Attachment #2: Changelog --]
[-- Type: application/octet-stream, Size: 6468 bytes --]

<binutls changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>

	* NEWS: Add note about SSE5 being added on i386.

<gas changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* NEWS: Add SSE5 support to NEWS file.
	
	* config/tc-i386.h (drex_byte): Add fields to allow process_drex
	and build_modrm_byte to communicate.
	(DREX_OC0): New SSE5 macro.
	(DREX_OC0_MASK): Ditto.
	(DREX_OC1): Ditto.
	(DREX_OC1_MASK): Ditto.
	(DREX_XMEM_X1_X2_X2): Ditto.
	(DREX_X1_XMEM_X2_X2): Ditto.
	(DREX_X1_XMEM_X2_X1: Ditto.
	(DREX_X1_X2_XMEM_X1: Ditto.
	(DREX_XMEM_X1_X2): Ditto.
	(DREX_X1_XMEM_X2): Ditto.
	(drex_byte): New structure to describe the DREX byte.

	* config/tc-i386.c (process_drex): New function to handle SSE5
	DREX bits.
	(build_modrm_byte): Use the information cached away in
	process_drex in the case of DREX instructions.
	(i386_insn): Add drex field.
	(pi): Add debugging of drex field.
	(md_assemble): Treat SSE5 like SSE3 in instructions with an
	immediate byte.  Move REX field to DREX if this is a DREX
	instruction.
	(process_operands): Add SSE5 support.
	(build_modrm_byte): Ditto.
	(output_insn): Ditto.
	(cpu_arch): Ditto.
	(i386_align_code): Ditto.

<gas/testsuite changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* gas/i386/i386.exp (x86-64-sse5): Run SSE5 64-bit tests.
	(sse5): Do not currently run 32-bit  tests.

	* gas/i386/x86-64-sse5.s: New file for SSE5 support.
	* gas/i386/x86-64-sse5.d: Ditto.

<opcodes changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* i386-opc.h (CpuSSE5):  New macro.
	(i386_cpu_flags): Add Drex, Drexv and Drexc.

	* i386-gen.c (cpu_flag_init): Add CPU_SSE5_FLAGS.
	(operand_type_init): Add CpuSSE5.
	(opcode_modifiers): Add Drex, Drexv and Drexc.
	(i386_opcode_modifier): Ditto.

	* i386-opc.tbl (fmaddps,fmaddpd,fmaddss,fmaddsd): Define SSE5
	instructions here.
	(fmsubps,fmsubpd,fmsubss,fmsubsd): Ditto.
	(fnmaddps,fnmaddpd,fnmaddss,fnmaddsd): Ditto.
	(fnmsubps,fnmsubpd,fnmsubss,fnmsubsd): Ditto.
	(pmacssww,pmacsww,pmacsswd,pmacswd): Ditto.
	(pmacssdd,pmacsdd,pmacssdql,pmacssdqh): Ditto.
	(pmacsdql,pmacsdqh,pmadcsswd,pmadcswd): Ditto.
	(phaddbw,phaddbd,phaddbq,phaddwd): Ditto.
	(phaddwq,phadddq,phaddubw,phaddubd): Ditto.
	(phaddubq,phadduwd,phadduwq,phaddudq): Ditto.
	(phsubbw,phsubwd,phsubdq): Ditto.
	(pcmov,pperm,permps,permpd): Ditto.
	(protb,protw,protd,protq): Ditto.
	(pshlb,pshlw,pshld,pshlq): Ditto.
	(pshab,pshaw,pshad,pshaq): Ditto.
	(comps,comeqps,comltps,comungeps,comleps,comungtps): Ditto.
	(comunordps,comneps,comneqps,comnltps,comugeps): Ditto.
	(comnleps,comugtps,comordps,comueqps,comultps): Ditto.
	(comngeps,comuleps,comngtps,comfalseps,comuneps): Ditto.
	(comuneqps,comunltps,comgeps,comunleps,comgtps,comtrueps): Ditto.
	(compd,comeqpd,comltpd,comungepd,comlepd,comungtpd,comunordpd): Ditto.
	(comnepd,comneqpd,comnltpd,comugepd,comnlepd,comugtpd): Ditto.
	(comordpd,comueqpd,comultpd,comngepd,comulepd,comngtpd): Ditto.
	(comfalsepd,comunepd,comuneqpd,comunltpd,comgepd): Ditto.
	(comunlepd,comgtpd,comtruepd): Ditto.
	(comss,comeqss,comltss,comungess,comless,comungtss,comunordss): Ditto.
	(comness,comneqss,comnltss,comugess,comnless,comugtss): Ditto.
	(comordss,comueqss,comultss,comngess,comuless,comngtss): Ditto.
	(comfalsess,comuness,comuneqss,comunltss,comgess): Ditto.
	(comunless,comgtss,comtruess): Ditto.
	(comsd,comeqsd,comltsd,comungesd,comlesd,comungtsd,comunordsd): Ditto.
	(comnesd,comneqsd,comnltsd,comugesd,comnlesd,comugtsd): Ditto.
	(comordsd,comueqsd,comultsd,comngesd,comulesd,comngtsd): Ditto.
	(comfalsesd,comunesd,comuneqsd,comunltsd,comgesd): Ditto.
	(comunlesd,comgtsd,comtruesd): Ditto.
	(pcomub,pcomltub,pcomleub,pcomgtub,pcomgeub,pcomequb): Ditto.
	(pcomnequb,pcomneub): Ditto.
	(pcomuw,pcomltuw,pcomleuw,pcomgtuw,pcomgeuw,pcomequw): Ditto.
	(pcomnequw,pcomneuw): Ditto.
	(pcomud,pcomltud,pcomleud,pcomgtud,pcomgeud,pcomequd): Ditto.
	(pcomnequd,pcomneud): Ditto.
	(pcomuq,pcomltuq,pcomleuq,pcomgtuq,pcomgeuq,pcomequq): Ditto.
	(pcomnequq,pcomneuq): Ditto.
	(pcomb,pcomltb,pcomleb,pcomgtb,pcomgeb,pcomeqb): Ditto.
	(pcomneqb,pcomneb): Ditto.
	(pcomw,pcomltw,pcomlew,pcomgtw,pcomgew,pcomeqw): Ditto.
	(pcomneqw,pcomnew): Ditto.
	(pcomd,pcomltd,pcomled,pcomgtd,pcomged,pcomeqd): Ditto.
	(pcomneqd,pcomned): Ditto.
	(pcomq,pcomltq,pcomleq,pcomgtq,pcomgeq): Ditto.
	(pcomeqq,pcomneqq,pcomneq): Ditto.
	(pcomtrueb, pcomtruew, pcomtrued, pcomtrueq): Ditto.
	(pcomtrueub, pcomtrueuw, pcomtrueud, pcomtrueuq): Ditto.
	(pcomfalseb, pcomfalsew, pcomfalsed, pcomfalseq): Ditto.
	(pcomfalseub, pcomfalseuw, pcomfalseud, pcomfalseuq): Ditto.
	(frczps,frczpd,frczss,frczsd): Ditto.
	(cvtph2ps,cvtps2ph): Ditto.

	* i386-tbl.h: Regenerate from i386-opc.tbl.

	* i386-dis.c (libiberty.h): Include to get ARRAY_SIZE.
	(dis386_move_test): New disassembly support for move from test
	register instruction that overlaps with SSE5 instructions.
	(print_insn): Add support for special casing the i386/i486 move
	from test register instruction that overlaps with the SSE5
	0x0f24 4 operand instructions.
	(OP_DREX_ICMP): New macros for SSE5 DREX handling.
	(OP_DREX_FCMP): Ditto.
	(OP_E_extended): Rename from OP_E, add additional argument to skip
	the DREX byte.
	(OP_E): Call OP_E_extended.
	(DREX_REG_MEMORY): New macros for drex handling.
	(DREX_REG_UNKNOWN): Ditto.
	(DREX4_OC1): Ditto.
	(DREX4_NO_OC0): Ditto.
	(DREX4_MASK): Ditto.
	(three_byte_table): Add SSE5 instructions.
	(print_drex_arg): New function to print a DREX register or memory
	reference.
	(OP_DREX4): New function for handling DREX 4 argument ops.
	(OP_DREX3): New function for handling DREX 3 argument ops.
	(twobyte_has_modrm): 0f{25,7a,7b} all use the modrm byte.
	(THREE_BYTE_SSE5_0F{24,25,7A,7B}): New macros for initializing 3
	byte opcode support for SSE5 instructions.
	(OPC_EXT_{44,45}): Remove OPC_EXT_$$ and rename OPC_EXT_45 
	to OPC_EXT_44.
	(dis386_twobyte): Add SSE5 24/25/7a/7b support.
	(three_byte_table): Add rows for describing SSE5 instructions.

	* Makefile.am (i386-dis.lo): Add $(INCDIR)/libiberty.h.

	* Makefile.in (i386-dis.lo): Add $(INCDIR)/libiberty.h.

[-- Attachment #3: binutils-sse5.patch.gz --]
[-- Type: application/x-gzip, Size: 48578 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-13 21:52         ` rajagopal, dwarak
@ 2007-09-13 22:44           ` H.J. Lu
  0 siblings, 0 replies; 17+ messages in thread
From: H.J. Lu @ 2007-09-13 22:44 UTC (permalink / raw)
  To: rajagopal, dwarak; +Cc: binutils, Meissner, Michael, Harle, Christophe

On Thu, Sep 13, 2007 at 04:51:35PM -0500, rajagopal, dwarak wrote:
> I tried this and it does not work.
> 
>   {
>     /* OPC_EXT_45 */
>     { THREE_BYTE_SSE5_0F24 },
>     { "movL",		{ Td, Rd } },
>   },
> 
> In addition to adding a case in get_valid_dis386() for handling
> IS_3BYTE_OPCODE, it does not work.
> 
> To make it work I have to duplicate the modrm bit calculations which is
> done for IS_3BYTE_OPCODE again for THREE_BYTE_SSE5_0F24 in print_insn.
> Also I would need to pass an extra argument with the third opcode to
> get_valid_dis386 to make it work so that it can index the
> THREE_BYTE_SSE5_0F24 table correctly.
> 

Will this patch work for you.


H.J.
----
2007-09-13  H.J. Lu  <hongjiu.lu@intel.com>

	* i386-dis.c (get_valid_dis386): Take a pointer to
	disassemble_info.  Handle IS_3BYTE_OPCODE.
	(print_insn): Updated.  Don't handle IS_3BYTE_OPCODE.

--- opcodes/i386-dis.c.3byte	2007-08-31 16:45:21.000000000 -0700
+++ opcodes/i386-dis.c	2007-09-13 15:34:00.000000000 -0700
@@ -3629,7 +3629,7 @@ with the -M switch (multiple options sho
 /* Get a pointer to struct dis386 with a valid name.  */
 
 static const struct dis386 *
-get_valid_dis386 (const struct dis386 *dp)
+get_valid_dis386 (const struct dis386 *dp, disassemble_info *info)
 {
   int index;
 
@@ -3678,6 +3678,15 @@ get_valid_dis386 (const struct dis386 *d
       dp = &x86_64_table[dp->op[1].bytemode][index];
       break;
 
+    case IS_3BYTE_OPCODE:
+      FETCH_DATA (info, codep + 2);
+      index = *codep++;
+      dp = &three_byte_table[dp->op[1].bytemode][index];
+      modrm.mod = (*codep >> 6) & 3;
+      modrm.reg = (*codep >> 3) & 7;
+      modrm.rm = *codep & 7;
+      break;
+
     case USE_OPC_EXT_TABLE:
       index = modrm.mod == 0x3 ? 1 : 0;
       dp = &opc_ext_table[dp->op[1].bytemode][index];
@@ -3696,7 +3705,7 @@ get_valid_dis386 (const struct dis386 *d
   if (dp->name != NULL)
     return dp;
   else
-    return get_valid_dis386 (dp);
+    return get_valid_dis386 (dp, info);
 }
 
 static int
@@ -3897,11 +3906,6 @@ print_insn (bfd_vma pc, disassemble_info
       dp = &dis386_twobyte[threebyte];
       need_modrm = twobyte_has_modrm[*codep];
       codep++;
-      if (dp->name == NULL && dp->op[0].bytemode == IS_3BYTE_OPCODE)
-	{
-	  FETCH_DATA (info, codep + 2);
-	  op = *codep++;
-	}
     }
   else
     {
@@ -3964,14 +3968,7 @@ print_insn (bfd_vma pc, disassemble_info
 	}
     }
 
-  if (dp->name == NULL && dp->op[0].bytemode == IS_3BYTE_OPCODE)
-    {
-      dp = &three_byte_table[dp->op[1].bytemode][op];
-      modrm.mod = (*codep >> 6) & 3;
-      modrm.reg = (*codep >> 3) & 7;
-      modrm.rm = *codep & 7;
-    }
-  else if (need_modrm)
+  if (need_modrm)
     {
       FETCH_DATA (info, codep + 1);
       modrm.mod = (*codep >> 6) & 3;
@@ -3985,7 +3982,7 @@ print_insn (bfd_vma pc, disassemble_info
     }
   else
     {
-      dp = get_valid_dis386 (dp);
+      dp = get_valid_dis386 (dp, info);
       if (dp != NULL && putop (dp->name, sizeflag) == 0)
         {
 	  for (i = 0; i < MAX_OPERANDS; ++i)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add x86 SSE5 instructions to the GNU binary utilities
  2007-09-13 22:31     ` rajagopal, dwarak
@ 2007-09-13 22:47       ` H.J. Lu
  2007-09-14  2:51         ` rajagopal, dwarak
  0 siblings, 1 reply; 17+ messages in thread
From: H.J. Lu @ 2007-09-13 22:47 UTC (permalink / raw)
  To: rajagopal, dwarak; +Cc: binutils, Meissner, Michael, Harle, Christophe

On Thu, Sep 13, 2007 at 05:30:33PM -0500, rajagopal, dwarak wrote:
> The patch does not contain the generated files (i386-init.h and
> i386-tbl.h). I have also fixed the coding typos you mentioned in the
> assembler.
> 
> Is this ok?
> 

Please try my OPC_EXT_45 suggestion on top of

http://sourceware.org/ml/binutils/2007-09/msg00172.html

When I wrote get_valid_dis386, I forgot to handle IS_3BYTE_OPCODE.
With this, we don't need to special case IS_3BYTE_OPCODE in
print_insn.

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-13 22:47       ` H.J. Lu
@ 2007-09-14  2:51         ` rajagopal, dwarak
  2007-09-14 15:00           ` H.J. Lu
  0 siblings, 1 reply; 17+ messages in thread
From: rajagopal, dwarak @ 2007-09-14  2:51 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Meissner, Michael, Harle, Christophe

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

Yes this fixes the issue. Thanks.
The enclosed patch now applies over your patch
(http://sourceware.org/ml/binutils/2007-09/msg00172.html).

Ok?

Thanks,
- Dwarak


> -----Original Message-----
> From: H.J. Lu [mailto:hjl@lucon.org]
> Sent: Thursday, September 13, 2007 5:47 PM
> To: rajagopal, dwarak
> Cc: binutils@sourceware.org; Meissner, Michael; Harle, Christophe
> Subject: Re: [PATCH] Add x86 SSE5 instructions to the GNU binary
utilities
> 
> On Thu, Sep 13, 2007 at 05:30:33PM -0500, rajagopal, dwarak wrote:
> > The patch does not contain the generated files (i386-init.h and
> > i386-tbl.h). I have also fixed the coding typos you mentioned in the
> > assembler.
> >
> > Is this ok?
> >
> 
> Please try my OPC_EXT_45 suggestion on top of
> 
> http://sourceware.org/ml/binutils/2007-09/msg00172.html
> 
> When I wrote get_valid_dis386, I forgot to handle IS_3BYTE_OPCODE.
> With this, we don't need to special case IS_3BYTE_OPCODE in
> print_insn.
> 
> Thanks.
> 
> 
> H.J.
> 


[-- Attachment #2: binutils-sse5.patch.gz --]
[-- Type: application/x-gzip, Size: 48013 bytes --]

[-- Attachment #3: Changelog --]
[-- Type: application/octet-stream, Size: 6391 bytes --]

<binutls changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>

	* NEWS: Add note about SSE5 being added on i386.

<gas changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* NEWS: Add SSE5 support to NEWS file.
	
	* config/tc-i386.h (drex_byte): Add fields to allow process_drex
	and build_modrm_byte to communicate.
	(DREX_OC0): New SSE5 macro.
	(DREX_OC0_MASK): Ditto.
	(DREX_OC1): Ditto.
	(DREX_OC1_MASK): Ditto.
	(DREX_XMEM_X1_X2_X2): Ditto.
	(DREX_X1_XMEM_X2_X2): Ditto.
	(DREX_X1_XMEM_X2_X1: Ditto.
	(DREX_X1_X2_XMEM_X1: Ditto.
	(DREX_XMEM_X1_X2): Ditto.
	(DREX_X1_XMEM_X2): Ditto.
	(drex_byte): New structure to describe the DREX byte.

	* config/tc-i386.c (process_drex): New function to handle SSE5
	DREX bits.
	(build_modrm_byte): Use the information cached away in
	process_drex in the case of DREX instructions.
	(i386_insn): Add drex field.
	(pi): Add debugging of drex field.
	(md_assemble): Treat SSE5 like SSE3 in instructions with an
	immediate byte.  Move REX field to DREX if this is a DREX
	instruction.
	(process_operands): Add SSE5 support.
	(build_modrm_byte): Ditto.
	(output_insn): Ditto.
	(cpu_arch): Ditto.
	(i386_align_code): Ditto.

<gas/testsuite changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* gas/i386/i386.exp (x86-64-sse5): Run SSE5 64-bit tests.
	(sse5): Do not currently run 32-bit  tests.

	* gas/i386/x86-64-sse5.s: New file for SSE5 support.
	* gas/i386/x86-64-sse5.d: Ditto.

<opcodes changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* i386-opc.h (CpuSSE5):  New macro.
	(i386_cpu_flags): Add Drex, Drexv and Drexc.

	* i386-gen.c (cpu_flag_init): Add CPU_SSE5_FLAGS.
	(operand_type_init): Add CpuSSE5.
	(opcode_modifiers): Add Drex, Drexv and Drexc.
	(i386_opcode_modifier): Ditto.

	* i386-opc.tbl (fmaddps,fmaddpd,fmaddss,fmaddsd): Define SSE5
	instructions here.
	(fmsubps,fmsubpd,fmsubss,fmsubsd): Ditto.
	(fnmaddps,fnmaddpd,fnmaddss,fnmaddsd): Ditto.
	(fnmsubps,fnmsubpd,fnmsubss,fnmsubsd): Ditto.
	(pmacssww,pmacsww,pmacsswd,pmacswd): Ditto.
	(pmacssdd,pmacsdd,pmacssdql,pmacssdqh): Ditto.
	(pmacsdql,pmacsdqh,pmadcsswd,pmadcswd): Ditto.
	(phaddbw,phaddbd,phaddbq,phaddwd): Ditto.
	(phaddwq,phadddq,phaddubw,phaddubd): Ditto.
	(phaddubq,phadduwd,phadduwq,phaddudq): Ditto.
	(phsubbw,phsubwd,phsubdq): Ditto.
	(pcmov,pperm,permps,permpd): Ditto.
	(protb,protw,protd,protq): Ditto.
	(pshlb,pshlw,pshld,pshlq): Ditto.
	(pshab,pshaw,pshad,pshaq): Ditto.
	(comps,comeqps,comltps,comungeps,comleps,comungtps): Ditto.
	(comunordps,comneps,comneqps,comnltps,comugeps): Ditto.
	(comnleps,comugtps,comordps,comueqps,comultps): Ditto.
	(comngeps,comuleps,comngtps,comfalseps,comuneps): Ditto.
	(comuneqps,comunltps,comgeps,comunleps,comgtps,comtrueps): Ditto.
	(compd,comeqpd,comltpd,comungepd,comlepd,comungtpd,comunordpd): Ditto.
	(comnepd,comneqpd,comnltpd,comugepd,comnlepd,comugtpd): Ditto.
	(comordpd,comueqpd,comultpd,comngepd,comulepd,comngtpd): Ditto.
	(comfalsepd,comunepd,comuneqpd,comunltpd,comgepd): Ditto.
	(comunlepd,comgtpd,comtruepd): Ditto.
	(comss,comeqss,comltss,comungess,comless,comungtss,comunordss): Ditto.
	(comness,comneqss,comnltss,comugess,comnless,comugtss): Ditto.
	(comordss,comueqss,comultss,comngess,comuless,comngtss): Ditto.
	(comfalsess,comuness,comuneqss,comunltss,comgess): Ditto.
	(comunless,comgtss,comtruess): Ditto.
	(comsd,comeqsd,comltsd,comungesd,comlesd,comungtsd,comunordsd): Ditto.
	(comnesd,comneqsd,comnltsd,comugesd,comnlesd,comugtsd): Ditto.
	(comordsd,comueqsd,comultsd,comngesd,comulesd,comngtsd): Ditto.
	(comfalsesd,comunesd,comuneqsd,comunltsd,comgesd): Ditto.
	(comunlesd,comgtsd,comtruesd): Ditto.
	(pcomub,pcomltub,pcomleub,pcomgtub,pcomgeub,pcomequb): Ditto.
	(pcomnequb,pcomneub): Ditto.
	(pcomuw,pcomltuw,pcomleuw,pcomgtuw,pcomgeuw,pcomequw): Ditto.
	(pcomnequw,pcomneuw): Ditto.
	(pcomud,pcomltud,pcomleud,pcomgtud,pcomgeud,pcomequd): Ditto.
	(pcomnequd,pcomneud): Ditto.
	(pcomuq,pcomltuq,pcomleuq,pcomgtuq,pcomgeuq,pcomequq): Ditto.
	(pcomnequq,pcomneuq): Ditto.
	(pcomb,pcomltb,pcomleb,pcomgtb,pcomgeb,pcomeqb): Ditto.
	(pcomneqb,pcomneb): Ditto.
	(pcomw,pcomltw,pcomlew,pcomgtw,pcomgew,pcomeqw): Ditto.
	(pcomneqw,pcomnew): Ditto.
	(pcomd,pcomltd,pcomled,pcomgtd,pcomged,pcomeqd): Ditto.
	(pcomneqd,pcomned): Ditto.
	(pcomq,pcomltq,pcomleq,pcomgtq,pcomgeq): Ditto.
	(pcomeqq,pcomneqq,pcomneq): Ditto.
	(pcomtrueb, pcomtruew, pcomtrued, pcomtrueq): Ditto.
	(pcomtrueub, pcomtrueuw, pcomtrueud, pcomtrueuq): Ditto.
	(pcomfalseb, pcomfalsew, pcomfalsed, pcomfalseq): Ditto.
	(pcomfalseub, pcomfalseuw, pcomfalseud, pcomfalseuq): Ditto.
	(frczps,frczpd,frczss,frczsd): Ditto.
	(cvtph2ps,cvtps2ph): Ditto.

	* i386-tbl.h: Regenerate from i386-opc.tbl.

	* i386-dis.c (libiberty.h): Include to get ARRAY_SIZE.
	(dis386_move_test): New disassembly support for move from test
	register instruction that overlaps with SSE5 instructions.
	(print_insn): Add support for special casing the i386/i486 move
	from test register instruction that overlaps with the SSE5
	0x0f24 4 operand instructions.
	(OP_DREX_ICMP): New macros for SSE5 DREX handling.
	(OP_DREX_FCMP): Ditto.
	(OP_E_extended): Rename from OP_E, add additional argument to skip
	the DREX byte.
	(OP_E): Call OP_E_extended.
	(DREX_REG_MEMORY): New macros for drex handling.
	(DREX_REG_UNKNOWN): Ditto.
	(DREX4_OC1): Ditto.
	(DREX4_NO_OC0): Ditto.
	(DREX4_MASK): Ditto.
	(three_byte_table): Add SSE5 instructions.
	(print_drex_arg): New function to print a DREX register or memory
	reference.
	(OP_DREX4): New function for handling DREX 4 argument ops.
	(OP_DREX3): New function for handling DREX 3 argument ops.
	(twobyte_has_modrm): 0f{25,7a,7b} all use the modrm byte.
	(THREE_BYTE_SSE5_0F{24,25,7A,7B}): New macros for initializing 3
	byte opcode support for SSE5 instructions.
	(dis386_twobyte): Add SSE5 24/25/7a/7b support.
	(three_byte_table): Add rows for describing SSE5 instructions.

	* Makefile.am (i386-dis.lo): Add $(INCDIR)/libiberty.h.

	* Makefile.in (i386-dis.lo): Add $(INCDIR)/libiberty.h.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-14  2:51         ` rajagopal, dwarak
@ 2007-09-14 15:00           ` H.J. Lu
  2007-09-14 16:00             ` rajagopal, dwarak
  0 siblings, 1 reply; 17+ messages in thread
From: H.J. Lu @ 2007-09-14 15:00 UTC (permalink / raw)
  To: rajagopal, dwarak; +Cc: binutils, Meissner, Michael, Harle, Christophe

On Thu, Sep 13, 2007 at 09:51:00PM -0500, rajagopal, dwarak wrote:
> Yes this fixes the issue. Thanks.
> The enclosed patch now applies over your patch
> (http://sourceware.org/ml/binutils/2007-09/msg00172.html).
> 

See my comments below.


H.J.
----
+  fprintf (stdout, "  drex:  reg %d rex 0x%x\n", x->drex.reg, x->drex.rex);

Line too long

+  /* If the instruction has the DREX attribute (aka SSE5), don't emit a
+     REX prefix.  */
+  if ((i.tm.opcode_modifier.drex || i.tm.opcode_modifier.drexc) != 0)

No need for () and != 0.

+  /* SSE5 4 operand instructions must have the destination the same as one of

Line too long

+     the inputs.  Figure out the destination register and cache it away in the

Line too long

+	       && i.types[1].bitfield.regxmm != 0
+	       && (i.types[2].bitfield.regxmm 
+		   || operand_type_check (i.types[2], anymem)) != 0

No need for != 0.

+	  /* Specify the modrm encoding for memory addressing.  Include the

Line too long

+	     high order bit that is normally stored in the REX byte in the

Line too long

+      /* Case 5: 4 operand insn, dest = src3, src2 = memory.  */
+      else if (i.types[0].bitfield.regxmm != 0
+	       && (i.types[1].bitfield.regxmm 
+		   || operand_type_check (i.types[1], anymem)) != 0

No need for != 0. Please remove != 0 from all of

(A || B) != 0

in your patch.

+
+	  /* Specify the modrm encoding and remember the register including the

Line too long

+	  /* Specify the modrm encoding and remember the register including the

Line too long. You have many long lines. Please make sure your change
is less than 72 columns.


+     encoded in the DREX byte. */
+  else if (i.tm.opcode_modifier.drex && !i.tm.opcode_modifier.drexv 
+	   && i.operands == 4)

One line for each condition. Please double check other places for
this.

+    {
+      /* Case 1: 4 operand insn, dest = src1, src3 = reg/mem.  */
+      if ((i.types[0].bitfield.regxmm) != 0

No need for ().

+	  && (i.types[1].bitfield.regxmm 
+	      || operand_type_check(i.types[1], anymem)) != 0

See above.

@@ -996,8 +1026,8 @@ static const struct dis386 dis386_twobyt
   { OPC_EXT_41 },
   { OPC_EXT_42 },
   { OPC_EXT_43 },
-  { OPC_EXT_44 },
-  { "(bad)",		{ XX } },
+  { OPC_EXT_44 }, /* also movL {Td, Rd} in 386/486 */

Don't change it.
Keep

    { OPC_EXT_44 },

+  { THREE_BYTE_SSE5_0F25 },

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-14 15:00           ` H.J. Lu
@ 2007-09-14 16:00             ` rajagopal, dwarak
  2007-09-14 16:32               ` H.J. Lu
  0 siblings, 1 reply; 17+ messages in thread
From: rajagopal, dwarak @ 2007-09-14 16:00 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Meissner, Michael, Harle, Christophe

[-- Attachment #1: Type: text/plain, Size: 3159 bytes --]

I thought the restriction was 80 columns instead of 72.
Nevertheless, I have fixed all my changes to 72 columns. Also, I have
fixed all other comments you pointed out.

Ok?

Thanks,
Dwarak

> -----Original Message-----
> From: H.J. Lu [mailto:hjl@lucon.org]
> Sent: Friday, September 14, 2007 8:50 AM
> To: rajagopal, dwarak
> Cc: binutils@sourceware.org; Meissner, Michael; Harle, Christophe
> Subject: Re: [PATCH] Add x86 SSE5 instructions to the GNU binary
utilities
> 
> On Thu, Sep 13, 2007 at 09:51:00PM -0500, rajagopal, dwarak wrote:
> > Yes this fixes the issue. Thanks.
> > The enclosed patch now applies over your patch
> > (http://sourceware.org/ml/binutils/2007-09/msg00172.html).
> >
> 
> See my comments below.
> 
> 
> H.J.
> ----
> +  fprintf (stdout, "  drex:  reg %d rex 0x%x\n", x->drex.reg, x-
> >drex.rex);
> 
> Line too long
> 
> +  /* If the instruction has the DREX attribute (aka SSE5), don't emit
a
> +     REX prefix.  */
> +  if ((i.tm.opcode_modifier.drex || i.tm.opcode_modifier.drexc) != 0)
> 
> No need for () and != 0.
> 
> +  /* SSE5 4 operand instructions must have the destination the same
as
> one of
> 
> Line too long
> 
> +     the inputs.  Figure out the destination register and cache it
away
> in the
> 
> Line too long
> 
> +	       && i.types[1].bitfield.regxmm != 0
> +	       && (i.types[2].bitfield.regxmm
> +		   || operand_type_check (i.types[2], anymem)) != 0
> 
> No need for != 0.
> 
> +	  /* Specify the modrm encoding for memory addressing.  Include
the
> 
> Line too long
> 
> +	     high order bit that is normally stored in the REX byte in
the
> 
> Line too long
> 
> +      /* Case 5: 4 operand insn, dest = src3, src2 = memory.  */
> +      else if (i.types[0].bitfield.regxmm != 0
> +	       && (i.types[1].bitfield.regxmm
> +		   || operand_type_check (i.types[1], anymem)) != 0
> 
> No need for != 0. Please remove != 0 from all of
> 
> (A || B) != 0
> 
> in your patch.
> 
> +
> +	  /* Specify the modrm encoding and remember the register
including
> the
> 
> Line too long
> 
> +	  /* Specify the modrm encoding and remember the register
including
> the
> 
> Line too long. You have many long lines. Please make sure your change
> is less than 72 columns.
> 
> 
> +     encoded in the DREX byte. */
> +  else if (i.tm.opcode_modifier.drex && !i.tm.opcode_modifier.drexv
> +	   && i.operands == 4)
> 
> One line for each condition. Please double check other places for
> this.
> 
> +    {
> +      /* Case 1: 4 operand insn, dest = src1, src3 = reg/mem.  */
> +      if ((i.types[0].bitfield.regxmm) != 0
> 
> No need for ().
> 
> +	  && (i.types[1].bitfield.regxmm
> +	      || operand_type_check(i.types[1], anymem)) != 0
> 
> See above.
> 
> @@ -996,8 +1026,8 @@ static const struct dis386 dis386_twobyt
>    { OPC_EXT_41 },
>    { OPC_EXT_42 },
>    { OPC_EXT_43 },
> -  { OPC_EXT_44 },
> -  { "(bad)",		{ XX } },
> +  { OPC_EXT_44 }, /* also movL {Td, Rd} in 386/486 */
> 
> Don't change it.
> Keep
> 
>     { OPC_EXT_44 },
> 
> +  { THREE_BYTE_SSE5_0F25 },
> 


[-- Attachment #2: Changelog --]
[-- Type: application/octet-stream, Size: 6391 bytes --]

<binutls changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>

	* NEWS: Add note about SSE5 being added on i386.

<gas changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* NEWS: Add SSE5 support to NEWS file.
	
	* config/tc-i386.h (drex_byte): Add fields to allow process_drex
	and build_modrm_byte to communicate.
	(DREX_OC0): New SSE5 macro.
	(DREX_OC0_MASK): Ditto.
	(DREX_OC1): Ditto.
	(DREX_OC1_MASK): Ditto.
	(DREX_XMEM_X1_X2_X2): Ditto.
	(DREX_X1_XMEM_X2_X2): Ditto.
	(DREX_X1_XMEM_X2_X1: Ditto.
	(DREX_X1_X2_XMEM_X1: Ditto.
	(DREX_XMEM_X1_X2): Ditto.
	(DREX_X1_XMEM_X2): Ditto.
	(drex_byte): New structure to describe the DREX byte.

	* config/tc-i386.c (process_drex): New function to handle SSE5
	DREX bits.
	(build_modrm_byte): Use the information cached away in
	process_drex in the case of DREX instructions.
	(i386_insn): Add drex field.
	(pi): Add debugging of drex field.
	(md_assemble): Treat SSE5 like SSE3 in instructions with an
	immediate byte.  Move REX field to DREX if this is a DREX
	instruction.
	(process_operands): Add SSE5 support.
	(build_modrm_byte): Ditto.
	(output_insn): Ditto.
	(cpu_arch): Ditto.
	(i386_align_code): Ditto.

<gas/testsuite changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* gas/i386/i386.exp (x86-64-sse5): Run SSE5 64-bit tests.
	(sse5): Do not currently run 32-bit  tests.

	* gas/i386/x86-64-sse5.s: New file for SSE5 support.
	* gas/i386/x86-64-sse5.d: Ditto.

<opcodes changes>
2007-08-31  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* i386-opc.h (CpuSSE5):  New macro.
	(i386_cpu_flags): Add Drex, Drexv and Drexc.

	* i386-gen.c (cpu_flag_init): Add CPU_SSE5_FLAGS.
	(operand_type_init): Add CpuSSE5.
	(opcode_modifiers): Add Drex, Drexv and Drexc.
	(i386_opcode_modifier): Ditto.

	* i386-opc.tbl (fmaddps,fmaddpd,fmaddss,fmaddsd): Define SSE5
	instructions here.
	(fmsubps,fmsubpd,fmsubss,fmsubsd): Ditto.
	(fnmaddps,fnmaddpd,fnmaddss,fnmaddsd): Ditto.
	(fnmsubps,fnmsubpd,fnmsubss,fnmsubsd): Ditto.
	(pmacssww,pmacsww,pmacsswd,pmacswd): Ditto.
	(pmacssdd,pmacsdd,pmacssdql,pmacssdqh): Ditto.
	(pmacsdql,pmacsdqh,pmadcsswd,pmadcswd): Ditto.
	(phaddbw,phaddbd,phaddbq,phaddwd): Ditto.
	(phaddwq,phadddq,phaddubw,phaddubd): Ditto.
	(phaddubq,phadduwd,phadduwq,phaddudq): Ditto.
	(phsubbw,phsubwd,phsubdq): Ditto.
	(pcmov,pperm,permps,permpd): Ditto.
	(protb,protw,protd,protq): Ditto.
	(pshlb,pshlw,pshld,pshlq): Ditto.
	(pshab,pshaw,pshad,pshaq): Ditto.
	(comps,comeqps,comltps,comungeps,comleps,comungtps): Ditto.
	(comunordps,comneps,comneqps,comnltps,comugeps): Ditto.
	(comnleps,comugtps,comordps,comueqps,comultps): Ditto.
	(comngeps,comuleps,comngtps,comfalseps,comuneps): Ditto.
	(comuneqps,comunltps,comgeps,comunleps,comgtps,comtrueps): Ditto.
	(compd,comeqpd,comltpd,comungepd,comlepd,comungtpd,comunordpd): Ditto.
	(comnepd,comneqpd,comnltpd,comugepd,comnlepd,comugtpd): Ditto.
	(comordpd,comueqpd,comultpd,comngepd,comulepd,comngtpd): Ditto.
	(comfalsepd,comunepd,comuneqpd,comunltpd,comgepd): Ditto.
	(comunlepd,comgtpd,comtruepd): Ditto.
	(comss,comeqss,comltss,comungess,comless,comungtss,comunordss): Ditto.
	(comness,comneqss,comnltss,comugess,comnless,comugtss): Ditto.
	(comordss,comueqss,comultss,comngess,comuless,comngtss): Ditto.
	(comfalsess,comuness,comuneqss,comunltss,comgess): Ditto.
	(comunless,comgtss,comtruess): Ditto.
	(comsd,comeqsd,comltsd,comungesd,comlesd,comungtsd,comunordsd): Ditto.
	(comnesd,comneqsd,comnltsd,comugesd,comnlesd,comugtsd): Ditto.
	(comordsd,comueqsd,comultsd,comngesd,comulesd,comngtsd): Ditto.
	(comfalsesd,comunesd,comuneqsd,comunltsd,comgesd): Ditto.
	(comunlesd,comgtsd,comtruesd): Ditto.
	(pcomub,pcomltub,pcomleub,pcomgtub,pcomgeub,pcomequb): Ditto.
	(pcomnequb,pcomneub): Ditto.
	(pcomuw,pcomltuw,pcomleuw,pcomgtuw,pcomgeuw,pcomequw): Ditto.
	(pcomnequw,pcomneuw): Ditto.
	(pcomud,pcomltud,pcomleud,pcomgtud,pcomgeud,pcomequd): Ditto.
	(pcomnequd,pcomneud): Ditto.
	(pcomuq,pcomltuq,pcomleuq,pcomgtuq,pcomgeuq,pcomequq): Ditto.
	(pcomnequq,pcomneuq): Ditto.
	(pcomb,pcomltb,pcomleb,pcomgtb,pcomgeb,pcomeqb): Ditto.
	(pcomneqb,pcomneb): Ditto.
	(pcomw,pcomltw,pcomlew,pcomgtw,pcomgew,pcomeqw): Ditto.
	(pcomneqw,pcomnew): Ditto.
	(pcomd,pcomltd,pcomled,pcomgtd,pcomged,pcomeqd): Ditto.
	(pcomneqd,pcomned): Ditto.
	(pcomq,pcomltq,pcomleq,pcomgtq,pcomgeq): Ditto.
	(pcomeqq,pcomneqq,pcomneq): Ditto.
	(pcomtrueb, pcomtruew, pcomtrued, pcomtrueq): Ditto.
	(pcomtrueub, pcomtrueuw, pcomtrueud, pcomtrueuq): Ditto.
	(pcomfalseb, pcomfalsew, pcomfalsed, pcomfalseq): Ditto.
	(pcomfalseub, pcomfalseuw, pcomfalseud, pcomfalseuq): Ditto.
	(frczps,frczpd,frczss,frczsd): Ditto.
	(cvtph2ps,cvtps2ph): Ditto.

	* i386-tbl.h: Regenerate from i386-opc.tbl.

	* i386-dis.c (libiberty.h): Include to get ARRAY_SIZE.
	(dis386_move_test): New disassembly support for move from test
	register instruction that overlaps with SSE5 instructions.
	(print_insn): Add support for special casing the i386/i486 move
	from test register instruction that overlaps with the SSE5
	0x0f24 4 operand instructions.
	(OP_DREX_ICMP): New macros for SSE5 DREX handling.
	(OP_DREX_FCMP): Ditto.
	(OP_E_extended): Rename from OP_E, add additional argument to skip
	the DREX byte.
	(OP_E): Call OP_E_extended.
	(DREX_REG_MEMORY): New macros for drex handling.
	(DREX_REG_UNKNOWN): Ditto.
	(DREX4_OC1): Ditto.
	(DREX4_NO_OC0): Ditto.
	(DREX4_MASK): Ditto.
	(three_byte_table): Add SSE5 instructions.
	(print_drex_arg): New function to print a DREX register or memory
	reference.
	(OP_DREX4): New function for handling DREX 4 argument ops.
	(OP_DREX3): New function for handling DREX 3 argument ops.
	(twobyte_has_modrm): 0f{25,7a,7b} all use the modrm byte.
	(THREE_BYTE_SSE5_0F{24,25,7A,7B}): New macros for initializing 3
	byte opcode support for SSE5 instructions.
	(dis386_twobyte): Add SSE5 24/25/7a/7b support.
	(three_byte_table): Add rows for describing SSE5 instructions.

	* Makefile.am (i386-dis.lo): Add $(INCDIR)/libiberty.h.

	* Makefile.in (i386-dis.lo): Add $(INCDIR)/libiberty.h.

[-- Attachment #3: binutils-sse5.patch.gz --]
[-- Type: application/x-gzip, Size: 48339 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add x86 SSE5 instructions to the GNU binary  utilities
  2007-09-14 16:00             ` rajagopal, dwarak
@ 2007-09-14 16:32               ` H.J. Lu
  0 siblings, 0 replies; 17+ messages in thread
From: H.J. Lu @ 2007-09-14 16:32 UTC (permalink / raw)
  To: rajagopal, dwarak; +Cc: binutils, Meissner, Michael, Harle, Christophe

On Fri, Sep 14, 2007 at 10:45:22AM -0500, rajagopal, dwarak wrote:
> I thought the restriction was 80 columns instead of 72.
> Nevertheless, I have fixed all my changes to 72 columns. Also, I have
> fixed all other comments you pointed out.
> 
> Ok?
> 

-      if (i.tm.extension_opcode != None)
+      if (i.tm.extension_opcode != None
+	  && (i.tm.opcode_modifier.drex 
+	      || i.tm.opcode_modifier.drexv 
+	      || i.tm.opcode_modifier.drexc) == 0)

Please change it to !() instead of == 0.  OK with this change.

Thanks.


H.J.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-09-14 16:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-31 21:46 [PATCH] Add x86 SSE5 instructions to the GNU binary utilities Michael Meissner
2007-09-07  2:27 ` Alan Modra
2007-09-13 20:11 ` rajagopal, dwarak
2007-09-13 20:30   ` H.J. Lu
2007-09-13 22:31     ` rajagopal, dwarak
2007-09-13 22:47       ` H.J. Lu
2007-09-14  2:51         ` rajagopal, dwarak
2007-09-14 15:00           ` H.J. Lu
2007-09-14 16:00             ` rajagopal, dwarak
2007-09-14 16:32               ` H.J. Lu
2007-09-13 20:38   ` H.J. Lu
2007-09-13 20:43     ` H.J. Lu
2007-09-13 20:44     ` rajagopal, dwarak
2007-09-13 20:56       ` H.J. Lu
2007-09-13 21:52         ` rajagopal, dwarak
2007-09-13 22:44           ` H.J. Lu
2007-09-13 20:45     ` Meissner, Michael

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).