From mboxrd@z Thu Jan 1 00:00:00 1970 From: Toshiyasu Morita To: law@cygnus.com Cc: egcs@cygnus.com Subject: Re: GCC 2.7.2.3 good, EGCS 1.0.3 bad for x86 subtract then test Date: Thu, 24 Dec 1998 01:11:00 -0000 Message-id: <199812240911.BAA06099@netcom8.netcom.com> References: <12174.914475308@hurl.cygnus.com> X-SW-Source: 1998-12/msg00898.html >> This is solved by a patch Jeff sent me just yesterday. > > Yea. But it seems to generally create worse code -- too aggressive about > copying the source to the destination. Some tweaking of the heuristic for > when to copy the source to the destination may be needed to make the change > generally useful. > > jeff The patch seems to help some on the SH4; I see it now loading values directly into fr0 occasionally where it didn't before: 4873:graph3d.ii **** min_z = M*(poly.vertices[0].sx-width2) + N*(poly.vertices[0].sy-height2) + O; 1567 0308 57A5 mov.l @(20,r10),r7 <- bad! 1569 030a 7104 add #4,r1 1571 030c 475A lds r7,fpul <- bad! 1573 030e F418 fmov.s @r1,fr4 1575 0310 F12D float fpul,fr1 <- bad! 1577 0312 71FC add #-4,r1 1579 0314 F218 fmov.s @r1,fr2 1580 0316 52A6 mov.l @(24,r10),r2 1581 0318 F211 fsub fr1,fr2 1582 031a 425A lds r2,fpul 1583 031c F12D float fpul,fr1 1585 031e F54C fmov fr4,fr5 1587 0320 F35C fmov fr5,fr3 1588 0322 E044 mov #68,r0 1589 0324 F311 fsub fr1,fr3 1590 0326 F0E6 fmov.s @(r0,r14),fr0 1591 0328 F302 fmul fr0,fr3 1592 032a E040 mov #64,r0 1593 032c F0E6 fmov.s @(r0,r14),fr0 <- here 1594 032e F32E fmac fr0,fr2,fr3 <- here 1595 0330 E048 mov #72,r0 1596 0332 F0E6 fmov.s @(r0,r14),fr0 1597 0334 F300 fadd fr0,fr3 ... 1639 0374 F0E6 fmov.s @(r0,r14),fr0 1640 0376 FC2E fmac fr0,fr2,fr12 ...but the majority of the cases seem to be missed: 1731 03f8 FA19 fmov.s @r1+,fr10 <- here 1733 03fa F522 fmul fr2,fr5 1734 03fc F08C fmov fr8,fr0 4911:graph3d.ii **** Q1 = poly.plane.m_cam2tex->m21; 1736 03fe F619 fmov.s @r1+,fr6 <- here 1738 0400 F53E fmac fr0,fr3,fr5 1739 0402 7308 add #8,r3 4912:graph3d.ii **** Q2 = poly.plane.m_cam2tex->m22; 1741 0404 F719 fmov.s @r1+,fr7 1743 0406 F138 fmov.s @r3,fr1 1744 0408 F0AC fmov fr10,fr0 <- here 4913:graph3d.ii **** Q3 = poly.plane.m_cam2tex->m23; 4914:graph3d.ii **** Q4 = -(Q1*poly.plane.v_cam2tex->x + Q2*poly.plane.v_cam2tex->y + Q3*poly.plane.v_cam2tex->z); 1746 040a F47C fmov fr7,fr4 1748 040c F51E fmac fr0,fr1,fr5 <- here 1750 040e F422 fmul fr2,fr4 1751 0410 F06C fmov fr6,fr0 <- here 1752 0412 F43E fmac fr0,fr3,fr4 <- here 1754 0414 FB18 fmov.s @r1,fr11 <- here 4915:graph3d.ii **** 4916:graph3d.ii **** if (Scan::tmap2) 1756 0416 D26A mov.l L464,r2 1758 0418 F0BC fmov fr11,fr0 <- here 1759 041a F41E fmac fr0,fr1,fr4 <- here ... 4939:graph3d.ii **** J1 = P1*Camera::inv_aspect + P4*M; 1842 04a8 E040 mov #64,r0 1843 L585: 1844 04aa F2E6 fmov.s @(r0,r14),fr2 1845 04ac F05C fmov fr5,fr0 1846 04ae D13F mov.l L459,r1 1847 04b0 F022 fmul fr2,fr0 <- here 1848 04b2 F118 fmov.s @r1,fr1 1849 04b4 F20C fmov fr0,fr2 <- here 1850 04b6 F08C fmov fr8,fr0 1851 04b8 F21E fmac fr0,fr1,fr2 <- here 1852 04ba E04C mov #76,r0 1853 04bc FE27 fmov.s fr2,@(r0,r14) 4940:graph3d.ii **** J2 = P2*Camera::inv_aspect + P4*N; 1855 04be E044 mov #68,r0 1856 04c0 F2E6 fmov.s @(r0,r14),fr2 <- here 1857 04c2 F05C fmov fr5,fr0 1858 04c4 F022 fmul fr2,fr0 <- here 1859 04c6 F20C fmov fr0,fr2 <- here 1860 04c8 F09C fmov fr9,fr0 1861 04ca F21E fmac fr0,fr1,fr2 <- here 1862 04cc E050 mov #80,r0 1863 04ce FE27 fmov.s fr2,@(r0,r14) ... 4942:graph3d.ii **** K1 = Q1*Camera::inv_aspect + Q4*M; 1871 04da E040 mov #64,r0 1872 04dc F2E6 fmov.s @(r0,r14),fr2 <- here 1873 04de F04C fmov fr4,fr0 1874 04e0 F022 fmul fr2,fr0 <- here 1875 04e2 F20C fmov fr0,fr2 <- here 1876 04e4 F06C fmov fr6,fr0 1877 04e6 F21E fmac fr0,fr1,fr2 1878 04e8 E058 mov #88,r0 1879 04ea FE27 fmov.s fr2,@(r0,r14) 4943:graph3d.ii **** K2 = Q2*Camera::inv_aspect + Q4*N; 1881 04ec E044 mov #68,r0 1882 04ee F2E6 fmov.s @(r0,r14),fr2 <- here 1883 04f0 F04C fmov fr4,fr0 1884 04f2 F022 fmul fr2,fr0 <- here 1885 04f4 F20C fmov fr0,fr2 <- here 1886 04f6 F07C fmov fr7,fr0 1887 04f8 F21E fmac fr0,fr1,fr2 <- here 1888 04fa E05C mov #92,r0 ... Sometimes these sequences are justified because the value is reused; e.g.: fmov @rm,fr9 fmov fr9,fr0 fmac fr0,frm,frn ... (fr0 clobbered) fmov fr9,fr0 fmac fr0,frm,frn however it would be nice if gcc could recognize this and convert this to: fmov @rm,fr0 fmov fr0,fr9 fmac fr0,frm,frn ... (fr0 clobbered) fmov fr9,fr0 fmac fr0,frm,frn because this sequence gives the scheduler more freedom to move the second instruction (fmov fr0,fr9) - it can be moved below the fmac instruction if necessary. The file graph3d.ii is from my stress suite at ftp://shell14.ba.best.com/pub.t/~tm2/stress-1.4.tar.gz . The file matrix.i from the stress suite also elicits the same behavior from egcs as well. Toshi