From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-258703-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 30352 invoked by alias); 19 Mar 2010 13:26:35 -0000
Received: (qmail 30342 invoked by uid 22791); 19 Mar 2010 13:26:35 -0000
X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 	tests=AWL,BAYES_00,SPF_FAIL
X-Spam-Check-By: sourceware.org
Received: from mx20.gnu.org (HELO mx20.gnu.org) (199.232.41.8)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 19 Mar 2010 13:26:31 +0000
Received: from mail.codesourcery.com ([38.113.113.100]) 	by mx20.gnu.org with esmtp (Exim 4.60) 	(envelope-from <joseph@codesourcery.com>) 	id 1NscDc-0000el-Pv 	for gcc-patches@gcc.gnu.org; Fri, 19 Mar 2010 09:26:29 -0400
Received: (qmail 16161 invoked from network); 19 Mar 2010 13:26:26 -0000
Received: from unknown (HELO digraph.polyomino.org.uk) (joseph@127.0.0.2)   by mail.codesourcery.com with ESMTPA; 19 Mar 2010 13:26:26 -0000
Received: from jsm28 (helo=localhost) 	by digraph.polyomino.org.uk with local-esmtp (Exim 4.69) 	(envelope-from <joseph@codesourcery.com>) 	id 1NscDZ-00082V-0D; Fri, 19 Mar 2010 13:26:25 +0000
Date: Fri, 19 Mar 2010 13:44:00 -0000
From: "Joseph S. Myers" <joseph@codesourcery.com>
To: Shujing Zhao <pearly.zhao@oracle.com>
cc: GCC Patches <gcc-patches@gcc.gnu.org>,      Paolo Carlini <paolo.carlini@oracle.com>
Subject: Re: [PATCH PR/42686] Align the help text output
In-Reply-To: <4BA35936.8070304@oracle.com>
Message-ID: <Pine.LNX.4.64.1003191301180.30074@digraph.polyomino.org.uk>
References: <4B863559.4000407@oracle.com> <Pine.LNX.4.64.1002251547410.30254@digraph.polyomino.org.uk>  <4B976D51.8010705@oracle.com> <Pine.LNX.4.64.1003101656110.12539@digraph.polyomino.org.uk>  <4B9A1E65.4080506@oracle.com> <Pine.LNX.4.64.1003121651160.31790@digraph.polyomino.org.uk>  <4B9DFA37.9010303@oracle.com> <Pine.LNX.4.64.1003151052520.31811@digraph.polyomino.org.uk>  <4BA07D89.3040406@oracle.com> <Pine.LNX.4.64.1003171201450.2148@digraph.polyomino.org.uk>  <4BA2010C.9030103@oracle.com> <Pine.LNX.4.64.1003181650570.19807@digraph.polyomino.org.uk>  <4BA35936.8070304@oracle.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-detected-operating-system: by mx20.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4)
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2010-03/txt/msg00872.txt.bz2

On Fri, 19 Mar 2010, Shujing Zhao wrote:

> > Why this increment?  Characters may have width 0 (combining characters) or
> > more than 1 and it's not clear why adding 1 should be the correct thing to
> > do here.
> It is changed to *length--* at the new logic.
> +      colwidth += wcwidth (wc[0]);
> +      length--;
> +      if (length == 0)
> +       len = remaining;
> I have tried to used length = length - nbytes to get the new length. But if
> nbytes > 1 and the character is the last character of the string (length will
> be 0 if it decreases nbytes), it would not assign the last variable *len* to
> *len_bk*, then the character that maybe put into the ROOM would not be
> printed. That is to say, it presumes the nbytes is 1. If (nbytes > 1), it
> would be handled specially (just to update the value of len_bk). At the other
> conditionals, len_bk is always kept consistently with len.

I'm afraid none of this makes any sense to me, because I have no idea what 
"length", "len" and "len_bk" are meant to be, and how they differ.  You 
need clearer variable names.  I think you also need comments at various 
points stating what the values various variables mean at those points, and 
maybe an overview comment about the algorithm used.  It would be best to 
avoid ambiguous terms such as "length" at all points in variable names and 
comments, instead talking about numbers of bytes difference between two 
points, or similar.

For example, the present algorithm in opts.c:wrap_help might be described 
thus:

/* Look for a suitable point at which to break HELP.  If there is one 
   after at most ROOM columns, take the latest such point.  Otherwise, 
   take the first such point in the string, or the end of the string if it 
   cannot be broken.  */

If the new algorithm is still like that, you may just need to update 
variable names (and explain what the interface is for indicating both how 
much of the string to print, and how much (spaces) to skip before starting 
on the next line).

The key variables to explain are LEN and I.  At each iteration in the 
present loop, something like this might explain them.

/* The best candidate point found so far to break HELP is after LEN 
   bytes, where either LEN < I, or LEN == REMAINING if the string cannot 
   be broken within the first I bytes.  If we are outside ROOM columns and 
   have found somewhere to break the string, that is the best point 
   available; otherwise, breaking after I bytes will be better if that is 
   a valid point to break.  */

Now, things will be a bit more complicated in that you need to track the 
current column width as well as the current number of bytes of the string 
that take up that width.  But I think the basic idea is still the same - 
it is just that "length", "len" and "len_bk" are not useful variable names 
to distinguish the different quantities being tracked and so make clear 
whether particular arithmetic you are doing on them makes logical sense.  
I remain very suspicious of anything modifying a byte counter by 1 in this 
context; if you are ever modifying a byte counter by something other than 
the number of bytes in the relevant multibyte character I think you are 
confused and need to sort out the logic of the code so you don't need to 
do so.

-- 
Joseph S. Myers
joseph@codesourcery.com