public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* regex_t internals: can we use re_magic to tell whether a regex has been regcomp'd?
@ 2011-08-10 18:09 Fischer, Matthew L
  2011-08-11  7:43 ` Corinna Vinschen
  0 siblings, 1 reply; 2+ messages in thread
From: Fischer, Matthew L @ 2011-08-10 18:09 UTC (permalink / raw)
  To: cygwin

We are porting code from Linux that is attempting to determine whether a regular expression has been properly regcomp'd and not freed.  The code from Linux is looking into the buffer inside regex_t.   On Cygwin, the "buffer" (not the same field name) is hidden inside re_guts which has a comment that dissuades us from using it for this purpose.  However, from looking at the Cygwin implementation, it looks like if re_magic is != 0 then the regexp is valid and has been regcomp'd and not regfree'd.  Is this interpretation correct?

The porting mechanism in the code below seems to work well, but we're not sure whether re_magic is the best solution for Cygwin.  Is method below the best and more importantly, safest option?

bool regexValid() 
{
#ifdef __CYGWIN__
                return (m_reg.re_magic != 0);
#endif
                //original Linux code
return m_Reg.buffer != NULL;  
}

Structures:
------------------------------------

Linux regex_t - typedef'd to struct re_pattern"

struct re_pattern_buffer
{
  /* Space that holds the compiled pattern.  It is declared as
     `unsigned char *' because its elements are sometimes used as
     array indexes.  */
  unsigned char *__REPB_PREFIX(buffer);
...
}
typedef struct re_pattern_buffer regex_t;

------------------------------------


Cygwin regex_t:

On Cygwin, the malloc'd space is down in "re_guts" which has a great comment: 
typedef struct {
                int re_magic;
                size_t re_nsub;                 /* number of parenthesized subexpressions */
#ifdef __CYGWIN__
                const char *re_endp;     /* end pointer for REG_PEND */
#else
                __const char *re_endp;               /* end pointer for REG_PEND */
#endif
                struct re_guts *re_g;     /* none of your business :-) */
} regex_t;


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: regex_t internals: can we use re_magic to tell whether a regex has been regcomp'd?
  2011-08-10 18:09 regex_t internals: can we use re_magic to tell whether a regex has been regcomp'd? Fischer, Matthew L
@ 2011-08-11  7:43 ` Corinna Vinschen
  0 siblings, 0 replies; 2+ messages in thread
From: Corinna Vinschen @ 2011-08-11  7:43 UTC (permalink / raw)
  To: cygwin

On Aug 10 19:07, Fischer, Matthew L wrote:
> We are porting code from Linux that is attempting to determine whether
> a regular expression has been properly regcomp'd and not freed.  The
> code from Linux is looking into the buffer inside regex_t.   On

Which is kind of scary, IMHO.

Using the internals of the regex_t structure other than the ones blessed
by the POSIX standard is a sure way to write non-portable code.  See
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/regex.h.html
The only officially documented member of regex_t is re_nsub.

So, why does the code check the internals at all?  Why is it important
that something has been allocated or not?  Shouldn't the application
code be happy to rely solely on the return value of regcomp?

> Cygwin, the "buffer" (not the same field name) is hidden inside
> re_guts which has a comment that dissuades us from using it for this
> purpose.

Rightfully.  Please note that the regex code is *not* Cygwin-specific.
This code is actually FreeBSD code, with only minor changes to port it
to Cygwin, plus an extension to allow the GNU \< and \> expressions.

> However, from looking at the Cygwin implementation, it looks
> like if re_magic is != 0 then the regexp is valid and has been
> regcomp'd and not regfree'd.  Is this interpretation correct?

Well, I never actually examined the guts of regcomp/regfree more than
necessary, but it seems you're right.  No guarantees, though.  IMHO, if
the application code has to check the internals of the regex_t structure
to know if it called regfree on it, it's a bug in the application.
Rather than doing that, it should keep track of its regcomp/regfree
calls by using an external state variable.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-08-11  7:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-10 18:09 regex_t internals: can we use re_magic to tell whether a regex has been regcomp'd? Fischer, Matthew L
2011-08-11  7:43 ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).