public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* Crash in gconv_db.c
@ 2019-11-11 16:05 Abhidnya Joshi
  2019-11-11 16:19 ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Abhidnya Joshi @ 2019-11-11 16:05 UTC (permalink / raw)
  To: libc-locales; +Cc: abhidnya chirmule

Hi,

We use glibc version glibc-2.17-78.el7.src.rpm on centos.

We encountered a problem recently in our product where we get crash in
libc stack in gconv.
The crash is seen randomly after few days during character encoding
(UTF-8 to UTF-16LE). The functions getting called are:

iconv_open --> __gconv_open--> __gconv_find_transform -->
find_derivation --> increment_counter

In increment_counter, we get nstep = 2. In the second iteration inside
increment_counter when step[0] is under processing,
we see below values in it:
shlib_handle is NULL, _modname = NULL, __counter = 1 after condition
check in (increment-counter), from_name UTF-8 to to_name INTENAL

{__shlib_handle = 0x0, __modname = 0x0, __counter = 1, __from_name =
0x7f028853b890 "ISO-10646/UTF8/", __to_name = 0x7f028abca431
"INTERNAL",
  __fct = 0x7f028aa73ce0 <__gconv_transform_utf8_internal>,
__btowc_fct = 0x7f028aa714b0 <__gconv_btwoc_ascii>, __init_fct = 0x0,
__end_fct = 0x0, __min_needed_from = 1,
  __max_needed_from = 6, __min_needed_to = 4, __max_needed_to = 4,
__stateful = 0, __data = 0x0}


The stack looks something like this:

(gdb) bt
#0  0x00007f028aa6f31a in increment_counter (nsteps=2,
steps=0x7f028853b140) at gconv_db.c:393
#1  find_derivation (toset=toset@entry=0x7f0251ceb990 "UTF-16LE//",
toset_expand=0x0, fromset=fromset@entry=0x7f0251ceb970 "UTF-8//",
    fromset_expand=fromset_expand@entry=0x7f028853a798
"ISO-10646/UTF8/", handle=handle@entry=0x7f0251ceb900,
nsteps=nsteps@entry=0x7f0251ceb910) at gconv_db.c:426
#2  0x00007f028aa6ff61 in __gconv_find_transform
(toset=toset@entry=0x7f0251ceb990 "UTF-16LE//",
fromset=fromset@entry=0x7f0251ceb970 "UTF-8//",
handle=handle@entry=0x7f0251ceb900,
    nsteps=nsteps@entry=0x7f0251ceb910, flags=flags@entry=0) at gconv_db.c:755
#3  0x00007f028aa6e7ea in __gconv_open
(toset=toset@entry=0x7f0251ceb990 "UTF-16LE//",
fromset=fromset@entry=0x7f0251ceb970 "UTF-8//",
handle=handle@entry=0x7f0251ceb9c0,
    flags=flags@entry=0) at gconv_open.c:173
#4  0x00007f028aa6e371 in iconv_open (tocode=0x7f0251ceb990
"UTF-16LE//", fromcode=0x7f0251ceb970 "UTF-8//") at iconv_open.c:71

(Pasting last 4 frames out of 27)

The gconv_steps looked like below:

(gdb) p *(&step[1])
$4 = {__shlib_handle = 0x7f028853b240, __modname = 0x7f028853b270
"/usr/lib64/gconv/UTF-16.so", __counter = 1, __from_name =
0x7f028abca431 "INTERNAL",
  __to_name = 0x7f028853b220 "UTF-16LE//", __fct = 0x59e0be5c0391534f,
__btowc_fct = 0xa7e5bbe252f1534f, __init_fct = 0x59e0be5c1c91534f,
__end_fct = 0x59e0be5c03b1534f,
  __min_needed_from = 4, __max_needed_from = 4, __min_needed_to = 2,
__max_needed_to = 4, __stateful = 0, __data = 0x9bd320}
(gdb) p *(&step[0])
$5 = {__shlib_handle = 0x0, __modname = 0x0, __counter = 1,
__from_name = 0x7f028853b890 "ISO-10646/UTF8/", __to_name =
0x7f028abca431 "INTERNAL",
  __fct = 0x7f028aa73ce0 <__gconv_transform_utf8_internal>,
__btowc_fct = 0x7f028aa714b0 <__gconv_btwoc_ascii>, __init_fct = 0x0,
__end_fct = 0x0, __min_needed_from = 1,
  __max_needed_from = 6, __min_needed_to = 4, __max_needed_to = 4,
__stateful = 0, __data = 0x0}
(gdb)

And locals were:
(gdb) f 0
#0  0x00007f028aa6f31a in increment_counter (nsteps=2,
steps=0x7f028853b140) at gconv_db.c:393
393     in gconv_db.c
(gdb) info locals
init_fct = 0xa9a7d3f2ddf12978
step = 0x7f028853b140
cnt = 0
result = 0
(gdb)

Coming back to the core, if we examine frame 0, it gave problem at
below assembly code:

   0x00007f028aa6f2e1 <+209>:   mov    0x20(%rax),%r15
   0x00007f028aa6f2e5 <+213>:   mov    0x28(%rax),%rax
   0x00007f028aa6f2e9 <+217>:   movq   $0x0,0x30(%rbx)
   0x00007f028aa6f2f1 <+225>:   mov    %rdx,0x28(%rbx)
   0x00007f028aa6f2f5 <+229>:   mov    %r15,0x38(%rbx)
   0x00007f028aa6f2f9 <+233>:   mov    %rax,0x40(%rbx)
   0x00007f028aa6f2fd <+237>:   ror    $0x11,%r15
   0x00007f028aa6f301 <+241>:   xor    %fs:0x30,%r15
   0x00007f028aa6f30a <+250>:   test   %r15,%r15
   0x00007f028aa6f30d <+253>:   je     0x7f028aa6f332 <find_derivation+290>
   0x00007f028aa6f30f <+255>:   mov    %r15,%rdi
   0x00007f028aa6f312 <+258>:   callq  0x7f028ab7f7b0
<__GI__dl_mcount_wrapper_check>
   0x00007f028aa6f317 <+263>:   mov    %rbx,%rdi
=> 0x00007f028aa6f31a <+266>:   callq  *%r15
   0x00007f028aa6f31d <+269>:   mov    0x30(%rbx),%rax
   0x00007f028aa6f321 <+273>:   xor    %fs:0x30,%rax
   0x00007f028aa6f32a <+282>:   rol    $0x11,%rax
   0x00007f028aa6f32e <+286>:   mov    %rax,0x30(%rbx)
   0x00007f028aa6f332 <+290>:   sub    $0x1,%r12
   0x00007f028aa6f336 <+294>:   sub    $0x68,%rbx
   0x00007f028aa6f33a <+298>:   test   %r12,%r12
   0x00007f028aa6f33d <+301>:   je     0x7f028aa6f35f <find_derivation+335>


The questions here are:
1. why step[0] has counter 1? this means it was 0, got incremented via
increment_counter and hence getting inside

              DL_CALL_FCT (init_fct, (step));

2. When step gets initialized, counter is never 0. Under which
condition this can become 0?
3. Please let me know what to debug more to understand this.

Thanks
Abhidnya

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-11-11 16:05 Crash in gconv_db.c Abhidnya Joshi
@ 2019-11-11 16:19 ` Florian Weimer
  2019-11-11 16:47   ` Abhidnya Joshi
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-11-11 16:19 UTC (permalink / raw)
  To: Abhidnya Joshi; +Cc: libc-locales

* Abhidnya Joshi:

> The questions here are:
> 1. why step[0] has counter 1? this means it was 0, got incremented via
> increment_counter and hence getting inside
>
>               DL_CALL_FCT (init_fct, (step));
>
> 2. When step gets initialized, counter is never 0. Under which
> condition this can become 0?
> 3. Please let me know what to debug more to understand this.

First, we should rule out that this isn't the result of unrelated heap
corruption.  Do you have reproducer?  Can you run under valgrind or
built with Address Sanitizer?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-11-11 16:19 ` Florian Weimer
@ 2019-11-11 16:47   ` Abhidnya Joshi
       [not found]     ` <CALmqtCVBCb2vJ+XNb6WZa1csNZaisLmqoG5nTn-QUU0MO=UbPw@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Abhidnya Joshi @ 2019-11-11 16:47 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-locales

This is not reproducible. Its happening in customer environment. I
tried sample iconv program just to understand the flow. But when I
execute this, it takes little different path.

I am afraid I do not understand iconv details much. What else I can do
to detect corruption?

Thanks
Abhidnya

On Mon, Nov 11, 2019 at 9:49 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Abhidnya Joshi:
>
> > The questions here are:
> > 1. why step[0] has counter 1? this means it was 0, got incremented via
> > increment_counter and hence getting inside
> >
> >               DL_CALL_FCT (init_fct, (step));
> >
> > 2. When step gets initialized, counter is never 0. Under which
> > condition this can become 0?
> > 3. Please let me know what to debug more to understand this.
>
> First, we should rule out that this isn't the result of unrelated heap
> corruption.  Do you have reproducer?  Can you run under valgrind or
> built with Address Sanitizer?
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
       [not found]     ` <CALmqtCVBCb2vJ+XNb6WZa1csNZaisLmqoG5nTn-QUU0MO=UbPw@mail.gmail.com>
@ 2019-11-14 17:34       ` Florian Weimer
  2019-11-19 14:05         ` Abhidnya Joshi
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-11-14 17:34 UTC (permalink / raw)
  To: Abhidnya Joshi; +Cc: libc-locales

* Abhidnya Joshi:

> Is there a possibility of integer overflow? I can see that counter
> keeps increasing even after icon_open, iconv and iconv_close is used
> in a sequence but multiple times for encoding.

There could be.  The reference counter handling is strange.  I haven't
seen this particular bug, but I wouldn't be surprised if it existed.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-11-14 17:34       ` Florian Weimer
@ 2019-11-19 14:05         ` Abhidnya Joshi
  2019-11-28 18:47           ` Abhidnya Joshi
  0 siblings, 1 reply; 17+ messages in thread
From: Abhidnya Joshi @ 2019-11-19 14:05 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-locales

I tried to experiment again. When I run conversions via readdir, I can
see counter increasing as below:

(gdb) c
Continuing.
[Switching to Thread 0x7f936d472700 (LWP 16095)]

Breakpoint 1, find_derivation (toset=toset@entry=0x7f936d46e990
"UTF-16LE//", toset_expand=0x0, fromset=fromset@entry=0x7f936d46e970
"UTF-8//",
    fromset_expand=fromset_expand@entry=0x7f9392c17ae8
"ISO-10646/UTF8/", handle=handle@entry=0x7f936d46e900,
nsteps=nsteps@entry=0x7f936d46e910) at gconv_db.c:426
426     gconv_db.c: No such file or directory.
(gdb) s
0x00007f93948e62be in increment_counter (nsteps=2,
steps=0x7f9392c18500) at gconv_db.c:410
410     in gconv_db.c
(gdb) p *(&(steps[0]))
$1 = {__shlib_handle = 0x0, __modname = 0x0, __counter = 1782250,
__from_name = 0x7f9392c18c50 "ISO-10646/UTF8/", __to_name =
0x7f9394a41431 "INTERNAL",
  __fct = 0x7f93948eace0 <__gconv_transform_utf8_internal>,
__btowc_fct = 0x7f93948e84b0 <__gconv_btwoc_ascii>, __init_fct = 0x0,
__end_fct = 0x0, __min_needed_from = 1,
  __max_needed_from = 6, __min_needed_to = 4, __max_needed_to = 4,
__stateful = 0, __data = 0x0}
(gdb) q


After some time when I stopped readdirs, I saw counter back to very low count:
(gdb) b increment_counter
Breakpoint 1 at 0x7f93948e62be
(gdb) c
Continuing.
[Switching to Thread 0x7f938e0e6700 (LWP 22822)]

Breakpoint 1, find_derivation (toset=toset@entry=0x7f938e0e3160
"UTF-8//", toset_expand=0x7f9392c17ae8 "ISO-10646/UTF8/",
fromset=fromset@entry=0x7f938e0e3140 "WCHAR_T//",
    fromset_expand=fromset_expand@entry=0x7f9392c17a2a "INTERNAL",
handle=handle@entry=0x7f938e0e30d0,
nsteps=nsteps@entry=0x7f938e0e30e0) at gconv_db.c:426
426     gconv_db.c: No such file or directory.
(gdb) s
0x00007f93948e62be in increment_counter (nsteps=1, steps=0x6a8d90) at
gconv_db.c:410
410     in gconv_db.c
(gdb) p *(&(steps[0]))
$1 = {__shlib_handle = 0x0, __modname = 0x0, __counter = 26,
__from_name = 0x6a8e00 "INTERNAL", __to_name = 0x6a8e20
"ISO-10646/UTF8/",
  __fct = 0x7f93948ea200 <__gconv_transform_internal_utf8>,
__btowc_fct = 0x0, __init_fct = 0x0, __end_fct = 0x0,
__min_needed_from = 4, __max_needed_from = 4, __min_needed_to = 1,
  __max_needed_to = 6, __stateful = 0, __data = 0x0}

I have few questions after looking at this behaviour:
1. Is there a chance of __counter getting integer overflow and getting
reset to 0 (after 2^32)?
2. When do we see steps getting freed? Is it via free_derivation?

Thanks
Abhidnya

On Thu, Nov 14, 2019 at 11:04 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Abhidnya Joshi:
>
> > Is there a possibility of integer overflow? I can see that counter
> > keeps increasing even after icon_open, iconv and iconv_close is used
> > in a sequence but multiple times for encoding.
>
> There could be.  The reference counter handling is strange.  I haven't
> seen this particular bug, but I wouldn't be surprised if it existed.
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-11-19 14:05         ` Abhidnya Joshi
@ 2019-11-28 18:47           ` Abhidnya Joshi
  2019-12-12 15:58             ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Abhidnya Joshi @ 2019-11-28 18:47 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-locales

Hi Florian,

In increment_counter function, do we have to handle integer overflow like this?


if (step->__counter == INT_MAX)
{
step->__counter = 1
}
if (step->__counter++ == 0)
{ ....
}

I do not understand what will be impact of this on different "steps".
If step->__counter gets set to 0 and step->__modname is NULL
then we see the segfault as mentioned earlier.

Can you please suggest a way out to check integer overflow along with
other checks which are necessary to consider?

Thanks
Abhidnya

On Tue, Nov 19, 2019 at 7:35 PM Abhidnya Joshi
<abhidnyachirmule@gmail.com> wrote:
>
> I tried to experiment again. When I run conversions via readdir, I can
> see counter increasing as below:
>
> (gdb) c
> Continuing.
> [Switching to Thread 0x7f936d472700 (LWP 16095)]
>
> Breakpoint 1, find_derivation (toset=toset@entry=0x7f936d46e990
> "UTF-16LE//", toset_expand=0x0, fromset=fromset@entry=0x7f936d46e970
> "UTF-8//",
>     fromset_expand=fromset_expand@entry=0x7f9392c17ae8
> "ISO-10646/UTF8/", handle=handle@entry=0x7f936d46e900,
> nsteps=nsteps@entry=0x7f936d46e910) at gconv_db.c:426
> 426     gconv_db.c: No such file or directory.
> (gdb) s
> 0x00007f93948e62be in increment_counter (nsteps=2,
> steps=0x7f9392c18500) at gconv_db.c:410
> 410     in gconv_db.c
> (gdb) p *(&(steps[0]))
> $1 = {__shlib_handle = 0x0, __modname = 0x0, __counter = 1782250,
> __from_name = 0x7f9392c18c50 "ISO-10646/UTF8/", __to_name =
> 0x7f9394a41431 "INTERNAL",
>   __fct = 0x7f93948eace0 <__gconv_transform_utf8_internal>,
> __btowc_fct = 0x7f93948e84b0 <__gconv_btwoc_ascii>, __init_fct = 0x0,
> __end_fct = 0x0, __min_needed_from = 1,
>   __max_needed_from = 6, __min_needed_to = 4, __max_needed_to = 4,
> __stateful = 0, __data = 0x0}
> (gdb) q
>
>
> After some time when I stopped readdirs, I saw counter back to very low count:
> (gdb) b increment_counter
> Breakpoint 1 at 0x7f93948e62be
> (gdb) c
> Continuing.
> [Switching to Thread 0x7f938e0e6700 (LWP 22822)]
>
> Breakpoint 1, find_derivation (toset=toset@entry=0x7f938e0e3160
> "UTF-8//", toset_expand=0x7f9392c17ae8 "ISO-10646/UTF8/",
> fromset=fromset@entry=0x7f938e0e3140 "WCHAR_T//",
>     fromset_expand=fromset_expand@entry=0x7f9392c17a2a "INTERNAL",
> handle=handle@entry=0x7f938e0e30d0,
> nsteps=nsteps@entry=0x7f938e0e30e0) at gconv_db.c:426
> 426     gconv_db.c: No such file or directory.
> (gdb) s
> 0x00007f93948e62be in increment_counter (nsteps=1, steps=0x6a8d90) at
> gconv_db.c:410
> 410     in gconv_db.c
> (gdb) p *(&(steps[0]))
> $1 = {__shlib_handle = 0x0, __modname = 0x0, __counter = 26,
> __from_name = 0x6a8e00 "INTERNAL", __to_name = 0x6a8e20
> "ISO-10646/UTF8/",
>   __fct = 0x7f93948ea200 <__gconv_transform_internal_utf8>,
> __btowc_fct = 0x0, __init_fct = 0x0, __end_fct = 0x0,
> __min_needed_from = 4, __max_needed_from = 4, __min_needed_to = 1,
>   __max_needed_to = 6, __stateful = 0, __data = 0x0}
>
> I have few questions after looking at this behaviour:
> 1. Is there a chance of __counter getting integer overflow and getting
> reset to 0 (after 2^32)?
> 2. When do we see steps getting freed? Is it via free_derivation?
>
> Thanks
> Abhidnya
>
> On Thu, Nov 14, 2019 at 11:04 PM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Abhidnya Joshi:
> >
> > > Is there a possibility of integer overflow? I can see that counter
> > > keeps increasing even after icon_open, iconv and iconv_close is used
> > > in a sequence but multiple times for encoding.
> >
> > There could be.  The reference counter handling is strange.  I haven't
> > seen this particular bug, but I wouldn't be surprised if it existed.
> >
> > Thanks,
> > Florian
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-11-28 18:47           ` Abhidnya Joshi
@ 2019-12-12 15:58             ` Florian Weimer
  2019-12-20 15:05               ` Abhidnya Joshi
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-12-12 15:58 UTC (permalink / raw)
  To: Abhidnya Joshi; +Cc: libc-locales

* Abhidnya Joshi:

> In increment_counter function, do we have to handle integer overflow like this?
>
>
> if (step->__counter == INT_MAX)
> {
> step->__counter = 1
> }
> if (step->__counter++ == 0)
> { ....
> }
>
> I do not understand what will be impact of this on different "steps".
> If step->__counter gets set to 0 and step->__modname is NULL
> then we see the segfault as mentioned earlier.

The steps array has reference counts in the array *elements*, which is
wrong.  We ran into this a couple of months ago when we tried to fix a
memory leak.  This is the reason why I have limited confidence in the
correctness of the reference counting.

I looked at what it would take to reproduce the overflow with repeated
iconv_open calls, and I estimate that 1.2 TiB of RAM are required, so I
have not reproduced that yet.

But there could be more subtle ways to trigger counter overflow, like
uselocale followed by pthread_exit.  Without a reproducer, we are in the
dark here unfortunately.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-12-12 15:58             ` Florian Weimer
@ 2019-12-20 15:05               ` Abhidnya Joshi
  2019-12-20 15:16                 ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Abhidnya Joshi @ 2019-12-20 15:05 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-locales

Hi Florian,

Good News! We are able to reproduce it using continuous readdirs.
(Where reply involved encoding to UTF-16)
The stack is exactly same as shared earlier. It did overflowed counter.

Do you have any suggestion on the fix? We can test the fix using our test.

Thanks and Regards
Abhidnya Joshi

On Thu, Dec 12, 2019 at 9:28 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Abhidnya Joshi:
>
> > In increment_counter function, do we have to handle integer overflow like this?
> >
> >
> > if (step->__counter == INT_MAX)
> > {
> > step->__counter = 1
> > }
> > if (step->__counter++ == 0)
> > { ....
> > }
> >
> > I do not understand what will be impact of this on different "steps".
> > If step->__counter gets set to 0 and step->__modname is NULL
> > then we see the segfault as mentioned earlier.
>
> The steps array has reference counts in the array *elements*, which is
> wrong.  We ran into this a couple of months ago when we tried to fix a
> memory leak.  This is the reason why I have limited confidence in the
> correctness of the reference counting.
>
> I looked at what it would take to reproduce the overflow with repeated
> iconv_open calls, and I estimate that 1.2 TiB of RAM are required, so I
> have not reproduced that yet.
>
> But there could be more subtle ways to trigger counter overflow, like
> uselocale followed by pthread_exit.  Without a reproducer, we are in the
> dark here unfortunately.
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-12-20 15:05               ` Abhidnya Joshi
@ 2019-12-20 15:16                 ` Florian Weimer
  2019-12-20 15:40                   ` Abhidnya Joshi
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-12-20 15:16 UTC (permalink / raw)
  To: Abhidnya Joshi; +Cc: libc-locales

* Abhidnya Joshi:

> Good News! We are able to reproduce it using continuous readdirs.
> (Where reply involved encoding to UTF-16)
> The stack is exactly same as shared earlier. It did overflowed counter.
>
> Do you have any suggestion on the fix? We can test the fix using our test.

glibc's readdir does not perform UTF-16 conversion.  Could you come up
with an actual reproducer?  Then we can likely fix this bug.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-12-20 15:16                 ` Florian Weimer
@ 2019-12-20 15:40                   ` Abhidnya Joshi
  2019-12-20 15:43                     ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Abhidnya Joshi @ 2019-12-20 15:40 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-locales

Its not glibc readdir. It just readdir call on wire (SMB client and
server). We internally call iconv_open in our dir listing.

Thanks and Regards
Abhidnya Joshi

On Fri, Dec 20, 2019 at 8:46 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Abhidnya Joshi:
>
> > Good News! We are able to reproduce it using continuous readdirs.
> > (Where reply involved encoding to UTF-16)
> > The stack is exactly same as shared earlier. It did overflowed counter.
> >
> > Do you have any suggestion on the fix? We can test the fix using our test.
>
> glibc's readdir does not perform UTF-16 conversion.  Could you come up
> with an actual reproducer?  Then we can likely fix this bug.
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-12-20 15:40                   ` Abhidnya Joshi
@ 2019-12-20 15:43                     ` Florian Weimer
  2019-12-20 15:53                       ` Abhidnya Joshi
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-12-20 15:43 UTC (permalink / raw)
  To: Abhidnya Joshi; +Cc: libc-locales

* Abhidnya Joshi:

> Its not glibc readdir. It just readdir call on wire (SMB client and
> server). We internally call iconv_open in our dir listing.

Okay.  Can you somehow capture the glibc API calls and perhaps build a
reproducer that way?

Do you call readdir_r, by chance?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-12-20 15:43                     ` Florian Weimer
@ 2019-12-20 15:53                       ` Abhidnya Joshi
  2019-12-20 16:09                         ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Abhidnya Joshi @ 2019-12-20 15:53 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-locales

DO you want to capture those as part of core or any other debug enable
during test run?
The stack looks like this.

(gdb) bt
#0  0x00007f028aa6f31a in increment_counter (nsteps=2,
steps=0x7f028853b140) at gconv_db.c:393
#1  find_derivation (toset=toset@entry=0x7f0251ceb990 "UTF-16LE//",
toset_expand=0x0, fromset=fromset@entry=0x7f0251ceb970 "UTF-8//",
    fromset_expand=fromset_expand@entry=0x7f028853a798
"ISO-10646/UTF8/", handle=handle@entry=0x7f0251ceb900,
nsteps=nsteps@entry=0x7f0251ceb910) at gconv_db.c:426
#2  0x00007f028aa6ff61 in __gconv_find_transform
(toset=toset@entry=0x7f0251ceb990 "UTF-16LE//",
fromset=fromset@entry=0x7f0251ceb970 "UTF-8//",
handle=handle@entry=0x7f0251ceb900,
    nsteps=nsteps@entry=0x7f0251ceb910, flags=flags@entry=0) at gconv_db.c:755
#3  0x00007f028aa6e7ea in __gconv_open
(toset=toset@entry=0x7f0251ceb990 "UTF-16LE//",
fromset=fromset@entry=0x7f0251ceb970 "UTF-8//",
handle=handle@entry=0x7f0251ceb9c0,
    flags=flags@entry=0) at gconv_open.c:173
#4  0x00007f028aa6e371 in iconv_open (tocode=0x7f0251ceb990
"UTF-16LE//", fromcode=0x7f0251ceb970 "UTF-8//") at iconv_open.c:71

(Pasting last 4 frames out of 27)

The gconv_steps looked like below:

(gdb) p *(&step[1])
$4 = {__shlib_handle = 0x7f028853b240, __modname = 0x7f028853b270
"/usr/lib64/gconv/UTF-16.so", __counter = 1, __from_name =
0x7f028abca431 "INTERNAL",
  __to_name = 0x7f028853b220 "UTF-16LE//", __fct = 0x59e0be5c0391534f,
__btowc_fct = 0xa7e5bbe252f1534f, __init_fct = 0x59e0be5c1c91534f,
__end_fct = 0x59e0be5c03b1534f,
  __min_needed_from = 4, __max_needed_from = 4, __min_needed_to = 2,
__max_needed_to = 4, __stateful = 0, __data = 0x9bd320}
(gdb) p *(&step[0])
$5 = {__shlib_handle = 0x0, __modname = 0x0, __counter = 1,
__from_name = 0x7f028853b890 "ISO-10646/UTF8/", __to_name =
0x7f028abca431 "INTERNAL",
  __fct = 0x7f028aa73ce0 <__gconv_transform_utf8_internal>,
__btowc_fct = 0x7f028aa714b0 <__gconv_btwoc_ascii>, __init_fct = 0x0,
__end_fct = 0x0, __min_needed_from = 1,
  __max_needed_from = 6, __min_needed_to = 4, __max_needed_to = 4,
__stateful = 0, __data = 0x0}
(gdb)

Thanks and Regards
Abhidnya Joshi

On Fri, Dec 20, 2019 at 9:13 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Abhidnya Joshi:
>
> > Its not glibc readdir. It just readdir call on wire (SMB client and
> > server). We internally call iconv_open in our dir listing.
>
> Okay.  Can you somehow capture the glibc API calls and perhaps build a
> reproducer that way?
>
> Do you call readdir_r, by chance?
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-12-20 15:53                       ` Abhidnya Joshi
@ 2019-12-20 16:09                         ` Florian Weimer
  2019-12-20 16:42                           ` Abhidnya Joshi
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-12-20 16:09 UTC (permalink / raw)
  To: Abhidnya Joshi; +Cc: libc-locales

* Abhidnya Joshi:

> The gconv_steps looked like below:
>
> (gdb) p *(&step[1])
> $4 = {__shlib_handle = 0x7f028853b240, __modname = 0x7f028853b270
> "/usr/lib64/gconv/UTF-16.so", __counter = 1, __from_name =
> 0x7f028abca431 "INTERNAL",
>   __to_name = 0x7f028853b220 "UTF-16LE//", __fct = 0x59e0be5c0391534f,
> __btowc_fct = 0xa7e5bbe252f1534f, __init_fct = 0x59e0be5c1c91534f,
> __end_fct = 0x59e0be5c03b1534f,
>   __min_needed_from = 4, __max_needed_from = 4, __min_needed_to = 2,
> __max_needed_to = 4, __stateful = 0, __data = 0x9bd320}
> (gdb) p *(&step[0])
> $5 = {__shlib_handle = 0x0, __modname = 0x0, __counter = 1,
> __from_name = 0x7f028853b890 "ISO-10646/UTF8/", __to_name =
> 0x7f028abca431 "INTERNAL",
>   __fct = 0x7f028aa73ce0 <__gconv_transform_utf8_internal>,
> __btowc_fct = 0x7f028aa714b0 <__gconv_btwoc_ascii>, __init_fct = 0x0,
> __end_fct = 0x0, __min_needed_from = 1,
>   __max_needed_from = 6, __min_needed_to = 4, __max_needed_to = 4,
> __stateful = 0, __data = 0x0}
> (gdb)

Since both __counter values are 1, I'm not sure there is a counter
overflow?

What I meant is that it would be nice to have a sequence of glibc API
calls which clearly demonstrate this behavior.  I can't reproduce this
with an obvious sequence of iconv calls in a loop.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-12-20 16:09                         ` Florian Weimer
@ 2019-12-20 16:42                           ` Abhidnya Joshi
  2019-12-20 16:47                             ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Abhidnya Joshi @ 2019-12-20 16:42 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-locales

Hi Florian,

step[0] is a problem. When we tried to reproduce this, during runs, we
did check counter of step[0] in between. It actually got increased to
2^32 and then got set back to 0.
This is only with first entry of step which is for some INTERNAL thing.

I am not sure I understand the way steps are calculated using
find_derivation but this is what I see as the sequence of APIs.
iconv_open called to convert to UTF-16 followed by iconv and then
iconv_close. If you want more details about withing iconv_open, which
APIs are getting called in our environment,
I will try to collect that using GDB.

We use this sequence for each name which is part of readdir reply. Its
actually not at all specific to readdir. encoding is used to send any
chars on wire as part of SMB protocol.
Because with readdir, it is easy to reproduce. Customer too saw issues
during readdir.

Thanks and regards
Abhidnya Joshi

On Fri, Dec 20, 2019 at 9:39 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Abhidnya Joshi:
>
> > The gconv_steps looked like below:
> >
> > (gdb) p *(&step[1])
> > $4 = {__shlib_handle = 0x7f028853b240, __modname = 0x7f028853b270
> > "/usr/lib64/gconv/UTF-16.so", __counter = 1, __from_name =
> > 0x7f028abca431 "INTERNAL",
> >   __to_name = 0x7f028853b220 "UTF-16LE//", __fct = 0x59e0be5c0391534f,
> > __btowc_fct = 0xa7e5bbe252f1534f, __init_fct = 0x59e0be5c1c91534f,
> > __end_fct = 0x59e0be5c03b1534f,
> >   __min_needed_from = 4, __max_needed_from = 4, __min_needed_to = 2,
> > __max_needed_to = 4, __stateful = 0, __data = 0x9bd320}
> > (gdb) p *(&step[0])
> > $5 = {__shlib_handle = 0x0, __modname = 0x0, __counter = 1,
> > __from_name = 0x7f028853b890 "ISO-10646/UTF8/", __to_name =
> > 0x7f028abca431 "INTERNAL",
> >   __fct = 0x7f028aa73ce0 <__gconv_transform_utf8_internal>,
> > __btowc_fct = 0x7f028aa714b0 <__gconv_btwoc_ascii>, __init_fct = 0x0,
> > __end_fct = 0x0, __min_needed_from = 1,
> >   __max_needed_from = 6, __min_needed_to = 4, __max_needed_to = 4,
> > __stateful = 0, __data = 0x0}
> > (gdb)
>
> Since both __counter values are 1, I'm not sure there is a counter
> overflow?
>
> What I meant is that it would be nice to have a sequence of glibc API
> calls which clearly demonstrate this behavior.  I can't reproduce this
> with an obvious sequence of iconv calls in a loop.
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-12-20 16:42                           ` Abhidnya Joshi
@ 2019-12-20 16:47                             ` Florian Weimer
  2019-12-20 17:18                               ` Abhidnya Joshi
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-12-20 16:47 UTC (permalink / raw)
  To: Abhidnya Joshi; +Cc: libc-locales

* Abhidnya Joshi:

> step[0] is a problem. When we tried to reproduce this, during runs, we
> did check counter of step[0] in between. It actually got increased to
> 2^32 and then got set back to 0.
> This is only with first entry of step which is for some INTERNAL thing.

I may have found something.

Would you please check if on the affected system, ther is a
gconv-modules.cache file somewhere under /usr?  Fedora uses
/usr/lib64/gconv/gconv-modules.cache, for example.  Or do you set the
GCONV_PATH environment variable on the affected system?

In my initial tests, the counter was stuck at 1 due to the gconv cache,
but that requires the gconv-modules.cache file being present at the
expected path, and GCONV_PATH must not be set.  In this case, you should
be able to work around this issue by running iconvconfig (or not setting
GCONV_PATH).

I cannot yet reproduce the crash, but I definitely see the counter
wrap-around.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-12-20 16:47                             ` Florian Weimer
@ 2019-12-20 17:18                               ` Abhidnya Joshi
  2019-12-20 18:39                                 ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Abhidnya Joshi @ 2019-12-20 17:18 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-locales

I checked for gconv-modules.cache. I do not see that at all in
/usr/lib64/gconv  .
Also the environment variable is not set. Is there a possibility that
cache is not at all getting enabled?

Thanks and regards
Abhidnya Joshi

On Fri, Dec 20, 2019 at 10:17 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Abhidnya Joshi:
>
> > step[0] is a problem. When we tried to reproduce this, during runs, we
> > did check counter of step[0] in between. It actually got increased to
> > 2^32 and then got set back to 0.
> > This is only with first entry of step which is for some INTERNAL thing.
>
> I may have found something.
>
> Would you please check if on the affected system, ther is a
> gconv-modules.cache file somewhere under /usr?  Fedora uses
> /usr/lib64/gconv/gconv-modules.cache, for example.  Or do you set the
> GCONV_PATH environment variable on the affected system?
>
> In my initial tests, the counter was stuck at 1 due to the gconv cache,
> but that requires the gconv-modules.cache file being present at the
> expected path, and GCONV_PATH must not be set.  In this case, you should
> be able to work around this issue by running iconvconfig (or not setting
> GCONV_PATH).
>
> I cannot yet reproduce the crash, but I definitely see the counter
> wrap-around.
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Crash in gconv_db.c
  2019-12-20 17:18                               ` Abhidnya Joshi
@ 2019-12-20 18:39                                 ` Florian Weimer
  0 siblings, 0 replies; 17+ messages in thread
From: Florian Weimer @ 2019-12-20 18:39 UTC (permalink / raw)
  To: Abhidnya Joshi; +Cc: libc-locales

* Abhidnya Joshi:

> I checked for gconv-modules.cache. I do not see that at all in
> /usr/lib64/gconv  .
> Also the environment variable is not set. Is there a possibility that
> cache is not at all getting enabled?

Yes, distributions usually run iconvconfig during the installation to
create the cache.

I'm pretty sure that enabling the cache will paper over this bug,
otherwise it would have been reported before.  With caching enable, the
counters should remain constant at 1.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-12-20 18:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-11 16:05 Crash in gconv_db.c Abhidnya Joshi
2019-11-11 16:19 ` Florian Weimer
2019-11-11 16:47   ` Abhidnya Joshi
     [not found]     ` <CALmqtCVBCb2vJ+XNb6WZa1csNZaisLmqoG5nTn-QUU0MO=UbPw@mail.gmail.com>
2019-11-14 17:34       ` Florian Weimer
2019-11-19 14:05         ` Abhidnya Joshi
2019-11-28 18:47           ` Abhidnya Joshi
2019-12-12 15:58             ` Florian Weimer
2019-12-20 15:05               ` Abhidnya Joshi
2019-12-20 15:16                 ` Florian Weimer
2019-12-20 15:40                   ` Abhidnya Joshi
2019-12-20 15:43                     ` Florian Weimer
2019-12-20 15:53                       ` Abhidnya Joshi
2019-12-20 16:09                         ` Florian Weimer
2019-12-20 16:42                           ` Abhidnya Joshi
2019-12-20 16:47                             ` Florian Weimer
2019-12-20 17:18                               ` Abhidnya Joshi
2019-12-20 18:39                                 ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).