ATT Labs

AT&T Laboratories
Cambridge

Home page About Us Interactive research Software Opportunities

gcc v2.96-94 (re-)throw bug

Introduction

Apologies for the large amount of code. It may be difficult to come up with a simple test program that demonstrates this problem, which has shown itself only in multi-threaded CORBA applications.

I am by no means sure that this is a compiler bug, but would note that the code has run successfully for a year or so with egcs-1.1.2-24 (RH6.1), egcs-1.1.2-30 (RH6.2), gcc-2.96-54 (RH7.0), and for a few weeks with gcc-2.95.3 (RH7.1). I am documenting it here in case it rings any bells with anyone.

The problem is that an exception is thrown and apparently, rather than being caught as expected, a piece of unrelated code in another module is executed, followed by a SEGV.

Code (1)

The following code is executed in the main thread.

cerr << "before resolve\n";
  try {
    CORBA::Object_var obj = nameServiceUtil::resolve(orb, nameServiceName);
cerr << "after resolve\n";
    throw CosNaming::NamingContext::NotFound();

    StationDirectory_var existing = StationDirectory::_narrow(obj);
cerr << "after narrow\n";
    if (!CORBA::is_nil(existing)) {
cerr << "after is_nil\n";

      if (!existing->_non_existent()) {
        cerr << "Existing " << nameServiceName << " is alive - aborting.\n";
        throw AlreadyRunning();
      }
cerr << "non-existent\n";

    } else {
      cerr << "Existing " << nameServiceName
           << " not a StationDirectory? - aborting.\n";
      throw AlreadyRunning();
    }

  } catch (CosNaming::NamingContext::NotFound& e) {

    // no existing stationDirectory - OK!
    cerr << "No existing stationDirectory - OK!\n";

  } catch (CORBA::COMM_FAILURE& e) {

    cerr << "Existing " << nameServiceName << " is dead - continuing.\n";

  } catch (AlreadyRunning& e) {

cerr << "re-throw already running\n";
    //throw;

  } catch (...) {
    cerr << "Unknown exception checking existing " << nameServiceName
         << " - aborting.\n";
    //throw;
  }

  //
  // Read the persistent data file, then open it for writing.
  //

cerr << "before lock\n";
  omni_mutex_lock l(lock);
cerr << "after lock\n";

  sequenceNum = 1;
  checkpointNeeded = true;
cerr << "before read\n";
  persistentDataFile.read(this);
cerr << "after read\n";

Code (2)

Here are parts of persistentDataFile.read()

void PersistentDataFile::read(Reader* r)
{
cerr << "PersistentDataFile::read() entry" << endl;
  ifs.open(active.c_str());

  if (ifs) {
    int line = 1;

    try {
      ...
    } catch (IOError& e) {

cerr << "PersistentDataFile::read() IOError" << endl;
      e.reason = e.reason + " in " + active + ".";
      throw;

    } catch (ParseError& e) {

cerr << "PersistentDataFile::read() ParseError" << endl;
      char lineStr[10];
      sprintf(lineStr,"%d",line);
      e.reason = e.reason + " at line " + lineStr + " in " + active + ".";
      throw;

    } catch (...) {
cerr << "PersistentDataFile::read() catch" << endl;
    }

cerr << "PersistentDataFile::read() report" << endl;
    cerr << timestamp() << "Read persistent data file " << active
         << " successfully." << endl;
  }
}

Program output

Run

gdb ./factory
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(gdb) run
Starting program: /home/bphone/bphone-server-2.1.9/src/factory/./factory 
[New Thread 1024 (LWP 31697)]
[New Thread 2049 (LWP 31698)]
Delayed SIGSTOP caught for LWP 31698.
[New Thread 1026 (LWP 31699)]
Delayed SIGSTOP caught for LWP 31699.
[New Thread 2051 (LWP 31700)]
Delayed SIGSTOP caught for LWP 31700.
[New Thread 3076 (LWP 31701)]
Delayed SIGSTOP caught for LWP 31701.
before resolve
len 32 str /project/bphone/stationDirectory
n 0 project
n 1 bphone
n 2 stationDirectory
n 3 buf 0x8086938
PersistentDataFile::read() report

Fri Jul 27 00:26:44 2001

Read persistent data file 
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1024 (LWP 31697)]
0x00001ee1 in ?? ()

Back-trace

(gdb) bt
#0  0x00001ee1 in ?? ()
#1  0x4038f831 in find_exception_handler (pc=0x805e19f, table=0x80721c8, eh_info=0x808aa88, 
    rethrow=1, cleanup=0xbffff5dc) at ../../gcc/libgcc2.c:3168
#2  0x4038fa82 in throw_helper (eh=0x807cfb0, pc=0x805e1e5, my_udata=0xbffff7e0, 
    offset_p=0xbffff7dc) at ../../gcc/libgcc2.c:3168
#3  0x4038ff6f in __rethrow (index=0x8070ebc) at ../../gcc/libgcc2.c:3168
#4  0x0805e1e6 in nameServiceUtil::resolve (orb=0x8085120, name=0xbffff950)
    at /opt/bphone/include/nameServiceUtil.h:98
#5  0x080586be in StationDirectory_i::StationDirectory_i (this=0x80862d0, __in_chrg=1, 
    orb_=0x8085120, boa_=0x80850c0, persistentDataFileName=0xbffffa10, checkpointPeriod_=900, 
    nameServiceName=0xbffffa20, purgeSessions_=false) at StationDirectory_i.cc:29
#6  0x0805b951 in main (argc=1, argv=0xbffffabc) at main.cc:90
#7  0x403ec617 in __libc_start_main (main=0x805b588 
, argc=1, ubp_av=0xbffffabc, init=0x80515a0 <_init>, fini=0x8067fe4 <_fini>, rtld_fini=0x4000db24 <_dl_fini>, stack_end=0xbffffaac) at ../sysdeps/generic/libc-start.c:129 (gdb) up #1 0x4038f831 in find_exception_handler (pc=0x805e19f, table=0x80721c8, eh_info=0x808aa88, rethrow=1, cleanup=0xbffff5dc) at ../../gcc/libgcc2.c:3168 3168 ../../gcc/libgcc2.c: No such file or directory. in ../../gcc/libgcc2.c (gdb) #2 0x4038fa82 in throw_helper (eh=0x807cfb0, pc=0x805e1e5, my_udata=0xbffff7e0, offset_p=0xbffff7dc) at ../../gcc/libgcc2.c:3168 3168 in ../../gcc/libgcc2.c (gdb) #3 0x4038ff6f in __rethrow (index=0x8070ebc) at ../../gcc/libgcc2.c:3168 3168 in ../../gcc/libgcc2.c (gdb) #4 0x0805e1e6 in nameServiceUtil::resolve (orb=0x8085120, name=0xbffff950) at /opt/bphone/include/nameServiceUtil.h:98 98 return nameServiceRoot->resolve(cosNamingName); Current language: auto; currently c++ (gdb) up #5 0x080586be in StationDirectory_i::StationDirectory_i (this=0x80862d0, __in_chrg=1, orb_=0x8085120, boa_=0x80850c0, persistentDataFileName=0xbffffa10, checkpointPeriod_=900, nameServiceName=0xbffffa20, purgeSessions_=false) at StationDirectory_i.cc:29 29 CORBA::Object_var obj = nameServiceUtil::resolve(orb, nameServiceName);

Discussion

From the back-trace it appears that an exception has been thrown in nameServiceUtil::resolve(), at StationDirectory_i.cc:29. However, from the program output, it seems that part of PersistentDataFile::read() has been executed. This is not called until StationDirectory_i.cc:81.

I found that adding or removing extra debug output (e.g. the n 0 project and similar lines) didn't affect whether SEGV was generated but did affect which bits of code were erroneously executed.


27/7/01 wfl@uk.research.att.com

For comments, suggestions and further information please contact us.
Copyright © 2001 AT&T Laboratories Cambridge