[FIXED] r279 - stop/restart doesn't work
#1
When performing stop/restart on r274 MCS responds with:

...
[<time>] TICK THREAD STOPPED

And then nothing else happens...
Reply
Thanks given by:
#2
Are you on windows or on linux? Can you work with your platform's debugger? Can you inspect the process to see where the individual threads are blocked?
Reply
Thanks given by:
#3
I'm on Linux. I'm not very familiar with debugging but I'm guessing I should use gdb. Can you guide me further?
Reply
Thanks given by:
#4
Well, Linux is the difficult one for me, but i'll try. I have only experience with ubuntu server, so your distro might differ.
First compile MCServer in the debug mode (default for the GNUmakefile; "make clean" before switching; instructions are in the COMPILING file). Then get MCServer to the situation that you describe - deadlocked while stopping. Then on another console:
1, Find out the MCServer PID, using "ps uxaw | grep MCS". It should list the processes containing MCS in their cmdline, so find the MCServer one and the PID is the number in the second column.
2, Start gdb with parameters path-to-MCServer PID ("gdb ./MCServer 8093"). GDB spits out a few lines of info, "Reading symbols from ~/MCServer...done." should be one of them. If it says "(no debugging symbols found)" in the MCServer line, then you didn't compile for debugging - retry (you need to make clean)
3, GDB now is waiting for your commands. Try "info threads" first to see all the threads; normally MCServer uses 5 - 7 threads.
4, For each thread in the table, use two commands, first "thread N" (where N is the thread number, first in the table), then "bt". This should print out a table of where the thread is executing.
5, Post all the output here, I'll have a look.

For me, this is what I get:
Code:
(gdb) info threads
  5 Thread 0xb75e0b70 (LWP 8094)  0xb7899430 in __kernel_vsyscall ()
  4 Thread 0xb6ddfb70 (LWP 8095)  0xb7899430 in __kernel_vsyscall ()
  3 Thread 0xb65deb70 (LWP 8096)  0xb7899430 in __kernel_vsyscall ()
  2 Thread 0xb53feb70 (LWP 8098)  0xb7899430 in __kernel_vsyscall ()
* 1 Thread 0xb75e2b30 (LWP 8093)  0xb7899430 in __kernel_vsyscall ()
(gdb) thread 1
[Switching to thread 1 (Thread 0xb75e2b30 (LWP 8093))]#0  0xb7899430 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7899430 in __kernel_vsyscall ()
#1  0xb7749245 in sem_wait@@GLIBC_2.1 () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08157873 in cEvents::Wait (this=0x8f10db8) at WebServer/cEvents.cpp:99
#3  0x0815c96a in webserver::Stop (this=0x8f10d88) at WebServer/WebServer.cpp:402
#4  0x080b5998 in ~cWebAdmin (this=0x8f10b60, __in_chrg=<value optimized out>) at source/cWebAdmin.cpp:69
#5  0x080a60f5 in cRoot::Start (this=0xbfe20808) at source/cRoot.cpp:145
#6  0x080cb500 in main (argc=1, argv=0xbfe20954) at source/main.cpp:79
(gdb) thread 2
[Switching to thread 2 (Thread 0xb53feb70 (LWP 8098))]#0  0xb7899430 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7899430 in __kernel_vsyscall ()
#1  0xb774a168 in accept () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x080ae9ed in cSocket::Accept (this=0x8f12548) at source/cSocket.cpp:240
#3  0x080a9997 in cServer::StartListenClient (this=0x8f12a48) at source/cServer.cpp:259
#4  0x080a8db7 in cServer::ServerListenThread (a_Args=0x8f12a48) at source/cServer.cpp:95
#5  0x080b37cb in cThread::MyThread (a_Param=0xb5c119b0) at source/cThread.cpp:114
#6  0xb774296e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#7  0xb76b0a4e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0xb65deb70 (LWP 8096))]#0  0xb7899430 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7899430 in __kernel_vsyscall ()
#1  0xb7749245 in sem_wait@@GLIBC_2.1 () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08128e27 in cEvent::Wait (this=0x8f142bc) at source/cEvent.cpp:93
#3  0x0811b182 in cChunkGenerator::Execute (this=0x8f14290) at source/cChunkGenerator.cpp:120
#4  0x0813ad20 in cIsThread::thrExecute (iParam=0x8f14290) at source/cIsThread.h:60
#5  0xb774296e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#6  0xb76b0a4e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) thread 4
[Switching to thread 4 (Thread 0xb6ddfb70 (LWP 8095))]#0  0xb7899430 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7899430 in __kernel_vsyscall ()
#1  0xb7749245 in sem_wait@@GLIBC_2.1 () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08128e27 in cEvent::Wait (this=0x8f1421c) at source/cEvent.cpp:93
#3  0x080944fb in cWorldStorage::Execute (this=0x8f141e0) at source/WorldStorage.cpp:247
#4  0x0813ad20 in cIsThread::thrExecute (iParam=0x8f141e0) at source/cIsThread.h:60
#5  0xb774296e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#6  0xb76b0a4e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 0xb75e0b70 (LWP 8094))]#0  0xb7899430 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7899430 in __kernel_vsyscall ()
#1  0xb774a168 in accept () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x0815840d in SocketServer::Accept (this=0x8f10d98) at WebServer/Socket.cpp:228
#3  0x0815c988 in webserver::Begin (this=0x8f10d88) at WebServer/WebServer.cpp:410
#4  0x080b7836 in cWebAdmin::ListenThread (lpParam=0x8f10b60) at source/cWebAdmin.cpp:322
#5  0xb774296e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#6  0xb76b0a4e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb)

6, To quit gdb, use "q" command; it will ask if it's okay to detach from the MCServer process, answer "y".
Reply
Thanks given by:
#5
Here's some output from gdb on r279. I had to log output to a file (set logging on) which doesn't show the commands given. I hope you can use it anyways..

Code:
Id   Target Id         Frame
  5    Thread 0xb7411b40 (LWP 4032) "MCServer" 0xb7713424 in __kernel_vsyscall
    ()
  4    Thread 0xb6c10b40 (LWP 4033) "MCServer" 0xb7713424 in __kernel_vsyscall
    ()
  3    Thread 0xb640fb40 (LWP 4034) "MCServer" 0xb7713424 in __kernel_vsyscall
    ()
  2    Thread 0xb4ff3b40 (LWP 4036) "MCServer" 0xb7713424 in __kernel_vsyscall
    ()
* 1    Thread 0xb7413700 (LWP 4031) "MCServer" 0xb7713424 in __kernel_vsyscall
    ()
[Switching to thread 1 (Thread 0xb7413700 (LWP 4031))]
#0  0xb7713424 in __kernel_vsyscall ()
#0  0xb7713424 in __kernel_vsyscall ()
#1  0xb75c3c95 in sem_wait@@GLIBC_2.1 () from /lib/libpthread.so.0
#2  0x08144567 in cEvents::Wait (this=0xb5b00498) at WebServer/cEvents.cpp:99
#3  0x081471c4 in webserver::Stop (this=0xb5b00468)
    at WebServer/WebServer.cpp:402
#4  0x080d7b85 in cWebAdmin::~cWebAdmin (this=0x82fcdf0,
    __in_chrg=<optimized out>) at source/cWebAdmin.cpp:69
#5  0x080bc1cb in cRoot::Start (this=0xbf9c5a58) at source/cRoot.cpp:145
#6  0x080a8258 in main (argc=1, argv=0xbf9c5b84) at source/main.cpp:79
[Switching to thread 2 (Thread 0xb4ff3b40 (LWP 4036))]
#0  0xb7713424 in __kernel_vsyscall ()
#0  0xb7713424 in __kernel_vsyscall ()
#1  0xb75c4bd8 in accept () from /lib/libpthread.so.0
#2  0x081326cd in cSocket::Accept (this=0x82fcac8) at source/cSocket.cpp:240
#3  0x08134acc in cServer::StartListenClient (this=0x82fc758)
    at source/cServer.cpp:259
#4  0x0813411f in cServer::ServerListenThread (a_Args=0x82fc758)
    at source/cServer.cpp:95
#5  0x080a854d in cThread::MyThread (a_Param=0x82fd868)
    at source/cThread.cpp:114
#6  0xb75bdd4c in start_thread () from /lib/libpthread.so.0
#7  0xb74fd03e in clone () from /lib/libc.so.6
[Switching to thread 3 (Thread 0xb640fb40 (LWP 4034))]
#0  0xb7713424 in __kernel_vsyscall ()
#0  0xb7713424 in __kernel_vsyscall ()
#1  0xb75c3c95 in sem_wait@@GLIBC_2.1 () from /lib/libpthread.so.0
#2  0x080c1253 in cEvent::Wait (this=0x82fe74c) at source/cEvent.cpp:93
#3  0x080d7760 in cChunkGenerator::Execute (this=0x82fe720)
    at source/cChunkGenerator.cpp:120
#4  0x080dc27c in cIsThread::thrExecute (iParam=0x82fe720)
    at source/cIsThread.h:60
#5  0xb75bdd4c in start_thread () from /lib/libpthread.so.0
#6  0xb74fd03e in clone () from /lib/libc.so.6
[Switching to thread 4 (Thread 0xb6c10b40 (LWP 4033))]
#0  0xb7713424 in __kernel_vsyscall ()
#0  0xb7713424 in __kernel_vsyscall ()
#1  0xb75c3c95 in sem_wait@@GLIBC_2.1 () from /lib/libpthread.so.0
#2  0x080c1253 in cEvent::Wait (this=0x82fe6ac) at source/cEvent.cpp:93
#3  0x0809ce91 in cWorldStorage::Execute (this=0x82fe670)
    at source/WorldStorage.cpp:247
#4  0x080dc27c in cIsThread::thrExecute (iParam=0x82fe670)
    at source/cIsThread.h:60
#5  0xb75bdd4c in start_thread () from /lib/libpthread.so.0
#6  0xb74fd03e in clone () from /lib/libc.so.6
[Switching to thread 5 (Thread 0xb7411b40 (LWP 4032))]
#0  0xb7713424 in __kernel_vsyscall ()
#0  0xb7713424 in __kernel_vsyscall ()
#1  0xb75c4bd8 in accept () from /lib/libpthread.so.0
#2  0x081492d4 in SocketServer::Accept (this=0xb5b00478)
    at WebServer/Socket.cpp:228
#3  0x081471e2 in webserver::Begin (this=0xb5b00468)
    at WebServer/WebServer.cpp:410
#4  0x080d9879 in cWebAdmin::ListenThread (lpParam=0x82fcdf0)
    at source/cWebAdmin.cpp:322
#5  0xb75bdd4c in start_thread () from /lib/libpthread.so.0
#6  0xb74fd03e in clone () from /lib/libc.so.6

When the debug session ended MCS wrote the following (this wasn't shown earlier):

[...]
[11:55:06] TICK THREAD STOPPED
[11:56:10] Waiting for the world storage to finish saving
[11:56:10] --- Stopped Log ---
Reply
Thanks given by:
#6
Seems more or less the same as mine. I't waiting for the webserver to finish, and for some reason the webserver doesn't take the hint. Thanks for the dump, I'll see if I can do anything about it.
Reply
Thanks given by:
#7
I think I found it. On linux, just closing a socket filedescriptor in one thread doesn't wake up the thread blocked on an accept() or select() for the same socket. The fix is to call shutdown for that socket fd before closing.
Fixed in rev 283.
Reply
Thanks given by:
#8
Yeah it's fixed! Smile

I've tried stress-testing the restart-command using the following command line (on Linux):
echo restart | ./MCServer

Output from MCS shows that the restart command is executed succesfully. But it also goes into an endless loop outputting the following several times a sec:
[12:31:10] Server console command: ""

At the same time memory consumption rises.

I don't think this is an important bug to fix but because of the memory consumption I thought you should know....

EDIT: I just tried the same with r235 and there was no endless loop. The above observation could be related to the implemented multithreading...
Reply
Thanks given by:




Users browsing this thread: 3 Guest(s)