02-07-2018, 09:31 PM
That's very unlikely to be the cause. The error messages are not from the deadlock detector doing its work, they are from the cleanup phase, when the deadlock detector is being terminated as part of the server shutdown. Let me explain in more detail:
Our deadlock detector works by checking whether each world's clock (game tick) gets incremented at least once during the pre-set interval (20 seconds by default). If it doesn't, the detector considers this a deadlock and aborts the server. At the time of the abort, it also writes some useful information to the console, part of which is which mutex (CriticalSection / CS in our Windows-originated terminology) is locked by which thread - this usually points directly at the deadlock loop. Because there's no generic way to enumerate all mutexes used by a program, we use manual registation - hand-selected mutexes get registered with the deadlock detector upon their creation, and unregistered upon their destruction. And the messages you get mean that the detector is being destroyed while some of the mutexes haven't unregistered yet. Since the deadlock detector is destroyed very late in the server shutdown sequence, I believe it has no negative impact on the data (nothing should get lost in the world saves or player saves), even if it means some problem with the code. Also it means I have no idea what could be causing this, beside some really wild theories about exceptions.
Since you're building your own executable, you could probably help us out by compiling in Debug mode (-DCMAKE_BUILD_TYPE=Debug passed to cmake) and running the server under gdb within the Amazon instance, setting a breakpoint to the code that prints the error messages (src/DeadlockDetect.cpp, line 42) and getting a stacktrace of all the threads when that breakpoint is hit.
Our deadlock detector works by checking whether each world's clock (game tick) gets incremented at least once during the pre-set interval (20 seconds by default). If it doesn't, the detector considers this a deadlock and aborts the server. At the time of the abort, it also writes some useful information to the console, part of which is which mutex (CriticalSection / CS in our Windows-originated terminology) is locked by which thread - this usually points directly at the deadlock loop. Because there's no generic way to enumerate all mutexes used by a program, we use manual registation - hand-selected mutexes get registered with the deadlock detector upon their creation, and unregistered upon their destruction. And the messages you get mean that the detector is being destroyed while some of the mutexes haven't unregistered yet. Since the deadlock detector is destroyed very late in the server shutdown sequence, I believe it has no negative impact on the data (nothing should get lost in the world saves or player saves), even if it means some problem with the code. Also it means I have no idea what could be causing this, beside some really wild theories about exceptions.
Since you're building your own executable, you could probably help us out by compiling in Debug mode (-DCMAKE_BUILD_TYPE=Debug passed to cmake) and running the server under gdb within the Amazon instance, setting a breakpoint to the code that prints the error messages (src/DeadlockDetect.cpp, line 42) and getting a stacktrace of all the threads when that breakpoint is hit.