mod_db and request timeouts

Poster Content
nk4um Moderator
Posts: 498
September 17, 2007 19:00Re: Watchdog aspect?
Hi Tom,

you raise a very interesting point and one which we must consider for NetKernel 4. My initial thoughts where that this might be quite hard to achieve. Certainly Java makes practically gaining some of this information quite tricky. However there isn''t anything in the NetKernel abstraction at the moment that can "oversee" the full sub-request tree at the application level. The closest thing that we have in NetKernel 4 is a kernel listener which receives all request events from the kernel and can do as it wishes with that data. However that cannot be layered in in the way you suggest, it is more of a global thing. This is certainly food for thought.... cheers
nk4um Moderator
Posts: 498
September 17, 2007 18:55Re: Check this one off
Yes that documentation got checked in a bit to soon. Thank you for catching that and also thank you for your very kind words.

I''m glad we have this one sorted as, although you say it is minor, I feel it is the kind of thing that might put off someone evaluating NetKernel for serious enterprise applications so it is great that you brought it to our attention.

Cheers,
Tony
nk4um User
Posts: 111
September 17, 2007 18:30Check this one off
Works great, thanks!  The only issue is that the doc file (doc/doc_mod_db_guide.xml) is slightly mangled where it looks like you started to add in a paragraph about the queryTimeoutMethod tag in the config file but created the distribution archive before saving what you wrote and it has an unbalanced tag.

I''ll also take this opportunity to compliment you on your fantastic level of support; rare is the vendor who provides patches for a relatively minor issue so fast.
nk4um User
Posts: 47
September 17, 2007 18:22Watchdog aspect?
Is there a more general principle that could be abstracted out of this? Could a "watchdog aspect" be created which would cancel an errant thread when it exceeded its resource limits (time, memory, calls, etc)? I guess I''m imagining something like the security aspects which could just be wrapped around any URI. Is this feasible?
nk4um Moderator
Posts: 498
September 17, 2007 10:38The last 5%
Hi J,

I''ve had a think about this and implemented an updated whereby you can specify an queryTimeoutMethod on the configuration. If you specify TIMER rather than SET it will use the java.util.Timer to place a timeout task on the statement to call cancel() after the specified period of time. (The timer is cancelled when the query completes) Specifying SET (the default) will use the statement.setQueryTimeout().

Let me know how you get on.
nk4um Moderator
Posts: 770
September 15, 2007 09:19Watchdog?
The sqlXXXX accessor could have a watchdog thread for cases where only Statement.cancel() works?
nk4um Moderator
Posts: 498
September 15, 2007 09:10Excellent, let me think about postgres
I''m glad that sorted oracle. It''s interest what you say about the way postgres handles it. Let me give it some thought over the weekend and get back to you. It hopefully won''t require a kernel change...
nk4um User
Posts: 111
September 14, 2007 23:39
Thanks, I installed it and it works like a charm for oracle connections, which is 95% of our concern.  It might make more sense for the "default" default timeout to be set to something other than zero - maybe 60 or the deadlock detection timeout.  Setting it on a per-statement basis could make for more overall consistency, but I doubt it would provide any additional real benefit.

However, the postgresql jdbc driver doesn''t implement setTimeout (it doesn''t even let you know that its not implemented).  So this implementation is no help there.   I have an idea for a way to solve it, but it''s pretty far reaching and probably only applicable to this one case.  Basically allow a request to register (and unregister) an interrupt callback that the scheduler would call before or instead of Thread.stop.  I think this would work because the postgresql driver does implement Statement.cancel() (it sends a cancel request to the server over the same connection, and the server then responds).  So if mod-db setup an interrupt callback to cancel the currently executing statement instead of timing out, then deadlock detection and manually killing the thread would work normally.  The downside is that this is a low-level kernel change (I''d think in the scheduler, but I don''t know where to go in there) and such things should not be taken lightly. 
nk4um Moderator
Posts: 498
September 14, 2007 10:38defaultQueryTimeout
I have made a change to the database module to support a default query timeout on the configuration with a <defaultQueryTimeout/> tag. It defaults to no timeout as it behaved previously. This change will be available in the next NetKernel 3.3 alpha - watch posts to Randy''s thread:

http://www.1060.org/forum/topic/281

Please let me know if this is adequate for you. We have the possibility of adding an overridable timeout value onto each query but at the moment I feel this isn''t necessary and just adds complexity.

Cheers,
Tony
nk4um User
Posts: 111
September 13, 2007 23:02
In practice the problem is mostly an annoyance to developers; by the time code makes it out to production hopefully we have eliminated these types of aberrant queries.  We also have a watchdog on our production servers to externally kill and restart netkernel if it does become unresponsive such as in a case like this.  Anecdotally, this happens "occasionally".

However, the issue does make some of our developers wary about deploying more mission-critical services on netkernel and I''d prefer to be able to say "we can recover easily if the problem does come up" than "if we make no mistakes then it won''t happen."
nk4um Moderator
Posts: 498
September 13, 2007 22:31A solution is needed
Hi J,

thank you. Yes I understand how this is a limitation that needs to be sorted out. I have raised a bugxter bug to track it:

http://www.1060.org/bugxter/main/edit_bug?id=134

I am unsure why NetKernel and the underlying java thread interrupt/termination mechanism is unable to stop a pending jdbc request. I have seen a similar problem with network IO, however in that situation we have the ability to set the network timeout parameters.

We''ll look into how to do this for JDBC. Can you let me know how urgent this is for you and your project?

Cheers,
Tony
nk4um User
Posts: 111
September 13, 2007 19:02mod_db and request timeouts
I was experimenting (accidentally at first, then intentionally) with long-running sql queries and discovered a troubling behavior: sql queries cannot be interrupted in any way.  They cannot be killed from the control panel (the response is "request was killed but is not responding"), if they are detected as deadlocked the behavior is probably similar.  Initiating a restart (hot or cold) or initiating a kernel shutdown (for an automatic external restart) only makes things worse, because these processes wait for existing requests to complete - which they never will - and the entire system becomes completely unresponsive.  The only solution at this point is to kill the server externally and messily.

Its pretty trivial to write a sql query that will take years to run to completion; while it is obviously best to not write these kinds of queries, there are plenty of ways they could accidentally sneak in, and it only takes one to get the system into an unrecoverable state.  I''ve tried this using the oracle thin jdbc driver and the postgresql jdbc driver with similar results.

So the simple question: how do I set a timeout on the sql query?  There doesn''t appear to be any way to do this either at the statement level or from the RDBMSconfig.xml configuration file. 

Deeper question: why can''t the system kill these requests?  Is it a bug, or is there something intrinsic about jdbc operations that prevents them from being killed?  And is there any other way I could recover a system in this state?

Thanks
-J