|
nk4um Moderator
Posts: 498
|
Hi Tom,
you raise a very interesting point and one which we must consider for NetKernel 4. My initial thoughts where that this might
be quite hard to achieve. Certainly Java makes practically gaining some of this information quite tricky. However there isn''t
anything in the NetKernel abstraction at the moment that can "oversee" the full sub-request tree at the application level.
The closest thing that we have in NetKernel 4 is a kernel listener which receives all request events from the kernel and can
do as it wishes with that data. However that cannot be layered in in the way you suggest, it is more of a global thing. This
is certainly food for thought.... cheers
|
|
nk4um Moderator
Posts: 498
|
Yes that documentation got checked in a bit to soon. Thank you for catching that and also thank you for your very kind words.
I''m glad we have this one sorted as, although you say it is minor, I feel it is the kind of thing that might put off someone
evaluating NetKernel for serious enterprise applications so it is great that you brought it to our attention.
Cheers, Tony
|
|
nk4um User
Posts: 111
|
Works great, thanks! The only issue is that the doc file (doc/doc_mod_db_guide.xml) is slightly mangled where it looks like
you started to add in a paragraph about the queryTimeoutMethod tag in the config file but created the distribution archive
before saving what you wrote and it has an unbalanced tag.
I''ll also take this opportunity to compliment you on your fantastic level of support; rare is the vendor who provides patches
for a relatively minor issue so fast.
|
|
nk4um User
Posts: 47
|
Is there a more general principle that could be abstracted out of this? Could a "watchdog aspect" be created which would cancel
an errant thread when it exceeded its resource limits (time, memory, calls, etc)? I guess I''m imagining something like the
security aspects which could just be wrapped around any URI. Is this feasible?
|
|
nk4um Moderator
Posts: 498
|
2007-09-17T10:38:57.000ZSeptember 17, 2007 10:38The last 5%
Hi J,
I''ve had a think about this and implemented an updated whereby you can specify an queryTimeoutMethod on the configuration.
If you specify TIMER rather than SET it will use the java.util.Timer to place a timeout task on the statement to call cancel()
after the specified period of time. (The timer is cancelled when the query completes) Specifying SET (the default) will use
the statement.setQueryTimeout().
Let me know how you get on.
|
|
nk4um Moderator
Posts: 770
|
2007-09-15T09:19:53.000ZSeptember 15, 2007 09:19Watchdog?
The sqlXXXX accessor could have a watchdog thread for cases where only Statement.cancel() works?
|
|
nk4um Moderator
Posts: 498
|
I''m glad that sorted oracle. It''s interest what you say about the way postgres handles it. Let me give it some thought over
the weekend and get back to you. It hopefully won''t require a kernel change...
|
|
nk4um User
Posts: 111
|
2007-09-14T23:39:30.000ZSeptember 14, 2007 23:39
Thanks, I installed it and it works like a charm for oracle connections, which is 95% of our concern. It might make more
sense for the "default" default timeout to be set to something other than zero - maybe 60 or the deadlock detection timeout.
Setting it on a per-statement basis could make for more overall consistency, but I doubt it would provide any additional real
benefit.
However, the postgresql jdbc driver doesn''t implement setTimeout (it doesn''t even let you know that its not implemented).
So this implementation is no help there. I have an idea for a way to solve it, but it''s pretty far reaching and probably
only applicable to this one case. Basically allow a request to register (and unregister) an interrupt callback that the scheduler
would call before or instead of Thread.stop. I think this would work because the postgresql driver does implement Statement.cancel()
(it sends a cancel request to the server over the same connection, and the server then responds). So if mod-db setup an interrupt
callback to cancel the currently executing statement instead of timing out, then deadlock detection and manually killing the
thread would work normally. The downside is that this is a low-level kernel change (I''d think in the scheduler, but I don''t
know where to go in there) and such things should not be taken lightly.
|
|
nk4um Moderator
Posts: 498
|
I have made a change to the database module to support a default query timeout on the configuration with a <defaultQueryTimeout/>
tag. It defaults to no timeout as it behaved previously. This change will be available in the next NetKernel 3.3 alpha - watch
posts to Randy''s thread: http://www.1060.org/forum/topic/281Please let me know if this is adequate for you. We have the possibility of adding an overridable timeout value onto each query
but at the moment I feel this isn''t necessary and just adds complexity. Cheers, Tony
|
|
nk4um User
Posts: 111
|
2007-09-13T23:02:31.000ZSeptember 13, 2007 23:02
In practice the problem is mostly an annoyance to developers; by the time code makes it out to production hopefully we have
eliminated these types of aberrant queries. We also have a watchdog on our production servers to externally kill and restart
netkernel if it does become unresponsive such as in a case like this. Anecdotally, this happens "occasionally".
However, the issue does make some of our developers wary about deploying more mission-critical services on netkernel and I''d
prefer to be able to say "we can recover easily if the problem does come up" than "if we make no mistakes then it won''t happen."
|
|
nk4um Moderator
Posts: 498
|
Hi J, thank you. Yes I understand how this is a limitation that needs to be sorted out. I have raised a bugxter bug to track it: http://www.1060.org/bugxter/main/edit_bug?id=134I am unsure why NetKernel and the underlying java thread interrupt/termination mechanism is unable to stop a pending jdbc
request. I have seen a similar problem with network IO, however in that situation we have the ability to set the network timeout
parameters. We''ll look into how to do this for JDBC. Can you let me know how urgent this is for you and your project? Cheers, Tony
|
|
nk4um User
Posts: 111
|
I was experimenting (accidentally at first, then intentionally) with long-running sql queries and discovered a troubling behavior:
sql queries cannot be interrupted in any way. They cannot be killed from the control panel (the response is "request was
killed but is not responding"), if they are detected as deadlocked the behavior is probably similar. Initiating a restart
(hot or cold) or initiating a kernel shutdown (for an automatic external restart) only makes things worse, because these processes
wait for existing requests to complete - which they never will - and the entire system becomes completely unresponsive. The
only solution at this point is to kill the server externally and messily.
Its pretty trivial to write a sql query that will take years to run to completion; while it is obviously best to not write these kinds of queries, there are plenty of ways they could accidentally sneak in, and it only takes one to get the
system into an unrecoverable state. I''ve tried this using the oracle thin jdbc driver and the postgresql jdbc driver with
similar results.
So the simple question: how do I set a timeout on the sql query? There doesn''t appear to be any way to do this either at
the statement level or from the RDBMSconfig.xml configuration file.
Deeper question: why can''t the system kill these requests? Is it a bug, or is there something intrinsic about jdbc operations
that prevents them from being killed? And is there any other way I could recover a system in this state?
Thanks -J
|