Wednesday, August 27, 2014

Killing Hang up Threads in Heritrix

Since I've started working with Heritrix, I used to face the problem of  hanging up threads. Sometime the job is running but it doesn't do anything. The main solution that comes to my mind is how to stop/kill these threads without affecting the current process.

In this blog, I will list some steps that I used to enable jdb on Heritrix and using it to kill specific thread.

  1. You need to modify bin/heritrix script to enable the jdb attach.
  2. JAVA_OPTS="${JAVA_OPTS} -Xmx256m -Xrunjdwp:transport=dt_socket,address=50100,server=y,suspend=n"
  3. Run Heritrix
  4. bin/heritrix -a username:password
  5. Run jdb
  6. jdp -attach 50100
  7. List the available thread groups by using threadgroups, list threads for specific group by threads , or list all the available threads by using thread commands.
  8. threads
  9. Select the thread that you want to kill. This is the tricky part, you need to look to heritrix web interface and the thread names to determine the thread that should be killed.
  10. Example: (org.archive.crawler.framework.ToeThread)0xe4b ToeThread #24: http://humsci.stanford.edu/robots.txt running
  11. Kill the thread from jdb by doing the following steps, based on this blog post.
  12. Four steps
    • thread <thread_id>
    • suspend <thread_id>
    • step
    • kill <thread_id> new java.lang.Exception()
    > thread 0xe4b ToeThread #24: [1] suspend 0xe4b ToeThread #24: [1] step > Step completed: "thread=ToeThread #24: ", org.archive.crawler.frontier.WorkQueueFrontier.findEligibleURI(), line=729 bci=492 ToeThread #24: [1] kill 0xe4b new java.lang.Exception() killing thread: ToeThread #24: ToeThread #24: [1] instance of org.archive.crawler.framework.ToeThread(name='ToeThread #24: ', id=3659) killed

  13. If you go to the job web interface, you will find the following line in the crawl log.

  14. 2014-08-27T01:31:42.982Z SEVERE Fatal exception in ToeThread #24: (in thread 'ToeThread #24: ')

No comments:

Post a Comment