In this blog, I will list some steps that I used to enable jdb on Heritrix and using it to kill specific thread.
- You need to modify bin/heritrix script to enable the jdb attach.
- Run Heritrix
- Run jdb
- List the available thread groups by using threadgroups, list threads for specific group by threads
, or list all the available threads by using thread commands. - Select the thread that you want to kill. This is the tricky part, you need to look to heritrix web interface and the thread names to determine the thread that should be killed.
- Kill the thread from jdb by doing the following steps, based on this blog post. Four steps
- thread <thread_id>
- suspend <thread_id>
- step
- kill <thread_id> new java.lang.Exception()
- If you go to the job web interface, you will find the following line in the crawl log.
JAVA_OPTS="${JAVA_OPTS} -Xmx256m -Xrunjdwp:transport=dt_socket,address=50100,server=y,suspend=n"
bin/heritrix -a username:password
jdp -attach 50100
threads
Example: (org.archive.crawler.framework.ToeThread)0xe4b ToeThread #24: http://humsci.stanford.edu/robots.txt running
> thread 0xe4b ToeThread #24: [1] suspend 0xe4b ToeThread #24: [1] step > Step completed: "thread=ToeThread #24: ", org.archive.crawler.frontier.WorkQueueFrontier.findEligibleURI(), line=729 bci=492 ToeThread #24: [1] kill 0xe4b new java.lang.Exception() killing thread: ToeThread #24: ToeThread #24: [1] instance of org.archive.crawler.framework.ToeThread(name='ToeThread #24: ', id=3659) killed
2014-08-27T01:31:42.982Z SEVERE Fatal exception in ToeThread #24: (in thread 'ToeThread #24: ')