multithreading - Java ThreadPool usage -
I am trying to write a multithreaded web crawler.
My main entry class has the following code:
Executable service exec = Executor. NewfixedThreadPool (numberOfCrawlers); Whereas (true) {url = frontier.get (); If (URL == empty) return; Exec.execute (new URLCrawler (this, URL)); }
The URL crawler brings the specified URL, removes the HTML and sends the link back to the margin.
A threshold is the line without any URL. The problem is how to write the get () method. If the queue is empty, then any URL crawlers should wait until the end and then try again the queue is empty and there is currently no active url crawler when it should be blank.
My first idea was to use an atomenter to calculate the current number of working crawlers and to calculate an auxiliary object to inform. () At the beginning of the call, each crawler increases the number of current working URL crawlers, and decreases when it comes out, and inform the object that it has completed.
But I have read that (inform) / notify () and wait () thread to some extent to exclude methods.
What should I use in this work pattern? This is similar to M producer and N consumers, the question is how to handle the evolution of producers.
I think the use / notification is appropriate in this case. Any straight j.u.c. To use it, it can not think any way forward. In a class, let's call the coordinator:
private final integer number of crawlers; Private Ent Wait; Public Boolean Triangle () {synchronize (this) {wait ++; If (Waiting & gt; = numOfCrawlers) {// Everyone is waiting, return falsehood; } and wait); // fake wake-up is ok / waited for any reason again - wait; Back true; }} Public is Zero () (synchronized) (this) {notifyAll (); }}
Then,
Executable service exec = Executor. NewfixedThreadPool (numberOfCrawlers); Whereas (true) {url = frontier.get (); If (url == faucet) {if (Coordinator.shouldTryAgain ()) // // All threads are waiting; No possibility of new jobs return; } Else {// It is possible that other jobs are retry; }} Exec.execute (new URLCrawler (this, url)); } // while (true)
Comments
Post a Comment