i bit new nutch . thing crawling url redirects url .now when analysing crawl results content of first url along status code : temp redirected (second url name) . question why not getting content , details of second url .is redirected url getting crawled or not? please help.
again, in omnipotent nutch-default.xml, there attribute controls way how nutch handles redirection.
<property> <name>http.redirect.max</name> <value>0</value> <description>the maximum number of redirects fetcher follow when trying fetch page. if set negative or 0, fetcher won't follow redirected urls, instead record them later fetching. </description> </property>
as description has mentioned, fetcher won't follow redirected urls , record them later fetching
. still have not figured out how force urls in db_redir_temp
fetched. however, if change configuration right @ beginning, assume might go away.
Comments
Post a Comment