nutch redirection handling issue -

i bit new nutch . thing crawling url redirects url .now when analysing crawl results content of first url along status code : temp redirected (second url name) . question why not getting content , details of second url .is redirected url getting crawled or not? please help.

again, in omnipotent nutch-default.xml, there attribute controls way how nutch handles redirection.

<property>   <name>http.redirect.max</name>   <value>0</value>   <description>the maximum number of redirects fetcher follow when   trying fetch page. if set negative or 0, fetcher won't   follow redirected urls, instead record them later fetching.   </description> </property>

as description has mentioned, fetcher won't follow redirected urls , record them later fetching. still have not figured out how force urls in db_redir_temp fetched. however, if change configuration right @ beginning, assume might go away.

Brazie

Search This Blog

nutch redirection handling issue -

Comments

Post a Comment