java - Parsing XML with Jsoup -


i following xml represents news article:

<content>    text blalalala    <h2>small subtitle</h2>    more text blbla    <ul class="list">       <li>list item 1</li>       <li>list item 2</li>    </ul>    <br />    more freakin text </content> 

i know format isn't ideal have take it.

the article should like:

  • some text blalalala
  • small subtitle
  • list items
  • even more freakin text

i parse xml jsoup. can text within <content> tag doc.owntext() have no idea other stuff (subtitle) placed, 1 big string.

would better use event based parser (i hate them :() or there possibility doc.gettextuntiltagappears("tagname")?

edit: clarification, know hot elements under <content>, problem getting text within <content>, broken every time when interrupted element.

i learned can text within content .textnodes(), works great, again don't know text node belongs in article (one @ top before h2, other 1 @ bottom).

jsoup has fantastic selector based syntax. see here

if want subtitle

document doc = jsoup.parse("path-to-your-xml"); // document node 

you know subtitle in h2 element

element subtitle = doc.select("h2").first();  // first h2 element appears 

and if have list:

elements listitems = doc.select("ul.list > li"); for(element item: listitems)     system.out.println(item.text());  // print list's items 1 after 

Comments