i following xml represents news article:
<content> text blalalala <h2>small subtitle</h2> more text blbla <ul class="list"> <li>list item 1</li> <li>list item 2</li> </ul> <br /> more freakin text </content>
i know format isn't ideal have take it.
the article should like:
- some text blalalala
- small subtitle
- list items
- even more freakin text
i parse xml jsoup. can text within <content>
tag doc.owntext()
have no idea other stuff (subtitle) placed, 1 big string
.
would better use event based parser (i hate them :() or there possibility doc.gettextuntiltagappears("tagname")
?
edit: clarification, know hot elements under <content>
, problem getting text within <content>
, broken every time when interrupted element.
i learned can text within content .textnodes()
, works great, again don't know text node belongs in article (one @ top before h2, other 1 @ bottom).
jsoup has fantastic selector based syntax. see here
if want subtitle
document doc = jsoup.parse("path-to-your-xml"); // document node
you know subtitle in h2
element
element subtitle = doc.select("h2").first(); // first h2 element appears
and if have list:
elements listitems = doc.select("ul.list > li"); for(element item: listitems) system.out.println(item.text()); // print list's items 1 after
Comments
Post a Comment