 |
Galloping Ghost of the Japanese Coast
|
Let's say you have an XML document containing authors.
The Nokogiri tutorial tells you can do this:
authors = doc.xpath("//author")
And it shows you will get output like this:
<author>Kernighan</author>
<author>Ritchie</author>
<author>Matsumoto</author>
How do you get rid of all those tags?
Instead of reading the Nokogiri documentation like I should have, I tried to further process this output.
A regex worked fine but you have to worry about exceptions if there are no matches. And it's ugly.
doc = Nokogiri::XML(body)
auths = []
authors = doc.xpath("//author")
for i in 0..authors.length - 1
auths[i] = /.*<author>(.*)<\/author>.*/.match(author[i].to_s)[1]
end
I also tried string substitution, which also worked fine. I didn't test a no match case.
doc = Nokogiri::XML(body)
auths = []
authors = doc.xpath("//author")
for i in 0..authors.length - 1
auths[i] = author[i].to_s.sub("<author>","").sub("</author>","")
end
I knew I was parsing already parsed data and thought there should be an option to suppress the tags. I received some good advice to look more closely at Nokogiri and I came up with this approach of popping Nodes off of the NodeSet.
If you need know how many Nodes there were originally you have to save a copy before the first pop.
doc = Nokogiri::XML(body)
auths = []
authors = doc.xpath("//author")
while authors.length() > 0
auths << authors.pop().inner_text()
end
I think there might be an approach where you can iterate over the NodeSet without worrying about length and without using pop() but I haven't figured it out yet.