<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Nlp on the art of simplicity</title><link>https://naoko.github.io/tags/nlp/</link><description>Recent content in Nlp on the art of simplicity</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 12 Apr 2020 00:00:00 +0000</lastBuildDate><atom:link href="https://naoko.github.io/tags/nlp/index.xml" rel="self" type="application/rss+xml"/><item><title>Stanza - A Python NLP Library for Many Human Languages</title><link>https://naoko.github.io/posts/2020-04-12-stanza-/</link><pubDate>Sun, 12 Apr 2020 00:00:00 +0000</pubDate><guid>https://naoko.github.io/posts/2020-04-12-stanza-/</guid><description>&lt;p&gt;I tested out &lt;a href="https://stanfordnlp.github.io/stanza/"&gt;Stanza&lt;/a&gt;.
English tokenizer and definately works.
I ran quick test with Japanese lang and output was somewhat unexpected.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;import stanza

# japanese &amp;#34;ja&amp;#34;, for english model &amp;#34;en&amp;#34;

stanza.download(&amp;#34;ja&amp;#34;)
nlp = stanza.Pipeline(&amp;#34;ja&amp;#34;)
doc = nlp(&amp;#34;皆さんおはようございます！　ご機嫌いかがですか？&amp;#34;)

for i, sentence in enumerate(doc.sentences):
 print(f&amp;#34;===== Sentence {i+1} tokens =====&amp;#34;)
 print(*[f&amp;#34;word: {word.text}\t upos: {word.upos} xpos: {word.xpos}&amp;#34; for word in sentence.words], sep=&amp;#34;\n&amp;#34;)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The output is:&lt;/p&gt;</description></item></channel></rss>