Japanese Word Count (Morphological Analysis)

Runs morphological analysis on Japanese text to automatically count words, tokens, and word frequency. Because it doesn't rely on space-separated words, it can accurately count words even in Japanese sentences, which have no spaces.


Word Count (excluding punctuation)
Total Tokens (including punctuation)

Top 20 Word Frequency

Rank Word Count
Enter text to see word frequency.

Tips

  • Unlike English or German, where spaces separate words, Japanese requires automatic detection of word boundaries. This tool estimates those boundaries using a lightweight statistical method called TinySegmenter.
  • The Top 20 Word Frequency table is handy for checking whether a blog post or SEO content overuses a particular keyword unnaturally.
  • Punctuation and brackets are each counted as one token, so the tool shows "Total Tokens" and "Word Count (excluding punctuation)" as two separate figures.
  • Proper nouns, neologisms, and words not in a dictionary can sometimes be split unnaturally depending on context. For use cases that need strict, dictionary-based morphological analysis, consider a dedicated tool such as MeCab.

FAQ

English text is generally separated into words by spaces, but Japanese sentences have no such separators. Simply splitting on spaces wouldn't give an accurate Japanese word count, so a morphological analysis (word segmentation) technique that infers word boundaries from the sequence of characters is needed.

TinySegmenter, the library this tool uses, is a lightweight statistical method with no dictionary, so its accuracy falls a bit short of dictionary-based morphological analyzers such as MeCab. It's accurate enough for everyday text, but segmentation may be off for text with a lot of technical terms or neologisms.

No. The entire morphological analysis runs in JavaScript inside your browser, so the text you enter is never sent to any server.

It's useful for checking whether a specific keyword is repeated unnaturally often in a blog post or SEO article, spotting repetitive phrasing, and analyzing general trends in a piece of text.

A regular character counter counts words by splitting on spaces, so it doesn't work for Japanese. This tool is built specifically for Japanese morphological analysis, and shows word frequency in addition to the word count.
ツールくん

Side Note — "Sumomo mo Momo mo Momo no Uchi" and the Difficulty of Word Segmentation

Japanese has no spaces between words (word segmentation, or wakachi-gaki, doesn't exist natively), which is one of the biggest challenges in Japanese natural language processing. A famous example is the tongue-twister "すもももももももものうち" (sumomo mo momo mo momo no uchi, roughly "plums, too, are a kind of peach"). A human can intuitively split it as "sumomo / mo / momo / mo / momo no / uchi," but for a machine with no dictionary, deciding where the boundaries fall is extremely difficult.

TinySegmenter, the library this tool uses, is a lightweight Japanese segmentation library created by Taku Kudo, a researcher also known for his work at Google and for MeCab. It has no dictionary at all — instead, it splits text using a statistically trained model that infers word boundaries from patterns in character-type transitions (hiragana, katakana, kanji, digits, and so on). Despite being only tens of kilobytes in size, it runs quickly right in the browser.

Full-scale morphological analysis engines such as MeCab or Kuromoji require dictionary data in the range of several to tens of megabytes. Because TinySegmenter needs no dictionary at all, this tool can complete the entire analysis in the browser without sending any data to a server. It trades some accuracy for that dictionary-free approach, but it's still plenty practical for getting a general word count on everyday text.