JavaCC and Unicode issue. Why u696d cannot be managed in JavaCC although it belong to the range "u4e00"-"u9fff"

We’re trying to use JavaCC as a parser to parse source code which is in UTF-8( the language is Japanese). In JavaCC, we have a declaration like:


If it meets a string like “日建フェンス工業”, it will fail because of 業 character. If I remove it, it works as expected. The code of 業 character is “u696d”, and as you can see in the declaration, it should belong to the range “u4e00”-“u9fff”

Any suggestion on this?

PS: If we rewrite this grammar using Antlr, how does it look like

Thank you so much

Source: java

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.