dcsimg

Encoding errors in Asian Search Phrase reports

Products

Webtrends Analytics 9.x
Webtrends Analytics 8.x

Issue

A profile using a data source from Asian web sites shows pound (#) symbols, squares or random characters for search keywords and phrases.

Resolution

This occurs frequently with Webtrends customers processing Chinese web logs. There are two Chinese character sets, Simplified and Traditional. Simplified is largely used in mainland China while Traditional is used in Taiwan and in some Chinese-speaking areas outside of China. A web site in Simplified Chinese may also attract traffic from Traditional Chinese search engines, but this not the only cross-border scenario, and any Asian Search Phrases report may have some visitors from Korea or Japan as well.

The profile, however, can only render one language at a time because there is currently no character set for all known double-byte languages. As a result, it may be necessary to switch between these languages to view all the phrases in a report. The best solution is Unicode UTF-8, located at the bottom of the list of supported languages in the language pull-down list in the Preferences menu. In some cases, UTF-8 may be able to resolve these issues.

The language used in a search phrase can be identified by examination of the search engine used for the phrase. A search phrase that has Google Japan as its origin is likely to be in Japanese, whereas a search phrase stating Yahoo Taiwan is likely Traditional Chinese.

A profile will appear by default in the language specified under “Analysis > Language” when editing the profile. Also select “Enable encoding conversion” to deal with various search engine encoding issues.

The displayed language can be changed from the question mark symbol (?) in the navigation bar above the report (or select “Preferences” in 8.0x), then navigate to “Preferences > Language.” It may be necessary to switch between multiple languages to read all the results.