dcsimg

How do I troubleshoot HTML page title retrieval?

Products

Webtrends Analytics 9.x
Webtrends Analytics 8.x
Webtrends Enterprise 7.x
Webtrends Professional 7.x
Webtrends Small Business 7.x

Cause

Webtrends Analytics includes the ability to retrieve the HTML page titles, i.e., the contents of the tag of each page, by indexing the web site specified in the Web Site URL field (found under Analysis > Home when editing a profile). On occasion, Webtrends may have difficulty retrieving the HTML page titles from the web server due to network issues or outages. Other reasons may include server-side redirects or proxy configurations requiring HTTP traffic to be routed through ports other than 80.

Resolution

The first step in troubleshooting this issue is to verify that page title retrieval is enabled on the profile. By default, this option is disabled when a new profile is created. To enable this feature for a new profile, check the Advanced profile options box during the first step of the New Profile Wizard, then check the box for Retrieve HTML Page titles under the General step of the profile creation process. To enable it on an existing profile, edit the profile and check the box for Retrieve HTML Page titles under Analysis > General. Next, navigate to Administration > Application Settings > System Management > Hosts and select Standard Analysis Engine(s) and verify the Threads for HTML Page Title Lookups has been set to a value high enough to not adversely affect the performance of this feature. By default this value is 100.

If the feature is enabled on the profile and configured properly, yet still failing to return HTML page titles, the next step is to manually send a GET request from Webtrends server to a page on the web server.

Open a command line window and type ‘telnet’.

Type the following commands to configure and enable logging for the session:
set logfile c:\path\filename.log
set localecho
set term vt100
…where path and filename will be the location and name of the telnet.log that will be created.

Now type:

open domain.com 80

…where domain.com is the web server. The message returned will show “Connecting to domain.com…”

The message about connecting will not return a status message indicating success or failure, but opening another command line window and entering the command ‘netstat -b -n’ will display a successful connection to the web server on port 80 for telnet.exe.

The following command, when entered, will overwrite the existing text at the top of the screen. Also, the command is case-sensitive and must be typed without error or it must be attempted again in a new session.

GET /page.ext HTTP/1.0

…where /page.ext is the name of the page and extension it uses.

Assuming the syntax is correct, hitting Enter twice will return something similar to the following, both on-screen and in the log file:

HTTP/1.1 200 OK
Connection: close
Date: Sat, 21 Mar 2009 22:15:37 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Set-Cookie: ASP.NET_SessionId=13sqoz45ziig5g45jhvfciak; path=/
Cache-Control: no-cache, no-store
Pragma: no-cache
Expires: -1
Content-Type: text/html; charset=utf-8
Content-Length: 24803
Set-Cookie: BIGipServerwww.webtrends.com_http=4064361482.20480.0000; expires=Sun, 22-Mar-2009 04:15:37 GMT; path=/

<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”
>
<html xmlns=”http://www.w3.org/1999/xhtml” xml:lang=”en” lang=”en”>
        <head>

                <title>Webtrends Marketing Web Analytics and Web Statistics</title>



Connection to host lost.

Press any key to continue…

Type ‘quit’ to end the telnet session.

In the example above, the inclusion of the line indicates a successful page title retrieval. If no page titles exist or if the title tags are not between the and tags then Webtrends will not be able to retrieve page titles. Page title retrieval will only work consistently on static pages. If a page is dynamically generated, there may be no titles to retrieve or the web server may be taking too long to respond to the page title retrieval request.