Products
Webtrends Analytics 9.x
Webtrends Analytics 8.x
Issue
Two key symptoms indicate an issue with log file analysis. In one case, analysis of a data source containing the current daily log file shows the number of licensed page views increasing rapidly and by greater margins in day-to-day comparisons. The other case is when logs that have been analyzed before are reanalyzed and the page view count increases, yet because they have already been analyzed there should be no change in the number of page views used.
These two symptoms share the same root cause in that the log files analyzed are being modified between analyses. Normally, once a log file has been analyzed it will be ignored on the next attempt. However, if a log file has been modified since the last analysis, Webtrends will view it as a new log file it has not yet analyzed. Specifically, Webtrends creates a checksum on the first analysis in order to uniquely identify a log file in the future. The name and the location of the log file can change, but if the contents change in any way then Webtrends will no longer recognize it. If log files have been modified, either manually, or through use of a script (e.g., when scrubbing logs to remove unwanted entries), this would then result in the behavior described above.
Another explanation is when data sources point to the web server/data collection server’s ‘live’ log file. The live log file collects data in real-time, and normally, at a certain point (usually at a set time or upon reaching a set size) it will split and save a static log file. The live log file, however, will continue growing, and in the case where it is included among the other log files in the data source, on every analysis it will be viewed as a new log file because it is constantly changing.
Taking into account that the live log file contains the same data as the next splitlog it will create, the log file entries will be analyzed in the live log first, and then they will be analyzed again after the splitlog is created. This issue can be further compounded in environments where the log files do not split daily and/or analysis takes place more frequently (the default is twice daily).
As an example, on a web site that consistently receives 100 hits by the time of analysis, which is every twelve hours, and the log file is configured to split only once a month, after the first analysis of the live log file 100 page views will have been used. On the second analysis, 300 page views will have been used (the live log file now contains 200 hits and 100 page views have already been used). On the third analysis, 600 page views will have been used (the live log now contains 300 hits and 300 page views have already been used). On the fourth analysis, 1000 page views will have been used (the live log now contains 400 hits and 600 page views have already been used). On the fifth analysis, 1500 page views will have been used (the live log now contains 500 hits and 1000 page views have already been used).
The above example shows the results of only three days, but a configuration like this run for longer periods of time can exhaust page views quickly, even for low-traffic sites. Even on web servers where the logs are rotated daily, assuming the default scheduled analysis runs every twelve hours, the extra analysis can waste page views and render report data inaccurate.
Resolution
This issue can be avoided by preventing Webtrends from analyzing dynamic log files. If log files must be scrubbed or modified, do so – and finalize all changes – prior to analysis. Where data sources are concerned, ideally, splitlogs will be moved to a location other than where the live log file is located. In the case that the location cannot be changed, configure splitlogs to be saved with a different naming convention than that used by the live log file, then specify a expression with a wildcard in the data source that will omit the live log file from analysis.