dcsimg

Hard drive space issues with compressed log files/log files retrieved via FTP

Products

Webtrends Analytics 9.x
Webtrends Analytics 8.x
Webtrends Enterprise 7.x
Webtrends Professional 7.x
Webtrends Small Business 7.x

Issue

When log files are accessed via FTP, and a path is used that ends something like “*.*” or “*.log”, then Webtrends will need to look at every file that matches that criteria to determine which logs actually need to be analyzed. Before it can do this, all of those logs need to be downloaded and cached locally. If there are 1,000 log files in the FTP folder, all 1,000 will need to be downloaded to the local cache directory prior to Webtrends peeking inside of them to determine which ones need to be analyzed. This can take up a lot of hard drive space, even if it is temporary. (Incidentally, in the case of FTP, the actual analysis process will take quite a bit longer. As FTP transfer is usually slower than normal, direct network connectivity, transferring a large number of log files in this manner can take a lot of time.) If hard drive space runs out before all of the files are downloaded, then the analysis will fail.
The same scenario can occur if the log files are compressed, even if they are stored locally. If there are 1,000 zipped log files in a local directory, and the data source directory path uses something like “*.*”, “*.zip”, or “*.gz” for the file name, all 1,000 of those files need to be opened and extracted before Webtrends can peek inside and determine which ones actually need to be analyzed. Just as with the FTP process, if hard drive space runs out before this is done, the analysis will fail.

Resolution

This problem can be alleviated by using date macros. More information can be found on how to format date macros by editing a Data Source within the Webtrends Administration Console. Click on the “?” symbol in the upper left-hand corner of the edit screen, then click on the “Path Examples” link at the end of the first sentence in item 2 of the “To specify the location of your web activity data:” section, and then clicking on the “Log File Path Macros” link at the bottom of that page.
One of the drawbacks of using a date macro is the concern with missing data. If the FTP connection is down or the Webtrends server is off-line for a few days, the next analysis for the profile can miss days worth of log files if only one date macro is used. Take the following example, based on the IIS log file naming convention:
ftp:///logs/w3svc/ex%date-1%%yy%%mm%%dd%.log
If today was January 30, 2005, this macro would instruct Webtrends to download and look at a log file named “ex050129.log” to determine if it needed to be analyzed. If the FTP connection was down for the last two days, the log files for the 27th and 28th would be missed.
The best way to alleviate this concern is to use several paths to go back more than one day. For instance, assuming the same scenario and log file naming convention above, consider using the following paths in the Data Source:
ftp:///logs/w3svc/ex%date-1%%yy%%mm%%dd%.log
ftp:///logs/w3svc/ex%date-2%%yy%%mm%%dd%.log
ftp:///logs/w3svc/ex%date-3%%yy%%mm%%dd%.log
ftp:///logs/w3svc/ex%date-4%%yy%%mm%%dd%.log
ftp:///logs/w3svc/ex%date-5%%yy%%mm%%dd%.log
This would instruct Webtrends to download and look at the following log files to determine which needed to be analyzed:
ex050129.log
ex050128.log
ex050127.log
ex050126.log
ex050125.log
This same convention could be followed to download even more logs simply by adding additional paths and selecting a larger date offset for each path. Using this method prevents the long transfer time of downloading all of the files, as well as the necessity for large amounts of hard drive space to temporarily accomodate all of the logs.
The same method can be used for compressed files to accomplish the exact same purpose. Accessing compressed files locally is usually not as time consuming as the FTP process, but the hard drive space issues are just as valid. However, there will be some performance increase regardless.