dcsimg

What is the effect of limiting table sizes or memory usage in Webtrends’ reports?

Products

Webtrends Analytics 9.x
Webtrends Analytics 8.x

Question

What is the effect of limiting table sizes or memory usage in Webtrends’ reports?

Answer

Traffic analysis, by nature, is a very processor-intensive task that involves large volumes of unique and important data. Some of Webtrends largest clients are processing data including hundreds of thousands of unique visitors, hundreds of millions of page views, and 50 gigabytes+ of traffic data per day.

Why do I need to limit table sizes?

When a site’s traffic approaches these large volumes, restraints are sometimes necessary to impose on the processing methods in order for the
analysis to be possible within the operating system limitations. To address the limitations of the operating system, Webtrends has added a buffer for
some statistics to allow users to drastically reduce the system resources required, without losing most valuable report statistics.

The way that Webtrends decided which tables and values to limit when this problem occurs was to break down memory usage for each table and graph in
the report. Webtrends then analyzed which types of statistics became more important and which statistics became less significant as traffic increased.

What actually happens when tables are limited?

Since Webtrends reports have over 300 tables and graphs to analyze and generate, the analysis process is no trivial matter. What Webtrends has found is that as traffic increases, trending and behavioral data becomes much more important to evaluating successful metrics on a site. As traffic increases, the fact that the 120,000th visitor is the most frequent visitor is much less significant than the fact that 12,000 visitors viewed the new product release page, for example. After weighing the memory constraints for each statistic Webtrends found an example of a usage statistic that could be limited in order to accommodate huge volumes of traffic. In this example, the “top users” statistics are removed once a certain large number of unique values is reached. For a site that has huge volumes of traffic the top users table is rarely significant since the top ten visitors out of perhaps hundreds of thousands becomes less and less meaningful to the site traffic statistics.

At this point, what is usually desired is to know the total number of users, however, there is less interest in the actual ranking of all the specific users. It is important to note that this artificial limit, which is used to increase performance and keep the analysis process from causing a memory overflow in the operating system, does not affect any of the overall totals for any values. For instance, the total of all page views will not be limited to 200,000.

The totals are stored in a separate table that is never limited, and never needs to be. The total table just keeps track of the count of a given item such as page views, visits, and visitors. The reason that the “top” statistics are so memory intensive can be understood with a brief description of the analysis process. The Webtrends internal databases, by
default, keep all the data from the log files as summary data, meaning that the usage levels and statistics are stored rather than trying to recreate
the original log data, where it is sequential hit-by-hit data. This means that Webtrends doesn’t create an entry for each page view. Instead, an entry
for each unique page is created and then a hit counter is increased which indicates the page was viewed more than once. This is also true for the rest of the tables such as visitors, documents, and so on.

It may then be asked, “Why not simply limit the table of the summary data to only a lower number of values?” Unfortunately, this is not possible to do without compromising statistics. Since top user rankings require the most frequent users to be ordered based on total site usage over the entire report period, then it is impossible to accurately track a limited set of data until all of the log entries have been analyzed, and the users are ranked by total activity. This defeats the purpose of limiting the table in
the first place since it isn’t a database limitation but instead a physical memory operating system limitation that is encountered during analysis. When traffic data becomes very large, the number of unique values for a table can cause Webtrends to exceed the maximum amount of allowed memory allocation for the operating system, as is the case for the top user table. To prevent this allocation violation from occurring Webtrends has built into the program the ability to limit tables. With this ability, Webtrends limits each table to a certain number of elements. Typically, any given table is not limited to less than 200,000 elements with the exception of the Top Companies table. With a limit of 200,000 elements a major impact of data in the report is prevented.

As in the example of the Top Users table, it is impossible to put an arbitrary limit on the number of unique elements without compromising the statistical integrity. “Top” statistics that require rankings and those that can have a very large number of unique entries are the most memory intensive.

Rest assured that, though there is a limit to the number of unique “top” users that can be individually tracked within the operating system memory, this limit does not affect any of the overall totals for any other values. For instance, the total of visitors that Webtrends reports, will not be limited to 200,000 because the “top” visitors table has a limit. Your home page may have 500,000 visits or visitors or more. The only thing that will be limited is that you will not be able to track who the 103rd most active visitor to that page was. Each of the tables just keep track of the count of a given item such as page views, visits, and visitors.