How does the Webtrends Data Scheduler differ from other Webtrends data export methods?


Webtrends Analytics 9.x


How does the Webtrends Data Scheduler differ from other Webtrends data export methods?


When Webtrends builds the database for reports in a profile there are two types of tables, regardless of whether or not the report has one dimension or two. The only difference with two dimensional reports is they have an additional table set for the second dimension.

The first table type is the “analysis” table. The analysis table contains all of the data that we collect for the report. It also includes summary information for the report. The summary information is the totals for any of the measures in the report. As an example, with the Pages report, even though all of the pages on the site may not be displayed in the report itself, the Page Views total needs to contain the actual number of page views for all pages and all visitors. This number is incremented whether or not a page actually makes it into any of the tables.

The second table type is the “report” table. Where the analysis table contains all of the data based on the dimension(s) in the report, the report table is more limited in that it only contains the data that is displayed in the report itself.

This leads us into the area of table limits. We must place reasonable limits on how much data can be contained in both the analysis and report tables for a couple of reasons. First and foremost is the sheer volume of data. The whole purpose of the Webtrends application is to provide data that depicts the trends occurring on a web site. It was never meant to provide 100% of every little detail on the site. While the actual volume of information that can be captured is quite large, attempting to accumulate every scrap of data is simply not feasible or worth the effort in most cases. There is no way to use all of that detail to come up with a sensible analysis of what is actually happening. This is why we rely so much on looking at totals, spikes, and dips in the overall data.

The second reason for the limits has to do with limitations in the operating system and its associated file system. Attempting to get too much data into a single profile can cause failures in analysis due to the operating system running out of memory or the inability of the operating system to write large enough files to contain the data. The advent of the 64-bit engine has provided some relief for this issue, but we still must maintain strict limits on the table sizes in order to maintain the efficiency of the analysis. Having a much higher memory cap available to us does no good if the analysis engine takes more time to analyze current data than the time span of the data itself. For instance, if we are analyzing 12 hours of data, but it takes 16 hours to analyze that 12 hours because the table limits have been raised so high that this is the fastest the data can be processed, the profile will always be running further and further behind after each analysis. If it takes 6 hours to analyze 12 hours of data we are at 50% utilization.

With these factors in mind, keeping tables to respectable limits not only ensures that the profile runs efficiently, but it also gives us room to make increases when necessary without being overly concerned with a profile continuing to run successfully. As a profile starts reaching utilization levels that put it in jeopardy we need to start evaluating what is and is not really necessary to include in the profile. Often times we will find that a profile can be streamlined because data is being captured that is not really providing any business value. If this is not possible, splitting the data into separate profiles allows us to overcome such issues.

Report table data requires more memory than analysis table data. The main reason for this is the calculations that must be ever-present in the report tables so the data can be properly organized and displayed in the reports themselves. For this reason the report table limits are always smaller than the analysis table limits. This does not mean that the analysis tables do not contain the same level of detail as the report tables. More to the point, the analysis tables keep the data in a simpler form as that data is not needed directly for the reports. The report table contains the most relevant data. If the relevancy of an element in the analysis table increases over the relevancy of an element in the report table, the less relevant element is demoted to the analysis table and the more relevant element is promoted to the report table.

As to the export of the data, in the past the only data available for export was the data contained within the report tables. This was regardless of the method of export; manual export, scheduled export, ODBC export, or retrieval through Web Services. The new “Data Scheduler” option changes this. We can now offer access to the data contained within analysis tables. There are still limitations, but it does open a number of possibilities that we have never had before.

One of the most outstanding limitations is in respect to the analysis table itself. There are still limits imposed on how much data an analysis table can contain. If this limit has been reached, no more new data can be added into the table. In a lot of cases an analysis table will go through a process called “smart trimming.” Older, less relevant data will be deprecated in favor of newer, more relevant data. This is a bit of an oversimplification of how smart trimming works, but the process of smart trimming is another complete discussion outside of the scope of this topic. Nonetheless, we need to be aware that capturing data from the analysis table can be affected by the table limits themselves, so it is not always an end-all solution.

The second limitation has to do with what data can actually be retrieved via Analytics Data Capture. The following profiles, reports, and tables are the only currently captured data.

Supported profile types:

  • All Enterprise Webtrends Analytics profiles
  • All Marketing Lab Warehouse profile

Unsupported profile types:

  • Express or parent/child profiles
  • Streaming media profiles
  • SmartView profiles

Supported report types:

Standard Tables:

  • Pages (TOP_PAGES)
  • Top Visitors (TOP_USERS)
  • Entry Pages (TOP_ENTRY_PAGE)
  • Exit Pages (TOP_EXIT_PAGE)
  • Search Keywords (TOP_SEARCH_KEYWORDS)
  • Search Engines (TOP_SEARCH_ENGINES)
  • Search Phrases (TOP_SEARCH_KEYPHRASES)
  • Search Engines with Phrases (TOP_SEARCH_PHRASE_ENGINE)
  • Organizations (TOP_COMPANY_NAMES)

Custom Reports:

All custom reports are supported except those that have the following dimension configurations.

  • SmartView-compliant reports
  • Drilldowns
  • Butterfly Reports using Step of Interest dimensions (1st or 2nd dimension)
  • Reports using Time Period dimensions (1st or 2nd dimension)
  • Custom Reports using Scenario Analysis steps as a measure