Organizations are increasingly implementing an alternative technique to collect web site traffic information, rather than relying on web server log files. This technique is called client-side data collection, or "data tagging" for short. Popularized by Webtrends and other web analytics vendors that provide hosted services, data tagging solves many problems associated with web server log file analysis. With data tagging, web traffic data is more accurate because traffic normally hidden by cache or proxy servers is tracked. IT administration is eased because data collection is centralized in one location versus site data being dispersed among several log files from multiple web servers that may also be geographically dispersed. And web data can be collected from specialized applications, such as application servers and browser applications (e.g. Flash). With all these benefits of client-side collection, there are a few drawbacks. Implementing data tagging requires some development work to ensure that data tags are inserted and maintained on web pages.
What is a Page View? A hit to any file classified as a page (such as, html, htm, psp, and asp pages). For sites still using frames, an actual page viewed may consist of several HTML documents.
Why may client-side Data Tagging solutions sometimes produce lower page view results than traditional server-side log file web analytics?
1. Server Side Redirects: These are captured in Web Server log files but not with tagging since the redirect is server side. For example when a visitor types in http://webtrends.com they are redirected to http://www.webtrends.com. This will show as two page views from your log files (one for http://webtrends.com and one for http://www.webtrends.com) but would only count as one page view for tagging (one for the landing page http://www.webtrends.com).
3. File Type configuration: Some Web Analytics tools are configured to count certain file types as page views. For example, images that might signify a certain step in a process may be given page view status in order to complete a process or scenario. Some application processes use a servlet so all pages look the same. Each step in the process is determined by passing a different image. Then by excluding all images except the ones that represent pages the process is measured. This ultimately results in double counting of pages, one for the servlet and one for the image.
4. Bots, Crawlers, and Spiders: These terms are all the same, they all refer to an automated program that goes from website to website caching and processing the pages for search engines. A spider looks at all the pages of your website, and uses that information to rank you in search engines (how high you will list in a search result), and cache a copy of your page on their server for quick reference, and if your site ever goes down. Spiders jump from link to link on the Internet and run endlessly, even if you never submit your website to a search engine, odds are your site will still be spidered. Most sites experience some type of “log-file bloat”. This refers to the extra page views contained within a log file data-source that would naturally be excluded using the more accurate client-side data tagging solution. Spiders and bots that are not included within client-side data tagging solutions are usually contained within log files to some degree. The reason for this is that there are currently over 300+ spiders robots and crawlers actively hitting sites on the Internet. Log file solutions rely on filters within their configuration settings to exclude known spiders. Given the fact that anyone may write a crawler to release on the Internet at any time, these manually applied configuration setting are usually out of date and do not represent the current spiders on the Internet at any given time.
5. Web Monitoring Automated Tools & Scripts Most enterprises have automated scripts and spiders that are used to test availability as well as periodically test application function and record uptime. These solutions all produce page views in a log file scenario but do not produce page views in a data tagging solution. Also, if the site includes frames, the frame pages will be counted by a log file solution which will also create “log-file bloat.”
6. Filters: New implementations of an analytics product can also result in lower page view counts as well. Most often this will be caused by a filter set on the customer’s profile. As log file analysis products count page views regardless of origin, this could cause a discrepancy between log file solutions and data tagging solutions. In order to avoid these problems, the customer can use SmartSource, which is somewhat like an auto-filter, as only the pages of interest are tagged.
7. The Tag doesn’t execute: It’s possible for a visitor to click a link to go to another page or click stop in the browser before the Webtrends tag executes, for example if the tag is at the bottom of the page or the site has poor page performance. Customers concerned about this type of situation may elect to put the Webtrends tag at the top of the page.
10. The Webtrends tag is not called from all the pages of your site. Your log files collect all requests to your web servers where tagging only collects the information from the pages containing the Tag. If the tag is not on certain pages then you will obviously not be counting those page views.