You are considering moving to Smartsource Data Collector (SDC) logs and currently use a web log format such as IIS or Apache. To prepare for this move, you need to know how much disk space will be required.

The minimum requirements for the SDC state that you should expect about 200 MB for every 1 million page views collected:


There are many things that can impact the size of the log files for web logs. This can include the number of fields enabled, the amount of data in each field, and how much traffic is being recorded. SDC logs have a consistent formatting with the possible addition of some fields such as the Authenticated Username field. As a result, it’s difficult to make an even comparison of the two.

To test the differences, two log files with 50 hits were generated. The first was an IIS-formatted web log with the typical fields enabled and populated. This particular log did not have “cookies” or “authenticated user names” in the hits as many customers do not have these configured. The SDC log contained the typical fields including a populated cookie field and a query string with the many possible SDC parameters in it.

The IIS logs in this test averaged 12 KB.

The SDC logs in this test averaged 64 KB.

This worked out to the IIS web logs containing roughly 20% the amount of raw data as the SDC logs. While at first glance this seems like a reasonable assessment, we must take into account the type of data in the logs.

Probably the largest thing to account for would be images. The web logs would likely have a large amount of hits, one for each picture on your page. In the SDC logs however, you would likely see a single hit for each page load. If we assume there are 10 images for each web page (not an unreasonable number for all the buttons and graphics), then we could effectively say that the SDC logs would possibly contain 10 percent of the hits compared to a web log. This brings the size of a SDC log to about 7 KB for the same traffic as the IIS weblog compared with to the 12 KB file.

Next would be spider and robot traffic. One of the advantages of SDC logs is that spiders do not typically run javascript. Due to this, your SDC logs may contain much fewer hits. This can vary between 10-30% depending on how much spider traffic you receive at your site. We could assume an increase in size of the weblog near 20% based on this information bringing the size to about 15 KB compared to the SDC’s 7 KB.

As you can see, though the SDC has more information per hit when compared to an IIS log, it’s quite possible for the log to be significantly smaller in size. Keeping these things and the minimum requirements in mind, you can hopefully consider how large your weblogs are and make an approximation of the required disk space for the SDC logs for your site.