IIS Log Files Format, Search Engine Spider in Internet Information Services

Websites’ log files contain both user and search engine spider information for webmasters and SEO professionals to analyze. Log files also have information that web analytics tools (e.g. Google Analytics) isn’t able to provide.

SEO Glossary has a short description of log file:

A Log file (or web server log) records all information about a website’s incoming and outgoing traffic, including search engine spider activities, that can be analyzed by webmasters to improve a site’s SEO.

What is IIS?

Wikipedia defines “Internet Information Services”:

Internet Information Services (IIS) – formerly called Internet Information Server – is a web server application and set of feature extension modules created by Microsoft for use with Microsoft Windows. It is the most used web server after Apache HTTP Server.

If your website is hosted on a Microsoft Internet Information Services (IIS), you should:

  • Log (record) all the requests to your site
  • Extract the information from the log files
  • Review and analyze the log files on a regular basis

IIS Log Format, IIS Log Field Definitions

An IIS log file can have 22 fields. Microsoft’s official IIS log site provides definitions for all the IIS log fields:

==============

EXAMPLE

FIELD – [Field Name]

Definition.

==============

Date – [date]

Date when the request occurs.

Time – [time]

Time (Coordinated Universal Time, UTC) when the request occurs.

Service Name and Instance Number – [s-sitename]

Internet service name and instance number that is running on the client.

Server Name – [s-computername]

Name of the server where the log file entry is generated.

Server IP Address – [s-ip]

Server’s IP.

Method – [cs-method]

The method used when the web page is requested, e.g. Get method or Post method.

URI Stem – [cs-uri-stem]

A target (e.g. a file) on your site is requested, e.g. If the home page of your site is requested: index.asp.

URI Query – [cs-uri-query]

A Universal Resource Identifier (URI) query is needs only when the client requests dynamic web pages. The URL Query (cs-uri-query) contains the the portion after the “?” symbol of the URL.

Server Port – [s-port]

Server port number configured for the service.

User Name – [cs-username]

Name of the authenticated user who accesses the server. Anonymous users are indicated by a hyphen (-).

Client IP Address – [c-ip]

Client’s IP.

Protocol Version – [cs-version]

Protocol version used when the request is made by the client. Protocol can either be HTTP or FTP, e.g. HTTP/1.1

User Agent – [cs(User-Agent)]

User agent is the browser type that the client used.

  • When a human user requests a specific web page (URL) of your site through a browser, the name and version of the browser are recorded.
  • When a search engine spider (e.g. Googlebot) requests a specific web page (URL) of your site, the search engine spider information/name is recorded.

Cookie – [cs(Cookie)]

Content of the cookie sent or received.

Referrer – [cs(Referer)]

Referrer is the website that referred the visit to your site.

  • The IIS log captures the referrer as a referral URL.
  • Web analytics tools (e.g. Google Analytics) also provide referral site / referral URL information.
  • If a user is referred through a search engine’s organic results (e.g. Google’s organic SERP), then the user’s search query will be included in the referrer. Google Webmaster Tools does provide Google users’ search query report.

Host – [cs-host]

The host’s header name, e.g. www.gordonchoi.com

HTTP Status – [sc-status]

HTTP status code.

  • Microsoft provides this official list of HTTP Status Codes on 1xx, 2xx, 3xx, 4xx, and 5xx.

Protocol Substatus – [sc-substatus]

Substatus error code.

Win32 Status – [sc-win32-status]

Windows status code.

Bytes Sent – [sc-bytes]

Number of bytes the server sends to the client during the request, in bytes.

Bytes Received – [cs-bytes]

Number of bytes the server receives from the client during the request, in bytes.

Time Taken – [time-taken]

Length of time the request takes, in milliseconds (ms).

Log File Analysis

Through the IIS log files:

  • Webmasters can review site and/or web page errors through errors that show up in HTTP Status Codes (sc-status). Actions should be taken depending on the HTTP status code errors (e.g. 302, 404, 500, 503).
  • Webmasters / SEO professionals can review different search engine spiders’ behavior to the site through the cs(User-Agent) field. After analyzing the search engine spider information, actions may be taken to optimize how Googlebot or other spiders access/crawl your site.

Tools to analyze IIS log files:

  • Log Parser – You will need to apply some specific SQL queries when Log Parser.
  • Most of the programming languages (e.g. C, PHP, Python) allow you to apply customized queries to IIS logs, but coding skills/experience is required to the programming languages you choose to use.
  • Nihuo Weblog Analyzer – Allows you to generate a wide range of website statistics from your IIS log file and present in over 80 different reports, without SQL query writing skills or programming language experience.

Search Engine Spiders in IIS

The behavior of search engine spiders to your Internet Information Services (IIS) hosted website can be found in your site’s IIS log files. In the IIS logs, the cs(User-Agent) field records the search engine spiders.

  • Google’s spider: Googlebot
  • Bing’s spider: msnbot or Bingbot
  • Yahoo’s spider: Slurp
  • Yandex’s spider: YandexBot
  • Baidu’s spider: Baiduspider
  • Sogou’s spider: Sogou+web+spider or New-Sogou-Spider
  • Soso’s spider: Sosospider
  • Youdao’s spider: YoudaoBot
  • Naver’s spider: Yeti
  • Daum’s spider: Daumoa

Search engine spiders of each of the search engines look like the below in IIS log’s cs(User-Agent) field.

Google’s Search Engine Spiders

Google Web Search:

  • Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)

Google Image:

  • Googlebot-Image/1.0

Google Feed:

  • Feedfetcher-Google;+(+http://www.google.com/feedfetcher.html;+feed-id=xxxxx)

Google Ads:

  • AdsBot-Google+(+http://www.google.com/adsbot.html)
  • Mediapartners-Google

Google Ads for Mobile:

  • AdsBot-Google-Mobile+(+http://www.google.com/mobile/adsbot.html)+Mozilla+(iPhone;+U;+CPU+iPhone+OS+3+0+like+Mac+OS+X)+AppleWebKit+(KHTML,+like+Gecko)+Mobile+Safari

Bing Search Engine Spiders

Bing Web Search:

  • Mozilla/5.0+(compatible;+bingbot/2.0;++http://www.bing.com/bingbot.htm)
  • msnbot/2.0b+(+http://search.msn.com/msnbot.htm)

Bing Ads:

  • msnbot-media/1.1+(+http://search.msn.com/msnbot.htm)

Yahoo Search Engine Spider

Yahoo Slurp Web Search:

  • Mozilla/5.0+(compatible;+Yahoo!+Slurp/3.0;+http://help.yahoo.com/help/us/ysearch/slurp)
  • Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp)

Baidu Search Engine Spiders

Baidu Web Search:

  • Baiduspider+(+http://www.baidu.com/search/spider.htm)
  • Mozilla/5.0+(compatible;+Baiduspider/2.0;++http://www.baidu.com/search/spider.html)

Baidu Video Search:

  • Baiduspider-video+(+http://www.baidu.com/search/spider.htm)

You can block Baiduspider’s access to your web pages by writing instructions in Baidu robots.txt.

Sogou’s Search Engine Spider

Sogou Web Search:

  • New-Sogou-Spider/1.0+(compatible;+MSIE+5.5;+Windows+98)
  • Sogou+web+spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)

Soso’s Search Engine Spider

Soso Web Search:

  • Sosospider+(+http://help.soso.com/webspider.htm)

Yahoo China’s Search Engine Spider

Yahoo Slurp China:

  • Yahoo!+Slurp+China

Youdao Search Engine Spider

Youdao Web Search:

  • Mozilla/5.0+(compatible;+YoudaoBot/1.0;+http://www.youdao.com/help/webmaster/spider/;+)

Yandex’s Search Engine Spider

Yandex Web Search:

  • Mozilla/5.0+(compatible;+YandexBot/3.0;++http://yandex.com/bots)

Naver’s Search Engine Spider

Naver Web Search:

  • Yeti/1.0+(NHN+Corp.;+http://help.naver.com/robots/)

Daum’s Search Engine Spider

Daum Web Search:

  • Mozilla/5.0+(compatible;+MSIE+or+Firefox+mutant;+not+on+Windows+server;++http://ws.daum.net/aboutWebSearch.html)+Daumoa/2.0

Comments

Leave your comments

  • Your first comment will be reviewed before getting posted.
  • Your subsequent comments will be posted without review.
  • All spammy comments will be deleted.