Monday, December 19, 2011

Comparing hostip and geoip

What is the best IP address to geolocalisation system?

Seeking to analyze web log files by the country of request origin, I recently proposed a simple plugin wrapping hostip.info API. Of course, the major competitor in the field is maxmind geoip and they provide a free light version of their database. This database can be queried through their java API of a geoip Grails plugin.
Time has come for numbers. How each module perform on the same set of ip address? Even with 3% of uncertainties, maxmind geoip lite shows to be clearly the best.
The test
22647 unique ip adresses from web log files (the remoteIP http field), both were submitted to each of library. For both libraries, I took the latest available libraries on December 12th, 2011: hostip database and the free version of maxmind geoip, GeoLiteCity.
The results
The resolution rate (how often a location is given back for an IP address) is 100% for geoip lite, when it is of 28.7% for hostip. I made a couple of check with the web application, to check that there was no obvious bug in my code.
Both systems shows the same answer 25.5% of the  total cases. This means that there is a conflict in the answer for 3.1%. Here are a few examples, where both systems give different results (these were randomly drawn out of 712):
ip              geoip hostip
141.0.8.191     US    AU
201.240.195.232 PE    AU
201.240.242.245 PE    CA
209.184.116.225 US    GB
209.85.226.81   US    BE
24.65.2.74      CA    CN
41.201.33.106   DZ    AF
78.220.216.13   FR    DE
78.224.73.14    FR    DE
80.239.243.104  PL    PH
95.122.191.231  ES    DE

Conclusions
As hostip.info, coming from a community base initiative, had my first vote, I was clearly confronted to these 3/4 of unresolved ip addresses. This observation pushed me towards maxmind solution (thanks for providing a sub part of your library for free) and the gain is quite clear. I have not put any real effort in trying to resolved this 3.1% of resolution conflict. However as these IP come from search engine logs, I had a look at a few queries where I can clearly state the distinction between the languages (CA/CN, FR/DE for example). In the few lines I checked, geoip was the best guess.
Without not much surprise at this stage, I would say maxmind solution is clearly the best today. But there is certainly some room for a community based solution.

No comments:

Post a Comment