Seeking to analyze web log files by the country of request origin, I recently proposed a simple plugin wrapping hostip.info API. Of course, the major competitor in the field is maxmind geoip and they provide a free light version of their database. This database can be queried through their java API of a geoip Grails plugin.
Time has come for numbers. How each module perform on the same set of ip address? Even with 3% of uncertainties, maxmind geoip lite shows to be clearly the best.
22647 unique ip adresses from web log files (the remoteIP http field), both were submitted to each of library. For both libraries, I took the latest available libraries on December 12th, 2011: hostip database and the free version of maxmind geoip, GeoLiteCity.
The resolution rate (how often a location is given back for an IP address) is 100% for geoip lite, when it is of 28.7% for hostip. I made a couple of check with the web application, to check that there was no obvious bug in my code.
Both systems shows the same answer 25.5% of the total cases. This means that there is a conflict in the answer for 3.1%. Here are a few examples, where both systems give different results (these were randomly drawn out of 712):
ip geoip hostip
220.127.116.11 US AU
18.104.22.168 PE AU
22.214.171.124 PE CA
126.96.36.199 US GB
188.8.131.52 US BE
184.108.40.206 CA CN
220.127.116.11 DZ AF
18.104.22.168 FR DE
22.214.171.124 FR DE
126.96.36.199 PL PH
188.8.131.52 ES DE
As hostip.info, coming from a community base initiative, had my first vote, I was clearly confronted to these 3/4 of unresolved ip addresses. This observation pushed me towards maxmind solution (thanks for providing a sub part of your library for free) and the gain is quite clear. I have not put any real effort in trying to resolved this 3.1% of resolution conflict. However as these IP come from search engine logs, I had a look at a few queries where I can clearly state the distinction between the languages (CA/CN, FR/DE for example). In the few lines I checked, geoip was the best guess.
Without not much surprise at this stage, I would say maxmind solution is clearly the best today. But there is certainly some room for a community based solution.