I'm using Nominatim to reverse-geocode natural language location descriptions for a research project. I spent some time looking through the source code (in particular, website/search.php), but I can't seem to make heads or tails of how the "importance" score is calculated.

From what I can tell, there is some baseline calculation and then numerous tweaks - one line, for example, says

$aResult['importance'] = $aResult['importance'] + ($iCountWords*0.1); // 0.1 is a completely arbitrary number but something in the range 0.1 to 0.5 would seem right

I also noticed in the documentation that Nominatim will use Wikipedia to improve the ranking of results, but once again nothing specific beyond "the importance value is calculated as log(totalcount)/log(max totalcount)." I assume that "totalcount" is the number of internal links to an article about a specific location in the result set, and "max totalcount" is the maximum of that value across the entire result set. But this only tells me the scoring contribution from Wikipedia, and not how the baseline score is calculated.

My question is, what properties of the OSM data go into the calculation, and then how is the importance score actually calculated? What special tweaks and thresholds should I be aware of?

asked 03 Aug '13, 16:36

aweissman's gravatar image

aweissman
41113
accept rate: 0%


For in-depth technical discussion of nominatim, you'd be better off asking on the geocoding mailing list.

http://lists.openstreetmap.org/listinfo/geocoding

permanent link

answered 03 Aug '13, 18:01

Andy%20Allan's gravatar image

Andy Allan
11.5k23120139
accept rate: 30%

From the mailing list: https://lists.openstreetmap.org/pipermail/geocoding/2013-August/000916.html

The major weight of importance comes indeed from the Wikipedia link count. If no article can be found for an object, the base score is based on the object rank (country, county, city, etc.)

There are a few minor tweaks to this wikipedia importance. The one you have found is the reranking by exactness of match with the query (the one you cited above). The more words from the query appear verbatim in the display name (that's the one including the address) of the result, the higher it gets ranked.

The second reranking is related to the viewbox. If you supply a viewbox parameter, then anything within or close to the viewbox is ranked higher. (e.g. https://github.com/twain47/Nominatim/blob/master/website/search.php#L976)

There is also a small tweak to take the importance of the address members into account but that only has an effect if objects have an equal importance. (e.g. https://github.com/twain47/Nominatim/blob/master/website/search.php#L1241)

permanent link

answered 04 Dec, 14:10

Potdeyaourt's gravatar image

Potdeyaourt
3112
accept rate: 0%

edited 04 Dec, 14:10

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×517
×4
×3

question asked: 03 Aug '13, 16:36

question was seen: 3,045 times

last updated: 04 Dec, 14:10

powered by OSQA