0
1

I'm currently downloading OSM data of the Netherlands, with the plan to import and query the data in MongoDB.

However, I have a very specific use-case. I need to do a full-text search of an address string, which can contain anything from the street address to the postal code, (sub-)district, province, state, country, etc. For example, I would like to search for any of the following strings in the database:

  • Grand Parkview Asoke Flr. 11, Unit 444/12 Sukhumvit 21 10110 Bangkok
  • Grand Parkview Asoke, Sukhumvit
  • Sukhumvit 21 Bangkok
  • etc.

In all of these cases, the "address" that matches all of these searches the most would be something like the following:

{
    country: "Thailand",
    city: "Bangkok",
    city_district: "Watthana District",
    neighborhood: "Sukhumvit",
    postal_code: 10110
    street: "Sukhumvit"
}

The problem is that OSM data isn't setup like this, i.e. there is no node that contains all of this information. Rather there are different nodes such as streets, districts, neighborhoods etc. that are in one way or another linked with each other. My question is how can I search for the entire address string,like "Grand Parkview Asoke Flr. 11 Unit 444/12 Sukhumvit 21 10110 Bangkok", to determine what other information we know about this address. For example, in this case we know that this address is in Thailand.

So, should I restructure the data to be able to perform such searches? Or would there be another way of achieving such geocoding?

If anything is unclear, please ask for further elaboration.

P. S. I know about nominatim, but I have to run this locally. Nominatim's local installation requires a server with 32GB+ of RAM for the planet, which is why I would like to try and implement my specific needs using MongoDB as to hopefully achieve a less resource intensive application.

asked 08 May '14, 03:29

TomM's gravatar image

TomM
41346
accept rate: 0%

edited 08 May '14, 03:31


Nominatim doesn't require 32 GB of RAM; you can run it on a machine with much less. The weakest point is the initial data loading which greatly benefits from having more RAM (for caching); if you run an initial data load on a machine with, say, only 4 GB of RAM and no SSDs then it will very likely take between two and four weeks to complete.

Having said that, the data import and using the data are two separate issues; you could for example run the import on a high-memory rented server somewhere, and then download a database dump to your local low-memory machine for using it. There's already a geocoder based on Apache Solr which uses only the data import part of Nominatim and builds its own search on top of that.

Nominatim contains a lot of logic to try and build an address hierarchy like the one you're after, and it is not very likely that you will be able to re-invent a better version of this wheel without spending a huge amount of time. "MongoDB vs Postgres" is probably just a side issue - I should be very surprised if building a proper hierarchy from the whole planet's data were any faster on a MongoDB powered system than on a Postgres powered one.

permanent link

answered 08 May '14, 08:26

Frederik%20Ramm's gravatar image

Frederik Ramm ♦
71.3k846451113
accept rate: 24%

Interesting, so if I rent a 32GB system and then transfer the resulting database to another system with less resources, what kind of system would you recommend? E.g. would 2GB of RAM be enough? And would the requirements be lower if I use the Apache Solr based geocoder?

(08 May '14, 09:16) TomM
1

The usual advice is to start with a small extract, experiment with it, and progressively try with bigger extracts until you tacle the whole planet.

(08 May '14, 21:57) Vincent de P... ♦

I don't know about the requirements of the Solr based geocoder. 2 GB of RAM would certainly be enough to run Nominatim but I can't say how fast it would be; it could take a few seconds to resolve a query.

(09 May '14, 08:57) Frederik Ramm ♦
1

Hello Tom, maybe you can have a look at "Pelias" which was now added in the OSM overwiew page http://wiki.openstreetmap.org/wiki/Search_engines

(09 May '14, 15:10) stephan75
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×560
×162
×113
×5

question asked: 08 May '14, 03:29

question was seen: 6,350 times

last updated: 09 May '14, 15:10

powered by OSQA