I'm currently downloading OSM data of the Netherlands, with the plan to import and query the data in MongoDB. However, I have a very specific use-case. I need to do a full-text search of an address string, which can contain anything from the street address to the postal code, (sub-)district, province, state, country, etc. For example, I would like to search for any of the following strings in the database:
In all of these cases, the "address" that matches all of these searches the most would be something like the following:
The problem is that OSM data isn't setup like this, i.e. there is no So, should I restructure the data to be able to perform such searches? Or would there be another way of achieving such geocoding? If anything is unclear, please ask for further elaboration. P. S. I know about nominatim, but I have to run this locally. Nominatim's local installation requires a server with 32GB+ of RAM for the planet, which is why I would like to try and implement my specific needs using MongoDB as to hopefully achieve a less resource intensive application. asked 08 May '14, 03:29 TomM |
Nominatim doesn't require 32 GB of RAM; you can run it on a machine with much less. The weakest point is the initial data loading which greatly benefits from having more RAM (for caching); if you run an initial data load on a machine with, say, only 4 GB of RAM and no SSDs then it will very likely take between two and four weeks to complete. Having said that, the data import and using the data are two separate issues; you could for example run the import on a high-memory rented server somewhere, and then download a database dump to your local low-memory machine for using it. There's already a geocoder based on Apache Solr which uses only the data import part of Nominatim and builds its own search on top of that. Nominatim contains a lot of logic to try and build an address hierarchy like the one you're after, and it is not very likely that you will be able to re-invent a better version of this wheel without spending a huge amount of time. "MongoDB vs Postgres" is probably just a side issue - I should be very surprised if building a proper hierarchy from the whole planet's data were any faster on a MongoDB powered system than on a Postgres powered one. answered 08 May '14, 08:26 Frederik Ramm ♦ Interesting, so if I rent a 32GB system and then transfer the resulting database to another system with less resources, what kind of system would you recommend? E.g. would 2GB of RAM be enough? And would the requirements be lower if I use the Apache Solr based geocoder?
(08 May '14, 09:16)
TomM
1
The usual advice is to start with a small extract, experiment with it, and progressively try with bigger extracts until you tacle the whole planet.
(08 May '14, 21:57)
Vincent de P... ♦
I don't know about the requirements of the Solr based geocoder. 2 GB of RAM would certainly be enough to run Nominatim but I can't say how fast it would be; it could take a few seconds to resolve a query.
(09 May '14, 08:57)
Frederik Ramm ♦
1
Hello Tom, maybe you can have a look at "Pelias" which was now added in the OSM overwiew page http://wiki.openstreetmap.org/wiki/Search_engines
(09 May '14, 15:10)
stephan75
|