Hi, I have successfully installed Nominatim with a single Postgres Database in my infrastructure. However, our infrastructure team has informed me that the planet database is too large, and they recommend working with smaller databases for better performance. My initial idea is to set up separate instances of Nominatim for different regions and create a gateway to expose the API as a unified deployment. To achieve this, I'm considering making the 'countryCode' mandatory in user requests and then forwarding the request to the appropriate Nominatim instance based on the country code. For example, if the 'countryCode' is 'Austria,' I would route the request to the Europe region instance. However, I'm concerned about potential issues, especially when a user requests an address in a place like Hawaii, which falls within the Oceania region. In this case, for a 'countryCode' of 'US,' I may need to query at least two regions simultaneously and somehow combine the results. My primary concern is not the increased complexity of this deployment (multiple databases and API servers) as the infrastructure constraints take precedence. I'm more interested in understanding potential functional issues that I might not be aware of at the moment. Has anyone faced a similar challenge or has insights to share on the best practices for handling such a multi-region Nominatim deployment? asked 31 Oct '23, 14:31 MattCon |
Let me guess, your IT people want to run everything "in the cloud" and somehow a single cloud node (or whatever the term in the particular cloud infrastructure is) cannot allocate the resources needed. And now you are on a quest to spend several person-weeks of design time to implement a workaround for this infrastructure shortcoming, resulting in a system that is more complex and more prone to failure and where (unlike when running a "standard" Nominatim) your organisation will be the only group on earth using it, so the chances of finding others with similar problems (should problems arise) are near zero. Brilliant plan! Less cynically, I think your plan can work as long as you avoid splitting countries. Nominatim doesn't like it if a country boundary is not fully there. This could require some work if you are dealing with countries that have overseas territories, like France. You might have to cut out your regions from a planet file with Osmium and ensure referential integrity on county border relations, something that is not granted in e.g. Geofabrik download files (where the Europe file will not contain French Dom-Tom areas like Martinique). I would still strongly advise against your plan. If you cannot allocate the ~ 1 TB of disk space and 64 GB of RAM to run a full-blown Nominatim server (and finding these resources will almost certainly be cheaper than continuing down the path you have started on!), then you might consider
answered 31 Oct '23, 14:59 Frederik Ramm ♦ Thanks Frederik for your answer, you made me smile. I want to assure you than except for this question, no more time was spent on this cursed path and no more will be spent. The critical resource here is space, so I'm interested in the first option you pictured. Could you please elaborate or point me to existing resources on this? Thanks a lot!
(31 Oct '23, 16:00)
MattCon
When Nominatim does the initial import it uses a number of extra database tables that are not required for production use, but are needed during the import process and - if later incremental updates are desired - also during updates. If you do not need incremental updates (and instead can live with, say, doing a full new data load once a month) then you can simply remove the excess information from your database after the import, see https://nominatim.org/release-docs/latest/admin/Import/#dropping-data-required-for-dynamic-updates - and after that copy the much-reduced database to the actual production machine. If you would like to do incremental updates, then a variant of the above can be achieved by setting up PostgreSQL logical replication between a beefy "import server" (that has the extra tables needed for updates, and keeps them), and one (or more - there's an easy option for scaling here) "followers" that do the actual request serving but only have the subset of tables required for that. PostgreSQL replication allows you to selectively replicate tables.
(31 Oct '23, 18:10)
Frederik Ramm ♦
|