Hello all, I really need help on this front! I've set up a Nominatim Server on GCP's Compute Engine. It works ok enough, but now I have 100 million unique addresses that I need to forward-geocode through my service, and I'm trying to use multiprocessing to speed things up - even 100 addresses processed simultaneously stalls the service. My VM has 128 GB of RAM & 24 CPUs. I followed the configuration recommendations from the installation guide. Does anyone have any best practices for setting up the service for handling HUGE bulk-loads? Would switching from apache to nginx help? Reproduceable Code Example in Python:
asked 20 Sep '20, 23:58 rirhun |
For installations with a high load, you should switch your server at least to php-fpm. In my experience it is also worth switching to nginx, as it is much better in coping with many parallel requests. Your system setup should be able to manage 600 request/s. (It depends on how the VM is set up, in particular, how fast is access to the disks.) On a general note: it is not really worth increasing the number of parallel requests infinitely. Your server has a fixed number of CPUs and that limits the number of parallel work you can do. Too many parallel request and they get in each other's way, which actually slows you down. In your case I would expect that beyond 50 parallel requests you won't see much increase in throughput. answered 21 Sep '20, 09:09 lonvia Thanks for the insightful response - I'm using an SSD so access to disk should be relatively fast I believe. I tried looking into nginx and it wasn't working - the server kept complaining "file not found" w/ regards to the php-fpm.socket even though the path to the socket was correct.
(21 Sep '20, 17:55)
rirhun
|