This is a static archive of our old OpenStreetMap Help Site. Please post any new questions and answers at community.osm.org.

Osm2pgsql –append is extremely slow

So I was following https://switch2osm.org/serving-tiles/manually-building-a-tile-server-18-04-lts/ to load a "gis" database with tiles of Azerbaijan and everything worked RELATIVELY quickly.

But after I tried to append a map of central part of Russia the speed of PDB was horrible: Processing: Node(1580k 5.1k/s) Way(0k 0.00k/s) Relation(0 0.00/s) And it didn't consume any resources: 3 CPUs were consumed more by x-org then by Osm2pgsql and there was plenty of RAM available (2.2G out of 9 were used) The script I run for this is as follows:

osm2pgsql -d gis --append --slim  -G --hstore --tag-transform-script ~/src/openstreetmap-carto/openstreetmap-carto.lua -C 2500 --number-processes 1 -S ~/src/openstreetmap-carto/openstreetmap-carto.style ~/data/central-fed-district-latest.osm.pbf

When I added Azerbaijan tiles, I used "--create" instead of "--append" option and the speed of Node processing was around 300 kb/sec (whichs extremly low as well, but not THAT low). I can't figure out what might be the reason for this, because there is a plenty of resources available.

So my question is how do I speed this up, if possible. Thanks.

osm2pgsql

asked 26 Feb '20, 10:23

kartman1
38●6●7●11
accept rate: 0%

edited 26 Feb '20, 10:27

3 Answers:

Don't use --append. Merge the .osm.pbf files you want to import with osmium (or osmosis), and then import the combined file. Appending is super slow because it uses the same processing chain as updating, and for every new object that comes in the update processing chain needs to check if the new information causes a change to existing data.

answered 26 Feb '20, 10:36

Frederik Ramm ♦
82.5k●92●720●1273
accept rate: 23%

Then why (if this operation is so ineffecient) doesn't it consume more processing power? I don't really mind it doing some extra work, it's just weird that it refuses to use any resources. That is just being lazy at best.

(26 Feb '20, 11:57) kartman1

My suspicion is that it is limited by disk I/O.

(26 Feb '20, 12:00) Frederik Ramm ♦

OMG I downloaded the PBF file for the whole Asia (7.7 Gb) and left it working for the night and when I got back, the speed of the Node Processing was 12 kb/s!!! Wtf is that! Wtf is that application! How do people even use it! My pc is a production grade machine and even though I run a VM it was configured to use more than a half of everything from the host system! Wtf is going on! Do I have to wait a whole month (if I'm lucky!) to load the database every time something small changes as well?!!

(27 Feb '20, 07:46) kartman1

Please, calm down. Lose the "OMG"s and the "Wtf"s and the multiple exclamation marks. This is not high school. -- OSM data import is very I/O intensive. You absolutely must have a (local, not SAN) SSD/NVMe disk if you want to get any performance out of it, and when using virtualisation you must make sure that you have the right settings (disk drivers, emulation, whatever) as to not interfere with disk I/O performance. You can have the fattest production grade machine with 256 GB RAM and 64 CPU cores, the Asia import will still run for a week if you have rotating hard disk or bad virtualization. -- There are also some recommendations about tuning your PostgreSQL database for faster imports which you can find using your favourite search engine.

(27 Feb '20, 10:09) Frederik Ramm ♦

@kartman1 Thanks for giving everyone a laugh this morning!

To take a step back, the answer suggested "Merge the .osm.pbf files you want to import with osmium (or osmosis), and then import the combined file". You at the time were trying to process Azerbaijan and part of Russia. No-one suggested "download the whole of Asia and try and load that. I'd suggest that you browse around http://download.geofabrik.de/asia.html and decide what bits you want. If it's just 2 or more downloadable regions, then download them and merge using osmium or osmosis. If it's part of a larger area, then perhaps cut the bit you want out of that larger area. There's documentation linked from the OSM wiki - for example https://wiki.openstreetmap.org/wiki/Osmium#Osmium_Tool points to https://osmcode.org/osmium-tool/ and one of the things that that mentions is "Create geographical extracts from OSM files".

(27 Feb '20, 10:31) SomeoneElse ♦

Helpfully, another question has more detail about "how to obtain a smaller area from within a larger download".

(27 Feb '20, 10:40) SomeoneElse ♦

"Thanks for giving everyone a laugh this morning!" - you are welcome, I had to pay with something.

(27 Feb '20, 13:17) kartman1

showing 5 of 7 show 2 more comments

Well...no postgres optimization helped, I moved vm from expanding disk to a fixed one, still had no impprovements. After searching a bit more carefully on this topick I found similar issues (on machines far more superior) with the way processing on a vm. So there is no surprise that playing around/tweaking/tiddling/widdling/etc. with conf files didn't help at all. As a result I had to borrow a machine with an I7-8700 CPU, 32 gb ram, samsung 970 evo 1TB SSD, Windows 10. And the performance was:

Processing: Node(1160123k 321.1k/s) Way(146288k 7.23k/s) Relation(1128670 634.08/s) parse time: 25620s Node stats: total(1160123207) , max(7248671584) in 3613s Way stats: total(146288690) , max(776865119) in 20227s Relation stats: total(1128706), max(10758453) in 1780s

This answer is marked "community wiki".

answered 03 Mar '20, 07:53

kartman1
38●6●7●11
accept rate: 0%

edited 03 Mar '20, 08:22

Is this for Asia, and using a similar osm2pgsql command line as shown in your initial post (minus the --append)? Can you share the output of "osm2pgsql -v"?

(03 Mar '20, 08:54) Frederik Ramm ♦

It is for Asia with the following command: osm2pgsql -d gis --create --slim -G --hstore --tag-transform-script ~/src/openstreetmap-carto/openstreetmap-carto.lua -C 10500 --drop --disable-parallel-indexing --number-processes 5 -S ~/src/openstreetmap-carto/openstreetmap-carto.style ~/data/asia-latest.osm.pbfA

(03 Mar '20, 11:50) kartman1

Silly question - do you actually want to load data for the whole of Asia?

(03 Mar '20, 11:52) SomeoneElse ♦

Yeah I needed Asian maps from the start.

(03 Mar '20, 12:00) kartman1

Do you have the osm2pgsql version for us (osm2pgsql -v)?

(03 Mar '20, 12:30) Frederik Ramm ♦

osm2pgsql version 1.2.0 (1.2.0-269-g7892613) (64 bit id space)

(03 Mar '20, 12:34) kartman1

This import, with this osm2pgsql version and this command line (though dropping the -C10500 and adding a --flat-nodes file on the SSD) took 3 hours and 25 minutes on a reasonably big SSD machine here, without virtualisation, plain Ubuntu 16.04. (Node stats: total(1161737945), max(7259603803) in 432s; Way stats: total(146494361), max(777866045) in 3440s; Relation stats: total(1129804), max(10780102) in 490s). This matches my expectations from importing full planet files. I tried the same import without --flat-nodes and everything else the same, and it took 8 hours 25 minutes instead (Node stats: total(1161737945), max(7259603803) in 2173s; Way stats: total(146494361), max(777866045) in 18040s; Relation stats: total(1129804), max(10780102) in 1755s).

(03 Mar '20, 16:19) Frederik Ramm ♦

showing 5 of 7 show 2 more comments

You should definitely upgrade to osm2pgsql 1.2.1 (https://github.com/openstreetmap/osm2pgsql/releases). Although that may appear as a "minor" release, it actually fixes an important memory related bug caused by libosmium, that could cause out of memory issues running osm2pgsql due to GBs of RAM needed for area building. I personally experienced that issue, upgrading to 1.2.1 fixed it.

As others stated: harddisks are totally dead for this kind of work. Building ways and relations requires fast random access to data stored on disk. Best is NVMe SSD, but I run osm2pgsql happily against a USB3 connected 2TB SATA EVO.

VMs are not evil perse. I have successfully imported the whole of Europe (> 20GB PBF) on a 12 GB RAM (+ swap configured) Ubuntu 19.10 Virtualbox instance running PostgreSQL 11, running on an eight year old Core i7-2600 with 16GB RAM and Windows 10 as host system, with the described USB3 connected disk.

Took about 2-3 days for the import if I remember well, not to bad, stats for nodes/ways/relation per sec were comparable to your figures I think. Building osm2pgsql with LUAJIT supports, seemed indeed to improve processing speed by about 25%, as mentioned on the osm2pgsql github site: https://github.com/openstreetmap/osm2pgsql

answered 03 Mar '20, 21:57

mboeringa
1.5k●2●15●27
accept rate: 9%

edited 04 Mar '20, 08:25