This is a static archive of our old OpenStreetMap Help Site. Please post any new questions and answers at community.osm.org.

How can I get osmosis to use more than two CPUs/cores?

The simplest version of the command I'm using is (pretty formatting for easier reading - it's usually one line):

osmosis --read-pbf-fast file="north-america-latest.osm.pbf" 
        --bounding-polygon file="holyoke_ma.poly" --write-xml file="holyoke_ma.osm"

I get more complex by adding --tee ## with ## being the number of files being read/written. So that's something like (pretty formatting for easier reading - it's usually one line):

osmosis --read-pbf-fast file="north-america-latest.osm.pbf" --tee 4 
        --bounding-polygon file="city1.poly" --write-xml file="city1.osm"
        --bounding-polygon file="city2.poly" --write-xml file="city2.osm"
        --bounding-polygon file="city3.poly" --write-xml file="city3.osm"
        --bounding-polygon file="city4.poly" --write-xml file="city4.osm"

I've tried adding workers=## to the mix, with ## being the number of cores on the server. That results in something like (pretty formatting for easier reading - it's usually one line):

osmosis --read-pbf-fast workers=4 file="north-america-latest.osm.pbf" --tee 4 
        --bounding-polygon file="city1.poly" --write-xml file="city1.osm"
        --bounding-polygon file="city2.poly" --write-xml file="city2.osm"
        --bounding-polygon file="city3.poly" --write-xml file="city3.osm"
        --bounding-polygon file="city4.poly" --write-xml file="city4.osm"

In any of my attempts, with 2 or more cores, the process tops out at nearly 200% CPU use (when viewed in top). When viewed in mpstat -P ALL (which you can get after running apt-get install sysstat), there's one CPU that usually sits at about 20-30%, and the others are mostly idle.

I verified that I can get a process to use all cores (4 in this case) by running sysbench --test=cpu --cpu-max-prime=20000 --num-threads=4 run (which you can get after running apt-get install sysbench) ... This would spike the process to 400% (when viewed in top).

How do I get osmosis to use all of the available CPUs/cores?
Where (if anywhere) should --buffer entries be placed? Before the --bounding-polygon flag? Before the --write-xml flag?
Should I keep a limit to my --tee usage? Does that tie to the number of cores at all? Sometimes I have hundreds (up to a thousand or so) of .poly files to process - can I put them all into a single osmosis command with a very large tee value?

pbf polygon osmosis

asked 16 Jan '14, 19:12

JamesChevalier
151●7●7●13
accept rate: 25%

One Answer:

I would suggest to place one "buffer" before and after each --bounding-polygon directive. I haven't tried more than about 50 "tee" threads but I guess more would still be possible - but keep in mind that the number of "point in polygon" checks that osmosis has to make is the size of your input file multiplied by the number of --bounding-polygon threads you're using - each object will be checked against all (thousands of) polygons. Therefore it is more efficient to first divide up your input file into a couple of smaller regions and extract your files from them.

Here's an older blog entry that describes how we used to run the Geofabrik extracts. We nowadays use the history splitter which offers better performance when doing a large number of polygon splits at once.

answered 16 Jan '14, 19:26

Frederik Ramm ♦
82.5k●92●720●1273
accept rate: 23%

Thanks! (I actually came across your blog post the other day, and thought about contacting you directly). I've mostly used a country OSM file (like germany-latest.osm.pbf), but I did run some tests with a 'state' level file (like brandenburg-latest.osm.pbf). Those definitely finished quicker, but still didn't make full use of the CPU. I'll try again with your suggestion of buffer placement.

(16 Jan '14, 19:37) JamesChevalier