NOTICE: help.openstreetmap.org is no longer in use from 1st March 2024. Please use the OpenStreetMap Community Forum

The simplest version of the command I'm using is (pretty formatting for easier reading - it's usually one line):

osmosis --read-pbf-fast file="north-america-latest.osm.pbf" 
        --bounding-polygon file="holyoke_ma.poly" --write-xml file="holyoke_ma.osm"

I get more complex by adding --tee ## with ## being the number of files being read/written. So that's something like (pretty formatting for easier reading - it's usually one line):

osmosis --read-pbf-fast file="north-america-latest.osm.pbf" --tee 4 
        --bounding-polygon file="city1.poly" --write-xml file="city1.osm"
        --bounding-polygon file="city2.poly" --write-xml file="city2.osm"
        --bounding-polygon file="city3.poly" --write-xml file="city3.osm"
        --bounding-polygon file="city4.poly" --write-xml file="city4.osm"

I've tried adding workers=## to the mix, with ## being the number of cores on the server. That results in something like (pretty formatting for easier reading - it's usually one line):

osmosis --read-pbf-fast workers=4 file="north-america-latest.osm.pbf" --tee 4 
        --bounding-polygon file="city1.poly" --write-xml file="city1.osm"
        --bounding-polygon file="city2.poly" --write-xml file="city2.osm"
        --bounding-polygon file="city3.poly" --write-xml file="city3.osm"
        --bounding-polygon file="city4.poly" --write-xml file="city4.osm"

In any of my attempts, with 2 or more cores, the process tops out at nearly 200% CPU use (when viewed in top). When viewed in mpstat -P ALL (which you can get after running apt-get install sysstat), there's one CPU that usually sits at about 20-30%, and the others are mostly idle.

I verified that I can get a process to use all cores (4 in this case) by running sysbench --test=cpu --cpu-max-prime=20000 --num-threads=4 run (which you can get after running apt-get install sysbench) ... This would spike the process to 400% (when viewed in top).

  • How do I get osmosis to use all of the available CPUs/cores?
  • Where (if anywhere) should --buffer entries be placed? Before the --bounding-polygon flag? Before the --write-xml flag?
  • Should I keep a limit to my --tee usage? Does that tie to the number of cores at all? Sometimes I have hundreds (up to a thousand or so) of .poly files to process - can I put them all into a single osmosis command with a very large tee value?

asked 16 Jan '14, 19:12

JamesChevalier's gravatar image

JamesChevalier
1517713
accept rate: 25%


I would suggest to place one "buffer" before and after each --bounding-polygon directive. I haven't tried more than about 50 "tee" threads but I guess more would still be possible - but keep in mind that the number of "point in polygon" checks that osmosis has to make is the size of your input file multiplied by the number of --bounding-polygon threads you're using - each object will be checked against all (thousands of) polygons. Therefore it is more efficient to first divide up your input file into a couple of smaller regions and extract your files from them.

Here's an older blog entry that describes how we used to run the Geofabrik extracts. We nowadays use the history splitter which offers better performance when doing a large number of polygon splits at once.

permanent link

answered 16 Jan '14, 19:26

Frederik%20Ramm's gravatar image

Frederik Ramm ♦
82.5k927201273
accept rate: 23%

Thanks! (I actually came across your blog post the other day, and thought about contacting you directly). I've mostly used a country OSM file (like germany-latest.osm.pbf), but I did run some tests with a 'state' level file (like brandenburg-latest.osm.pbf). Those definitely finished quicker, but still didn't make full use of the CPU. I'll try again with your suggestion of buffer placement.

(16 Jan '14, 19:37) JamesChevalier

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×252
×92
×60

question asked: 16 Jan '14, 19:12

question was seen: 10,451 times

last updated: 16 Jan '14, 19:37

NOTICE: help.openstreetmap.org is no longer in use from 1st March 2024. Please use the OpenStreetMap Community Forum