This is a static archive of our old OpenStreetMap Help Site. Please post any new questions and answers at community.osm.org.

Configuring osm2pgsql to import objects with some particular attributes only (and all dependent objects)

I downloaded a world PBF file and going to use osm2pgsql. I'm only interested in a small subset of data:

administrative boundaries (type=boundary&boundary=administrative)
all kind of cities, towns, villages etc (place=...) - I actually just need their coordinates/polygons, names and populations

Is there a way to configure osm2pgsql in such a way that only those nodes, ways and polygons are created in database which make up administrative boundaries and city/towns/etc with their respective names, populations and coordinates?

In my understand, I should:

not use any hstore-related command line options when importing
remove everything from the default.style file and only list attributes that I'm interested in:
- boundary:administrative (how do I only import objects with attributes containing a particular value?)
- type:boundary
- place
- name
- population
- population:date
- source:population

Will this make the osm2pgsql tool to only store objects that I need (and dependent objects) and discard any other objects?

By dependent objects I mean, for example, nodes that make up a a country boundary. The nodes won't have the attributes listed above, but they are still needed for the boundaries. How do I import these, but not the others?

Am I taking the right course of action, or should I do it in a different way?

The final goal is to create a database with minimum amount of data that I can query for simple geocoding by place name, and for rendering of administrative map with administrative boundaries only and no other map features. Does it make any sense?

I'm not concerned about the time it will take to import, i.e. it makes no difference for me whether it will be 2 hours or 24 hours. The only concern is the final amount of data stored in the database, which I would like to reduce as much as possible.

osm2pgsql

asked 02 Jul '17, 06:06

meglio
40●3●3●7
accept rate: 0%

edited 02 Jul '17, 06:11

Hi @meglio, did you find a way to accomplish this? I'm looking to import only a fraction of all the features too.

(30 Oct '18, 17:24) pierrebonbon

Unfortunately, no.

(31 Oct '18, 01:23) meglio

@meglio that's a bummer, was hoping that there's was some way to make the import leaner. :-(

(05 Nov '18, 12:05) pierrebonbon

2 Answers:

You could consider using osmfilter and osmconvert to preprocess the .osm.pbf file even before it gets to osm2pgsql.

answered 02 Jul '17, 22:03

Richard ♦
30.9k●44●279●412
accept rate: 18%

Thanks @Richard. Should I consider --ignore-dependencies if I only need my geometries of my target objects and no of separate nodes / ways from which the former consist?

(03 Jul '17, 10:54) meglio

I would not advise using the --ignore-dependencies option because you will need nodes as they are of the only OSM object type which holds coordinates. Here is an example how to prefilter OSM data and even update the database with prefiltered data: https://wiki.openstreetmap.org/wiki/Openptmap/Installation#Fill_the_Database

(08 Jul '17, 20:36) Marqqs

osm2pgsql manages two sets of tables. The first set, the geoemtry tables, consists of planet_osm_polygon, planet_osm_line, planet_osm_roads, and planet_osm_point. These tables will contain the geometries you are interested in, and because they already contain ready-made geometries, no dependencies on other objects exist. Using the procedure you have outlined you will end up with a fairly minimal data set in these tables, although - and I believe this depends on the osm2pgsql version you use - there might be some duplication with both boundary lines and boundary polygons being generated.

The second set of tables, the slim tables, contain more or less raw OSM objects, in the adequatly named planet_osm_nodes, planet_osm_ways, and planet_osm_rels tables. These tables are only created if you import with --slim, and they will contain nearly all OSM objects, independent of your style file. That's several hundred GB in the case of a full planet import. Using --slim is definitely required if you want to import incremental updates, and even when you don't, --slim reduces RAM usage during the initial import. If you only use --slime to reduce RAM usage, combine with --drop to drop the slim tables after the import.

Long story short, you can reduce the total database size significantly using the process you outline, but only if don't need incremental updates.

answered 02 Jul '17, 09:26

Frederik Ramm ♦
82.5k●92●720●1273
accept rate: 23%

@frederik-ramm "no dependencies on other objects exist" - this is a gotcha moment, many thanks! If I understand correctly, slim tables are populated first as a normalized version of the data file, then geometry tables are populated out of those. If I need to experiment with osm2pgsql and let it populate various content for the geometry tables, is there a way to skip production of the slim tables on every run, and let the tool reuse the existing slim tables instead?

(03 Jul '17, 08:21) meglio

@frederik, may you please also clarify is my understanding is correct: whatever columns I specify in the .style file, if all values are null/empty for a particular object, the object is discarded and will not be present in the geometry tables. If so, how do I save names of cities, but discard all other objects that contain a name attribute?

(03 Jul '17, 09:52) meglio