Hello,

I'm using the OSM planet file, imported into Postgres with osm2pgsql. I'm trying to calculate a 'greenery index' for each road on the planet, and one of the approaches involves working out what portion of each road lies within a 30 meter buffer of a green space (a polygon in this instance).

The query works fine and produces the expected results, but it takes incredibly long. The simple version of the query below takes approximately 35 hours using the full OSM planet dataset.

A vastly simplified version of my query looks like the following:

SELECT
   road.osm_id,
   -- the fuller version of the query has a lot more inside this select
   count(*)
FROM planet_osm_line AS road
INNER JOIN planet_osm_polygon_feature AS building ON ST_Intersects(road.way, building.wayb)
group by road.osm_id;

My tables and indexes are defined as follows:

gis=# \d+ planet_osm_line
                                         Table "public.planet_osm_line"
   Column   |           Type            | Collation | Nullable | Default | Storage  | Stats target | Description
------------+---------------------------+-----------+----------+---------+----------+--------------+-------------
 osm_id     | bigint                    |           |          |         | plain    |              |
 access     | text                      |           |          |         | extended |              |
 area       | text                      |           |          |         | extended |              |
 bicycle    | text                      |           |          |         | extended |              |
 bridge     | text                      |           |          |         | extended |              |
 embankment | text                      |           |          |         | extended |              |
 foot       | text                      |           |          |         | extended |              |
 harbour    | text                      |           |          |         | extended |              |
 highway    | text                      |           |          |         | extended |              |
 historic   | text                      |           |          |         | extended |              |
 landuse    | text                      |           |          |         | extended |              |
 leisure    | text                      |           |          |         | extended |              |
 name       | text                      |           |          |         | extended |              |
 network    | text                      |           |          |         | extended |              |
 natural    | text                      |           |          |         | extended |              |
 place      | text                      |           |          |         | extended |              |
 route      | text                      |           |          |         | extended |              |
 tracktype  | text                      |           |          |         | extended |              |
 tunnel     | text                      |           |          |         | extended |              |
 water      | text                      |           |          |         | extended |              |
 waterway   | text                      |           |          |         | extended |              |
 wetland    | text                      |           |          |         | extended |              |
 width      | text                      |           |          |         | extended |              |
 wood       | text                      |           |          |         | extended |              |
 z_order    | integer                   |           |          |         | plain    |              |
 way_area   | real                      |           |          |         | plain    |              |
 way        | geometry(LineString,3857) |           |          |         | main     |              |
Indexes:
    "planet_osm_line_osm_id_idx" btree (osm_id)
    "planet_osm_line_way_idx" gist (way)
    "road_way" btree (highway)
Access method: heap


gis=# \d+ planet_osm_polygon_feature
                                   Table "public.planet_osm_polygon_feature"
   Column   |          Type           | Collation | Nullable | Default | Storage  | Stats target | Description
------------+-------------------------+-----------+----------+---------+----------+--------------+-------------
 osm_id     | bigint                  |           |          |         | plain    |              |
 access     | text                    |           |          |         | extended |              |
 area       | text                    |           |          |         | extended |              |
 bicycle    | text                    |           |          |         | extended |              |
 bridge     | text                    |           |          |         | extended |              |
 embankment | text                    |           |          |         | extended |              |
 foot       | text                    |           |          |         | extended |              |
 harbour    | text                    |           |          |         | extended |              |
 highway    | text                    |           |          |         | extended |              |
 historic   | text                    |           |          |         | extended |              |
 landuse    | text                    |           |          |         | extended |              |
 leisure    | text                    |           |          |         | extended |              |
 name       | text                    |           |          |         | extended |              |
 network    | text                    |           |          |         | extended |              |
 natural    | text                    |           |          |         | extended |              |
 place      | text                    |           |          |         | extended |              |
 route      | text                    |           |          |         | extended |              |
 tracktype  | text                    |           |          |         | extended |              |
 tunnel     | text                    |           |          |         | extended |              |
 water      | text                    |           |          |         | extended |              |
 waterway   | text                    |           |          |         | extended |              |
 wetland    | text                    |           |          |         | extended |              |
 width      | text                    |           |          |         | extended |              |
 wood       | text                    |           |          |         | extended |              |
 z_order    | integer                 |           |          |         | plain    |              |
 way_area   | real                    |           |          |         | plain    |              |
 way        | geometry(Geometry,3857) |           |          |         | main     |              |
 wayb       | geometry(Geometry,3857) |           |          |         | main     |              |
Indexes:
    "feature_wayb" gist (wayb) CLUSTER
    "planet_osm_polygon_feature_leisure_idx" btree (leisure)
    "planet_osm_polygon_feature_natural_idx" btree ("natural")
    "planet_osm_polygon_feature_osm_id_idx" btree (osm_id)
    "planet_osm_polygon_feature_waterway_idx" btree (waterway)
    "planet_osm_polygon_feature_way_idx" gist (way)
Access method: heap

The planet_osm_polygon_feature.wayb field is just created with st_buffer(way,30).

Here is my query plan:

gis=# explain SELECT road.osm_id, count(*)
FROM planet_osm_line AS road
INNER JOIN planet_osm_polygon_feature AS building ON ST_Intersects(road.way, building.wayb)
group by road.osm_id;
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=1001.15..222836071810.48 rows=42674141 width=16)
   Group Key: road.osm_id
   ->  Gather Merge  (cost=1001.15..222833724732.72 rows=384067269 width=16)
         Workers Planned: 9
         ->  Partial GroupAggregate  (cost=0.99..222786057281.88 rows=42674141 width=16)
               Group Key: road.osm_id
               ->  Nested Loop  (cost=0.99..222449112574.99 rows=67303593096 width=8)
                     ->  Parallel Index Scan using planet_osm_line_osm_id_idx on planet_osm_line road  (cost=0.57..9513398.17 rows=19532914 width=306)
                     ->  Index Scan using feature_wayb on planet_osm_polygon_feature building  (cost=0.41..11383.39 rows=455 width=2634)
                           Index Cond: (wayb && road.way)
                           Filter: st_intersects(road.way, wayb)
(11 rows)

And here are the row counts:

gis=# select count(*) from planet_osm_line;
   count
-----------
 175796232
(1 row)

gis=# select count(*) from planet_osm_polygon_feature;
  count
---------
 4551875
(1 row)

My Postgres config is:

shared_buffers = 128GB
work_mem = 512MB
effective_io_concurrency = 200
max_worker_processes = 12
max_parallel_workers_per_gather = 12
max_parallel_workers = 12

The server is a Xeon E5-1650 v3, with 256GB DDR4, and a 2TB SSD (which Postgres is allocated to use).

During the ~35 hour execution time, I see that the server uses approximately 80% of the CPU cores pretty much solidly, and there's almost zero disk IO.

I'd like to find a way of reducing the query time. I'm iterating on the SQL rules quite frequently, and the reprocessing delay is quite burdensome.

I fully realise I'm attempting to join two massive tables, I've indexed it as sensibly as I can, and my query plan seems to look okay. If I run the query just on the England dataset (which I realise is tiny in comparison to the planet), then it only takes a few minutes.

Anyway, if anyone has any ideas on how to improve a ~35 hour execution time, I'm all ears! Thanks in advance!

asked 30 Nov, 17:33

samcrawford's gravatar image

samcrawford
313
accept rate: 0%

Be the first one to answer this question!
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×213
×122

question asked: 30 Nov, 17:33

question was seen: 44 times

last updated: 30 Nov, 17:33

powered by OSQA