NOTICE: help.openstreetmap.org is no longer in use from 1st March 2024. Please use the OpenStreetMap Community Forum

Follow-up to my previous (bad) question. I managed to catch one of these freezes and monitor the syslog and server behavior. The "freeze" does not need a server restart. It would actually work itself out in about 5-10 minutes.

First of all, this is what I see on the map that is using my tile server. Requests for (pre-rendered) tiles are all stuck.

Here's what my VPS monitoring looks like. Look at the bottom chart "Bandwidth Public". As you can see, the server is operating normally and suddenly has a huge drop in outbound bandwidth, which means the server stopped serving tiles. All other system stats look normal.

Looking at the syslog, here is the section of output code during the freeze. The freeze began at around 15:33 and ended about 10 minutes later. The anomaly code is at 15:37, where there were a bunch of outputs like Created slice User Slice of root and Reached target Sockets. However the freeze began at 15:33, so 4 minutes before the weird output began.

During the freeze, there was a 1 minute period at 15:39 when it successfully served some tiles, before freezing up for another 3-4 minutes. After 15:44, the tile server went back to normal.

Apr 16 15:33:49 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36325), y(47623), mime(image/png), options()
Apr 16 15:33:49 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:33:51 tilerserver2 renderd[2542]: DEBUG: DONE TILE ajt 17 36320-36327 47616-47623 in 2.639 seconds
Apr 16 15:35:01 tilerserver2 CRON[26475]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Apr 16 15:35:01 tilerserver2 CRON[26477]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 16 15:35:01 tilerserver2 CRON[26478]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
Apr 16 15:37:56 tilerserver2 systemd[1]: Created slice User Slice of root.
Apr 16 15:37:56 tilerserver2 systemd[1]: Starting User Manager for UID 0...
Apr 16 15:37:56 tilerserver2 systemd[1]: Started Session 2590 of user root.
Apr 16 15:37:56 tilerserver2 systemd[26804]: Reached target Sockets.
Apr 16 15:37:56 tilerserver2 systemd[26804]: Reached target Timers.
Apr 16 15:37:56 tilerserver2 systemd[26804]: Reached target Paths.
Apr 16 15:37:56 tilerserver2 systemd[26804]: Reached target Basic System.
Apr 16 15:37:56 tilerserver2 systemd[26804]: Reached target Default.
Apr 16 15:37:56 tilerserver2 systemd[26804]: Startup finished in 30ms.
Apr 16 15:37:56 tilerserver2 systemd[1]: Started User Manager for UID 0.
Apr 16 15:38:22 tilerserver2 systemd[1]: Started Session 2591 of user root.
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36576), y(47867), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: START TILE ajt 17 36576-36583 47864-47871, age 3.02 days
Apr 16 15:39:09 tilerserver2 renderd[2542]: Rendering projected coordinates 17 36576 47864 -> -8854465.356560|5400734.670520 -8852019.371654|5403180.655425 to a 8 x 8 tile
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36576), y(47866), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36577), y(47866), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36576), y(47865), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 13, number 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36577), y(47868), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 1 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(13) xml(ajt), z(17), x(36577), y(47865), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 13 closed, now 0 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36576), y(47868), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36578), y(47865), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36578), y(47867), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36578), y(47866), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36577), y(47867), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36578), y(47868), mime(image/png), options()
Apr 16 15:39:09 tilerserver2 renderd[2542]: DEBUG: Connection 0, fd 12 closed, now 0 left
Apr 16 15:39:13 tilerserver2 renderd[2542]: DEBUG: DONE TILE ajt 17 36576-36583 47864-47871 in 3.652 seconds
Apr 16 15:40:01 tilerserver2 CRON[26922]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 16 15:40:01 tilerserver2 CRON[26923]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
Apr 16 15:40:34 tilerserver2 systemd[1]: Started Session 2594 of user root.
Apr 16 15:43:45 tilerserver2 systemd[1]: Started Session 2595 of user root.
Apr 16 15:44:08 tilerserver2 renderd[2542]: DEBUG: Got incoming connection, fd 12, number 1
Apr 16 15:44:08 tilerserver2 renderd[2542]: DEBUG: Got incoming request with protocol version 2
Apr 16 15:44:08 tilerserver2 renderd[2542]: DEBUG: Got command RenderLow fd(12) xml(ajt), z(17), x(36536), y(47857), mime(image/png), options()

Apache2 access.log during a freeze. There is a gap between 14:01 and 14:06 when server is not taking any requests (the freeze).

69.158.20.237 - - [23/Apr/2019:14:01:17 +0000] "GET /hot/17/36646/47835.png HTTP/1.1" 200 18392 "https://www.mysite.ca/map/43.651611&-79.360015&17" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
69.158.20.237 - - [23/Apr/2019:14:01:17 +0000] "GET /hot/17/36646/47833.png HTTP/1.1" 200 17406 "https://www.mysite.ca/map/43.651611&-79.360015&17" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
69.158.20.237 - - [23/Apr/2019:14:01:17 +0000] "GET /hot/17/36646/47836.png HTTP/1.1" 200 30092 "https://www.mysite.ca/map/43.651611&-79.360015&17" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
184.175.22.23 - - [23/Apr/2019:14:01:18 +0000] "GET /hot/14/4572/5978.png HTTP/1.1" 200 73673 "https://www.mysite.ca/map/43.658775&-79.464724&13" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"
69.158.20.237 - - [23/Apr/2019:14:01:18 +0000] "GET /hot/17/36647/47835.png HTTP/1.1" 200 16931 "https://www.mysite.ca/map/43.653908&-79.353299&17" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
72.139.207.185 - - [23/Apr/2019:14:06:14 +0000] "GET /hot/14/4560/5972.png HTTP/1.1" 200 31375 "https://www.mysite.ca/map/43.741305&-79.805202&14" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
72.139.207.185 - - [23/Apr/2019:14:06:14 +0000] "GET /hot/14/4561/5972.png HTTP/1.1" 200 49864 "https://www.mysite.ca/map/43.741305&-79.805202&14" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
72.139.207.185 - - [23/Apr/2019:14:06:14 +0000] "GET /hot/14/4557/5972.png HTTP/1.1" 200 17749 "https://www.mysite.ca/map/43.741305&-79.805202&14" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
184.175.22.23 - - [23/Apr/2019:14:06:14 +0000] "GET /hot/13/2284/2990.png HTTP/1.1" 200 88895 "https://www.mysite.ca/map/43.666105&-79.444001&14" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"

asked 16 Apr '19, 17:10

valachio's gravatar image

valachio
418913
accept rate: 0%

edited 23 Apr '19, 16:52


Maybe have a look to the load at that moment. I see you have munin running, what do munin graphs look like?

permanent link

answered 17 Apr '19, 05:58

yvecai's gravatar image

yvecai
1.5k1226
accept rate: 9%

The system load at the moment is normal. See the bottom chart (Bandwidth Public). The dip on the right is when the freeze happened. All other system stats look normal, apart from a very small spike in CPU.

I looked through the munin logs. Is there a particular one I should look at? There are munin-graph, munin-update, munin-limits, munin-html, munin-node. I looked through all of them. I tried to compare the munin logs when during a freeze, and when the tile server was serving normally, I couldn't see a difference in the logs.

(23 Apr '19, 15:48) valachio
During the freeze, there was a 1 minute period at 15:39 when it successfully served some tiles, before freezing up for another 3-4 minutes. After 15:44, the tile server went back to normal.

Looking at the log, it was serving tiles OK to 15:39 and no requests pending at 15:39, so it's not surprising that things went quiet until 15:44. IF you were trying to send requests in in that time they weren't reaching apache (or at least not mod_tile).

permanent link

answered 19 Apr '19, 01:17

SomeoneElse's gravatar image

SomeoneElse ♦
36.9k71370866
accept rate: 16%

Sorry for delayed response.

The website is always serving ~100 users at any point of time, so there is a constant stream of requests coming in. I was trying to use the website myself at the time and map tiles were not loading.

If the requests were not reaching Apache/mod_tile at that time, any idea why not? Did Apache freeze? Did mod_tile freeze? Why does the issue resolve itself in 10 minutes?

(23 Apr '19, 15:40) valachio
2

I found something. I looked through Apache2's access.log. During the freeze, access.log showed no requests during the freeze. On average there should be about 10 requests per second.

So it would seem that the requests are not reaching Apache...

Apache's error.log showed no errors during the freeze

(23 Apr '19, 16:18) valachio
2

Then maybe the requests did not reach the server at all: network issue?

(23 Apr '19, 20:51) yvecai

(at the risk of stating the bleeding obvious) perhaps you could script some web page accesses locally to show up periodically in apache's access log to see if they stop when the server "freezes". If they don't, it's an external network issue.

(23 Apr '19, 21:23) SomeoneElse ♦

@yvecai The server is running fine during the freeze. But it does seem any tile requests are being ignored by Apache for some reason... What is causing Apache to ignore tile requests for 5 minutes at a time, randomly?

(24 Apr '19, 03:24) valachio
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×287
×204
×105

question asked: 16 Apr '19, 17:10

question was seen: 3,121 times

last updated: 24 Apr '19, 03:24

NOTICE: help.openstreetmap.org is no longer in use from 1st March 2024. Please use the OpenStreetMap Community Forum