This is a static archive of our old OpenStreetMap Help Site. Please post any new questions and answers at community.osm.org.

OSM Minutely Diffs process

6

I'm curious if anyone knows where proper documentation lives that walks through how osm manages to pull minutely diff files from the API database?

I know the documentation states that they use osmosis --read-apid. Warning this gets a little in the weeds, but I am trying to find the proper channel to ask my question.

Looking at osmosis source code, this is the query passed to the DB:

(CREATE TEMPORARY TABLE tmp_nodes ON COMMIT DROP AS SELECT node_id, version FROM nodes WHERE (((xid_to_int4(xmin) >= ## AND xid_to_int4(xmin) <= ##))) AND redaction_id IS NULL;)

I don't see how osm is able to read the nodes table and filter down by transation IDs in < 1 minute. Scaling up from my small API db it would seem that the nodes table in osm production API db would have to be around 350GB alone.

Looking here - https://hardware.openstreetmap.org/servers/katla.openstreetmap.org/

The data shows that the Main server housing the APIdb has 252 GB RAM. This wouldn't be enough to read the entire nodes table in to RAM, so I must be missing something.

If anyone has an idea how this is accomplished or where it is documented, I would be really interested in hearing about it.

Thanks.

asked 15 Feb '17, 16:06

Cellington's gravatar image

Cellington
21691015
accept rate: 0%


One Answer:

6

That's easy, we have an extra index on the nodes table, defined as follows:

CREATE INDEX nodes_xmin_idx ON nodes USING btree (xid_to_int4(xmin))

There are also equivalent indexes on the ways and relations tables.

answered 15 Feb '17, 16:51

TomH's gravatar image

TomH ♦♦
3.3k83943
accept rate: 20%

edited 15 Feb '17, 16:57

Ah an additional index, that makes sense. Thanks @TomH

(15 Feb '17, 17:14) Cellington

Source code available on GitHub .