This is a static archive of our old OpenStreetMap Help Site. Please post any new questions and answers at community.osm.org.

OSM Minutely Diffs process

I'm curious if anyone knows where proper documentation lives that walks through how osm manages to pull minutely diff files from the API database?

I know the documentation states that they use osmosis --read-apid. Warning this gets a little in the weeds, but I am trying to find the proper channel to ask my question.

Looking at osmosis source code, this is the query passed to the DB:

(CREATE TEMPORARY TABLE tmp_nodes ON COMMIT DROP AS SELECT node_id, version FROM nodes WHERE (((xid_to_int4(xmin) >= ## AND xid_to_int4(xmin) <= ##))) AND redaction_id IS NULL;)

I don't see how osm is able to read the nodes table and filter down by transation IDs in < 1 minute. Scaling up from my small API db it would seem that the nodes table in osm production API db would have to be around 350GB alone.

Looking here - https://hardware.openstreetmap.org/servers/katla.openstreetmap.org/

The data shows that the Main server housing the APIdb has 252 GB RAM. This wouldn't be enough to read the entire nodes table in to RAM, so I must be missing something.

If anyone has an idea how this is accomplished or where it is documented, I would be really interested in hearing about it.

Thanks.

diffs apidb osmosis

asked 15 Feb '17, 16:06

Cellington
216●9●10●15
accept rate: 0%

One Answer:

That's easy, we have an extra index on the nodes table, defined as follows:

CREATE INDEX nodes_xmin_idx ON nodes USING btree (xid_to_int4(xmin))

There are also equivalent indexes on the ways and relations tables.

answered 15 Feb '17, 16:51

TomH ♦♦
3.3k●8●39●43
accept rate: 20%

edited 15 Feb '17, 16:57

Ah an additional index, that makes sense. Thanks @TomH

(15 Feb '17, 17:14) Cellington