This is a static archive of our old OpenStreetMap Help Site. Please post any new questions and answers at community.osm.org.

Function to parse addr:flats tag to array in PostgreSQL?

I have been using the addr:flats tag on apartment buildings which specifies a range of flat references, together with the building:flats tag which specifies the total number (quantity) of flats. The latter information is actually the bit I'm most interested in, as I am using the data (amongst other purposes) to compute the number of leaflets required for delivery in a district (during political elections). I have two reservations though:

The building:flats tag seems little used, for some reason
The value in it is technically redundant as it is computable from the addr:flats data.

So I wondered if anyone has written or knows of a user-defined function I can use within PostgreSQL to parse an addr:flats tag into a comprehensive array of distinct values. As well as the addr:flats value, the function would need to be passed the addr:interpolation value (or null). The number of flats is then simply the length of the returned array. This would be functionally similar to parsing a page printing sequence - eg this kind of thing - but I'm not much of a programmer and hoped I could steal someone else's work!

function addr_flats building_flats postgresql parsing

asked 14 Oct '18, 10:37

Libarch
31●3●3●5
accept rate: 0%

One Answer:

I've had a go myself, and this is what I came up with. It seems to work OK on my test data, but any comments or improvements would be very welcome:

CREATE OR REPLACE FUNCTION addr_flats_to_array(addr_flats text, addr_interpolation text DEFAULT NULL) RETURNS text[] AS $$
DECLARE
    part_range text[];
    part text;
    return_array text[];
    start_val int;
    end_val int;
    incr int;
BEGIN
    IF addr_flats IS NULL THEN
        RETURN NULL;
    ELSIF addr_flats ~ ',' THEN -- commas not permitted (other checks here?)
        RAISE NOTICE 'addr_flats_to_array: Illegal char in [%]', addr_flats;
        RETURN NULL;            
    END IF;
    FOREACH part IN ARRAY regexp_split_to_array(addr_flats, ';')
    LOOP
        part_range := regexp_split_to_array(part, '-');
        CASE array_length(part_range, 1)
            WHEN 1 THEN -- this case is treated as a single reference, not a range
                return_array := array_append(return_array, part);
                CONTINUE;
            WHEN 2 THEN -- this case is a range, ie two references separated by a hyphen
                IF part_range[1] ~ '^\d+$' AND part_range[2] ~ '^\d+$' THEN -- both values are integers
                    start_val := CAST(part_range[1] AS int);
                    end_val := CAST(part_range[2] AS int);
                    IF end_val < start_val THEN
                        RAISE NOTICE 'addr_flats_to_array: Non-incremental range "%" in [%]', part, addr_flats;
                        RETURN NULL;
                    END IF;
                    CASE addr_interpolation
                        WHEN 'odd' THEN
                            IF mod(start_val, 2) != 1 THEN
                                RAISE NOTICE 'addr_flats_to_array: Even start value "%" in odd-interpolated ranges [%]', start_val, addr_flats;
                                RETURN NULL;
                            END IF;
                            incr := 2;
                        WHEN 'even' THEN
                            IF mod(start_val, 2) != 0 THEN
                                RAISE NOTICE 'addr_flats_to_array: Odd start value "%" in even-interpolated ranges [%]', start_val, addr_flats;
                                RETURN NULL;
                            END IF;
                            incr := 2;
                        ELSE
                            incr := 1;
                    END CASE;
                    FOR i IN start_val..end_val BY incr LOOP
                        return_array := array_append(return_array, CAST(i AS text));
                    END LOOP;
                ELSIF length(part_range[1]) = 1 AND length(part_range[2]) = 1 THEN -- both references are single non-integer chars
                    start_val := ascii(part_range[1]);
                    end_val := ascii(part_range[2]);
                    IF start_val < ascii('A') OR end_val > ascii('Z') OR end_val < start_val THEN -- only capital letters can be used in a range
                        RAISE NOTICE 'addr_flats_to_array: Malformed alphabetic range "%" in [%]', part, addr_flats;
                        RETURN NULL;
                    END IF;
                    FOR i IN start_val..end_val LOOP
                        return_array := array_append(return_array, chr(i));
                    END LOOP;
                END IF;
            ELSE -- this shouldn't happen; eg '1-3-9' etc
                RAISE NOTICE 'addr_flats_to_array: Malformed range "%" in [%]', part, addr_flats;
                RETURN NULL;
        END CASE;
    END LOOP;
    RETURN return_array;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

answered 15 Oct '18, 15:52

Libarch
31●3●3●5
accept rate: 0%