I have been using the addr:flats tag on apartment buildings which specifies a range of flat references, together with the building:flats tag which specifies the total number (quantity) of flats. The latter information is actually the bit I'm most interested in, as I am using the data (amongst other purposes) to compute the number of leaflets required for delivery in a district (during political elections). I have two reservations though:

  1. The building:flats tag seems little used, for some reason
  2. The value in it is technically redundant as it is computable from the addr:flats data.

So I wondered if anyone has written or knows of a user-defined function I can use within PostgreSQL to parse an addr:flats tag into a comprehensive array of distinct values. As well as the addr:flats value, the function would need to be passed the addr:interpolation value (or null). The number of flats is then simply the length of the returned array. This would be functionally similar to parsing a page printing sequence - eg this kind of thing - but I'm not much of a programmer and hoped I could steal someone else's work!

asked 14 Oct '18, 10:37

Libarch's gravatar image

Libarch
31135
accept rate: 0%


I've had a go myself, and this is what I came up with. It seems to work OK on my test data, but any comments or improvements would be very welcome:

CREATE OR REPLACE FUNCTION addr_flats_to_array(addr_flats text, addr_interpolation text DEFAULT NULL) RETURNS text[] AS $$
DECLARE
    part_range text[];
    part text;
    return_array text[];
    start_val int;
    end_val int;
    incr int;
BEGIN
    IF addr_flats IS NULL THEN
        RETURN NULL;
    ELSIF addr_flats ~ ',' THEN -- commas not permitted (other checks here?)
        RAISE NOTICE 'addr_flats_to_array: Illegal char in [%]', addr_flats;
        RETURN NULL;            
    END IF;
    FOREACH part IN ARRAY regexp_split_to_array(addr_flats, ';')
    LOOP
        part_range := regexp_split_to_array(part, '-');
        CASE array_length(part_range, 1)
            WHEN 1 THEN -- this case is treated as a single reference, not a range
                return_array := array_append(return_array, part);
                CONTINUE;
            WHEN 2 THEN -- this case is a range, ie two references separated by a hyphen
                IF part_range[1] ~ '^\d+$' AND part_range[2] ~ '^\d+$' THEN -- both values are integers
                    start_val := CAST(part_range[1] AS int);
                    end_val := CAST(part_range[2] AS int);
                    IF end_val < start_val THEN
                        RAISE NOTICE 'addr_flats_to_array: Non-incremental range "%" in [%]', part, addr_flats;
                        RETURN NULL;
                    END IF;
                    CASE addr_interpolation
                        WHEN 'odd' THEN
                            IF mod(start_val, 2) != 1 THEN
                                RAISE NOTICE 'addr_flats_to_array: Even start value "%" in odd-interpolated ranges [%]', start_val, addr_flats;
                                RETURN NULL;
                            END IF;
                            incr := 2;
                        WHEN 'even' THEN
                            IF mod(start_val, 2) != 0 THEN
                                RAISE NOTICE 'addr_flats_to_array: Odd start value "%" in even-interpolated ranges [%]', start_val, addr_flats;
                                RETURN NULL;
                            END IF;
                            incr := 2;
                        ELSE
                            incr := 1;
                    END CASE;
                    FOR i IN start_val..end_val BY incr LOOP
                        return_array := array_append(return_array, CAST(i AS text));
                    END LOOP;
                ELSIF length(part_range[1]) = 1 AND length(part_range[2]) = 1 THEN -- both references are single non-integer chars
                    start_val := ascii(part_range[1]);
                    end_val := ascii(part_range[2]);
                    IF start_val < ascii('A') OR end_val > ascii('Z') OR end_val < start_val THEN -- only capital letters can be used in a range
                        RAISE NOTICE 'addr_flats_to_array: Malformed alphabetic range "%" in [%]', part, addr_flats;
                        RETURN NULL;
                    END IF;
                    FOR i IN start_val..end_val LOOP
                        return_array := array_append(return_array, chr(i));
                    END LOOP;
                END IF;
            ELSE -- this shouldn't happen; eg '1-3-9' etc
                RAISE NOTICE 'addr_flats_to_array: Malformed range "%" in [%]', part, addr_flats;
                RETURN NULL;
        END CASE;
    END LOOP;
    RETURN return_array;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
permanent link

answered 15 Oct '18, 15:52

Libarch's gravatar image

Libarch
31135
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×141
×15
×4
×1
×1

question asked: 14 Oct '18, 10:37

question was seen: 234 times

last updated: 15 Oct '18, 15:52

powered by OSQA