Hi,

as there's no documentation on Nominatim's search.php, I'm trying to make sense of the code myself. I noticed that for tokenizing search queries PHP code invokes a stored procedure in Postgresql, which in turn invokes a Nominatim custom module compiled from C code. Coming from the Java world I am wondering:

Is that a common LAPP-pattern?

I understand that string-crunching is faster in C than in an interpreted language, still many things happen in PHP in the Nominatim implementation. What's the reason for invoking such a tool-chain? Google didn't help me finding an explanation for that so I thought I'd just ask.

Thanks!

asked 26 Apr '13, 15:14

konstantin's gravatar image

konstantin
61446
accept rate: 0%


The string standardisation and tokenising are all also required during indexing of data and that is almost entirely done in plpgsql and c. For consistency the php search code then accesses those same functions.

So, the postgresql module acts as a common point between the indexing and search code.

The other reason is as you suggest - there is a significant speed gain from writing this frequent operation in c. During the search phase this speed is less important but during indexing it is essential.

permanent link

answered 26 Apr '13, 16:05

twain's gravatar image

twain
2.4k2538
accept rate: 40%

edited 26 Apr '13, 16:19

I haven't looked at the nominatim code, but the pattern doesn't seem wrong to me :

While it may look like "just a string tokenizer", chances are that some complicated logic like word stemming is necessary. Two reasons for doing that inside the database come to mind: either we want to use the exact same algorithm as the database is using (so reusing code that's available in the db makes sense), or the process requires fectching data from the db (so going back and forth between php and postgres would kill performance).

The richness and ease of use of server-side languages and extenstion is actually one of postgresql's killer feature compared to other databases. It's a great tool that can make postgres look more like a software platform than an RDBMS. Learn to love it :)

Concerning the idea that string-crunching is better done in C than an interpreted language, it is generaly false. Interpreted languages are often very good at string-crunching, and beating that in C can take enormous effort. I doubt this is the reason for nominatim's pattern.

permanent link

answered 26 Apr '13, 15:54

Vincent%20de%20Phily's gravatar image

Vincent de P... ♦
17.0k16147244
accept rate: 19%

Having looked at the code I assure you: There's nothing complicated in that c module invoked. Also no data is fetched from the DB itself.

Twain's explanation sounds reasonable to me: The same logic used at indexing time needs to be invoked during search. Thanks!

(26 Apr '13, 16:12) konstantin

While you are entirely right about string handling and C vs interpreted in the general case in this particular case plpgsql string handling is particularly poor and the operations involved (character table lookups and reductions in simple ascii) are particularly well suited to c.

(26 Apr '13, 16:18) twain
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×558
×146
×30
×5
×2

question asked: 26 Apr '13, 15:14

question was seen: 2,655 times

last updated: 26 Apr '13, 16:19

powered by OSQA