I am looking for a tool(s) that can be used to find profane language in a tag value within a specific country or region. I'd also like to find cases where a unicode character has been included in a tag value that is not expected in that particular part of the world. Does such a tool exist?

Please refrain from fixing your findings in OSM via automated edits

Thanks but I have no intention to make edits. I just want to understand what is in, and going in to, OSM.

Not sure that it is altogether wise to try and solve the Scunthorpe problem. The context of OSM is likely to produce a very high false-positive rate.

However an obvious approach would be a tool chain using osmium-tool => OPL format => standard UNIX pattern matching tools (grep, sed etc.). Osmium can provide bbox filtering and transformation. One-per-line format allows just tag & key values to be examined (e.g., by pulling out that field with awk or perl). The UNIX tools provide a wide range of means of matching regular expressions (for instance, tr could be used to identify specific unicode characters if that was a particular issue.)

