如果可能的话,我建议使用库,因为地址解析可能很困难。查看因迪里佐 https://github.com/daveworth/IndirizzoRuby gem,这让这变得简单:
require 'Indirizzo'
address = Indirizzo::Address.new("7707 Foo Bar Blvd")
address.number
=> "7707"
address.street
=> ["foo bar blvd", "foo bar boulevard"]
即使你don't使用 Indirizzo 库本身,阅读其源代码对于了解他们如何解决问题可能非常有用。例如,它具有微调的正则表达式来匹配地址的不同部分:
Match = {
# FIXME: shouldn't have to anchor :number and :zip at start/end
:number => /^(\d+\W|[a-z]+)?(\d+)([a-z]?)\b/io,
:street => /(?:\b(?:\d+\w*|[a-z'-]+)\s*)+/io,
:city => /(?:\b[a-z][a-z'-]+\s*)+/io,
:state => State.regexp,
:zip => /\b(\d{5})(?:-(\d{4}))?\b/o,
:at => /\s(at|@|and|&)\s/io,
:po_box => /\b[P|p]*(OST|ost)*\.*\s*[O|o|0]*(ffice|FFICE)*\.*\s*[B|b][O|o|0][X|x]\b/
}
其源代码中的这些文件可以提供更多细节:
- https://github.com/daveworth/Indirizzo/blob/master/lib/indirizzo/address.rb https://github.com/daveworth/Indirizzo/blob/master/lib/indirizzo/address.rb
- https://github.com/daveworth/Indirizzo/blob/master/lib/indirizzo/constants.rb https://github.com/daveworth/Indirizzo/blob/master/lib/indirizzo/constants.rb
- https://github.com/daveworth/Indirizzo/blob/master/lib/indirizzo/numbers.rb https://github.com/daveworth/Indirizzo/blob/master/lib/indirizzo/numbers.rb
(但我也普遍同意 @drhenner 的评论,为了让自己更轻松,您可能只需在单独的字段中接受这些数据输入。)
Edit:要给出有关如何删除街道后缀(例如“Blvd”)的更具体答案,您可以使用 Indirizzo 的正则表达式常量(例如Suffix_Type
from constants.rb
)像这样:
address = Indirizzo::Address.new("7707 Foo Bar Blvd", :expand_streets => false)
address.street.map {|street| street.gsub(Indirizzo::Suffix_Type.regexp, '').strip }
=> ["foo bar"]
(注意我也通过了:expand_streets => false
到初始化程序,以避免同时扩展“Blvd”和“Boulevard”替代项,因为无论如何我们都会丢弃后缀。)