我写了一个脚本vba
结合regular expressions
解析company name
, phone
and fax
来自网页。当我运行脚本时,我可以完美地获取这些信息。然而,问题是我用过三种不同的expressions
为了让它们成功,我创建了三个不同的regex objects
, as in rxp
,rxp1
, and rxp2
.
我的问题:我怎样才能创建一个regex object
在其中我将能够使用三个patterns
与我下面所做的不同?
这是脚本(工作脚本):
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp, rxp1 As New RegExp, rxp2 As New RegExp
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "Company Name:(\s[\w\s]+)"
rxp1.Pattern = "Phone:(\s\+[\d\s]+)"
rxp2.Pattern = "Fax:(\s\+[\d\s]+)"
If rxp.Execute(.responseText).Count > 0 Then
[A1] = rxp.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp1.Execute(.responseText).Count > 0 Then
[B1] = rxp1.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp2.Execute(.responseText).Count > 0 Then
[C1] = rxp2.Execute(.responseText).Item(0).SubMatches(0)
End If
End With
End Sub
参考添加到库中执行上面的脚本:
Microsoft XML, v6.0
Microsoft VBScript Regular Expressions
您可以使用替代方案构建正则表达式,启用全局匹配rxp.Global = True
,并将已知字符串捕获到组 1 中,将未知部分捕获到组 2 中。然后,您将能够通过检查组 1 的值来为变量分配正确的值:
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp
Dim ms As MatchCollection
Dim m As Match
Dim cname As String, phone As String, fax As String
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "(Phone|Company Name|Fax):\s*(\+?[\w\s]*\w)"
rxp.Global = True
Set ms = rxp.Execute(.responseText)
For Each m In ms
If m.SubMatches(0) = "Company Name" Then cname = m.SubMatches(1)
If m.SubMatches(0) = "Phone" Then phone = m.SubMatches(1)
If m.SubMatches(0) = "Fax" Then fax = m.SubMatches(1)
Next
Debug.Print cname, phone, fax
End With
Output:
Vaucraft Braford Stud +61 7 4942 4859 +61 7 4942 0618
See the 正则表达式演示.
图案细节:
-
(Phone|Company Name|Fax)
- 捕获组 1:三个选项中的任何一个
-
:\s*
- 一个冒号,然后是 0+ 个空格
-
(\+?[\w\s]*\w)
- Capturing group 2:
-
\+?
- 可选的+
-
[\w\s]*
- 0个或多个字母、数字、_
或空格
-
\w
- 单个字母、数字或_
.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)