我试图通过修改来生成分区统计图SVG map http://upload.wikimedia.org/wikipedia/commons/5/5f/USA_Counties_with_FIPS_and_names.svg描绘了美国的所有县。基本方法是通过流动的数据 http://flowingdata.com/2009/11/12/how-to-make-a-us-county-thematic-map-using-free-tools/。由于 SVG 基本上只是 XML,因此该方法利用美丽汤 http://www.crummy.com/software/BeautifulSoup/bs4/doc/ parser.
问题是,解析器并没有捕获所有path
SVG 文件中的元素。以下仅捕获了 149 条路径(超过 3000 条路径):
#Open SVG file
svg=open(shp_dir+'USA_Counties_with_FIPS_and_names.svg','r').read()
#Parse SVG
soup = BeautifulSoup(svg, selfClosingTags=['defs','sodipodi:namedview'])
#Identify counties
paths = soup.findAll('path')
len(paths)
然而,从物理检查和事实来看,我知道还存在更多的问题元素树 https://docs.python.org/2/library/xml.etree.elementtree.html方法使用以下例程捕获 3,143 个路径:
#Parse SVG
tree = ET.parse(shp_dir+'USA_Counties_with_FIPS_and_names.svg')
#Capture element
root = tree.getroot()
#Compile list of IDs from file
ids=[]
for child in root:
if 'path' in child.tag:
ids.append(child.attrib['id'])
len(ids)
我还没想好如何写ElementTree
以一种不完全混乱的方式对象。
#Define style template string
style='font-size:12px;fill-rule:nonzero;stroke:#FFFFFF;stroke-opacity:1;'+\
'stroke-width:0.1;stroke-miterlimit:4;stroke-dasharray:none;'+\
'stroke-linecap:butt;marker-start:none;stroke-linejoin:bevel;fill:'
#For each path...
for child in root:
#...if it is a path....
if 'path' in child.tag:
try:
#...update the style to the new string with a county-specific color...
child.attrib['style']=style+col_map[child.attrib['id']]
except:
#...if it's not a county we have in the ACS, leave it alone
child.attrib['style']=style+'#d0d0d0'+'\n'
#Write modified SVG to disk
tree.write(shp_dir+'mhv_by_cty.svg')
上面的修改/写入例程会产生这个怪物:
我的主要问题是:为什么 BeautifulSoup 未能捕获所有path
标签?其次,为什么要使用ElementTree
对象有所有这些课外活动正在进行吗?任何建议将不胜感激。