YAML(以及其他相对自由的格式,如 HTML、JSON、XML)最好不要使用正则表达式进行转换,很容易在一个示例中工作,并在下一个具有额外空格、不同缩进等的情况下中断。
在这种情况下使用 YAML 解析器并不是一件简单的事情,因为许多人要么期望文件中只有一个 YAML 文档(并且在 Markdown 部分上将其作为无关的东西而拒绝),要么期望在文件中包含多个 YAML 文档(并且因为 Markdown 不是 YAML 而拒绝) )。此外,大多数 YAML 解析器会丢弃有用的东西,例如注释和重新排序映射键。
多年来,我一直在为我的 ToDo 项目使用类似的格式(YAML 标头,后跟 reStructuredText),并使用一个小型 Python 程序来提取和更新这些文件。给定这样的输入:
---
ID: 51 # one of the key/values to preserve
post_title: Here's my post title
author: Frank Meeuwsen
post_date: 2014-07-03 22:10:11
post_excerpt: ""
layout: post
permalink: >
https://myurl.com/slug
published: true
sw_timestamp:
- "399956"
sw_open_thumbnail_url:
- >
https://myurl.com/wp-content/uploads/2014/08/Featured_image.jpg
sw_cache_timestamp:
- "408644"
swp_open_thumbnail_url:
- >
https://myurl.com/wp-content/uploads/2014/08/Featured_image.jpg
swp_open_graph_image_data:
- '["https://i0.wp.com/myurl.com/wp-content/uploads/2014/08/Featured_image.jpg?fit=800%2C400&ssl=1",800,400,false]'
swp_cache_timestamp:
- "410228"
---
additional stuff that is not YAML
and more
and more
这个程序 1:
import sys
import ruamel.yaml
from pathlib import Path
def extract(file_name, position=0):
doc_nr = 0
if not isinstance(file_name, Path):
file_name = Path(file_name)
yaml_str = ""
with file_name.open() as fp:
for line_nr, line in enumerate(fp):
if line.startswith('---'):
if line_nr == 0: # don't count --- on first line as next document
continue
else:
doc_nr += 1
if position == doc_nr:
yaml_str += line
return ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
def reinsert(ofp, file_name, data, position=0):
doc_nr = 0
inserted = False
if not isinstance(file_name, Path):
file_name = Path(file_name)
with file_name.open() as fp:
for line_nr, line in enumerate(fp):
if line.startswith('---'):
if line_nr == 0:
ofp.write(line)
continue
else:
doc_nr += 1
if position == doc_nr:
if inserted:
continue
ruamel.yaml.round_trip_dump(data, ofp)
inserted = True
continue
ofp.write(line)
data = extract('input.yaml')
for k in list(data.keys()):
if k not in ['ID', 'post_title', 'author', 'post_date', 'layout', 'published']:
del data[k]
reinsert(sys.stdout, 'input.yaml', data)
你得到这个输出:
---
ID: 51 # one of the key/values to preserve
post_title: Here's my post title
author: Frank Meeuwsen
post_date: 2014-07-03 22:10:11
layout: post
published: true
---
additional stuff that is not YAML
and more
and more
请注意以下评论ID
线路得到妥善保存。
¹ This was done using ruamel.yaml https://pypi.python.org/pypi/ruamel.yaml a YAML 1.2 parser, which tries to preserve as much information as possible on round-trips, of which I am the author.