从字符串中提取字典

2024-01-15

我正在调用一个返回包含字典的字符串的函数。我怎样才能提取这个字典，记住第一行和最后一行可能包含“{”和“}”。

This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}
This is a {testing string} example
This {is} a testing {string} example

我需要将此值提取为字典变量。

{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}

更新答案

考虑到 @martineau 和 @ekhumoro 的评论，以下编辑后的代码包含一个函数，该函数搜索字符串并提取有效的所有内容dicts。这是对我之前的答案更稳健的方法，因为现实世界的内容dict可能会有所不同，并且这个逻辑（希望）能够解释这一点。

示例代码：

import json
import re

def extract_dict(s) -> list:
    """Extract all valid dicts from a string.
    
    Args:
        s (str): A string possibly containing dicts.
    
    Returns:
        A list containing all valid dicts.
    
    """
    results = []
    s_ = ' '.join(s.split('\n')).strip()
    exp = re.compile(r'(\{.*?\})')
    for i in exp.findall(s_):
        try:
            results.append(json.loads(i))        
        except json.JSONDecodeError:
            pass    
    return results

测试字符串：

OP 的原始字符串已更新以添加多个dicts，作为最后一个字段的数值，以及list value.

s = """
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": 5
}
{"website": "stackoverflow",
"type": "question",
"date": "2020-09-11"
}
{"website": "stackoverflow",
"type": "question",
"dates": ["2020-09-11", "2020-09-12"]
}
This is a {testing string} example
This {is} a testing {string} example
"""

Output:

正如OP所述，通常只有一个dict在字符串中，因此（显然）可以使用results[0].

>>> results = extract_dict(s)

[{'website': 'stackoverflow', 'type': 'question', 'date': 5},
 {'website': 'stackoverflow', 'type': 'question', 'date': '2020-09-11'},
 {'website': 'stackoverflow', 'type': 'question', 'dates': ['2020-09-11', '2020-09-12']}]

原答案：

忽略此部分。尽管该代码可以工作，但它特别适合OP的要求，并且对于其他用途来说并不健壮。

此示例使用正则表达式来识别字典开头{"和字典结束"}并提取中间部分，然后将字符串转换为正确的字符串dict。随着新行的出现并使正则表达式变得复杂，我只是将字符串展平以开始。

根据 @jizhihaoSAMA 的评论，我已更新为使用json.loads将字符串转换为dict，因为它更干净。如果您不想额外导入，eval也可以，但不推荐。

示例代码：

import json
import re

s = """
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}
This is a {testing string} example
This {is} a testing {string} example
"""

s_ = ' '.join(s.split('\n')).strip()
d = json.loads(re.findall(r'(\{\".*\"\s?\})', s_)[0])

>>> d
>>> d['website']

Outputs:

{"website": "stackoverflow", "type": "question", "date": "10-09-2020"}

'stackoverflow'

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

string

Dictionary