使用Python提取嵌套括号中的句子

2024-01-03

我有多个.txt目录中的文件。 这是一个示例one of my .txt files:

kkkkk;

  select xx("xE'", PUT(xx.xxxx.),"'") jdfjhf:jhfjj from xxxx_x_xx_L ;
quit; 

/* 1.xxxxx FROM xxxx_x_Ex_x */ 
proc sql; ("TRUuuuth");
hhhjhfjs as fdsjfsj:
select * from djfkjd to jfkjs
(
SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
    FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and 
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))
 );


jjjjjj;

  select xx("xE'", PUT(xx.xxxx.),"'") jdfjhf:jhfjj from xxxx_x_xx_L ;
quit; 

/* 1.xxxxx FROM xxxx_x_Ex_x */ ()
proc sql; ("CUuuiiiiuth");
hhhjhfjs as fdsjfsj:
select * from djfkjd to jfkjs
(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
    FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and 
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))(( ))
 );

我试图提取嵌套括号中的所有句子,在我的.txt files.

我尝试过多种方法,例如stacking https://stackoverflow.com/questions/58908686/extract-strings-inside-inconsistent-nested-brackets但是当代码解析其中之一时,我收到一个错误,提示“列表索引超出范围”.txt文件。我猜是因为括号里没有写任何东西。

我一直在尝试regex https://stackoverflow.com/questions/58888951/python-regex-to-capture-all-words-within-nested-parentheses?noredirect=1#comment104062204_58888951同样,使用以下代码:

with open('lan sample text file.txt','r') as fd:
    lines = fd.read()

    check = set()
    check.add("Select")
    check.add("select")
    check.add("SELECT")
    check.add("from")
    check.add("FROM")
    check.add("From")
    items=re.findall("(\(.*)\)",lines,re.MULTILINE)
    for x in items:
        print(x)

但我的输出是:

("xE'", PUT(xx.xxxx.),"'"
("TRUuuuth"
((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.
(xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.)
("xE'", PUT(xx.xxxx.),"'"
("CUuuiiiiuth"
((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.
(xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.)

我想要的输出应该是这样的:

("xE'", PUT(xx.xxxx.),"'")
("TRUuuuth")
(
SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
    FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and 
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))
 )
("xE'", PUT(xx.xxxx.),"'")
("CUuuiiiiuth")
(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
    FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and 
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))(( ))
 )

我想说我的解决方案不是优化的,但它可以解决您的问题。

Solution(只需将 test.txt 替换为您的文件名)

result = []
with open('test.txt','r') as fd:
    # To keep track of '(' and ')' parentheses
    parentheses_stack = []
    # To keep track of complete word wrapped by ()
    complete_word = []
    # Iterate through each line in file
    for words in fd.readlines():
        # Iterate each character in a line
        for char in list(words):
            # Initialise the parentheses_stack when you find the first open '(' 
            if char == '(':
                parentheses_stack.append(char)
            # Pop one open '(' from parentheses_stack when you find a ')'
            if char == ')':
                if not parentheses_stack = []:
                    parentheses_stack.pop()
                if parentheses_stack == []:
                    complete_word.append(char)
            # Collect characters in between the first '(' and last ')'
            if not parentheses_stack == []:
                complete_word.append(char)
            else:
                if not complete_word == []:
                    # Push the complete_word once you poped all '(' from parentheses_stack
                    result.append(''.join(complete_word))
                    complete_word = []



for res in result:
    print(res)

Result:

WS:python rameshrv$ python3 /Users/rameshrv/Documents/python/test.py
("xE'", PUT(xx.xxxx.),"'")
("TRUuuuth")
(
SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
    FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and 
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))
 )
("xE'", PUT(xx.xxxx.),"'")
()
("CUuuiiiiuth")
(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
    FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and 
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))(( ))
 )
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

使用Python提取嵌套括号中的句子 的相关文章

随机推荐