此篇博客是将前面的内容进行整合并进一步提升,真正实现一个简单表达式语法的编译器
如果有不了解的地方请查看下面链接
词法分析
LR(1)分析(一)
LR(1)分析(二)
这里说的程序语言编译器是指将算术表达式部分进行翻译,暂时不包括优化以及目标语言生成部分,只产生四元式,后面部分根据博主的时间安排以及个人能力(担心有余而力不足算术表达式转换成四元式属于相对较为容易的任务,只需要在对表达式进行规范LR(1)分析当遇到归约时执行相应的语义动作即可实现对算术表达式的翻译,因此只需要对先前的规范LR(1)分析中的分析过程稍做改进即可实现,但这样的话难度有些太低了,心里总有些不舒服。因此我计划做一下几方面的调整:
- 将ACTION表与GOTO表存入mysql数据库中这样无需每次启动程序都需要进行预测分析表的获取节省时间,提高效率。
- 插入类型检查与类型转换部分
- 加入数组声明、赋值与计算部分(先挖坑如果计组考完有时间就完善)
- 最后形成一个交互界面
由于前面已经有了一定的基础,下面有些细节我就不再展开,请参考前面的博客。
程序语言特点分析以及翻译模式
这里的语言并不是现存的哪一种只是我构思出的一个程序语言,他的语法规则为:
S->i=E # 对应赋值
E->E+T # 对应加法
E->E-T # 对应减法
E->-T # 对应负数
E->T # 这里设计是因为加减的优先级低于乘除
T->T*F # 乘
T->T/F # 除
T->F # 这里是对应括号
F->(E) # 括号的部分
F->i
S->i:D # 说明
D->int(m) #int 类型
D->real(n) #real类型
这里给出上述语法规则的翻译模式
S->i=E {p=lookup(i.name);
if(p!=NULL && ((i.type==int&&E.type ==int) || (i.type==real&&E.type ==real)))
emit(p'='E.place)
if(p!=NULL && (i.type==int&&E.type==real)
u=newTemp;
emit(u'=''inttoreal'p);
emit(u'='E.place);
if(p!=NULL && (i.type==real&&E.type==int)
u=newTemp;
emit(u'=''inttoreal'E.place);
emit(p'='u);
}
E->E1+T {
if (T.type == int and E1.type == int)
emit(E.place'='E1.place'int+'T.place)
E.type = int;
if (E.type = real and T.type = int)
u = newTemp;
emit(u'=''inttoreal'T.place);
emit(E.place'='u'real+'E1.place);
E.type = real;
if (T.type = real and E.type = int)
u = newTemp;
emit(u'=''inttoreal'E.place);
emit(u'='E1.place'real+'T.place);
E.type = real;
if(T.type = real and E.type = real)
emit(E.place'='E1.place+'real+'T.place);
E.type = real;}
E->E1-T {
if (T.type == int and E1.type == int)
emit(E.place'='E1.place'int-'T.place);
E.type = int;
if (E1.type = real and T.type = int)
u = newTemp;
emit(u'=''inttoreal'T.place;
emit(E.place'='u'real-'E1.place);
E.type = real;
if (T.type = real and E.type = int)
u = newTemp;
emit(u'=''inttoreal'E1.place);
emit(E'='u'real-'T.place);
E.type = real;
if(T.type = real and E.type = real)
emit(E.place'='E1.place+'real-'T.place);
E.type = real;}
E->-T {E.place = newTemp; emit(E.place'=''uminus'T.place)
E.type = T.type;}
E->T {E.place = T.place;E.type = T.type;}
E->T*F {
if (T.type == int and F.type == int)
emit(E.place'='T.place'int*'F.place);
E.type = int;
if (T.type = real and F.type = int)
u = newTemp;
emit(u'=''inttoreal'F.place;
emit(E.place'='u'real*'T.place);
E.type = real;
if (T.type = real and F.type = int)
u = newTemp;
emit(u'=''inttoreal'F.place);
emit(E'='u'real*'T.place);
E.type = real;
if(T.type = real and F.type = real)
emit(E.place'='F.place+'real*'T.place);
E.type = real;}
E->T/F {
if (T.type == int and F.type == int)
emit(E.place'='T.place'int/'F.place);
E.type = int;
if (T.type = real and F.type = int)
u = newTemp;
emit(u'=''inttoreal'F.place;
emit(E.place'='u'real/'T.place);
E.type = real;
if (T.type = real and F.type = int)
u = newTemp;
emit(u'=''inttoreal'F.place);
emit(E'='u'real/'T.place);
E.type = real;
if(T.type = real and F.type = real)
emit(E.place'='F.place+'real/T.place);
E.type = real;}
T->F {T.place = F.place; T.type = F.type;}
F->(E) {F.place = E.place;F.type = E.type;}
F->i { F.place = i.name;F.type = i.type;}
if(p!= NULL){
F.place = p;
F.type = p.type
}
S->i:D {enter(id.name,D.type,offset); // 这里的offset可忽略
i.type = D.type;}
D->real{D.type = real(m);}
D->int {D.type = int(n);}
ACTION表与GOTO表存入MySql数据库
这一步并不是必须的只是“锦上添花”。将表存入MySql中这里主要考虑的是在真正形成软件时,可以无需将形成两张预测分析表的代码部分一并打包,这样可以保护(知识产权)当然现在无需考虑,而且将LR(1)预测分析表的构造较为困难使用python会比其他语言降低一点点开发难度(但也异常困难),后面的预测分析相对较为容易,我想导入MySql数据库的另一个原因是我原先打算此后的部分用java开发网站的但因为时间确实非常紧张而且SpringBoot我也只是才起步,相比之下Python的网站开发框架我更熟悉经验更多,因此没有实施。
这里我对上面描述的语法进行了略微改变将int和real分别用m和n代替以便在进行语法分析时进行抽象。该语法得到的预测分析表如下(部分)
下面要做的便是将建立的表存入MySQl数据库关于MYSQL的具体使用请参看大佬博客MySQL现在我们仅需知道如何使用就可以了。如果你之前没有接触过MySql也完全没关系,网上下载安装的教程很多,下载安装一个尝试一点新的东西我觉得是没坏处的! 用python的话要导入pymysql包
def sqlStore(ACTION,GOTO,all_x,name_list):
"""
将LR分析表得到的ACTION表与GOTO表存进MySql以缩短程序加载时间
:param ACTION:ACTION表
:param GOTO:GOTO表
:return:成功失败标志
"""
l1 = []
l2 = []
for x in all_x:
if x not in name_list:
l1.append(x)
else:
l2.append(x)
if '#' not in l1:
l1.append('#')
conn = pymysql.Connect(
host='127.0.0.1',
port=3306,
user='root',
password='760830',
database='my_db',
)
cursor = conn.cursor()
sql = "CREATE TABLE ACTION(`STATE` INT,"
for key in l1:
s = '`'+key+'`'+" CHAR(10),"
sql += s
sql = sql[:len(sql)-1] + ")"
cursor.execute(sql)
sql = "INSERT INTO ACTION(STATE,"
for key in l1:
if key.islower():
s = key+','
sql += s
else:
s = "`"+key + '`,'
sql += s
sql = sql[:len(sql)-1] + ") VALUES ("
temp_sql = str(sql)
for state in ACTION.keys():
sql += str(state)+","
for x in l1:
if x in ACTION[state].keys():
sql += "\""+ACTION[state][x][0]+"\","
else:
sql += "\"ERROR\","
sql = sql[:len(sql)-1]
sql += ")"
print(sql)
cursor.execute(sql)
sql = temp_sql
conn.commit()
sql = "CREATE TABLE GOTO(`STATE` INT,"
for key in l2:
s = '`' + key + '`' + " int,"
sql += s
sql = sql[:len(sql) - 1] + ")"
cursor.execute(sql)
sql = "INSERT INTO GOTO(STATE,"
for key in l2:
s = key + ','2q222
sql += s
sql = sql[:len(sql) - 1] + ") VALUES ("
temp_sql = str(sql)
for state in ACTION.keys():
sql += str(state) + ","
if state not in GOTO.keys():
for x in l2:
sql += "-1,"
else:
for x in l2:
if x in GOTO[state].keys():
sql +=str(GOTO[state][x][0]) + ","
else:
sql += "-1,"
sql = sql[:len(sql) - 1]
sql += ")"
print(sql)
cursor.execute(sql)
conn.commit()
sql = temp_sql
conn.close()
这里我遇到了蛮多困难的,首先这个sql语言写起来就有些困难,我在mysql终端调试了很久才解决,大家要按我这个格式来:
还有一种用%s占位的写法我遇到了点问题如果您能够实现请务必教教我!
效果展示:
GOTO表:
ACTION表
呼呼这样就将预测分析表存入数据库中了,那么在进行下面的步骤时就不再需要反复重新生成了,当然如果要改变文法的话还是要重新生成的。注这里的数据库查看我用的是pycharm自带的数据库查看工具这边也推荐使用Navicat也很不错
下面就是根据语法规则以及语义规则来进行语法分析并生成四元式。
这里就涉及到将用户输入的表达式通过词法分析抽象成产生式中的符号,然后才能进行语法分析以及语义动作的执行。
下面对之前写的词法分析器按照本次项目的需要进行裁剪
词法分析器改造
之前写的词法分析器的能力比本次项目需求高因此对其进行裁剪。
class Analysis:
def __init__(self):
self.reserveWord = ["int"] # 此处只保留int和real两个关键字
self.operatorOrDelimiter = ["+", "-", "*", "/","(",")",":","="] # 此处运算符号只保留加减乘除
self.Operator = ["+", "-", "*", "/",")","(",":","="]
self.token = "" # 得到的单词
self.result = [] # 储存扫描得到的单词信息结果
def isReserve(self, target): # 判断是否为关键字
if target in self.reserveWord:
return True
return False
def isMark(self, inString, pos):
flag = False
for i in inString:
pos += 1
if i.isalpha() or i.isdigit() or i == '_':
self.token += str(i)
flag = True
elif i in self.operatorOrDelimiter: # 遇到算术
flag = True
break
else:
flag = False
break
return flag, pos
def IsDigits(self, inString, pos):
flag = False
for i in inString:
pos += 1
if i.isdigit():
self.token += str(i)
flag = True
elif i == '.' and i not in self.token and 'e' not in self.token and 'E' not in self.token:
self.token += str(i)
elif i == 'e' or i == 'E' and i not in self.token: # 处理含E或e的合法指数情况
self.token += str(i)
elif i == '.' and i in self.token:
flag = False
self.token += str(i)
break
else:
if i in self.Operator or i == ' ' or i == '\n':
flag = True
break
else:
flag = False
self.token += i
break
return flag, pos
def IsOperator(self, inString, pos):
if len(inString) == 1:
self.token += str(inString[0])
return pos
for i in inString[0:]:
pos += 1
if i in self.operatorOrDelimiter:
self.token += str(i)
else:
break
return pos
def scan(self, inString, row, col):
"""
扫描字符串
:param col: 储存当前扫描的列
:param row: 储存当前扫描的行
:type inString: 待处理的字符串
:return: 对字符串的判断结果,类型为列表
"""
inString = str(inString).strip() # 去除字符串两端可能含有的空格
self.token = ""
if inString[0].isdigit():
judge, index = self.IsDigits(inString, 0)
if judge:
self.result.append([self.token, "常数", (row, col)])
if index < len(inString):
self.scan(inString[index-1:], row, col)
else:
print(index)
self.result.append([self.token, "ERROR", (row, col)])
index +=1
if index == len(inString) and judge == False:
if inString[-1].isdigit():
self.result.append([inString[-1], "常数", (row, col)])
elif inString[-1].isalpha():
self.result.append([inString[-1], "标识符", (row, col)])
elif inString[-1] in self.Operator:
self.result.append([inString[-1], "操作符", (row, col)])
elif inString[0].isalpha():
judge, index = self.isMark(inString, 0)
print(inString)
print(judge)
print(index)
if self.isReserve(self.token):
self.result.append([self.token, "关键字", (row, col)])
else:
self.result.append([self.token, "标识符", (row, col)])
if judge and index-1 < len(inString) and not inString[index-1].isalpha():
# print(inString[index-1:])
self.scan(inString[index-1:], row, col)
else:
if judge == False:
analysis.result.append("ERROR")
return
elif inString[0] in self.Operator:
index = self.IsOperator(inString, 0)
if self.token in self.Operator:
self.result.append([self.token, "操作符", (row, col)])
else:
self.result.append([self.token, "ERROR", (row, col)])
if index <= len(inString) and index - 1 >= 0:
self.scan(inString[index - 1:], row, col)
else:
analysis.result.append([self.token, "ERROR", (row, col)])
return
if __name__ == "__main__":
analysis = Analysis()
initial = input("请输入代码:").split(' ')
col = 1
for s in initial:
analysis.scan(s, 1, col)
col += 1
for res in analysis.result:
print(res)
仅保留了需要用到的分析,其余的均裁去。注意这里为了方便分隔符我设置的默认是分号。
那么这个模块已经完成下面就正式开始分析啦!我们采用的是一句一句分析!
语法分析+语义分析与中间代码生成
这里也是十分有难度写起来,要把输入的表达式经词法分析后抽象成相应的输入串供语法分析,以及建立属性的关系。只好硬着头皮上啦!
效果展示如下:
对表达式的分析:
d:int;c:real;a:int;a=c+d
分析结果:
具体代码:
class EX_Analysis:
def __init__(self,express):
self.analysis = Analysis()
self.express = express.split(';')
print(self.express)
# 与数据库建立连接
self.conn = pymysql.Connect(
host='127.0.0.1',
port=3360,
user='root',
password='760830',
database='my_db',
)
self.cursor = self.conn.cursor()
self.rule_sets = []
self.name_list = []
filename = 'rule.txt'
expresses = expressGet(filename)
infinite, name_list, project_equals, all_x, self.rule_sets = init_new(expresses)
states = LR1_state_get(name_list, project_equals, all_x, infinite)
ACTION1, GOTO1 = LR1_table_create(states, all_x, name_list, self.rule_sets, infinite)
sqlStore(ACTION1, GOTO1, all_x, name_list)
self.record = {} # 用于记录已经声明的标识符
self.mark_num = 0
def generate(self,stack_op,rule_express, op):
new_mark = Mark(rule_express[0])
new_mark.place = "T" + str(self.mark_num)
self.mark_num += 1
if stack_op[-2].type == stack_op[-1].type:
new_mark.type = stack_op[-1].type
print("(" + stack_op[-1].type + op + "," + stack_op[-2].place + "," + stack_op[
-1].place + "," + new_mark.place + ")")
elif stack_op[-2].type == 'int' and stack_op[-1].type == 'real':
u = 'T' + str(self.mark_num)
self.mark_num += 1
print("(inttoreal," + stack_op[-2].place + ",-," + u+")")
print("(real" + op + "," + u + "," + stack_op[-1].place + "," + new_mark.place + ")")
elif stack_op[-2].type == 'real' and stack_op[-1].type == 'int':
u = 'T' + str(self.mark_num)
self.mark_num += 1
print("(inttoreal," + stack_op[-1].place + ",-," + u+")")
print("(real" + op + "," + stack_op[-2].place + "," + u + "," + new_mark.place + ")")
stack_op.pop()
stack_op.pop()
stack_op.append(new_mark)
def four_type_express_generate(self):
print(self.rule_sets)
for expresses in self.express:
self.analysis.result = []
for express in expresses.split(' '):
self.analysis.scan(express,1,1)
type_words = self.analysis.result
s = ""
print(type_words)
for word in type_words:
if word[1] == "操作符":
s += word[0]
if word[1] == "常数" or word[1] == "标识符":
s += 'i'
if word[1] == "关键字":
if word[0] == 'int':
s += 'n'
elif word[0] == 'real':
s += 'm'
pos = 0
# 下面正式开始语法分析并在归约时执行相应的语义动作
stack_state = []
stack_state.append(0)
stack_op = []
s += '#'
while pos < len(s):
X = s[pos]
if X.islower():
sql = "SELECT "+X+" FROM ACTION WHERE STATE ="+str(stack_state[-1])
else:
sql = "SELECT `" + X + "` FROM ACTION WHERE STATE =" + str(stack_state[-1])
try:
self.cursor.execute(sql)
result = self.cursor.fetchone()
ac = result[0]
#print(ac)
if 's' in ac:
stack_state.append(int(ac[1:]))
infinite = Mark(type_words[pos][0])
if type_words[pos][1] == "常数":
if '.' in type_words[pos][0]:
infinite.type = "real"
else:
infinite.type = "int"
stack_op.append(infinite)
elif type_words[pos][1] == "标识符":
if type_words[pos][0] in self.record.keys():
infinite = self.record[type_words[pos][0]] # 如果已经存在符号表中则直接赋值
else:
self.record[type_words[pos][0]] = infinite
stack_op.append(infinite)
elif type_words[pos][1] == "关键字":
infinite.type = type_words[pos][0]
stack_op.append(infinite)
pos += 1
elif 'r' in ac:
rule = int(ac[1:])
rule_express = self.rule_sets[rule-1]
pop_len = len(rule_express[3:])
flag = False
if rule == 1:
if stack_op[-2].type == stack_op[-1].type:
print("(=," + stack_op[-1].place + ",-," + stack_op[-2].name + ")")
mark = Mark(rule_express[0])
mark.type = stack_op[-1].type
stack_op.pop()
stack_op.pop()
stack_op.append(mark)
elif stack_op[-1].type == " real" and stack_op[-2] == "int":
print("(inttoreal,"+stack_op[-2].name+",-,"+stack_op[-2].name+")")
print("(=," + stack_op[-1].place + ",-," + stack_op[-2].name + ")")
mark = Mark(rule_express[0])
mark.type = stack_op[-1].type
stack_op.pop()
stack_op.pop()
stack_op.append(mark)
elif stack_op[-1].type == " int" and stack_op[-2] == "real":
print("(inttoreal," + stack_op[-1].name + ",-," + stack_op[-1].name + ")")
print("(=," + stack_op[-1].place + ",-," + stack_op[-2].name + ")")
mark = Mark(rule_express[0])
mark.type = stack_op[-1].type
stack_op.pop()
stack_op.pop()
stack_op.append(mark)
else:
print("(=," + stack_op[-1].place + ",-," + stack_op[-2].name + ")")
if rule == 3:
self.generate(stack_op,rule_express,"+")
if rule == 4:
self.generate(stack_op, rule_express, "-")
if rule == 5:
new_mark = Mark(rule_express[0])
new_mark.place = "T" + str(self.mark_num)
new_mark.type = stack_op[-1].type
print("(uminus," + stack_op[-1].place + ",-," + new_mark.place + ")")
stack_op.pop()
stack_op.append(new_mark)
self.mark_num += 1
if rule == 6 or rule == 10 or rule == 9:
new_mark = Mark(rule_express[0])
new_mark.place = stack_op[-1].place
new_mark.type = stack_op[-1].type
stack_op.pop()
stack_op.append(new_mark)
if rule == 7:
self.generate(stack_op,rule_express,"*")
if rule == 8:
self.generate(stack_op,rule_express,"/")
if rule == 11:
new_mark = Mark(rule_express[0])
new_mark.place = stack_op[-1].name
new_mark.type = stack_op[-1].type
stack_op.pop()
stack_op.append(new_mark)
if rule == 2:
stack_op[-2].type = stack_op[-1].type
if rule == 12:
new_mark = Mark(rule_express[0])
new_mark.place = stack_op[-1].name
new_mark.type = 'real'
stack_op.pop()
stack_op.append(new_mark)
if rule == 13:
new_mark = Mark(rule_express[0])
new_mark.place = stack_op[-1].name
new_mark.type = 'int'
stack_op.pop()
stack_op.append(new_mark)
for i in range(pop_len):
if len(stack_state) > 0:
stack_state.pop()
else:
flag = True
break
if flag:
print("ERROR")
break
try:
sql = "SELECT " + rule_express[0] + " FROM GOTO WHERE STATE =" + str(stack_state[-1])
self.cursor.execute(sql)
result = self.cursor.fetchone()
ac = result[0]
if ac == -1:
print("wrong!2")
break
else:
stack_state.append(ac)
except:
print(sql)
print("wrong!3")
break
elif ac == 'acc':
break
except:
print(sql)
traceback.print_exc()
break
这样主体的逻辑代码就已经完成设计了,总的来说是对前面几个实验的一个综合,难度不是特别的大,但博主也改了很久很久,呜呜呜这个过程也是异常的艰辛。下面就是如何把这个以一种非常友好的方式呈现出来,之前用的都是图形化界面来呈现,这次我想用点不一样的,想试试使用网页的形式来呈现,让更多的人能够较为直接的体验。
作品呈现
我采用的框架是flask主要原因是django我之前用过而且已经非常熟悉,这次想尝试一点新的东西试一下,而且flask相比django更加轻量,更适合本次的项目。
具体效果展示如下: