awk 不匹配所有匹配我的所有条目

2023-12-14

我正在尝试制作“脚本”（本质上是一个 awk 命令）来提取 .c 文件中 C 代码函数的原型，以自动生成标头 .h。我是 awk 的新手，所以我不了解所有细节。

这是源 .c 的示例：

dict_t dictup(dict_t d, const char * key, const char * newval)
{

  int i = dictlook(d, key);

  if (i == DICT_NOT_FOUND) {

    fprintf(stderr, "key \"%s\" doesn't exist.\n", key);
    dictdump(d);
  }
  else {

    strncpy(d.entry[i].val, newval, DICTENT_VALLENGTH);
  }

  return d;
}


dict_t* dictrm(dict_t* d, const char * key) {

  int i = dictlook(d, key);

  if (i == DICT_NOT_FOUND) {

    fprintf(stderr, "key \"%s\" doesn't exist.\n", key);
    dictdump(d);
  }
  else {
    d->entry[i] = d->entry[--d.size];
  }
  if ( ((float)d->size)/d.maxsise < 0.25 ) {
    d->maxsize /= 2; 
    d->entry = realloc(d->entry, d->maxsize*sizeof(dictent_t*));
  }

  return d;
}

我想要生成什么：

dict_t dictup(dict_t d, const char * key, const char *newval); 
dict_t* dictrm(dict_t* d, const char * key);

我的完整正则表达式命令如下所示：

 awk '/^[a-zA-Z*_]+[:space:]+[a-zA-Z*_]+[:space:]*\(.*?\)/{ print $0 }' dict3.c

但我对此一无所获。所以我试图挤压它只是为了看看我是否能带来一些东西。我试过这个：

awk '/^[a-zA-Z*_]+[:space:]+[a-zA-Z*_]+/{ print $0 }' dict3.c

我明白了：

dictent_t* dictentcreate(const char * key, const char * val) 
dict_t* dictcreate() 
dict_t* dictadd(dict_t* d, const char * key, const char * val) 
dict_t dictup(dict_t d, const char * key, const char * newval) 
dict_t* dictrm(dict_t* d, const char * key) {

它是许多奇迹的源泉！

为什么第一个正则表达式不起作用？
为什么第二个捕获了一些声明，但不是全部？我向您保证，任何声明之前都没有空格。我想由于缩进，它没有捕获代码的其他部分，例如变量声明。
第三个问题，为什么它捕获了我只需要表达式的所有行？
最后一张，如何添加;在每个正则表达式的末尾？

注意：自从我写下这个答案以来，问题已经发生了很大的变化。

Replace [:space:] with [[:space:]]:

$ awk '/^[a-zA-Z*_]+[[:space:]]+[a-zA-Z*_]+[[:space:]]*[(].*?[)]/{ print $0 }' dict3.c
dictent_t* dictentcreate(const char * key, const char * val)  
dict_t* dictcreate() 
void dictdestroy(*dict_t d) 
void dictdump(dict_t *d) 
int dictlook(dict_t *d, const char * key) 
int dictget(char* s, dict_t *d, const char *key)
dict_t* dictadd(dict_t* d, const char * key, const char * val)
dict_t dictup(dict_t d, const char * key, const char *newval) 
dict_t* dictrm(dict_t* d, const char * key)

原因是[:space:]将匹配任何字符:, s, p, a, c, or e。这不是你想要的。

你要[[:space:]]它将匹配任何空格。

太阳/Solaris

众所周知，Sun/Solaris 的 awk 充满了错误。如果您在该平台上，请尝试nawk or /usr/xpg4/bin/awk or /usr/xpg6/bin/awk.

使用 sed

可以使用非常相似的方法sed。这使用基于您的正则表达式：

$ sed -n '/^[a-zA-Z_*]\+[ \t]\+[a-zA-Z*]\+ *[(]/p' dict3.c
dictent_t* dictentcreate(const char * key, const char * val)  
dict_t* dictcreate() 
void dictdestroy(*dict_t d) 
void dictdump(dict_t *d) 
int dictlook(dict_t *d, const char * key) 
int dictget(char* s, dict_t *d, const char *key)
dict_t* dictadd(dict_t* d, const char * key, const char * val)
dict_t dictup(dict_t d, const char * key, const char *newval) 
dict_t* dictrm(dict_t* d, const char * key)

The -n选项告诉 sed 不要打印，除非我们明确要求它打印。构造/.../p告诉 sed 如果斜杠内的正则表达式匹配则打印该行。

Ed Morton 建议的所有对正则表达式的改进也适用于此处。

使用perl

以上也可以采用perl：

perl -ne  'print if /^[a-zA-Z_*]+[ \t]+[a-zA-Z*]+ *[(]/' dict3.c

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

regex

awk

headerfiles

textextraction