c******n 发帖数: 4965 | 1 【 以下文字转载自 Programming 讨论区 】
发信人: creation (努力自由泳50m/45sec !), 信区: Programming
标 题: anyone familiar with javacc?
发信站: BBS 未名空间站 (Thu May 7 22:28:05 2015, 美东)
got a question. thanks |
|
m***r 发帖数: 29 | 2 I am now trying to extend Java. In addition to lexer and parser,
I need one function that can inference the type of expression
for java. Or, better if it can give the type range during parsing.
No other requirement. Any guru know such tools?
I checked JavaCC, SableCC, and some other tools, But can not find
such capability. Is there any third implementation based on such
CC tools, such as a Compiler-Compiler based full java compiler (in
which there must be such functions)? I say CC-based because |
|
D*******I 发帖数: 29 | 3 I'm not sure, but I think JavaCC has a visitor pattern where you can add
functions to each node on the parser tree.
I suggest you send a email to professor Jens p******[email protected], he's one
of the guys that knows more on this. |
|
j******r 发帖数: 201 | 4 Anyone ever use javacc???? |
|
c****r 发帖数: 185 | 5 The performance of a parser generator is far less important than that of the
parser it generates. Most parser generators take just a few seconds to
generate a parser. In practice, LL parsers like antlr javacc seem more
popular than LR parsers. If you are doing research, I doubt if you can
motivate your problem. |
|
I**********s 发帖数: 441 | 6 javaCC用在java平台上是不错. 说到LR parser, 并没有流行的版本. 常用的yacc,
bison, byacc等都是LALR(1). LR(1)的优点在于可以分析所有context free的语法, 这
一点LALR(1)做不到. 而LL的语法也可以转换成LR(1). Knuth的原LR(1)算法长期一直被
认为速度太慢, 产生的parsing table太大. 即便现在, 用Knuth原算法实现的LR(1)
parser generator也常常需要很长时间才能完成运算. 比如我听说一个C++
implementation对于一个about 120 tokens, 350 production的语法, 要20分钟才能得
到结果. 另外一个implementation(in Python?), 作者称对大约100 tokens, 500
productions的语法, "I let it run for nearly three days, and it was far from
finishing, but using nearly 16GB of memory" |
|
c****r 发帖数: 185 | 7 Most part of a grammar can be recognized by LL(1) or LALR(1).
The rest can be specially taken care of, like lookahead in javacc.
Theoretically it is LL(k) and the complexity is exponential in k.
But practically it is not a problem. |
|
I**********s 发帖数: 441 | 8 "Specially take care of" can be a pain sometimes when you need to modify the
grammar, and you may not be sure the modified grammar is the same one as
before. Indeed a lot of things can be solved by precedence and other
conflict resolving methods. yacc/bison and now javacc can be used for most
compiler development tasks. But still there are people puzzled by the "
mysterious reduce/reduce conflict" and hope to find a LR(1) solution. Anyway
, I hope what I did can be an answer to this. |
|
f****y 发帖数: 70 | 9 我用javacc做一个简单语法分析
token定义如下
TOKEN :
{
|
|
|
< FUNC: ["a"-"z"] ( ["a"-"z","A"-"Z","_","0"-"9"] )* >
|
|
< ID: ( ["a"-"z","A"-"Z","_","0"-"9"] )+ >
}
FUNC指函数名,ID指任意字符窜,就是说这样的情况是合法的FUNC(ID)
。
不过这样的问题是ID的定义包含FUNC,就是在解析的时候,比如遇到test,就不知道是
FUNC还是ID。
这里规定如果函数如果没有参数就省去括号,比如done和test(d)都可以是函数。
所以不知道如何定义,函数名和ID可以解析的时候可以区别。 |
|
j******r 发帖数: 201 | 10 Lexi ations 可以用 TOKEN_MGR_DECLS定义global var
在bnf_production里面怎么定义global var? |
|
j******r 发帖数: 201 | 11 yes
if you have a string "\t\tx", and the length of tab is 4, the parser
i am using (javacc) will tell me the column number of 'x' is 8, not 3.
And the editor will show the column is 8 too.
Anyway, if I want to read a file, and display it in another terminal.
And I want the display is exactly as it is in the original one, i still need
to know the length of each '\t'. |
|
z***y 发帖数: 42 | 12 javacc. (like LEX/YACC)
They have C grammar available. |
|
|
m******t 发帖数: 2416 | 14
That's what I'd do, too. A primitive finite state machine, basically. Some
others might come along and tell you to use javacc, don't listen to them. 8-) |
|
b******y 发帖数: 9224 | 15 I've used javacc before, but a bit complicated. You need to come up with the
grammar, left recursion only, I think.
So, normally I would just use Pattern/Regex, since it is standard and easy
to use. |
|
h**j 发帖数: 2033 | 16 used antlr before, not bad...
or you can check javacc out |
|
k***r 发帖数: 4260 | 17 regex只能做很简单的。有时候不得不用lex之类。或者javacc/antlr |
|
c*****t 发帖数: 1879 | 18 I want to introduce a new way of dealing with file pattern matching.
This is using a project called CookCC which I started.
Basically, instead of writing complicated lexer patterns using Pattern
for non-trivial text files, it is actually easier to write a lexer/parser
using CookCC. However, the traditional way of writing lexer/parser
(such as using Antlr / JavaCC) is that you are writing code in a
proprietary file format that doesn't have good Java editing.
So here is how CookCC does it. I am |
|
b******y 发帖数: 9224 | 19
实际上,compiler来讲,我原先研究过recursive descent的,javacc和antler都还挺
不错的。我自己也写过简单的能够执行basic语言的interpreter. 都是用的java.
编译器和算法等,java描述的其实不少。
总体上讲,除了操作系统级别的软件外,java还是可以堪比c++的。 |
|
c*****t 发帖数: 1879 | 20 还是小意思。我个人 3-4 星期的工作量。主要加一个 target language 比较费
神。毕竟又快又好的 lexer/parser 里面可以 optimize 的地方很多。
刚刚弄出个 0.3 release 。有个比较 unique 的写 lexer/parser 的办法。可
惜,有人 98 年就在 Python 上弄了个类似的(虽然差了点)。遗憾的是,好像
没人注意到。我是写完才发现。现在 Antlr/JavaCC 等还是用比较古老的办法写。 |
|
|