c**t 发帖数: 2744 | 1 http://www.devx.com/xml/Article/16921/0/page/1
The XML parser world is a dynamic one. As standards change, the parsers
change as well--XML parsers are becoming more sophisticated. For most
programming projects, the parser, at minimum, must support DOM Level 2, SAX
2, XSLT, and Namespaces. All the parsers discussed here provide these
capabilities; however, there are distinct differences in performance,
reliability, and conformance to standards. In this article, I'll compare the
latest parsers fro |
|
S****n 发帖数: 46 | 2 where? in java?
http://www-uk.hpl.hp.com/people/ak/java/hex.html
import hplb.hex.Parser;
import hplb.hex.HTML;
import org.w3c.dom.Document;
...
Parser parser = new Parser();
HTML.applyHacks(parser);
Document doc = parser.parse(System.in); |
|
q*****i 发帖数: 30 | 3 我在paper里面说我用parser assign syntactic tag. 结果一个reviewer 就刁难我,
说 a parser does much more than assigning syntactic tags to sentences. It
normally outputs the syntactic structure of a sentence in the form of a
parse tree.
我想知道syntactic parser 就是为了生成parse tree 还是为了分析sentence
structure, 结果形成了一个parse tree。或者说这些syntactic tag 与parse tree的
关系。
我的理解是:a parser is designed to recognize the constituents in the
sentence and their relations, whose results would be a set of syntactic tags
assigned to the sentence |
|
z****g 发帖数: 2497 | 4 why do you dividing the file to string?
SAX parser is a progressing parser, go line by line.
If you use JDOM, it will read all the file in.
For large file, SAX Parser performs better than DOM
parser. |
|
w*r 发帖数: 2421 | 5 man, you got no other choice, your xml doc has multiple xml declaration
header,
what can you expect from the parser ? magic? NO! all you can do is to
design your own 'feeder' to the parser, skip the declare part and feed the
record to the parser. Both 1 and 2 will work, it just depends how big your
file is, if its millions millions record, i suggest 2 if small number of
records, 1 is okey.
And, my |
|
c*****t 发帖数: 1879 | 6 For LR parsers, when a set of tokens couldn't be reduced, there is a
error. At this point, you can decided on a case by case basis to
manipulate the parsing stack or directly generate an error.
For LL parsers, if your look ahead token isn't what you expect (such
as missing ';'), you can insert this pretty easily. Usually LL parsers
is a bit easier for this kind of stuff.
I'd say don't waste your time on itt. It's really not that importat. |
|
g***l 发帖数: 21 | 7 What is the defintion of the types of parsers?
I found some parsers, but I don't exactly how to program
them? Can someone give me a some hints? It is esp. good if
some tutorials for the parser programming.
thanks |
|
l***r 发帖数: 459 | 8 【 以下文字转载自 Programming 讨论区,原文如下 】
发信人: laoer (xiaoer), 信区: Programming
标 题: question on XML parser
发信站: The unknown SPACE (Thu Jun 13 18:45:20 2002) WWW-POST
Greetings,
I only have XSD files and request files in XML format. I wonder which XML
parser I should use to parser response files in XML format? Or should I have a
DTD file at least?
Thanks. |
|
e****y 发帖数: 26 | 9 Anyone can suggest me a parser of XML data?
I use the parser to parse incoming XML data and then try to store them
into relational DB.
So it'll be wonderful if i can get the source code of the parser.
Thanx a lot! |
|
a*****a 发帖数: 1429 | 10 【 以下文字转载自 Linux 讨论区,原文如下 】
发信人: atnalta (职业debugger), 信区: Linux
标 题: PERL XML parser
发信站: Unknown Space - 未名空间 (Tue Jul 6 14:58:18 2004) WWW-POST
I'm developing a scripting tool on PERL. Need to use XML parser.
I am familiar with PERL and XML, but never have experience on PERL
and XML combination. I've searched CPAN, got 70+ modules. Don't know
which is the most popular. Just need SAX parser. Any suggestions? |
|
z****c 发帖数: 602 | 11 【 以下文字转载自 Programming 讨论区 】
发信人: zodiac (King of Nuts), 信区: Programming
标 题: Expat XML parser不能处理中文?
发信站: BBS 未名空间站 (Sat Oct 8 14:59:44 2005), 转信
我用python的Expat XML parser处理含中文字符的XML文件,总是报出
XML not well-formed, invalid token之类的错误。如果是全英文的
XML就没有这个问题。为什么会这样?Python有没有处理中文的XML parser? |
|
a********5 发帖数: 1631 | 12 应该是要实现一个简单的抽象语法树parser吧,parser本身可以用状态机来做 |
|
a********5 发帖数: 1631 | 13 应该是要实现一个简单的抽象语法树parser吧,parser本身可以用状态机来做 |
|
E**********e 发帖数: 1736 | 14 今天比较高兴, 网上搜了个比较简单的xml parser,比较efficient。 本来在real
xml file上测试了,比较满意。但是换到了experian 的xml files。 竟然有bug。 花
了一天的时间debug。最终搞定了。 总共就几个简单的functions, 用recursion。竟
然很好的解决的experina的问题。 说到这,不得不赞自己一下。 学习能力还是很厉害
的吗。IT组的一个资深软件工程师用C#编的parser,用了most brute force 的
algorithm,答应出来竟然要10多页纸。
另外,也要赞一下python。 发现用python处理 xml file,很有效。 xml转成
dictionary data structure, 然后到json,到dataframe,到mongoDB. 一下子全通了
。 |
|
d*****u 发帖数: 17243 | 15 一般nlp上说的tagging都是标注词性等等,没有生成结构
parser显然要生成结构的
而且更高级的parser不但会parse已有的词
还会添加empty category等等
tags |
|
l*******s 发帖数: 1258 | 16 感觉你和reviewer都对,只是不同的两个方面。
parser的算法就是看局子里各个词之间的关系,而最后parser输出表现的形式就是一堆
的tag。
tags
form |
|
l***r 发帖数: 459 | 17 【 以下文字转载自 Programming 讨论区,原文如下 】
发信人: laoer (You know what!), 信区: Programming
标 题: a question on XML parser
发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST
Greetings,
I have several "<.." in one file. Right now, I first divide this
file to many string. Each string is one xml record. Then, I use Java SAX
parser to parse it. It turns out that it performs very slowly on dividing and
parsing. Is there any better way? like parsing all records in this file in one
time?
Tha |
|
z****g 发帖数: 2497 | 18 重新读一下SAX parser的sample code.
你的理解是错误的。
SAX parser是循序解读每个element.
另外, 你的xml doc好像有些问题
XML Declaration, DTD怎么有那么多个? 这个如同html的header, 只
应该有一个啊。 |
|
l***r 发帖数: 459 | 19
Really? what's my mistake?
It should be no problem because this is created by commercial program. And, my
SAX parser works for this format.
like
should
XML
parser |
|
g******a 发帖数: 730 | 20 3x a lot.:P
But as far as I see, it's a parser/lexer for java
I need a parser/lexer for C, but written in Java.
Is there any such tool? |
|
b******y 发帖数: 9224 | 21 Given an html page, I want to transform it into xml (proper html), then use
a good java xpath parser to retrive/scrape relevant content.
Any suggestions for a good java xpath parser?
Thanks in advance, |
|
g*********s 发帖数: 1782 | 22 Hi, can any guru give some suggestion on how to develop a smart parser/
compiler
given a set of syntax definition?
By "smart" I mean the parser can skip some errors and continue, like gcc
does in some situation. I hope to learn some general principles on this
issue.
Thanks! |
|
c*****t 发帖数: 1879 | 23 所有在这里的 test case 都是测试过的(不过 parser 部分还在 svn 里,
我现在还差将 compressed table 放进 code generator)。
http://code.google.com/p/cookcc/source/browse/trunk/tests/
code generator 现在只有 Java 。其实弄 C/C++ 也不是太难(毕竟是
template approach)。
现在用的是 xml 输入。主要是 yacc/lex 的输入文件其实很复杂。手工写
parser 太困难。至于自动 generate,俺不正在写嘛。
至于 mix 代码和 grammar 。我有个想法,不适合 C/C++,但是很适合
Java / C# / Python 。这样可以直接利用现成的 Java/C#/Python editor
(同时该 editor 的功能,比如 refactoring,context sensitive help
等)。我已经手工推出个 prototype,就差 implementation 。 |
|
c*****t 发帖数: 1879 | 24 还是小意思。我个人 3-4 星期的工作量。主要加一个 target language 比较费
神。毕竟又快又好的 lexer/parser 里面可以 optimize 的地方很多。
刚刚弄出个 0.3 release 。有个比较 unique 的写 lexer/parser 的办法。可
惜,有人 98 年就在 Python 上弄了个类似的(虽然差了点)。遗憾的是,好像
没人注意到。我是写完才发现。现在 Antlr/JavaCC 等还是用比较古老的办法写。 |
|
b***y 发帖数: 2799 | 25 ☆─────────────────────────────────────☆
zodiac (King of Nuts) 于 (Sat Oct 8 14:59:44 2005) 提到:
我用python的Expat XML parser处理含中文字符的XML文件,总是报出
XML not well-formed, invalid token之类的错误。如果是全英文的
XML就没有这个问题。为什么会这样?Python有没有处理中文的XML parser?
☆─────────────────────────────────────☆
rebatezhq (test) 于 (Sat Oct 8 22:13:50 2005) 提到:
看你设的CodePage是不是对。不知道expat是否支持GBK.
你可以转成UTF8,应该是没有问题。
☆─────────────────────────────────────☆
yhnb (yhnb) 于 (Sun Oct 9 19:38:06 2005) 提到:
xmlproc can handle gb2312 |
|
c*****t 发帖数: 1879 | 26 AST is nothing. You can easily create one by hand.
For example, for the following simple calculator code:
print 2 * 3 + 5 * 7;
x = 1;
while (x < 10) {
if (x < 5)
print x * x;
else
print x + x;
x = x + 1;
}
The entire code for it, include AST classes, interpreter, lexer + parser
in CookCC (which generates Java) is 310 lines.
https://code.google.com/p/cookcc/source/browse/trunk/tests/java/parser/calc/
calc.xcc
Like I said, it is trivial if y... 阅读全帖 |
|
j*a 发帖数: 14423 | 27 js可以做 没什么依赖
var parser = new DOMParser();
var doc = parser.parseFromString(result, "text/html");
$('table tr', doc).each(function() {
}); |
|
j*a 发帖数: 14423 | 28 js可以做 没什么依赖
var parser = new DOMParser();
var doc = parser.parseFromString(result, "text/html");
$('table tr', doc).each(function() {
}); |
|
A*******n 发帖数: 625 | 29 who knows what the below error is?
XML for Analysis parser: The driver property was not recognized.
it seems about parser excel to xml issue.
Thanks |
|
v*****x 发帖数: 2 | 30 本人刚刚学习一些关于xml parser的内容, 有个问题一直不明所以
。请大侠次叫 :-〉
1) 使用msxml,可以对一个xml的文件解析,给出节点内容。但
是,当产生一个xml
文件,如果这个文件的dtd改变了, 那末应用程序是不是要跟着改变
?比如我用xml制定
了一个
是必须修改原来的application?有没有更好的办法?
2)如果是在VC下使用parser,是不是只有用msxml?有更好的产品/工
具吗?
小弟的问题可能太幼稚, 不过还是谢谢虽能告诉我。 |
|
c***y 发帖数: 8 | 31 only a parser for xml is not enough.
项目开发schema,如何支持呢?
需要什么呢?其功能又是什么呢?
首先,需要一个xml parser,
其次,还需要一个xml schema processor (???)
是这样么?那么2个怎么一起用呢。。。
偶非常糊涂,可是项目迫在眉睫:((
偶选择了oracle xdk, 好像一般般。。。
可是不知道怎么一起用!!!
十万火急!!!! |
|
j*****q 发帖数: 26 | 32 It depends on which level you want to use the parser.
If u use javascript and vbscript based on windows or use Visual Basic, you
can easily use the parser because the system hides the complicated
interaction.
If u want to use it on c++ level, you need to know COM.
If u are familiar with JAVA, xerces is a handy lib. It's easy to understand
and use.
good/convenient
two |
|
c*****t 发帖数: 198 | 33 I am using xerces now. Honestly, it is good in that it has everything and it
is free. It is not that good to me since it is too big, not cute at all (want
to compile, take half an hour). This parser coordinates with xml 1.0
standards. It will create a C++ library so that you can use.
I tried it in Linux.
Personally, I don't think an XML parser should be this complex. Even the XML
standard I don't like it: it is not so convinient for programming.
good/convenient
two |
|
P*****x 发帖数: 72 | 34 DOM parser is easy to use, sax parser is faster w/ smaller memory footprint. |
|
l***r 发帖数: 459 | 35 【 以下文字转载自 Programming 讨论区,原文如下 】
发信人: laoer (You know what!), 信区: Programming
标 题: a question on XML parser
发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST
Greetings,
I have several "<.." in one file. Right now, I first divide this
file to many string. Each string is one xml record. Then, I use Java SAX
parser to parse it. It turns out that it performs very slowly on dividing and
parsing. Is there any better way? like parsing all records in this file in one
time?
Tha |
|
h*i 发帖数: 3446 | 36 我在弄一个关于Information Visualization的domain specific language (DSL),语
法大致像SQL的样子,想用一个Functional Programming的语言来写它的parser,但没
写过parser,有什么建议么? |
|
b*********o 发帖数: 46 | 37 今天面试被问到如何写一个json parser, 用state machine, 不是特别清楚,请大牛赐
教这个怎么work的 |
|
d****n 发帖数: 1637 | 38 question is how to design a xml parser, not how to parse a xml file. |
|
p***r 发帖数: 4702 | 39 支持xpath 之类吗?这些玩意要一做一套,或者符合iso 不然没有什么卵用。
你的xml parser 支持namespace 吗 |
|
E**********e 发帖数: 1736 | 40 还不懂xpath。
现在是从xml转化到python dict。 namespace有相应的函数处理掉。
这个parser是用来处理工作的数据。目前没有必要考虑iso是么。也许处理别的vendor
的xml还会有bug。不过即使有debug,再改进也行。 核心的东西就是recursion。 而且
也有附属的数据转化功能, xml 相应的text 转化成相应的数据类型。
总的来说, 处理experian的信用报告的xml 也就足够了。 |
|
E**********e 发帖数: 1736 | 41 mitbbs上垃圾挺多的,充诉各个版面,见不的有人夸自己好。像您这位就是。
我吃饱了撑的,没事发个假贴,引来您这尊大神
确实该高兴。马工的是干不了。但作为data scientist,coding能力胜任的。这也是自
己正在提高的方向。工作中有时就应该自己找活干。我这个parser一搞定,就可以绕开
IT。不是说IT干的不好。不同组之间有时工作安排不同,必须等。 |
|
x***n 发帖数: 39 | 42 1. chop ur monolithic(?) file (collection of xmls) into collection of
xml files, parse one by one
2. find a fast way to feel an xml document (part of the file) to a parser,
then the second parsing for the second xml DOCUMENT (unfortunately
it's the second part of ur physical file), and so on.
1 or 2. |
|
|
g*****g 发帖数: 34805 | 44 Basically I want to parse some online resource and grab some
data. Should be ideal if the lib can correct html errors and
prepare it xml like.
How is html parser in
http://htmlparser.sourceforge.net/ |
|
t*******e 发帖数: 684 | 45 Apache Commons - JXPath
几乎所有XML Parser的都支持XPath访问的. |
|
|
b******y 发帖数: 9224 | 47 jericho html parser also very good |
|
I**********s 发帖数: 441 | 48 I work on parser generation.
What do you want to talk about? |
|
c*****t 发帖数: 1879 | 49 Ha, good.
I created a new project called CookCC (hosted on google code) which
contains both lexer and parser generator (lalr (1) only). It's
written in Java. It basically replaces my yoolex and yooparse projects
I wrote N years ago, since Java is much easier to maintain and extend.
Right now I am working on the Java code generation using FreeMarker
template. I am wondering if anyone wants to join me on this project
to work on code generation for other languages, or assist in testing.
The perf |
|
|