由买买提看人间百态

topics

全部话题 - 话题: htmlunit
首页 上页 1 2 (共2页)
g*****g
发帖数: 34805
1
来自主题: Java版 - htmlunit及多线程问题
See if you can switch to google app engine.

exception
512m
b******d
发帖数: 794
2
来自主题: Java版 - htmlunit及多线程问题
is it like a auto-sized vpn? then I still have to pay for more calculation p
ower and memory if my program is not efficient.
b******d
发帖数: 794
3
来自主题: Java版 - htmlunit及多线程问题
sorry, i mean thread, but its still very heavy, and my design target is to
have
10 of those programs running together without negative effect on efficiency.
however, its necessary to parallel processing the request so the waiting
time could be acceptable.
g*****g
发帖数: 34805
4
来自主题: Java版 - htmlunit及多线程问题
I think you may want to consider AWS instead. And use it only when you need
it. You can use big enough instance that suits your need, and shut it down
once you finish it.

efficiency.
b******d
发帖数: 794
5
来自主题: Java版 - htmlunit及多线程问题
thx, but unfortunately, i can't shut it down because it is supposed to run e
very a few minutes to check the latest informatin from 8am to 9pm. I guess t
hat could cost a lot more than running 4g vps a whole day, :(

need
b******d
发帖数: 794
6
来自主题: Java版 - htmlunit及多线程问题
虫兄,另外想用web app实现一个定时执行的任务,要求可以设定间隔,可以通过网页启
动,或者停止;有什么好的framework?

need
e*****t
发帖数: 1005
7
来自主题: Java版 - htmlunit及多线程问题
I would recommend to use spring. You can use spring's impl or lay it on top
of quartz.

页启
g*****g
发帖数: 34805
8
来自主题: Java版 - htmlunit及多线程问题
For simpler one, TimerTask, more complicated one, quartz, both can be
wired through spring. You can expose the bean as a webservice where you can
set the parameters you need. Add a boolean that's checked every time it's
triggered and you have your stop/start.

页启
b******d
发帖数: 794
9
来自主题: Java版 - htmlunit及多线程问题
前几天在诸位大牛指导下做了个网络爬虫。
开了12个线程,在本地台机上跑还可以,一次查询只需2分多钟(台机I7 2600/12g mem/
ssd); 后来上线到vps(cpu不详,内存只有1g, 否则太贵养不起了),速度就很慢了
请大虾指点如果优化程序
g*****g
发帖数: 34805
10
来自主题: Java版 - htmlunit及多线程问题
打印一些timestamp出来看看哪里慢了。

mem/
b******d
发帖数: 794
11
来自主题: Java版 - htmlunit及多线程问题
multi-thread的,打stamp也看不出什么东西吧。我倒是程序里都有输出,看exception
主要是heap exception, out of memory, vps内存扩到4g后就很快了,可是月费是512m
的十几倍。
r*****l
发帖数: 2859
12
来自主题: Java版 - htmlunit及多线程问题
1. Check if CPU is fully loaded.
2. Check if memory is used up.
From your symptom, looks like memory issue. Use jmap to dump heap and use
jhat to analyze memory usage.

exception
512m
g*****g
发帖数: 34805
13
来自主题: Java版 - htmlunit及多线程问题
See if you can switch to google app engine.

exception
512m
b******d
发帖数: 794
14
来自主题: Java版 - htmlunit及多线程问题
is it like a auto-sized vpn? then I still have to pay for more calculation p
ower and memory if my program is not efficient.
b******d
发帖数: 794
15
来自主题: Java版 - htmlunit及多线程问题
sorry, i mean thread, but its still very heavy, and my design target is to
have
10 of those programs running together without negative effect on efficiency.
however, its necessary to parallel processing the request so the waiting
time could be acceptable.
g*****g
发帖数: 34805
16
来自主题: Java版 - htmlunit及多线程问题
I think you may want to consider AWS instead. And use it only when you need
it. You can use big enough instance that suits your need, and shut it down
once you finish it.

efficiency.
b******d
发帖数: 794
17
来自主题: Java版 - htmlunit及多线程问题
thx, but unfortunately, i can't shut it down because it is supposed to run e
very a few minutes to check the latest informatin from 8am to 9pm. I guess t
hat could cost a lot more than running 4g vps a whole day, :(

need
b******d
发帖数: 794
18
来自主题: Java版 - htmlunit及多线程问题
虫兄,另外想用web app实现一个定时执行的任务,要求可以设定间隔,可以通过网页启
动,或者停止;有什么好的framework?

need
e*****t
发帖数: 1005
19
来自主题: Java版 - htmlunit及多线程问题
I would recommend to use spring. You can use spring's impl or lay it on top
of quartz.

页启
g*****g
发帖数: 34805
20
来自主题: Java版 - htmlunit及多线程问题
For simpler one, TimerTask, more complicated one, quartz, both can be
wired through spring. You can expose the bean as a webservice where you can
set the parameters you need. Add a boolean that's checked every time it's
triggered and you have your stop/start.

页启
j******3
发帖数: 299
21
谢谢你的回复。请问那个路径是在哪里设置
我尝试用
vim .bashrc
加入了一行
export CLASSPATH="/jar locations/*.jar"
然后source .bashrc
同时用
sudo vim /etc/environment 里面加入类似的
CLASSPATH="/jar locations/*.jar"
Reboot
还是提示同样的错误。。。。
T****U
发帖数: 3344
22
show me your current classpath
j******3
发帖数: 299
23
ThinkU,谢谢你的帮助,我那个问题已经解决了,就是你说的,import的时候用*,就不
报错了。我还在继续学习中。感谢!
S*******e
发帖数: 525
g*****g
发帖数: 34805
25
if java, htmlunit。
g*********g
发帖数: 870
26
Looks like htmlunit is one for me. Thanks guys
e*****t
发帖数: 1005
27
来自主题: Java版 - 请教一个htmlunit的问题
这个很有些脱裤子放屁。
你要登陆网页,也就是说你自己的程序要launch一个browser,然后你还得somehow从br
owser拿到session ID等等。不是说不可能,但完全多此一举。而且都能drive browser
了,一样可以说有安全问题。
不知道这是内部工具还是外部的。我觉得只要你的程序不store用户的密码,就没有什么
特别的安全问题。否则,除非用类似oauth的引入第三方的authentication/authorizat
ion,你总是可以说有安全问题。
g****s
发帖数: 1755
28
先行谢过!
最近在做一些数据分析工作,涉及到从某个网页上点一个按钮,然后在新的页面上点一
个Button下载文件。
1 http://exac.broadinstitute.org/gene/ENSG00000169174
2 在这个页面点击 All 、 Missense、LoF中的一项
3 再点“Export table to CSV"
4 把Download的文件重新命名一下
我在试着用HtmlUnit实现,但是不知道如何Get the button, 所在到此寻求帮助。
再次感谢!
g*****g
发帖数: 34805
29
来自主题: Programming版 - 请教一个语言选择的弱问题
I would use Java and HtmlUnit.
w********o
发帖数: 10088
30
来自主题: Programming版 - 请教一个语言选择的弱问题
谢谢,没用过HtmlUnit,去学学看
g*****g
发帖数: 34805
31
来自主题: Programming版 - 菜鸟问题
You can try Java + HTMLUnit, pretty simple, one day programming
max.
g*****g
发帖数: 34805
32
sure, try htmlunit
g*****g
发帖数: 34805
33
来自主题: Programming版 - 怎么可以取出网页中更新的内容 ?
I use htmlUnit in java, which's pretty good.
c*****t
发帖数: 1879
34
Several approaches:
1. greasemonkey (firefox addon).
2. use wget
3. use java along with htmlunit library
c**t
发帖数: 2744
35
java: htmlunit;
perl: lwp..
C#: REST

容。
g*****g
发帖数: 34805
36
HtmlUnit, a java lib has a headless browser, with javascript engine built in.
If you need to fill a form before getting the data or there's certain
javascript
processing the data, that can be handy.
g*****g
发帖数: 34805
37
来自主题: Programming版 - perl question
这个还真不知道,用java的htmlunit是很容易的。
你看看perl有没有open source的headless browser吧。
如果手工做就是把读回来的cookie全部返回。
g*****g
发帖数: 34805
38
来自主题: Programming版 - 请问怎么写外挂啊?
If you use java, you can use htmlunit, several lines for a task
like this. You do have to include quite a number of jars though.
g*****g
发帖数: 34805
39
来自主题: Programming版 - 如何实现将网页内容自动存取?
With a sign-in page, there's definitely session management
and cookie sent back and forth. Simple URL handling is not
good enough.
There are tools like Rational XDE Tester, LoadRunner that can
do html recording. Probably the right tool for your background
but not free.
Experienced java developer can consider leveraging HtmlUnit,
a headless browser. It takes a couple of hours to handle a
2 page task like this.
f*******r
发帖数: 901
40
来自主题: Programming版 - 如何实现将网页内容自动存取?
刚才去HtmlUnit的网页上看了,好像不错。
g*****g
发帖数: 34805
41
I used java/HtmlUnit to do this job before. Though you can do it
any any language I think.

purposes.
g*****g
发帖数: 34805
42
用HtmlUnit之类的headless browser,内嵌JS引擎,可以实现。
g*****g
发帖数: 34805
43
Check HtmlUnit, pretty easy to do.
m********s
发帖数: 55301
44
来自主题: Programming版 - 请教大牛一个关于htmlunit的问题。
那在技术上有木有可能我不登陆网页就直接激活程序执行某些操作了呢?
g*****g
发帖数: 34805
g*****g
发帖数: 34805
46
来自主题: Programming版 - 你们不懂c++
你说的也不全对,一旦上了aws,谁不得用点s3, sqs啥的,两下三下就绑死了。
底下那些全套部署的东西更是平台相关。
狗狗那PaaS想法是好的,问题是只支持阉割的JVM我去。我想在上面跑个应用,用
HtmlUnit抓点东西,狗狗那IDE说用了不支持的类不肯编译。你说要自己的代码还能绕
路,这年头是个应用几十上百dependency,直接就是不能用。

,
b***e
发帖数: 1419
47
Take a look at phantomjs. If you are a Java person, look into htmlunit.
g*****g
发帖数: 34805
48
来自主题: Programming版 - web scraping有啥方便的API或者框架不
htmlunit.
c********l
发帖数: 8138
49
来自主题: Programming版 - web scraping有啥方便的API或者框架不
selenium内核就是htmlunit吧
l*******s
发帖数: 1258
50
你可能需要搞一个scraper
要是动手能力强有喜欢折腾的话,强烈推荐HtmlUnit,是java的
首页 上页 1 2 (共2页)