g*****g 发帖数: 34805 | 1 See if you can switch to google app engine.
exception
512m |
|
b******d 发帖数: 794 | 2 is it like a auto-sized vpn? then I still have to pay for more calculation p
ower and memory if my program is not efficient. |
|
b******d 发帖数: 794 | 3 sorry, i mean thread, but its still very heavy, and my design target is to
have
10 of those programs running together without negative effect on efficiency.
however, its necessary to parallel processing the request so the waiting
time could be acceptable. |
|
g*****g 发帖数: 34805 | 4 I think you may want to consider AWS instead. And use it only when you need
it. You can use big enough instance that suits your need, and shut it down
once you finish it.
efficiency. |
|
b******d 发帖数: 794 | 5 thx, but unfortunately, i can't shut it down because it is supposed to run e
very a few minutes to check the latest informatin from 8am to 9pm. I guess t
hat could cost a lot more than running 4g vps a whole day, :(
need |
|
b******d 发帖数: 794 | 6 虫兄,另外想用web app实现一个定时执行的任务,要求可以设定间隔,可以通过网页启
动,或者停止;有什么好的framework?
need |
|
e*****t 发帖数: 1005 | 7 I would recommend to use spring. You can use spring's impl or lay it on top
of quartz.
页启 |
|
g*****g 发帖数: 34805 | 8 For simpler one, TimerTask, more complicated one, quartz, both can be
wired through spring. You can expose the bean as a webservice where you can
set the parameters you need. Add a boolean that's checked every time it's
triggered and you have your stop/start.
页启 |
|
b******d 发帖数: 794 | 9 前几天在诸位大牛指导下做了个网络爬虫。
开了12个线程,在本地台机上跑还可以,一次查询只需2分多钟(台机I7 2600/12g mem/
ssd); 后来上线到vps(cpu不详,内存只有1g, 否则太贵养不起了),速度就很慢了
请大虾指点如果优化程序 |
|
g*****g 发帖数: 34805 | 10 打印一些timestamp出来看看哪里慢了。
mem/ |
|
b******d 发帖数: 794 | 11 multi-thread的,打stamp也看不出什么东西吧。我倒是程序里都有输出,看exception
主要是heap exception, out of memory, vps内存扩到4g后就很快了,可是月费是512m
的十几倍。 |
|
r*****l 发帖数: 2859 | 12 1. Check if CPU is fully loaded.
2. Check if memory is used up.
From your symptom, looks like memory issue. Use jmap to dump heap and use
jhat to analyze memory usage.
exception
512m |
|
g*****g 发帖数: 34805 | 13 See if you can switch to google app engine.
exception
512m |
|
b******d 发帖数: 794 | 14 is it like a auto-sized vpn? then I still have to pay for more calculation p
ower and memory if my program is not efficient. |
|
b******d 发帖数: 794 | 15 sorry, i mean thread, but its still very heavy, and my design target is to
have
10 of those programs running together without negative effect on efficiency.
however, its necessary to parallel processing the request so the waiting
time could be acceptable. |
|
g*****g 发帖数: 34805 | 16 I think you may want to consider AWS instead. And use it only when you need
it. You can use big enough instance that suits your need, and shut it down
once you finish it.
efficiency. |
|
b******d 发帖数: 794 | 17 thx, but unfortunately, i can't shut it down because it is supposed to run e
very a few minutes to check the latest informatin from 8am to 9pm. I guess t
hat could cost a lot more than running 4g vps a whole day, :(
need |
|
b******d 发帖数: 794 | 18 虫兄,另外想用web app实现一个定时执行的任务,要求可以设定间隔,可以通过网页启
动,或者停止;有什么好的framework?
need |
|
e*****t 发帖数: 1005 | 19 I would recommend to use spring. You can use spring's impl or lay it on top
of quartz.
页启 |
|
g*****g 发帖数: 34805 | 20 For simpler one, TimerTask, more complicated one, quartz, both can be
wired through spring. You can expose the bean as a webservice where you can
set the parameters you need. Add a boolean that's checked every time it's
triggered and you have your stop/start.
页启 |
|
j******3 发帖数: 299 | 21 谢谢你的回复。请问那个路径是在哪里设置
我尝试用
vim .bashrc
加入了一行
export CLASSPATH="/jar locations/*.jar"
然后source .bashrc
同时用
sudo vim /etc/environment 里面加入类似的
CLASSPATH="/jar locations/*.jar"
Reboot
还是提示同样的错误。。。。 |
|
T****U 发帖数: 3344 | 22 show me your current classpath |
|
j******3 发帖数: 299 | 23 ThinkU,谢谢你的帮助,我那个问题已经解决了,就是你说的,import的时候用*,就不
报错了。我还在继续学习中。感谢! |
|
|
|
g*********g 发帖数: 870 | 26 Looks like htmlunit is one for me. Thanks guys |
|
e*****t 发帖数: 1005 | 27 这个很有些脱裤子放屁。
你要登陆网页,也就是说你自己的程序要launch一个browser,然后你还得somehow从br
owser拿到session ID等等。不是说不可能,但完全多此一举。而且都能drive browser
了,一样可以说有安全问题。
不知道这是内部工具还是外部的。我觉得只要你的程序不store用户的密码,就没有什么
特别的安全问题。否则,除非用类似oauth的引入第三方的authentication/authorizat
ion,你总是可以说有安全问题。 |
|
g****s 发帖数: 1755 | 28 先行谢过!
最近在做一些数据分析工作,涉及到从某个网页上点一个按钮,然后在新的页面上点一
个Button下载文件。
1 http://exac.broadinstitute.org/gene/ENSG00000169174
2 在这个页面点击 All 、 Missense、LoF中的一项
3 再点“Export table to CSV"
4 把Download的文件重新命名一下
我在试着用HtmlUnit实现,但是不知道如何Get the button, 所在到此寻求帮助。
再次感谢! |
|
g*****g 发帖数: 34805 | 29 I would use Java and HtmlUnit. |
|
|
g*****g 发帖数: 34805 | 31 来自主题: Programming版 - 菜鸟问题 You can try Java + HTMLUnit, pretty simple, one day programming
max. |
|
|
g*****g 发帖数: 34805 | 33 I use htmlUnit in java, which's pretty good. |
|
c*****t 发帖数: 1879 | 34 Several approaches:
1. greasemonkey (firefox addon).
2. use wget
3. use java along with htmlunit library |
|
c**t 发帖数: 2744 | 35 java: htmlunit;
perl: lwp..
C#: REST
容。 |
|
g*****g 发帖数: 34805 | 36 HtmlUnit, a java lib has a headless browser, with javascript engine built in.
If you need to fill a form before getting the data or there's certain
javascript
processing the data, that can be handy. |
|
g*****g 发帖数: 34805 | 37 这个还真不知道,用java的htmlunit是很容易的。
你看看perl有没有open source的headless browser吧。
如果手工做就是把读回来的cookie全部返回。 |
|
g*****g 发帖数: 34805 | 38 If you use java, you can use htmlunit, several lines for a task
like this. You do have to include quite a number of jars though. |
|
g*****g 发帖数: 34805 | 39 With a sign-in page, there's definitely session management
and cookie sent back and forth. Simple URL handling is not
good enough.
There are tools like Rational XDE Tester, LoadRunner that can
do html recording. Probably the right tool for your background
but not free.
Experienced java developer can consider leveraging HtmlUnit,
a headless browser. It takes a couple of hours to handle a
2 page task like this. |
|
f*******r 发帖数: 901 | 40 刚才去HtmlUnit的网页上看了,好像不错。 |
|
g*****g 发帖数: 34805 | 41 I used java/HtmlUnit to do this job before. Though you can do it
any any language I think.
purposes. |
|
g*****g 发帖数: 34805 | 42 用HtmlUnit之类的headless browser,内嵌JS引擎,可以实现。 |
|
g*****g 发帖数: 34805 | 43 Check HtmlUnit, pretty easy to do. |
|
m********s 发帖数: 55301 | 44 那在技术上有木有可能我不登陆网页就直接激活程序执行某些操作了呢? |
|
|
g*****g 发帖数: 34805 | 46 你说的也不全对,一旦上了aws,谁不得用点s3, sqs啥的,两下三下就绑死了。
底下那些全套部署的东西更是平台相关。
狗狗那PaaS想法是好的,问题是只支持阉割的JVM我去。我想在上面跑个应用,用
HtmlUnit抓点东西,狗狗那IDE说用了不支持的类不肯编译。你说要自己的代码还能绕
路,这年头是个应用几十上百dependency,直接就是不能用。
, |
|
b***e 发帖数: 1419 | 47 Take a look at phantomjs. If you are a Java person, look into htmlunit. |
|
|
|
l*******s 发帖数: 1258 | 50 你可能需要搞一个scraper
要是动手能力强有喜欢折腾的话,强烈推荐HtmlUnit,是java的 |
|