第2页 - 关于readlines的讨论汇总 - 话题女王

全部话题 - 话题: readlines

b******y
发帖数: 9224

Random rand = new Random();
BufferedReader reader = new BufferedReader(new FileReader("bigfile.txt"));
// get the number of lines
int numLines = 0;
while (true)
{
String line = reader.readLine();
if (line == null) break;
numLines++;
}
// second pass, randomly get a line
String line = null;
float threshHold = 1 / (float) numLines;
for (int i = 0; i < numLines; i++)
{
line = reader.readLine();
if (rand.nextFloat() < threshHold)
{
// we got the line, break
break;... 阅读全帖

w****w
发帖数: 521

来自主题: Linux版 - 【包子】请教算法--2列数字比较

Memory够的话，hash一下就出来了。不够的话得分批。
#!/usr/bin/env python
smallfile = file("small.txt","r")
largefile = file("large.txt","r")
outfile=file("out.txt","w")
HASHSIZE=10*1024*1024
maxvalue=0
hash={}
while 1:
line=largefile.readline()
if not line:
break
v=int(line.strip())

if v>maxvalue:
hash={}
while 1:
r=smallfile.readline()
if not r:
break
maxvalue=int(r.strip())
hash[maxvalue]=1
if le... 阅读全帖

l********s
发帖数: 358

来自主题: Programming版 - Questions about C++ Linux Command Line Parsing

For ppl who also want to use readline library, remember to include "readline
" and "ncurses" library in the Makefile.

G********7
发帖数: 234

来自主题: Programming版 - 问一个python问题

要把在text里的数据转到csv格式，程序如下。第四步（# Text file to CSV）出错。
出错提示是
“Traceback (most recent call last):
File "C:\yw\LatLon\txt2csv.py", line 24, in
txt = f.readline()
ValueError: Mixing iteration and read methods would lose data”
望赐教。多谢！
***********************************************
# Import Modules
import os,sys
# Variables
path = os.getcwd()
inputFile = os.path.join(path, "raw_files", "JAN 2013
FLVERTEXFLVertexFLVertex.txt")
CSVfile = os.path.join(path, "temp", "LatLon.csv")
print input... 阅读全帖

B***i
发帖数: 724

来自主题: Programming版 - 读scala的程序就象读文言文

举个例子，如果我有个一文件，想找到第二个注释行 (假设注释行第一个字符都是#)
，用python 就很容易
count = 0
f = open(filename)
line = f.readline()
while line:
if line.startswith("#"):
count += 1
if count >= 2:
return line
line = f.readline()
谁能帮我写个 scala得程序？当然有 "functional programming"得风格最好了。
如果这个文件超级大的话，不知道效率如何？

w****w
发帖数: 521

来自主题: Programming版 - 求最佳解法

把我上面的sql翻成python:
#!/usr/bin/env python
import sys
from datetime import datetime, date
ids={}
fmt="%m/%d/%Y"
with open(sys.argv[1],"r") as fi:
fi.readline()
for x in fi.readlines():
v=x.rstrip("\n").split("\t")
if v[0] not in ids:
ids[v[0]]=[]

dat=datetime.strptime(v[1],fmt)
ent=[dat,float(v[2]),float(v[2])]
for y in ids[v[0]]:
dif=ent[0]-y[0]
if dif.days>=0 and dif.days<=365:
y[2]+=ent[1]... 阅读全帖

w****w
发帖数: 521

来自主题: Programming版 - 求最佳解法

下面程序产生22m transactions，8分钟。
import sys
from datetime import date,timedelta
from random import randint
ID_NUM=100000
DAY_NUM=3650
StartDate=date(2004,1,1)
fmt="%m/%d/%Y"
with open(sys.argv[1],"w") as fo:
fo.write("IDtDatetAmountn")
for i in xrange(22000000):
RandonId=randint(1,ID_NUM)
RandonDate=StartDate+ timedelta(days=randint(0,DAY_NUM))
RandonAmt=5*randint(1,20)
fo.write("%dt%st%dn" % (RandonId,RandonDate.strftime(fmt),RandonAmt))
下面程序产生id，80分钟
import s... 阅读全帖

w********p
发帖数: 948

来自主题: Programming版 - 求Java78大牛们帮忙看看这个fork/join的面试题

程序终于吭哧吭哧写完了。
公司说要和我面聊suggestion. 头疼。不知道要建议个啥。
恳请大牛们批评指教。与其被别人challenge, 不如在这预先知道的说。一定发包子感
谢的说。
https://github.com/selinaGit/SortNFiles
为了搏眼球，牺牲下隐私。貌似无盐，切勿下载哦,否则是小猪🐷
还有主要的reference 是下面这个link. 照葫芦画瓢的说。
http://www.oracle.com/technetwork/articles/java/fork-join-42260
我想请教下在上面的link里为什么要用static method return new object.
而不是用非static method "public Document fromFile(File file)" 然后 new
WordCounter obj
调用obj.Document fromFile(File file)
没有concurrency 的经验。问题有点傻。求教的说。
我的理解是这样的话，fork之前，file就read到lis... 阅读全帖

u*********1
发帖数: 2518

来自主题: Biology版 - 有谁谈谈从零开始学NGS数据分析都需要具备什么知识？

作为一个曾经0基础的菜鸟，我还是蛮有体会的。
想想一年前我连linux里的grep都不晓得是啥。老板说“grep”，我说gre。。啥？greb
吗？老板摇摇头说you really have a lot to learn...不过老板超好，想办法给我把
各种基础的东西讲清楚。。。包括RAM是啥。。汗。。。
做NGS/bioinformatics的，我觉得核心思想还是：如何利用计算机手段解决生物问题。
说起来简单但未必每个人都深刻体会的到。什么python/bash/perl啥啥的，要入门很快
，但也绝对不是什么两个星期就搞定。我现在和python打交道也一年了，但也完全就是
个皮毛，主要是你自己的project决定的。。如果你永远只需要简单的process下你的
text，而且text如果不大比如100MB，你可以永远for line in text。。或者readlines
（），但如果碰到很大的text，就不能readlines（）了因为cluster可能没有那么大的
memory to load the whole text.
所以我觉得就是现学现用，除非你是CS系科班搞计算出身... 阅读全帖

A*****n
发帖数: 243

来自主题: Statistics版 - 咣，咣，咣，上书了！Quick R guide.

关于16.1.2，其实R也能读入非常复杂的input data，read.table只是最常用的而已。
用python读数据的时候， readlines 和 string.split是最常用的，而这个在R里面用
readLines和strsplit也可以做到，R也有一些正则表达式的功能。如果针对数据的具体
形式写函数，效果应该和用python写的预处理程序是一样的。

R*********r
发帖数: 225

来自主题: Statistics版 - 有没有什么R PACKAGE 能把web上的文本抓下来?

R里面可以用readLines
> readLines('http://www.google.com')
[1] " charset=ISO-8859-1\">Google

topics