关于textfile的讨论汇总 - 话题女王

全部话题 - 话题: textfile

l*********r
发帖数: 136

来自主题: JobHunting版 - 网页点击一个button生成文件下载的功能怎么实现的？

纯前端实现的，不知道符合题主的要求不

---------------------- html: ----------------------

Download
---------------------- JS: ----------------------
$scope.maketxtfile = function () {
var textFile = null,
makeTextFile = function (text) {
var data = new Blob([text], { type: 'text/plain' });
if (textFile !== null) {
window.URL.revokeObjectURL(textFile);
}
textFile = win... 阅读全帖

y*****e
发帖数: 712

来自主题: JobHunting版 - L两轮面经，都碰到了没见过的题，当场就跪了。。。。

今天面的第二轮。。。面完很伤心很失望，下午上了一下午班后，感觉好了点，开始觉
得自己发挥的好差，题也不容易，为啥别人都能碰到常见的常规的题，我就碰不到。。
。。不够难过的时候怪运气是太容易的事了，但现在冷静下来感觉，不过是给自己找借
口罢了。发面经上来，给自己差劲的人品增值，希望将来的面试顺利。
第一轮两道题
1. first missing positive
2. 写一个file line iterator
Implement a (Java) Iterable object that iterates lines one by one from a
text file..
/** A reference to a file. */
public class TextFile implements Iterable. From 1point 3acres bbs
{
public TextFile(String fileName) { // please implement this
/** Begin reading the file, line by li... 阅读全帖

A******a
发帖数: 61

来自主题: Java版 - how to use grep/sed to remove newlines? (转载)

【以下文字转载自 Linux 讨论区】
发信人: Ataraxia (静), 信区: Linux
标题: how to use grep/sed to remove newlines?
发信站: BBS 未名空间站 (Sat Oct 18 10:48:52 2008)
i am trying grep some text from a file then return some words:
grep "^>start" textfile.txt
but it always returns results one each line:
A
B
C
I want to have it as: A B C ..., so I tried to use sed:
grep "^>start" textfile.txt | sed -e 's/\n//'
it doesn't work, anyone knows how to solve this?
Thanks.

v*****r
发帖数: 2325

来自主题: Programming版 - spark is slower than java Mapreduce --scala big bulls pls advise

spark beginner trying out the buzz tech
input 200GB uncompressed data file stored in hdfs
37 worker nodes, each has 24 cores
using java map reduce, 6-8 minutes
using spark, 37 minutes, 2 18 minute-stage
"lightning fast cluster computing, 100x faster" ???!!!!
Big bulls please advise!
#sortMapper sort values for each key, then do some iteration for the grouped
values
text = sc.textFile(input,1776) #24*37*2
text.map(mapper).filter(lambda x: x!=None).groupByKey().map(sortMapper).
filter(lambda x: x... 阅读全帖

p********7
发帖数: 549

来自主题: JobHunting版 - Bloomerg 还没放弃我。电话二面经过。

倒数第二个题可以用hash table，对于不同长度的单词使用不同的table，a ＝ 1；b
＝ 51；c＝ 101；d ＝ 151；这么定义字母，人那还cab ＝ 1 ＋ 51＋101 ＝ 153；
对于三个字母的组合，如果是anagram肯定是同样的值。
如果你想对于每个单词都排序了再做，复杂度太高了，至少都是nLogN
最后一个直接用hash table。把textfile的词都装到table，然后再find list的词，
如果找到就删除，如果没找到就什么都不干。最后table留下的就是答案。

i*******e
发帖数: 240

来自主题: JobHunting版 - 问道题吧 L家 onsite

一个很巨大的textfile包含一条条的数据，数据的形式和数据库里面的表一样。
问如何实现SQL查询里面group by这个功能。要求速度越快越好

c*****a
发帖数: 808

来自主题: JobHunting版 - G家mapreduce一道题

来个spark的
val file = spark.textFile("hdfs://documents")
val words = file.flatMap(l=> l.split(" ")).map(w => (w, 1)).groupByKey(10000
).filter(p => p._2.size>5000).map(_._1)

dg
发帖数: 546

来自主题: PennySaver版 - The Mailbox: 30% Off Clearance + FREE Shipping

Today only (July 18th), The Mailbox is offering up 30% off all clearance
items and 15% off all regular-priced items plus FREE shipping
http://www.theeducationcenter.com/TextFiles/Email/te_cybermonda

l****e
发帖数: 359

来自主题: Flash版 - 请问能否用AS动态生成button?

新手新手
请问能否用actionscript动态生成按钮？因为按钮的个数要根据从外部文件里
读出来的内容决定，按钮上面的字也一样。。。
我很土地可以做到动态生成一排textfiled，样子倒是可以了，可是它们不能
点来触发其他动作。。。
请问高手这个应该怎么做啊。。。
谢谢谢谢

c*****t
发帖数: 1879

来自主题: Java版 - how to use grep/sed to remove newlines? (转载)

echo `grep ">start" textfile.txt`

A******g
发帖数: 612

来自主题: Java版 - eclipse 做gui问题

用eclipse做gui，是不是只能用命令来加窗口，textfiled, Jbutton等，不能直接画？
eclipse的visual editor是不是做这个的？网上说明很少，连个图都找不到，读了faq
还不能在mac用。
netbeans是不是可以做这个。
不想花太多时间用代码布局窗口，就是个输入的界面而已，不需要很fancy。
想把主要精力放在算法上
谢谢！

d****n
发帖数: 1637

来自主题: Linux版 - 请教大家一个问题

patterns=(`cat patterns.txt`)
for pat in ${patterns[*]}
do
echo "Looking for pattern $pat"
sed -n "/^$pat/p" textfile.txt
done

A******g
发帖数: 612

来自主题: Programming版 - eclipse 做gui问题(java)

w********m
发帖数: 1137

来自主题: Programming版 - error of couting total line number in txt file on MSDOS (转载)

用python吧
空间O(1），时间O(n)
cnt = 0
with open('file.txt', 'r') as infile:
for _ in infile:
cnt += 1
print cnt
空间O(n), 时间O(n/k)
import pyspark
sc = pyspark.SparkContext()
infile = sc.textFile('file.txt')
print infile.count()

K*********n
发帖数: 2852

来自主题: Programming版 - sbt错误：object xxx is not a member of package yyy

Scala纯新手，写了点测试代码：
###### build.sbt #########
name := "Train NB"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.2.0",
"commons-io" % "commons-io" % "2.4",
"com.google.code.gson" % "gson" % "2.2.4",
"org.la4j" % "la4j" % "0.4.9"
)
javacOptions ++= Seq("-source", "1.7", "-target", "1.7")
####### scala code ##########
package app.mycompany
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apa... 阅读全帖

b*b
发帖数: 422

来自主题: Economics版 - 几个链结

Greenspan's testimony
http://www.bog.frb.fed.us/boarddocs/testimony/Testimony.cfm?WhichTestimonyIndex=63
George Soros' testimony
http://www.soros.org/textfiles/speeches/091598_Banking_and_Finance.txt
Fightback from HK
http://www.scmp.com/news/template/Front-Template.idc?artid=19980918002729026&top=front&template=Default.htx&maxfieldsize=2959
South China Morning Post
Paul Krugman's series on Slate
www.slate.com search Paul Krugman
(free registration)
Financial news:
http://www.cnnfn.com/
latest a

s****b
发帖数: 2039

来自主题: Mathematics版 - count unique values in file with 1 million rows (转载)

can use programming, such as java:
make a 2D array, put excel sheet into the array.
for each value, find _ and select substring before _, put into a new array.
then for the new array, start with first value, use for loop, from i=1,if
find any value == this
value, delete that value, shift array contents one step up,reduce the
dimension by 1, increase i by 1, do it again. when i reaches 1million, put
this value into first position of a new 1D array, delete this value.
do it again with the second v... 阅读全帖

b*****o
发帖数: 715

来自主题: DataSciences版 - Big data是下一个大坑吗

大赞大牛的摘要！！
第一次听说spark，刚刚看了一下文档，有一个疑问。就计算模型而言，spark和Dryad/
Flume几乎一样。我可以想像它做了一些infrastruture的优化(比如memory caching)，
从而大大降低了network cost和disk read/write cost。但是它貌似对于迭代算法依旧
不是很有效。
就文档里给的logistic regression的例子：
val points = spark.textFile(...).map(parsePoint).cache()
var w = Vector.random(D) // current separating plane
for (i <- 1 to ITERATIONS) {
val gradient = points.map(p =>
(1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x
).reduce(_ + _)
w -= gradient
}
println("Final separating plane: ... 阅读全帖

B*****g
发帖数: 34098

来自主题: DataSciences版 - Pig word count

-- Hive queries for Word Count
drop table if exists doc;
-- 1) create table to load whole file
create table doc(
text string
) row format delimited fields terminated by 'n' stored as textfile;
--2) loads plain text file
--if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc
table)
load data local inpath '/home/trendwise/Documents/sentiment/doc_data/
wikipedia' overwrite into table doc;
-- Trick-1
-- 3) wordCount in single line
SELECT word, COUNT(*) FROM doc LATERAL VIEW explo... 阅读全帖

s****h
发帖数: 3979

来自主题: DataSciences版 - spark 问题

data frame API 1.3 才有。
弄进一个dense vector很直接：
import org.apache.spark.mllib.linalg.{Vector, Vectors}
val path = "～/test60m.txt"
val minPartitions = 4

val parsed = sc.textFile(path, minPartitions)
.map(_.trim)
.filter(line => !(line.isEmpty || line.startsWith("#")))
val a = parsed.map({ line =>
val fields = line.split(" ")
(fields(0).toInt, fields(1))})
val dataLoad = a.map(x => (x._1, Vectors.dense(x._2.split(',').map(_.
toDouble))))
可是sparse vector就不会了。
不知道怎么结合下面的例子
// Create a dense vector (... 阅读全帖

E*********g
发帖数: 185

来自主题: DataSciences版 - 有尝请教关于spark api java coding

思路是对的，格式可以很flexible。List, Vector, Tuple或多个数值做参数都可以
取决于你的score function用什么数据类型
不知道你这个tree function是什么意思
假定你的gbm模型scoring function是 score(Tuple), Tuple= (x1, x2,...)
val data = sc.textFile(data_file)
.map(_.split(","))
.map(x=> (x._1, (x._2, x._3, ...)))
//数据格式RDD(y, (x1,x2,...))。如果x太多的话，先把y split出来，
再split剩下的就好。 Tuple有22个参数限制，如果x多于22个的话，不妨直接用List

val scores = data.map(x=> (x._1, score(x._2))

,v

E*********g
发帖数: 185

来自主题: DataSciences版 - 有尝请教关于spark api java coding

：有id作为unique的key, 可以生成key-vector pair。
：
：在tree函数之前有个function给所有变量赋值，把vector里的值传给变量
：XB_TOTAL_ACTIV_MNTH_P1Q，然后tree就直接根据这些变量计算。
：
：这个code是salford软件自动生成的，所以写成这样，上千的不同名字的变量都hard
：coded
：
：1.我可以改写tree函数成接受外部array把feature的值传进来，但是每个tree只需要
5
：，6个out of hundreds.每个tree还不同的features。我可以在textpad里一起
replace
：改，但是只可能每个tree都定义所有hundreds of变量，虽然只用5，6个，这样可以
吗？
可以
每棵树一个类是很糟糕的，有没有可能做成一个类，不同参数
：
：有可能不改吗，维持两个function吗？一个从vector到XB_TOTAL_ACTIV_MNTH_P1Q=？的
：赋值函数，一个tree函数根据XB_TOTAL_ACTIV_MNTH_P1Q计算？spark api允许自定义... 阅读全帖

p*****n
发帖数: 1415

来自主题: _America版 - 介绍《自由主义与美国认同》

新世纪
http://www.ncn.org/liujunning/textfiles/16b.htm
美国自由主义传统的复兴与重建－－介绍《自由主义与美国认同》
自由主义作为美国的主导意识形态，自1968年的反战学生
运动以来，一直受到猛烈的抨击。有人甚至宣称，美国的自由
主义作为一种意识形态已经死亡。而作者认为，一向认同自由
主义的美国自由派和民主党人也未能有效地捍卫好令美国人骄
傲的自由传统。然而，自由派必须迎接这一挑战。不过，若没
有明确的意识形态的认同，美国的自由派就无法在未来传达出
明确的政治信息。美国哥伦比亚大学新闻研究中心的研究员帕
特里克. 加里所撰写的《自由主义与美国认同》一书就试图在一
方面传达出一个明确的信息。这就是美国的自由主义传统的复
兴与重建的时刻已经到来。
在《自由主义与美国认同》一书中，作者对美国政治中自
由主义的重要性和意义作了系统地而充份地阐述。该书是自八
十年代美国的自由主义受到猛烈抨击以来试图使美国的政治自
由主义重现青春的第一部著作。
在第一章中，作者详细论述了自由主义在八十年代经历的
危机和衰落，以及在美国复兴自由主义传统的

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天