关于sqldf的讨论汇总 - 话题女王

p****e
发帖数: 165

来自主题: Statistics版 - 会R的麻烦进一下：R中怎么用SQL? 按道理不应该这么慢的。。。

请教一下大家，我想把csv文件导入R, 然后对此csv文件用SQL来选取不同的subset, 调
用sqldf package, R就一直在load, 奇慢无比，最后force quit了。
d = read.csv("abc.csv")
newd <- sqldf("select * from d where day = '2012-10-01'", row.names=TRUE)
于是R就卡住了。
我再想是不是需要把csv转化成data frame然后才可以在上面query, 于是就又尝试了这
个：
d = read.csv("abc.csv")
df <- as.data.frame(d)
newd <- sqldf("select * from df where day = '2012-10-01'", row.names=TRUE)
继续卡。。。force quit了
继续尝试，想是不是可以用read.csv.sql，
d <- read.csv.sql("abc.csv", eol "\r")
还是卡。。。force quit了。。。
请大牛们帮帮忙，我如果想用sqldf，... 阅读全帖

m*****s
发帖数: 9

来自主题: Statistics版 - R问题求救，在线等

install.packages("sqldf")
library(sqldf)
help(sqldf)

d*******1
发帖数: 854

来自主题: Statistics版 - 那位大侠帮我debug 一下这个R and sqlite code

在R里面运行sqlite syntax, 下面第一个命令运行无误, 但是第二个出了问题
all<- sqldf("select a.*, b.cutnum from all_resx a, cut b where a.cut=b.cut")
allx<- sqldf("select a.*, b.cut cutx from all a, cut b where b.cutnum-a.
cutnum=1")
出错信息如下:
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near "all": syntax
error)
Calls: sqldf ... dbGetQuery -> sqliteQuickSQL ->
sqliteExecStatement -> .Call
Execution halted
那位大侠给点... 阅读全帖

s*********e
发帖数: 1051

来自主题: Statistics版 - R 有点令人失望

> n <- 1000000
> set.seed(2013)
> ldf <- data.frame(id1 = sample(n, n), id2 = sample(n / 100, n, replace =
TRUE), x1 = rnorm(n), x2 = runif(n))
> rdf <- data.frame(id1 = sample(n, n), id2 = sample(n / 100, n, replace =
TRUE), y1 = rnorm(n), y2 = runif(n))
>
> # METHOD 1: MERGE
> system.time(join1 <- merge(ldf, rdf, by = c("id1", "id2")))
user system elapsed
54.028 11.229 65.673
>
> # METHOD 2: PLYR
> # library(plyr)
> # system.time(join2 <- plyr::join(ldf, rdf, by = c("id1", "id2"), type =... 阅读全帖

s*********e
发帖数: 1051

来自主题: Statistics版 - How to compute sum of revenue for each day each person in R?

well, you can use "proc sql" in R as well ^_^.
take a look at sqldf package http://code.google.com/p/sqldf/.

q**j
发帖数: 10612

来自主题: Statistics版 - 【【泪奔求助】】R高手帮我看看

发信人: statcompute (statcompute), 信区: Statistics
标题: Re: How to compute sum of revenue for each day each person in
发信站: BBS 未名空间站 (Fri May 22 23:55:47 2009)
well, you can use "proc sql" in R as well ^_^.
take a look at sqldf package http://code.google.com/p/sqldf/.

和2

c******y
发帖数: 3269

来自主题: Statistics版 - 问个R的数据处理的问题，在线等

前面我没表达清楚
我的意思是sqldf和R的其他方法比，数据大的情况下sqldf效率比较低
R没有比较高效的sql package，这点是R比较弱的

c******y
发帖数: 3269

来自主题: Statistics版 - 问个R的数据处理的问题，在线等

sqldf很多时候比较简洁易懂，用起来方便
Agree, it's just not efficient.
Therefore, if the users ETL in R, I'd recommend other R packages instead of
sqldf, so they can get a better idea of how R manipulates data.
If users prefer SQL, I would suggest the same as you do: let the pro-
software do SQL.

d*****g
发帖数: 4081

来自主题: Statistics版 - 【【泪奔求助】】R高手帮我看看

高手，是不是你给我的网页第四个例子的最后一个 animal marker
不过这个例子看起来简单了些，我有很多数据
还有像这个例子这样写，能比较数据大小找出一样的值的么？
sqldf("select * from SNP4 left join SNP1x using (Animal, Marker)")

d*****g
发帖数: 4081

来自主题: Statistics版 - 【【泪奔求助】】R高手帮我看看

比如我的结构是这样的
> A
1 2 3 4 5
2 4 2 4 1
1 4 5 6 7
1 2 3 5 9
8 6 2 4 3
> B
7 3
5 1
我用刚才上面的语句
sqldf("select * from A left join B using (A$V5, B$V1)")
能得到以下结果么？
1 2 3 4 5 1
2 4 2 4 1 NA
1 4 5 6 7 3
1 2 3 5 9 NA
8 6 2 4 3 NA

w********m
发帖数: 1137

来自主题: Statistics版 - 会R的麻烦进一下：R中怎么用SQL? 按道理不应该这么慢的。。。

Sadly R doesn't have native support for SQL. The creators of R/S in AT&T Lab
designed it for file systems such as UNIX or LINUX -- they just use
scripting language to manipulate or query data. The 3rd party access like
sqldf or RODBC is somewhat not stable.

s*********e
发帖数: 1051

来自主题: Statistics版 - 会R的麻烦进一下：R中怎么用SQL? 按道理不应该这么慢的。。。

it might not be the issue of sqldf. how big is csv file?
when people have a hammer, everything will look like a nail. ;-)

w***a
发帖数: 432

来自主题: Statistics版 - 会R的麻烦进一下：R中怎么用SQL? 按道理不应该这么慢的。。。

不好意思，我再去看了一下sqldf其实也还是R，我以为是连接到数据。。。。。
我试了一下好象没有问题，当然我的数据不是大，就几百行。
你保证读入的时候数据是对的吗？看看
head(d)

D**u
发帖数: 288

来自主题: Statistics版 - data.table is amazing

I think Sqldf (working with SQLite) still have the edge in big data over
data.table.

s*********e
发帖数: 1051

来自主题: Statistics版 - data.table is amazing

Sqldf is slow, period

D**u
发帖数: 288

来自主题: Statistics版 - data.table is amazing

Ok, I am going to try hdf5 + data.table combination, compare to rsqlite+
sqldf. That will be the optimal way I can think of now.

s*********e
发帖数: 1051

来自主题: Statistics版 - data.table is amazing

data.table is 100X faster than sqldf() for row extraction from a data.frame.
http://statcompute.wordpress.com/2013/01/01/efficiecy-of-extrac

l******n
发帖数: 9344

来自主题: Statistics版 - 请教一下R的Data Manipulation

sqldf还是不错的，内存够大的话速度很快

proc

l******n
发帖数: 9344

来自主题: Statistics版 - 问个R的数据处理的问题，在线等

用sqldf应该是最容易的吧

l******n
发帖数: 9344

来自主题: Statistics版 - 问个R的数据处理的问题，在线等

用sqldf应该是最容易的吧

l******n
发帖数: 9344

来自主题: Statistics版 - 问个R的数据处理的问题，在线等

其实R不需要呀，你直接用高效的sql工具处理，把结果在导入R就行了。R主要是作数据
处理的，不包括BI。sqldf很多时候比较简洁易懂，用起来方便

发帖数: 1

来自主题: Statistics版 - 五月毕业的统计master求内推&建议

不好意思，回复晚了！收到第一个回复好开心~~
我感觉还可以，实习和上学一直用的R，stringr, dplyr, ggplot2, sqldf,
RandomForest, xgboost, foreach, apply之类的用的都挺熟。因为实习需要，所以
需要读大量前人写的代码，并且写很多if else保证程序能跑过各种可能的情况，优化
算法提高速度这样，也经常帮老板debug，写function，写自动的小程序满足数据处理
和画图任务。课上会用R做数据处理和关于statistical learning的project。也用R做
过Kaggle的projects，成绩还行。
感觉现在接触的东西比较固定，还是有很多新的要学的~~您觉得R的水平对找工作很有
帮助吗？

s*********e
发帖数: 1051

来自主题: DataSciences版 - 在R里merge两个dataframe太慢了

try sqldf with index. there is no overhead for data table conversion.

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天