由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - 再问一个R问题
相关主题
一个用R 进行data preparation的问题请教在R里面如何拆character string.
问一个用R计算年龄的问题怎样用R subset character string
How can I improve my R plot to publishing quality?怎么去掉一个CHARACTER VAR中的delimiters
mysql 问题R data.frame
请教sas adv 130题中 第12题 第19题A SAS problem
help on a vary strange result: length statement[合集] a question about substring
SAS file question,thanks very mcuh[合集] sas里面怎么能让A2排在A10前面
如何导出R里面temp file的结果? (转载)a question about length assignment
相关话题的讨论汇总
话题: xr话题: na话题: mar话题: ftp话题: zip
进入Statistics版参与讨论
1 (共1页)
c***z
发帖数: 6348
1
怎么获取ftp文件的修改日期?
比如说这个 ftp://ftp.bls.gov/pub/special.requests/cew/2011/
页面上有修改日期,但是 file.info() returned nothing
我找到了scrap http page的办法,但是不能用于ftp page
请问哪位大侠有经验么?谢谢!
c***z
发帖数: 6348
2
下面是我的code and result
> sourcefile
[1] "ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_
level.zip"
> file.info(sourcefile)

size
ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_level
.zip NA

isdir
ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_level
.zip NA

mode
ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_level
.zip

mtime
ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_level
.zip

ctime
ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_level
.zip

atime
ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_level
.zip

uid
ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_level
.zip NA

gid
ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_level
.zip NA

uname
ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_level
.zip

grname
ftp://ftp.bls.gov/pub/special.requests/cew/2011/2011.q1-q3.county_high_level
.zip
>
c***z
发帖数: 6348
3
我发现其实 getURL()就能得到 修改时间
哪位大侠能指点一下怎么parse出 文件名和修改时间呢?
> files <- getURL(sourcelink, ftp.use.epsv = FALSE, dirlistonly = FALSE)
> files <- strsplit(files, "\n")
> files <- unlist(files)
> files
[1] "-r-xr-xr-x 1 owner group 15780895 Mar 26 16:16 2011.q1-q3.
county_high_level.zip"
[2] "-r-xr-xr-x 1 owner group 128178060 Mar 26 17:02 2011.q1-q3.
end.zip"
[3] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 county"

[4] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 county_high
_level"
[5] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 csa"

[6] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 microsa"

[7] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 msa"

[8] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 national"

[9] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 size"

[10] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 state"

> files <- getURL(sourcelink, ftp.use.epsv = FALSE, dirlistonly = FALSE)
> files <- strsplit(files, "\n")
> files <- unlist(files)
> files
[1] "-r-xr-xr-x 1 owner group 15780895 Mar 26 16:16 2011.q1-q3.
county_high_level.zip"
[2] "-r-xr-xr-x 1 owner group 128178060 Mar 26 17:02 2011.q1-q3.
end.zip"
[3] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 county"

[4] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 county_high
_level"
[5] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 csa"

[6] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 microsa"

[7] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 msa"

[8] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 national"

[9] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 size"

[10] "dr-xr-xr-x 1 owner group 0 Mar 28 10:05 state"

>
c***z
发帖数: 6348
4
我有一个naive的想法是strsplit again
有更好的办法么?谢谢!
c*********t
发帖数: 340
5
想不出更好的办法,对rcurl不是很熟
但是有个笨办法供lz参考
既然是fixed length就找出想要的column的位置:)
> grep("M",unlist(strsplit(files[1],"")))
47
> substr(files,47,47+11)
[1] "Mar 26 16:16" "Mar 26 17:02" "Mar 28 10:05" "Mar 28 10:05" "Mar 28 10:
05" "Mar 28 10:05" "Mar 28 10:05" "Mar 28 10:05" "Mar 28 10:05"
[10] "Mar 28 10:05"
c***z
发帖数: 6348
6
substr不行,因为每个files元素的长度不一样
有20,21,28这3种
再抱怨一下R的date class还真是要命啊
c***z
发帖数: 6348
7
我是这么做的,可以获取ftp时间。
但是因为递归,还是不能运行良好。
# list of contents
filestubs <- getURL(sourcelink, ftp.use.epsv = FALSE, dirlistonly = FALSE)
filestubs <- strsplit(filestubs, "\n")
filestubs <- unlist(filestubs)
files <- as.data.frame(filestubs)

# obtain names and modify time
for (i in 1:length(filestubs)) {
# i <- 1
temp <- strsplit(filestubs[i], " ")
temp <- unlist(temp)

temp.name <- temp[length(temp)]
files$name[i] <- temp.name

temp.date <- paste(temp[length(temp)-3], temp[length(temp)-2], mod.year,
sep = ' ')
temp.date <- strptime(temp.date, "%b %d %Y")
files$ftp.date[i] <- as.Date(temp.date)

temp.size <- temp[length(temp)-4]
files$size[i] <- temp.size
}

files$link <- paste(sourcelink, files$name, sep = '')
1 (共1页)
进入Statistics版参与讨论
相关主题
a question about length assignment请教sas adv 130题中 第12题 第19题
帮忙看一道题help on a vary strange result: length statement
another sas questionSAS file question,thanks very mcuh
问一个特简单的问题,sas如何导出R里面temp file的结果? (转载)
一个用R 进行data preparation的问题请教在R里面如何拆character string.
问一个用R计算年龄的问题怎样用R subset character string
How can I improve my R plot to publishing quality?怎么去掉一个CHARACTER VAR中的delimiters
mysql 问题R data.frame
相关话题的讨论汇总
话题: xr话题: na话题: mar话题: ftp话题: zip