转载

R读入多位小数点问题

在读入excel和csv的数据的时候总是回碰到小数点的问题,不能正确的显示。早就该弃用 read.csv 这个函数。

现在来介绍两个比较好的读入数据的包,Hadley出品 —— readxl & readr

测试数据:

函数介绍:

 readxl::read_excel("test.xlsx",col_names = F,col_types = rep("numeric",3))

col_types 一共有四种模式可选: "blank", "numeric", "date" or "text"。 blank 就是skip这一列,其他的三个都很好理解。

vignette("column-types") #参考这里的文档 readr::read_csv("test.csv",col_names = F,col_types = cols(X1="d",X2=col_skip(),X3="d"))

这里的col_types 更为丰富,

  • col_logical() [l], containing only T , F , TRUE or FALSE .

  • col_integer() [i], integers.

  • col_double() [d], doubles.

  • col_character() [c], everything else.

  • col_date(format = "") [D]: Y-m-d dates.

  • col_datetime(format = "") [T]: ISO8601 date times

  • col_number() [n], finds the first number in the field. A number is defined

    as a sequence of -, "0-9", decimal_mark and grouping_mark . This is useful for currencies and percentages.

decimal_mark 这个是在 locale() 里面设置的,具体见帮助文档 vignette("locales") .

You can also manually specify other column types:

  • col_skip() [ _, -], don't import this column.

  • col_date(format) , dates with given format.

  • col_datetime(format, tz) , date times with given format. If the timezone is UTC (the default), this is >20x faster than loading then parsing with strptime() .

  • col_time(format) , times. Returned as number of seconds past midnight.

  • col_factor(levels, ordered) , parse a fixed set of known values into a factor

例子

  read_csv("iris.csv", col_types = cols(   Sepal.Length = "d",   Sepal.Width = "d",   Petal.Length = "d",   Petal.Width = "d",   Species = col_factor(c("setosa", "versicolor", "virginica")) ))

读入数据后,我们往往会碰到这样的东西

a$X3 [1] 3.000000e-06 1.237595e+06 

解决办法:

formattable::digits(a$X3,7) [1] 0.0000030       1237594.5455460

这个formattable包还有很多的用途,详情见: http://renkun.me/formattable/

原文  https://segmentfault.com/a/1190000004881108
正文到此结束
Loading...