i'm working in r , have dataframe, dd_2006, numeric vectors. when first imported data, needed remove $'s, decimal points, , blank spaces 3 of variables: sumofcost, sumofcases, , sumofunits. that, used str_replace_all
. however, once used str_replace_all
, vectors converted characters. used as.numeric(var) convert vectors numeric, nas introduced, though when ran code below before ran as.numeric code, there no nas in vectors.
sum(is.na(dd_2006$sumofcost)) [1] 0 sum(is.na(dd_2006$sumofcases)) [1] 0 sum(is.na(dd_2006$sumofunits)) [1] 0
here code after import, beginning removing $ vector. in str(dd_2006)
output, deleted of variables sake of space, column #s in str_replace_all
code below don't match output i've posted here (but in original code):
library("stringr") dd_2006$sumofcost <- str_sub(dd_2006$sumofcost, 2, ) #2=the first # after $ #removes decimal pt, zero's after, , commas dd_2006[ ,9] <- str_replace_all(dd_2006[ ,9], ".00", "") dd_2006[,9] <- str_replace_all(dd_2006[,9], ",", "") dd_2006[ ,10] <- str_replace_all(dd_2006[ ,10], ".00", "") dd_2006[ ,10] <- str_replace_all(dd_2006[,10], ",", "") dd_2006[ ,11] <- str_replace_all(dd_2006[ ,11], ".00", "") dd_2006[,11] <- str_replace_all(dd_2006[,11], ",", "") str(dd_2006) 'data.frame': 12604 obs. of 14 variables: $ cmhsp : factor w/ 46 levels "allegan","ausable valley",..: 1 1 1 $ fy : factor w/ 1 level "2006": 1 1 1 1 1 1 1 1 1 1 ... $ population : factor w/ 1 level "dd": 1 1 1 1 1 1 1 1 1 1 ... $ sumofcases : chr "0" "1" "0" "0" ... $ sumofunits : chr "0" "365" "0" "0" ... $ sumofcost : chr "0" "96416" "0" "0" ...
i found response similar question mine here, using following code:
# create dummy data.frame d <- data.frame(char = letters[1:5], fake_char = as.character(1:5), fac = factor(1:5), char_fac = factor(letters[1:5]), num = 1:5, stringsasfactors = false)
let have glance @ data.frame
> d char fake_char fac char_fac num 1 1 1 1 2 b 2 2 b 2 3 c 3 3 c 3 4 d 4 4 d 4 5 e 5 5 e 5
and let run:
> sapply(d, mode) char fake_char fac char_fac num "character" "character" "numeric" "numeric" "numeric" > sapply(d, class) char fake_char fac char_fac num "character" "character" "factor" "factor" "integer"
now ask "where's anomaly?" well, i've bumped quite peculiar things in r, , not confounding thing, can confuse you, if read before rolling bed.
here goes: first 2 columns character. i've deliberately called 2nd 1 fake_char. spot similarity of character variable 1 dirk created in reply. it's numerical vector converted character. 3rd , 4th column factor, , last 1 "purely" numeric.
if utilize transform function, can convert fake_char numeric, not char variable itself.
> transform(d, char = as.numeric(char)) char fake_char fac char_fac num 1 na 1 1 1 2 na 2 2 b 2 3 na 3 3 c 3 4 na 4 4 d 4 5 na 5 5 e 5 warning message: in eval(expr, envir, enclos) : nas introduced coercion if same thing on fake_char , char_fac, you'll lucky, , away no na's:
transform(d, fake_char = as.numeric(fake_char), char_fac = as.numeric(char_fac))
char fake_char fac char_fac num 1 1 1 1 1 2 b 2 2 2 2 3 c 3 3 3 3 4 d 4 4 4 4 5 e 5 5 5 5
so tried above code in script, still came nas (without warning message coercion).
#changing sumofcases, cost, , units numeric dd_2006_1 <- transform(dd_2006, sumofcases = as.numeric(sumofcases), sumofunits = as.numeric(sumofunits), sumofcost = as.numeric(sumofcost)) > sum(is.na(dd_2006_1$sumofcost)) [1] 12 > sum(is.na(dd_2006_1$sumofcases)) [1] 7 > sum(is.na(dd_2006_1$sumofunits)) [1] 11
i've used table(dd_2006$sumofcases)
etc. @ observations see if there characters missed in observations, there weren't any. thoughts on why nas popping up, , how rid of them?
as anando pointed out, problem somewhere in data, , can't without reproducible example. said, here's code snippet pin down records in data causing problems:
test = as.character(c(1,2,3,4,'m')) v = as.numeric(test) # nas intorduced coercion ix.na = is.na(v) which(ix.na) # row index of our problem = 5 test[ix.na] # shows problematic record, "m"
instead of guessing why nas being introduced, pull out records causing problem , address them directly/individually until nas go away.
update: looks problem in call str_replace_all
. don't know stringr
library, think can accomplish same thing gsub
this:
v2 = c("1.00","2.00","3.00") gsub("\\.00", "", v2) [1] "1" "2" "3"
i'm not entirely sure accomplishes though:
sum(as.numeric(v2)!=as.numeric(gsub("\\.00", "", v2))) # illustrate vectors equivalent. [1] 0
unless achieves specific purpose you, i'd suggest dropping step preprocessing entirely, doesn't appear necessary , seems giving problems.
Comments
Post a Comment