I have a massive dataframe seems like this:
df = data.frame(year = c(rep(1998,5),rep(1999,5)), loc = c(10,rep(14,4),rep(10,2),rep(14,3)), sitA = c(rep(0,3),1,1,0,1,0,1,1), sitB = c(1,0,1,0,1,rep(0,4),1), n = c(2,13,2,9,4,7,2,7,7,4)) df year loc sitA sitB n 1 1998 10 0 1 2 2 1998 14 0 0 13 3 1998 14 0 1 2 4 1998 14 1 0 9 5 1998 14 1 1 4 6 1999 10 0 0 7 7 1999 10 1 0 2 8 1999 14 0 0 7 9 1999 14 1 0 7 10 1999 14 1 1 4
As you can see, there are years, localities, two different situation (denoted as
sitA
andsitB
) and finally the counts of these records (columnn
).I wanted to create a new data frame which reflects the counts for only year and localities where counts for situation A and B stored in the columns conditionally such as desired output below:
df.new year loc sitB.0.sitA.0 sitB.0.sitA.1 sitB.1.sitA.0 sitB.1.sitA.1 1 1998 10 0 0 2 0 2 1998 14 13 9 2 4 3 1999 10 7 2 0 0 4 1999 14 7 7 0 4
The tricky part as you can realize is that the original dataframe doesn't include all of the conditions. It only has the ones where the count is above 0. So the new dataframe should have "0" for the missing conditions in the original dataframe. Therefore, well known functions such as melt (reshape) or aggregate failed to solve my issue. A little help would be appreciated.
score:3
A tidyverse
method, we first append the column names to the values for sit..
columns. Then we unite
and combine them into one column and finaly spread
the values.
library(tidyverse)
df[3:4] <- lapply(names(df)[3:4], function(x) paste(x, df[, x], sep = "."))
df %>%
unite(key, sitA, sitB, sep = ".") %>%
spread(key, n, fill = 0)
# year loc sitA.0.sitB.0 sitA.0.sitB.1 sitA.1.sitB.0 sitA.1.sitB.1
#1 1998 10 0 2 0 0
#2 1998 14 13 2 9 4
#3 1999 10 7 0 2 0
#4 1999 14 7 0 7 4
If the position of the columns is not fixed you can use grep
first
cols <- grep("^sit", names(df))
df[cols] <- lapply(names(df)[cols], function(x) paste(x, df[, x], sep = "."))
Credit To: stackoverflow.com