Sometimes, you’d like to look at delimited files on the command line:
cat test.csvdate,name,status tomorrow,Porthy,melancholy yesterday,,morose someday,, Mon,Echawr,ecstatic ,Llano,blue ,Llany,suspicious ,,sleeping ,Peny, ,Llandy,piqued yesterday,Llaisy,oppressed
This is often annoying. Because the columns don’t line up, it’s hard to see what values go with what column. Fortunately, there’s a fantastic command line utility that will make aligned tables for you:
cat test.csv | column -s , -txdate name status tomorrow Porthy melancholy yesterday morose someday Mon Echawr ecstatic Llano blue Llany suspicious sleeping Peny Llandy piqued yesterday Llaisy oppressed
Oh. Except column doesn’t understand how to handle empty fields. So this is actually worse than before, because now we have a result that looks like it’s correctly aligned, but actually is not. column silently fails. (Although I have since discovered that this behavior of column was a bug, and it has been fixed in at least some distributions.)
I suppose we could fix this by just using sed to add some sort of ‘this field was empty’ value to the empty fields, and then columnate:
cat test.csv | sed "s/,,/,NULL,/g" | column -s , -txdate name status tomorrow Porthy melancholy yesterday NULL morose someday NULL Mon Echawr ecstatic Llano blue Llany suspicious NULL sleeping Peny Llandy piqued yesterday Llaisy oppressed
Oh. That almost works, but we forgot to consider all the possible cases. We need to worry about leading empty fields (beginning of line followed by delimiter), consecutive internal empty fields (delimiter delimiter delimiter), and trailing empty fields (delimiter followed by newline). Like this:
cat test.csv | sed "s/,,/,NULL,/g" | sed "s/,,/,NULL,/g" | sed "s/^,/NULL,/g" | sed "s/,$/,NULL/g" | column -s , -txdate name status tomorrow Porthy melancholy yesterday NULL morose someday NULL NULL Mon Echawr ecstatic NULL Llano blue NULL Llany suspicious NULL NULL sleeping NULL Peny NULL NULL Llandy piqued yesterday Llaisy oppressed
Nice. Although … that command line seems very complicated. Better to spend a few hours learning about zparseopts and abstract it into a shell function:
source fillna.zsh
cat test.csv | fillna --separator , --fill-value NULL | column -s , -txdate name status tomorrow Porthy melancholy yesterday NULL morose someday NULL NULL Mon Echawr ecstatic NULL Llano blue NULL Llany suspicious NULL NULL sleeping NULL Peny NULL NULL Llandy piqued yesterday Llaisy oppressed
Very nice. Although … this seems like a problem that other people must have encountered before.
Ah. Yes, yes it is. Better to spend a couple of days learning the DSLs for miller, tab, xsv, and q to conduct a comprehensive evaluation …