I was creating a new filesystem on a NetBSD box today, and wondered about the appropriate value for the “average file size” parameter. The first question I had was “what is the average filesize in my data set”, which I figured I could answer, since I had some representative data handy. I put together a quick one-liner to answer this question:
find /export -type f -print | perl -ne 'chomp; $count++; $total += (stat())[7]; END { print "$count files $total bytes total ", $total/$count, " byte average\n"; }
After running this, I was surprised at just how large the “average” size was, which led me to wonder just which average they were looking for here: mean or median? While I was at it, I decided to calculate the mode as well.