Added Readability Statistics
I was going to add some monthly reports for the progress meter, but I’ve been wanting to add readability statistics for a while. So this is what I concentrated on today.
I use the GNU “style” command to come up with the readability statistics. I spent several hours writing my own parsers in PHP, but the stats didn’t come out exactly the same as the style command. This led me to check some text on several different readability utilities on the net. My thought was at least some of the stats would be the same. Like the Flesch Reading Ease or the SMOG Index. I discovered something interesting … nothing matched up exactly.
One reason is hyphenated words. Sometimes hyphenated words are treated as one longer, multisyllable (and therefore complex) word, and sometimes as multiple words. So if the word counts and syllable counts don’t match, then nothing else is going to match because of this.
Another reason for word count discrepancies could be ellipses. How are ellipses treated by the parser? Does it actually determine the difference between omission ellipses, beginning ellipses and ending ellipses.
There’s bound to be another dozen reasons why the figures weren’t exact matches.
Generally the number of sentences matched up. And that’s about it.
I believe these statistic generators can be pretty useful. But comparisons between different implementations of the same algorithms sometimes varies much more than one would expect.