I believe that histograms need to be both summary and descriptive measures. This clashes with the need to show individual outliers, digit preference, bimodality, data gaps, and other features. Scroll past that to see the theory and explanation, then keep scrolling to find links to the papers that explain the method.Ĭonventional wisdom dictates that a "broken look' resulting from a histogram with many bins is undesirable. It is a bit more complicated to calculate, but seems to do a great job. This page from Hideaki Shimazaki explains an alternative method. The simplest method is to set the number of bins equal to the square root of the number of values you are binning. This wikipedia page lists several methods for deciding bin width from the number of observations. If you have lots of values, your graph will look better and be more informative if you have lots of bins. The decision clearly depends on the number of values. If you want to create a frequency distribution with equally spaced bins, you need to decide how many bins (or the width of each). Either a dot plot, or a cumulative frequency distribution, which doesn't require any bins. One solution is to create a graph that shows every value. If you have too many bins, you get a broken comb look, which also doesn't give a sense of the distribution. If you use too few bins, the histogram doesn't really portray the data very well.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |