Tuesday, October 21, 2008

Password Habits

Data on users' password habits is hard to come by. Most available researches base their publications on data collected with help of control groups. While these results are fairly good and representative, it makes sense that there will be a number of inaccuracies dragged alongside due to how usually control groups assembled and people who generally participate in them (I was unable to find any research on the subject with good explanation of their selection criterion. Why is that never discussed in depth?). Fortunately for us, these inaccuracies or impurities are rather insignificant, and the collected information still can be successfully extrapolated over the general public. After all, most of us share similar preferences when it comes to remembering things.
Here I would like to present some real life statistics, albeit based only on ~48000 samples, it should give a good view of password selection habits. Only the actual results are shown, it's left up to the reader to draw any conclusions.

Background information:
  • Source data is a list of 48595 username-password pairs, coming partially from a public discussion board (28595) and partially from a corporate network resource (20000). Users awareness about information security is unknown, but we could assume with a great deal of certainty that the users' expertise represents a complete spectrum from 'casual user' to 'technically inclined'.
  • We can also assume that the average age for the 20000 list is 20+ (people working in the company are most likely after a college, army, etc.)
  • Alpha-numeric and general characters allowed. Minimum password length is 6.
  • Initial password generated by the administrator is 10 characters long, consist of interleaving cases and numbers. E.g. UaI7VyijSt
  • For passwords from public discussion board: users with last access date - registration date difference no greater than a week were removed. This is done in order to clean up the list from one-time users who presumably chose a common, simple to remember, combination. This should remove a great share of non representative passwords and give us better statistics.
Results:
  1. Overall distribution by length (X axis - length, Y axis- distribution percentage):
  2. Combination match in a publicly available wordlist (~3349730 words): 5.12%
    Distribution by length:
  3. Consists solely of numbers: 11.91%
    Distribution by length:
  4. Top 30 most frequently occurring passwords:
  5. Has a numerical suffix (remaining characters are alphabetic): 19.83%
    Has a numerical prefix (remaining characters are alphabetic): 2.81%

    Top 30 suffixes/prefixes:
  6. Original passwords assigned by server retained (under assumption that the passwords of the form UaI7VyijSt are indeed system assigned and not user chosen): 1.44%
  7. Capitalized (remaining characters are lowercase/numbers/general): 2.41%
  8. All letters are uppercase (remaining characters are either numbers or general): 0.19%
  9. Consist solely of same repeating character (e.g. aaaaaaa, 33333333): 0.74%
  10. A double pattern (e.g. funkyfunky): 2.84%
  11. Password is an username derivative (e.g. username: vikk -> password: Zvikk007): 1.52%

1 comment:

Anonymous said...

trustno1, donkey, unreal, samsung???? wow! some really weird passwords there