Security of password authentication
Passwords are inevitable nowadays. Almost all online services require you to choose one when you sign up for an account or some time after that. It could work reasonably well but it turns out that password authentication is not that great these days. There are several reasons, one of them probably being poorly designed password forms.
How to hack a user account
Let's assume an attacker wants to illegitimately access some account on a website. They have several options including:
- stealing the password from the victim.
- exploiting a vulnerability in the website.
- guessing the password.
While (1) and (2) are important, I'll focus on (3) in this post because the security of a website is too complex to be covered here. For your information, possible countermeasures against (1) and (2) include HTTPS, malware and phishing protection, two-factor authentication and good development practices.
Guessing passwords
To find a password, an attacker could typically either:
- repeatedly try to login with different guesses, or
- steal the database and recover the passwords from it.
In both cases, the two main factors are:
- how fast they can check their guesses, online or offline
- in what order they check them
In (1), (a) can be countered by throttling login attempts, and in (2) by storing the passwords in a hashed form, using an expensive algorithm. In both cases, (b) can be countered by choosing strong passwords. I've summed it up in the rough attack-defense tree below:
Note that using an expensive algorithm also implicitly throttles login attempts since it slows down password verification. This is a good thing as long as logging in doesn't take too long for users. However, in some cases, this might not be enough of a protection and if your website is that sensitive, you should do a proper security analysis instead of the very rough one I'm providing you with.
Choosing a good hash algorithm
There is already plenty of advice on how to hash passwords securely and I won't attempt to cover it all here. See the Wikipedia entry for password verification for more information on why hashing the passwords is important. The main point of this is to make it expensive for an attacker to guess passwords from your database.
My advice: store the hashes of passwords using a renowned algorithm such as PBKDF2 or bcrypt, and adjust the work factor of the chosen algorithm to find a balance between the time it takes to login and the security requirements of your website.
Many web frameworks already handle this. For instance, Django does it rather well as seen in its documentation on password management. Symfony doesn't document it as well (see its documentation on authentication) but still provides good algorithms such as PBKDF2 and bcrypt. If your framework doesn't handle password hashing for you, try to find a good library such as passlib. That one even has a nice quickstart page on what algorithm to choose.
Choosing a good password
Slowing down the attacker is one thing, but if your password is "123456", the attacker will probably crack it anyway because that's a likely choice for a user. That's why we should talk about password strength.
The best attacker will estimate how likely each password is to have been chosen by the user, and then try the most likely first. The likelihood of each password depends on many things and is thus hard to estimate. So far, the best estimates have come from the analysis of leaked databases (see this 2010 paper by Weir et al. for example). It is also obviously influenced by the policy of the website, like forbidden or required characters.
What is a good password? Given what was previously said, it's a password that an attacker won't try early, that is, a password that they don't think is likely compared to other ones. Unfortunately, you can't know for sure in what order the attacker will perform their guesses. Therefore, you have to be sufficiently unpredictable.
Let's consider a simple example where the attacker knows your password is a random string of 6 digits. Each combination is therefore equally likely and the attacker can't do better than trying them in any order. They will need to perform (10 ^ 6) / 2 guesses on average. However, if they know you don't choose your password at random, which is the case of many users, they can try the more likely combinations like "123456" first, and need far fewer guesses on average.
Choosing a password at random is the best you can do but that's not always practical. For instance, random passwords are often hard to remember. Some users therefore use password managers to help them generate and store a new random password for every website. Otherwise, you have to choose passwords that are both easy to remember and hard to guess. You'll find good advice in this 2014 article from Bruce Schneier.
Password policies
Users are only partially responsible for their choice of password. I think they share that responsibility with web developers. When you design your form for letting users choose a password, you'll have to come up with a password policy. That policy is supposed to reject bad passwords and accept good ones, but as explained in the previous paragraphs, it can't be perfect: It will necessarily accept some bad passwords or reject some good ones, or both. Then what should that policy be?
The (10 ^ 6) / 2 calculation I did earlier gave us the maximum security you can achieve with a string of 6 digits. More generally, that maximum depends on how many characters the password contains and what the characters can be (like digits, letters, etc). As a result, you know that under a certain length, a password is necessarily insecure. That's why many websites enforce a minimum length for passwords and I think they're right.
The password length policy is nice because it doesn't reject good passwords if the length threshold is chosen correctly. However, a long password is not necessarily good. Here are some common additional requirements for a password:
- Contain both lowercase and uppercase letters.
- Contain at least one digit.
- Contain at least one special character.
I think these requirements are harmful because they reject good passwords. Choosing a good password is hard enough! And some websites wrongly reject those. Now users have to comply with the policy which breaks their pattern for memorizing their passwords and might give them a false sense of security. What do you think they do then? I'll tell you my trick: append "A1!" to your password whenever this occurs. I don't mind telling you because many people do that sort of thing and attackers know it.
Troy Hunt, in his 2011 article about bad password practices, covers other harmful requirements like "Your password must not contain special characters". Please read it if you ever want to build password authentication for a website. The advice about using HTTPS instead of just HTTP is also really important.
If you want to reject more bad passwords besides ones that are too short, consider giving zxcvbn a try. This library is excellent at detecting easily guessable passwords and, more importantly, it won't flag any good password as bad.
Summary
For users:
- Choose good passwords: long enough and sufficiently hard to predict.
- Choose a new password for every website.
- A password manager might help you manage random passwords, which are secure but hard to remember.
For developers:
- Secure your website against vulnerabilities.
- Use HTTPS.
- Only store the hashes of passwords, not the passwords themselves.
- Choose a good hash function: one that would make it expensive for an attacker to recover passwords from your database. PBKDF2 and bcrypt with a suitable work factor are two good examples.
- Avoid harmful password constraints.
If you want to read more on password choice and policies, I highly recommend the NIST SP 800-63B standard, which has a very good section on password security.