If one uses a passphrase which is grammatical, such as \"I put my keys under the
ID: 660132 • Letter: I
Question
If one uses a passphrase which is grammatical, such as "I put my keys under the doormat because they're safe there."... how many words (on average) must it contain to be as secure as a password with ~ 14 characters, such as 28fjha9;582g-jg ? (say we allow 26 lowercase, 26 capital, 10 digits and 10 punctuation, so total of 72).
I've heard people say that only a few thousand English words are commonly used, so let's say that there are 5000 possible words. If the passphrase were not grammatical, then using 8 words would give us better entropy than a 14-character password (assuming the words weren't all super short).
5000^8 = 3.906*10^29 and 72^14 = 1.006*10^26
However, if our passphrase is grammatical, I would think this severely restricts the number of combinations.
Given this restriction on the passphrase, how long does it need to be in order to be as secure as a 14-digit password randomly composed of ~ 72 characters? What about if we apply some restriction on the password as well, expecting that it will be mostly composed of dictionary words and a few extra characters?
Explanation / Answer
Unfortunately I don't think our industry has the information we would need to answer your question. While you are right that going from a random sequence of words to a grammatically valid sequence of words offers many less combinations, it's not an easy answer to determine how many combinations can be produced. And it's really figuring that out that then allows you to compare them to random passwords.
As you increase the number of words in a sequence you also exponentially increase the overall number of possibilities to check, which results in the problem of not having enough time/computing power to check them all. So while you can check a specific sequence fairly quickly you can't check all possible sequences very fast once you reach a certain number of words (6, maybe 7?). Pre-tagging words with their relevant part of speech and constructing sentence possibilities based on selecting appropriate tags would be faster than a brute force approach, but I don't know how much faster. I don't think we'd reach 1.006*10^26 acceptable combinations before hitting the computational wall. I could be overestimating the burden of this work, but that's my initial suspicion.
As the CMU paper (Effect of Grammar on Security of Long Passwords) @Steve-dl mentioned discusses, you also have the problem that attackers will further reduce that set of acceptable grammatical sequences to likely grammatical sequences. This is where I've seen most passphrase cracking research focused. Using either word sequences pulled directly from public sources (song titles/lyrics, book passages, famous quotes, wikipedia sentences, etc.) or partial phrase combinator attacks ("iloveto" + wordlist entry) they try to focus in on the likely choices. But the number of word combinations this produces will vary from attacker to attacker, so it is still difficult to say exactly what numbers these approaches can produce.
So to have a 'secure' grammatically correct passphrase you would not only need to make sure it was formed from a sufficiently long combination of words, but also make sure it has not appeared in popular media that could be part of an attacker's wordlist.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.