Password Cracker - Generating Passwordsđđ
Language Modeling with Recurrent Neural Networks (LSTMs)

In todayâs article, I am going to show you how to exploit Recurrent Neural Networks (LSTMs) for password cracking. I will demonstrate you how easy is to derive and break certain types of passwords. Furthermore, you will find out how to generate an infinite amount of probable passwords, but whatâs more important, you will learn how to pick the right passwords to increase the security of your data.
Disclaimer
This project was developed for purely educational use. Donât use it for any evil purposes.

Okay, now that I am sure that your hat is white (gray at most), we can proceed to our project.
Cracking Passwords
Without further ado, letâs get straight to the point because underlying concept is very simple.
- Given a large dataset of leaked passwords.
- Train an RNN LSTM model on it.
- Sample new passwords.
1. Data
I have prepared our passwords input dataset using top 85 million WPA (Wi-Fi) leaked passwords, that you can download here.
These passwords are initially sorted in the descending order by their probabilities.
I randomized and split them into:
- Training set (10%) ~100 MB (9 M passwords)
- Testing set (90%) ~850 MB (76 M passwords)
As our data is ready to go, letâs proceed with some training.
2. RNN LSTM Model
We will use Text Predictor for our RNN LSTM model.
Before diving into the Text Predictorâs code, I will recommend you to check my previous article on the RNNs basics. You will grasp the essentials of LSTMs and learn how to generate text from a given dataset.
3. Sampling Passwords

I have trained Text Predictorâs RNN LSTM model on a passwords dataset with the following hyperparameters for 130 thousand iterations in 9 epochs.

Results
After every thousand iterations, I have performed a validation that took the form of sampling 10 thousand characters - on average about 900 passwords. Then I checked each sampled password whether itâs in the test set or not. To validate our modelâs performance I came up with the hit_ratio metric (quasi-precision).
hit_ratio = sampled_passwords_in_test_set / all_sampled_passwords
With this metric, we can find out the percentage of how many passwords weâve artificially generated were actually used by people.

After 9 epochs and 130 000 learning iterations, weâve generated 909 passwords. It turned out, that weâve managed to correctly guess 120 of them, which is about 13%!
Pretty neat, huh?

Here is a bunch of AI-generated passwords that were actually used by people.
richardmars
sierrasoftball
8aug1863
FalconGroovy
verstockt
hakensen
mccaitlin
playboyslayer
republicmaster
eddie123
Denversharon
marchand
humaniseront5
7december1789
15071600
Spatted2
jaredhomebrew
choco2007
doctorPacker
bac7er!o1o9!s7s
elliot1993
d3r!v@7!on
trickset
jonathancruise
mcjordan23
Family82
susanAwesome
Conclusions and Recommendations
As you can see above, itâs definitely achievable to generate probable passwords, especially the ones that follow human-reasoning patterns and contain birthdays, relatives, pets, interests, places - in general things that are somehow related to us. Itâs just how our brains usually work, we can easily memorize them because they are important for us.
Unfortunately, itâs fairly easy to derive them with techniques like spidering that are focused on gathering information about a specific subject, backed by AI systems that process and analyze collected data.
Even though itâs possible to generate more and more probable passwords, we usually still need multiple attempts to find the correct one. A number of required tries is usually proportional to the password listâs feasibility.
Some of you may now think that multiple password entries can be easily detected/prevented and you will be right, but only to some extent. Nowadays, the majority of the authorization systems limit the number of password attempts to prevent brute force attacks.
Apple is no different there, this is how their limits look like.

I will give you some high-level overview of how this actually works without going into the implementation details. Each password attempt is being persisted along with its timestamp. While we exceed the
attempts / elapsed_time_from_last_attempt
quota, we get a lockdown and we have to wait additional time to try again.
Simple, yet very effective system, but what if it gets hacked?
At the end of the day, the information about last password entry attempt needs to be written to some sort of memory. Just as it gets written there, it can also be modified or removed. I am not saying that itâs simple, because in the majority of the cases it isnât, but by principle, itâs definitely possible.
Have you ever heard of the GrayKey? Itâs a small piece of hardware developed to do exactly what Iâve described above with the iOS devices.
Although itâs neither easy nor cheap to bypass passcode entry limits, itâs doable.
Are we helpless then?
No!
We can still protect our data or at least increase our security level.
How?
By making GrayKeyâs (and other alike devices) job significantly harder.
Its ultimate job is to find a correct password combination to unlock the device. If weâve decided to use a 4-digit numeric passcode, weâve already lost the battle - itâs only 10â´ combinations and it can be cracked in about 6 minutes on average.
What if weâve decided to use a 6-digit numeric passcode? With 10âś combinations to check, weâll make it compute and try longer, but definitely not long enough - only about 22 hours.
So whatâs the solution?
We can use a custom alphanumeric passcode. Letâs say we use a 12 character one. Assuming that there are 95 unique characters to choose from, itâs 95š² combinations.
95š²=540360087662636962890625 [combinations]
It would take decades to crack and I think that we can settle for such level of security.
If you are going to remember only one thing from this article, let it be the following.
Pick long, complex and most preferably not human-invented passwords. Do not reuse them.
Strong password example: h66-gPq-1vm-cYFA-a4Gb-12mN
I would recommend using tools designed just for this like iCloud Keychain or 1Password.
Knowing how to secure your data (or at least how to pick passwords), letâs move on to another fascinating topic.
Whatâs next?
Since generating passwords with recurrent neural networks proved to be successful, I encourage you to experiment with different sequential inputs like audio, video, images, time series, etc. There is still lots to discover in the RNNs field and invite you to do so. Let me know about your findings!
Donât forget to check the projectâs github page.
Questions? Comments? Feel free to leave your feedback in the comments section or contact me directly at https://gsurma.github.io.
And donât forget to đ if you enjoyed this article đ.
