Password Cracker - Generating Passwords🔑🔓

Language Modeling with Recurrent Neural Networks (LSTMs)

In today’s article, I am going to show you how to exploit (LSTMs) for password cracking. I will demonstrate you how easy is to derive and break certain types of passwords. Furthermore, you will find out how to generate an infinite amount of probable passwords, but what’s more important, you will learn how to pick the right passwords to increase the security of your data.


This project was developed for purely educational use. Don’t use it for any evil purposes.

(source: )

Okay, now that I am sure that your hat is white (gray at most), we can proceed to our project.

Cracking Passwords

Without further ado, let’s get straight to the point because underlying concept is very simple.

  1. Given a large dataset of leaked passwords.
  2. Train an RNN LSTM model on it.
  3. Sample new passwords.

1. Data

I have prepared our passwords input dataset using top 85 million WPA (Wi-Fi) leaked passwords, that you can download .

These passwords are initially sorted in the descending order by their probabilities.

I randomized and split them into:

  • Training set (10%) ~100 MB (9 M passwords)
  • Testing set (90%) ~850 MB (76 M passwords)

As our data is ready to go, let’s proceed with some training.

2. RNN LSTM Model

We will use Text Predictor for our RNN LSTM model.

Before diving into the Text Predictor’s code, I will recommend you to check my previous article on the RNNs basics. You will grasp the essentials of LSTMs and learn how to generate text from a given dataset.

3. Sampling Passwords

(source: )

I have trained Text Predictor’s RNN LSTM model on a passwords dataset with the following hyperparameters for 130 thousand iterations in 9 epochs.


After every thousand iterations, I have performed a validation that took the form of sampling 10 thousand characters - on average about 900 passwords. Then I checked each sampled password whether it’s in the test set or not. To validate our model’s performance I came up with the hit_ratio metric (quasi-precision).

hit_ratio = sampled_passwords_in_test_set / all_sampled_passwords

With this metric, we can find out the percentage of how many passwords we’ve artificially generated were actually used by people.

After 9 epochs and 130 000 learning iterations, we’ve generated 909 passwords. It turned out, that we’ve managed to correctly guess 120 of them, which is about 13%!

Pretty neat, huh?

(source: )

Here is a bunch of AI-generated passwords that were actually used by people.


Conclusions and Recommendations

As you can see above, it’s definitely achievable to generate probable passwords, especially the ones that follow human-reasoning patterns and contain birthdays, relatives, pets, interests, places - in general things that are somehow related to us. It’s just how our brains usually work, we can easily memorize them because they are important for us.

Unfortunately, it’s fairly easy to derive them with techniques like that are focused on gathering information about a specific subject, backed by AI systems that process and analyze collected data.

Even though it’s possible to generate more and more probable passwords, we usually still need multiple attempts to find the correct one. A number of required tries is usually proportional to the password list’s feasibility.

Some of you may now think that multiple password entries can be easily detected/prevented and you will be right, but only to some extent. Nowadays, the majority of the authorization systems limit the number of password attempts to prevent brute force attacks.

Apple is no different there, this is how their limits look like.

Apple’s iOS Security Guide

I will give you some high-level overview of how this actually works without going into the implementation details. Each password attempt is being persisted along with its timestamp. While we exceed the

attempts / elapsed_time_from_last_attempt 

quota, we get a lockdown and we have to wait additional time to try again.

Simple, yet very effective system, but what if it gets hacked?

At the end of the day, the information about last password entry attempt needs to be written to some sort of memory. Just as it gets written there, it can also be modified or removed. I am not saying that it’s simple, because in the majority of the cases it isn’t, but by principle, it’s definitely possible.

Have you ever heard of the GrayKey? It’s a small piece of hardware developed to do exactly what I’ve described above with the iOS devices.

Although it’s neither easy nor cheap to bypass passcode entry limits, it’s doable.

Are we helpless then?


We can still protect our data or at least increase our security level.


By making GrayKey’s (and other alike devices) job significantly harder.

Its ultimate job is to find a correct password combination to unlock the device. If we’ve decided to use a 4-digit numeric passcode, we’ve already lost the battle - it’s only 10⁴ combinations and it can be cracked in about 6 minutes on average.

What if we’ve decided to use a 6-digit numeric passcode? With 10⁶ combinations to check, we’ll make it compute and try longer, but definitely not long enough - only about 22 hours.

So what’s the solution?

We can use a custom alphanumeric passcode. Let’s say we use a 12 character one. Assuming that there are 95 unique characters to choose from, it’s 95¹² combinations.

95¹²=540360087662636962890625 [combinations]

It would take decades to crack and I think that we can settle for such level of security.

If you are going to remember only one thing from this article, let it be the following.

Pick long, complex and most preferably not human-invented passwords. Do not reuse them.

Strong password example: h66-gPq-1vm-cYFA-a4Gb-12mN

I would recommend using tools designed just for this like iCloud Keychain or 1Password.

Knowing how to secure your data (or at least how to pick passwords), let’s move on to another fascinating topic.

What’s next?

Since generating passwords with recurrent neural networks proved to be successful, I encourage you to experiment with different sequential inputs like audio, video, images, time series, etc. There is still lots to discover in the RNNs field and invite you to do so. Let me know about your findings!

Questions? Comments? Feel free to leave your feedback in the comments section or contact me directly at .

And don’t forget to 👏 if you enjoyed this article 🙂.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store